155 3 21MB
English Pages 787 [764] Year 2023
Lecture Notes in Networks and Systems 680
Satyabrata Roy · Deepak Sinwar · Nilanjan Dey · Thinagaran Perumal · João Manuel R. S. Tavares Editors
Innovations in Computational Intelligence and Computer Vision Proceedings of ICICV 2022
Lecture Notes in Networks and Systems Volume 680
Series Editor Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Satyabrata Roy · Deepak Sinwar · Nilanjan Dey · Thinagaran Perumal · João Manuel R. S. Tavares Editors
Innovations in Computational Intelligence and Computer Vision Proceedings of ICICV 2022
Editors Satyabrata Roy Department of Computer Science and Engineering Manipal University Jaipur Jaipur, Rajasthan, India Nilanjan Dey Techno International New Town Kolkata, India João Manuel R. S. Tavares Faculdade de Engenharia da Universidade do Porto Porto, Portugal
Deepak Sinwar Department of Computer and Communication Engineering Manipal University Jaipur Jaipur, Rajasthan, India Thinagaran Perumal Faculty of Computer Science and Information Technology Universiti Putra Malaysia Seri Kembangan, Malaysia
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-2601-5 ISBN 978-981-99-2602-2 (eBook) https://doi.org/10.1007/978-981-99-2602-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.
Organizing Committee
Chief Patron Shri S. Vaitheeswaran, Chairperson, Manipal University Jaipur
Patron Dr. G. K. Prabhu, President, Manipal University Jaipur
Co-patrons Dr. N. N. Sharma, Pro-President, Manipal University Jaipur Dr. Nitu Bhatnagar, Registrar, Manipal University Jaipur
General Chair Dr. Arun Shanbhag, Dean Faculty of Engineering, Manipal University Jaipur
Program Chair Dr. Vijaypal Singh Dhaka, Manipal University Jaipur
v
vi
Program Co-chairs Dr. Sumit Srivastava, Manipal University Jaipur Dr. Sandeep Chaurasia, Manipal University Jaipur Dr. Pankaj Vyas, Manipal University Jaipur
Program Secretaries Dr. Sunil Kumar, Manipal University Jaipur Dr. Amita Nandal, Manipal University Jaipur
Conveners Dr. Deepak Sinwar, Manipal University Jaipur Dr. Satyabrata Roy, Manipal University Jaipur
Finance Chair Dr. Lal Pratap Verma, Manipal University Jaipur
Registration Committee Ms. Kuntal Gaur, Manipal University Jaipur Dr. Amit Chaurasia, Manipal University Jaipur Dr. Sudhir Sharma, Manipal University Jaipur Mr. Pradeep Kumar, Manipal University Jaipur
Website Committee Mr. Krishna Kumar, Manipal University Jaipur Mr. Dheeraj Sain, Manipal University Jaipur Mr. Vikas Tatiwal, Manipal University Jaipur
Organizing Committee
Organizing Committee
IT Infra Committee Dr. Arjun Singh, Manipal University Jaipur Mr. Tarun Jain, Manipal University Jaipur
Transportation and Accommodation Committee Mr. Virender, Manipal University Jaipur Mr. Ravinder Kumar, Manipal University Jaipur
Publication Committee Dr. Deepak Sinwar, Manipal University Jaipur Dr. Satyabrata Roy, Manipal University Jaipur Dr. Kusumlata Jain, Manipal University Jaipur Dr. Deepika Shekhawat, Manipal University Jaipur Ms. Smaranika Mohapatra, Manipal University Jaipur
Local Organizing Committee Dr. Arvind Dhaka, Manipal University Jaipur Dr. Vaibhav Bhatnagar, Manipal University Jaipur Dr. G. L. Saini, Manipal University Jaipur Dr. Deepak Panwar, Manipal University Jaipur Dr. Abhay Sharma, Manipal University Jaipur Dr. Linesh Raja, Manipal University Jaipur Dr. Amit Kumar Bairwa, Manipal University Jaipur Dr. Renu Kumawat, Manipal University Jaipur Dr. Nitesh Pradhan, Manipal University Jaipur Dr. Mahesh Jangid, Manipal University Jaipur Dr. Neha V. Sharma, Manipal University Jaipur Dr. Somya Goyal, Manipal University Jaipur Dr. Ghanshyam Raghuwanshi, Manipal University Jaipur Dr. Kavita Jhajharia, Manipal University Jaipur Dr. Anju Yadav, Manipal University Jaipur Dr. Aprna Tripathi, Manipal University Jaipur Dr. Aditya Sinha, Manipal University Jaipur Dr. Ganpat Singh Chauhan, Manipal University Jaipur
vii
viii
Dr. Sandeep Kumar Sharma, Manipal University Jaipur Dr. Amit Kumar Gupta, Manipal University Jaipur Dr. Suman Bhakar, Manipal University Jaipur Dr. Sumit Dhariwal, Manipal University Jaipur Dr. Geeta Rani, Manipal University Jaipur Dr. Avani Sharma, Manipal University Jaipur Dr. Shally Vats, Manipal University Jaipur Dr. Rohit Mittal, Manipal University Jaipur Dr. Hemlata Goyal, Manipal University Jaipur Dr. Ashu Yadav, Manipal University Jaipur Mr. Venkatesh Gauri Shankar, Manipal University Jaipur Ms. Anita Shrotriya, Manipal University Jaipur Mr. Subarno Bhattacharyya, Manipal University Jaipur Mr. Anil Kumar, Manipal University Jaipur Mr. Anurag Bhatnagar, Manipal University Jaipur Ms. Nidhi Kundu, Manipal University Jaipur Mr. Deevesh Choudhary, Manipal University Jaipur Mr. Jay Prakash Singh, Manipal University Jaipur Mr. Rajesh Kumar, Manipal University Jaipur Mr. Arpit Sharma, Manipal University Jaipur Ms. Bali Devi, Manipal University Jaipur
Organizing Committee
Preface
This volume comprises of research papers presented at the 3rd International Conference on Innovations in Computational Intelligence and Computer Vision (ICICV 2022) organized by the Department of Computer and Communication Engineering, Manipal University Jaipur, India, during November 24–25, 2022. The conference was focused on addressing the research challenges and innovations specifically in the field of “Computational Intelligence” and “Computer Vision.” The conference received a total of 243 research paper submissions from different countries including Japan, Bangladesh, Botswana, Norway, China, Colombia, Ethiopia, France, India, Philippines, South Africa, and Kosovo. After careful examination by technical program committee of ICICV 2022 and reviews by at least two reviewers, 57 qualitative submissions related to the conference’s theme were selected for oral presentations. The papers covered recent innovations in computer vision, machine learning, advanced computing, and other allied areas. We express our sincere thanks to Manipal University Jaipur, India, for providing the whole-hearted support in organizing this conference. We would like to extend our sincere appreciation for the outstanding work contributed over many months by the Organizing Committee of ICICV 2022. We also wish to express our appreciation to the chief guest, guest of honor for gracing and motivating the participants. We sincerely thank our esteemed keynote speakers for their informative keynote talks. Our special thank goes to all the session chairs, track managers, reviewers, and student volunteers for their outstanding support in organizing this conference. Finally, our thanks go to all the participants who revitalized the event with their valuable research submissions and presentations. We hope that the broad scope of topics related to the fields of computational intelligence and computer vision covered in this proceeding will certainly help the
ix
x
Preface
readers to understand that the methods of computational intelligence and computer vision have become an important element of modern computer science. Jaipur, India Jaipur, India Kolkata, India Selangor, Malaysia Porto, Portugal
Satyabrata Roy Deepak Sinwar Nilanjan Dey Thinagaran Perumal João Manuel R. S. Tavares
Contents
Empirical Study of Multi-class Weed Classification Using Deep Learning Network Through Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . Mahendra Kumar Gourisaria, Vishal Kumar Sahoo, Biswajit Sahoo, Partha Pratim Sarangi, and Vijander Singh Secured Face Recognition System Based on Blockchain with Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Krishnakumar, S. Saravanan, and Amine Naite-Ali
1
13
Classifying Paintings/Artworks Using Deep Learning Techniques . . . . . . Shivam Singh and Sandeep Chaurasia
25
Hybrid Deep Face Music Recommendation Using Emotions . . . . . . . . . . . Divija Sanapala, Raja Muthalagu, and Pranav M. Pawar
35
Integrating ResNet18 and YOLOv4 for Pedestrian Detection . . . . . . . . . . Nader Salam and T. Jemima Jebaseeli
49
Image Restoration from Weather Degraded Images Using Markov Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ajoy Dey, Avra Ghosh, and Sheli Sinha Chaudhuri
63
A Feature Representation Technique for Angular Margin Loss . . . . . . . . Asfak Ali and Sheli Sinha Chaudhuri
73
LeafViT: Vision Transformers-Based Leaf Disease Detection . . . . . . . . . . H. Keerthan Bhat, Aashish Mukund, S. Nagaraj, and R. Prakash
85
Conversion of Satellite Images to Google Maps Using GAN . . . . . . . . . . . . 103 Medha Wyawahare, Ninad Ekbote, Sameer Pimperkhede, Atharva Deshpande, and Aditi Sahastrabudhe Investigation of Widely Used Implicit and Explicit Communication in Crossing-Decision of Pedestrian in UAE . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Shalini Pal, Lokesh Sharma, and Mahesh Jangid
xi
xii
Contents
Attention Deficit Hyperactivity Disorder Prediction Using Resting-State Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Hetav Patel, Nehil Sood, and Abhishek Sharma Neuroinformatics Deep Learning Synthesizer Based on Impulse Control Disorder Using LSTM Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Vipul Maheshwari, Akshat Soni, and Abhishek Sharma Wasserstein GANs-Enabled Spectral Normalization on Credit Card Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Kartikey Sharma, Abhishek Sharma, and Sulabh Bansal Classification of Bipolar Disorder Using Deep Learning Models on fMRI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Harsh Chauhan, Poojan Gadhiya, and Abhishek Sharma Predicting Schizophrenia from fMRI Using Deep Learning . . . . . . . . . . . . 177 Shail Kardani, Raghav Sharma, and Abhishek Sharma Patrolling Robot with Facial Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Mitrajeet Golsangi, Vivek Ghuge, Divija Godse, Pravin Harne, and Kaushalya Thopate A Graph-Based Relook Beyond Metadata for Music Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Vishal Bharadwaj, Aravind S. Mysore, Ninad Sangli, Shraddha Bharadwaj, and Bhaskarjyoti Das An Interpretability Assisted Empirical Study of Affective Traits in Visual Content of Disinformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Bhaskarjyoti Das, Shrikar Madhu, Yousha Mahamuni, and Kruthika Suresh Multi-scale Fusion-Based Object Detection Network for Advance Driver Assistance Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Aishwarya R. Dhayighode, Rajarajeswari Subramanian, and Pramod Sunagar Vectorization of Python Programs Using Recursive LSTM Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Ravuri Sai Srichandra, Pingali Shriram Rahul, and Venkataramana Battula Phenology Detection for Croplands Using Sentinel-2 and Computer Vision Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Yogiraj Bhoomkar, Aman Rastogi, Dwayne Fernandes, Vibhor Deshmukh, and Nitin Damame
Contents
xiii
Deep Learning-Based Safety Assurance of Construction Workers: Real-Time Safety Kit Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Megha Nain, Shilpa Sharma, and Sandeep Chaurasia Automatic Detection and Classification of Melanoma Using the Combination of CNN and SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 G. Sandhya, A. Susmitha, M. L. Sravya, M. Sai Ramya, and K. Kiranmai CyINSAT: Cyclone Dataset from Indian National Satellite for Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Deap Daru, Aditya Thaker, Akshath Mahajan, Adithya Sanyal, Meera Narvekar, and Debajyoti Mukhopadhyay Classification of Ocular Diseases: A Vision Transformer-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Sai Dheeraj Gummadi and Anirban Ghosh Analyzing Performance of Masked R-CNN Under the Influence of Distortions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Palak Mahajan, Namrata Karlupia, Pawanesh Abrol, and Parveen K. Lehana COVID-19 Detection in Chest X-Ray Images Using Non-iterative Deterministic Learning Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Arshi Husain and Virendra P. Vishwakarma Multimodal Classification via Visual and Lingual Feature Layer Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Avantika Saklani, H. S. Pannu, and Shailendra Tiwari A Review on Rural Women’s Entrepreneurship Using Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 Shivani Pareek, Vaibhav Bhatnagar, Ramesh Chandra Poonia, Shilpa Sharma, and Debabrata Samanta Handling Class Imbalance Problem Using Feature Selection Techniques: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Pooja Tyagi, Jaspreeti Singh, and Anjana Gosain An Intelligent Human Pose Recommendation System Using Feature Fusion Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Piyusha Taware, Sakshi Patil, Sonali Sargar, Mayank Anchlia, and Abhishek Bhatt Ensemble Machine Learning Algorithms for Predicting Cardiovascular Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 V. Deneshkumar, G. Jithu, and R. Jebitha
xiv
Contents
A Method for Designing the Requirements of an Information System Using Patterns Under Fuzzy Environment . . . . . . . . . . . . . . . . . . . . 437 Mohd. Arif, Tanveer Hassan, Chaudhary Wali Mohammad, Azra Parveen, and Mohd. Sadiq Performance Analysis of Hybridized Fuzzy Clustering Algorithms Using Metaheuristic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Kanika Bhalla and Anjana Gosain AI-Based Open Logo Detection for Indian Scenario . . . . . . . . . . . . . . . . . . . 463 S. N. Prasanth, R. Aswin Raj, M. A. C. Sivamaran, and V. Sowmya A Comparison of Machine Translation Methods for Natural Language Processing and Their Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Aarti Purohit, Kuldeep Kumar Yogi, and Rahul Sharma Globalizing Pre-trained Local BERT Embedding of Text for Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Prashantkumar. M. Gavali and Suresh K. Shirgave Crop Recommendation for Maximizing Crop Yield Using Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Amit Kumar Empirical Study of Semantic Analysis to Generate True Time Authenticity Scores for Subreddits Microblogging . . . . . . . . . . . . . . . . . . . . 517 Amit Kumar Sharma, Sandeep Chaurasia, Vibhakar Gupta, Mrityunjoy Chowdhury, and Devesh Kumar Srivastava A Time Series Classifier-Based Ensemble for Predictive Maintenance of Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Deepali Deshpande, Dhruva Khanwelkar, Harsh More, Nirvisha Soni, Juhi Rajani, and Chirag Vaswani Regression Test Case Optimization Using Jaccard Similarity Mapping of Control Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Mani Padmanabhan Investigating the Role of Semantic Analysis in Automated Answer Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 Deepender and Tarandeep Singh Walia Siamese-Discriminative Person Re-identification . . . . . . . . . . . . . . . . . . . . . 573 Raghav Sharma and Rohit Pandey Transformers-Based Automated PHP Code Generator . . . . . . . . . . . . . . . . 583 Yatin Tomer, Raghav Sharma, and Rohit Pandey
Contents
xv
Design and Analysis of Finite Impulse Response Filter Based on Particle Swarm Optimization and Grasshopper Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Sandeep Kumar and Rajeshwar Singh HOG Feature-Based Offline Handwritten Malayalam Word Clustering with Lexicon Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 A. T. Anju, Binu P. Chacko, and K. P. Mohamed Basheer MathKnowTopic: Creation of a Unified Knowledge Graph-Based Topic Modeling from Mathematical Text Books . . . . . . . . . . . . . . . . . . . . . . 619 M. Srivani, Abirami Murugappan, and T. Mala Health Data, Social Media, and Privacy Awareness: A Pilot Study . . . . . . 635 Prachotan Reddy Bathi and G. Deepa SSIDs as a Source for Point of Interest Suggestion in Smart Cities . . . . . . 647 Ajay Prasad and Arjun Singh The Psychological Influence of Online Stigmatization in Social Life due to Monkeypox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661 Avik Chatterjee, Soumya Mukherjee, Anirbit Sengupta, and Abhijit Das Reliability Characteristics of Two Units’ Parallel System with Refreshment Facility Subject to Environment Conditions . . . . . . . . . 675 M. S. Barak, Ajay Kumar, Reena Garg, Ashish Kumar, and Monika Sani A Comparison of Security Analysis Methods for Smart Contracts Built on the Ethereum Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 Satpal Singh Kushwaha and Sandeep Joshi IoT-Based Real-Time Crop and Fertilizer Prediction for Precision Farming 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695 Shailendra Tiwari, Ravit Garg, N. Kushal, and Manju Khurana Pied Piper: Meta Search for Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 Pulak Malhotra and Ashwin Rao Securing a SaaS Application on AWS Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . 723 Siddhartha Sourav Panda, Nishanth Kumar Pathi, and Shinu Abhi Optimized Energy Efficient Task Scheduling in Fog Computing . . . . . . . . 735 Shilpa Dinesh Vispute and Priyanka Vashisht Electrical Vehicles Insulation Detection Using Virtex 7 FPGA . . . . . . . . . . 747 Mahipal Bukya, Rajesh Kumar, and Akhilesh Mathur Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757
Editors and Contributors
About the Editors Satyabrata Roy received the B.Tech. and M.Tech. degrees in CSE from the Maulana Abul Kalam Azad University of Technology, West Bengal (Formerly known as West Bengal University of Technology), and KIIT University, Bhubaneswar, respectively. He has completed his Ph.D. from Manipal University Jaipur in lightweight security for IoT. He is having more than 10 years of teaching and research experience. He is currently Associate Professor with the Department of Computer Science and Engineering, Manipal University Jaipur. His major research interests include cryptography and cellular automata. He has published many research papers in referred journals and international conferences. Currently, he is supervising three research scholars at Manipal University Jaipur. He is Senior Member of IEEE and Professional Member of ACM. Deepak Sinwar is Associate Professor at the Department of Computer and Communication Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India. He received his Ph.D. and M.Tech. degrees in Computer Science and Engineering in 2016 and 2010, respectively, and B.Tech. (with honors) in Information Technology in 2008. On his credit, he has published more than 50 articles in peer-reviewed journals, conference proceedings, and chapters. His research interests include computational intelligence, data mining, machine learning, reliability theory, computer networks, and pattern recognition. He is Enthusiastic and Motivating Academician with more than 12 years of research and academic experience. He is Life Member of Indian Society for Technical Education (India), Senior Member of IEEE, and Member of ACM professional society. Nilanjan Dey is Associate Professor in the Department of Computer Science and Engineering, Techno International New Town, Kolkata, India. He is Visiting Fellow of the University of Reading, UK. He also holds a position of Adjunct Professor at Ton Duc Thang University, Ho Chi Minh City, Vietnam. Previously, he held
xvii
xviii
Editors and Contributors
Honorary Position of Visiting Scientist at Global Biomedical Technologies Inc., CA, USA (2012–2015). He was awarded his Ph.D. from Jadavpur University in 2015. He is Editor-in-Chief of the International Journal of Ambient Computing and Intelligence (IGI Global). He is Series Co-editor of Springer Tracts in NatureInspired Computing (Springer Nature), Data-Intensive Research (Springer Nature), and Advances in Ubiquitous Sensing Applications for Healthcare (Elsevier). He is having 110 books and over 300 publications in the area of medical imaging, machine learning, computer-aided diagnosis, data mining, etc. (over 16500 citations, 63 hindex). Furthermore, he is Indian Ambassador of the International Federation for Information Processing—Young ICT Group and Senior Member of IEEE. Thinagaran Perumal is currently Senior Lecturer at the Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia. He is also currently appointed as Head of Cyber-Physical Systems in the university and has been elected as Chair of IEEE Consumer Electronics Society Malaysia Chapter. He is Recipient of 2014 Early Career Award from IEEE Consumer Electronics Society for his pioneering contribution in the field of consumer electronics. He completed his Ph.D. at Universiti Putra Malaysia in smart technology and robotics. His research interests are toward interoperability aspects of smart homes and Internet of things (IoT), wearable computing, and cyber-physical systems. He is also heading the National Committee on Standardization for IoT (IEC/ISO TC/ G/16) as Chairman since 2018. Some of the eminent works include proactive architecture for IoT systems; development of the cognitive IoT frameworks for smart homes, and wearable devices for rehabilitation purposes. He is Active Member of IEEE Consumer Electronics Society and its Future Directions Committee on Internet of Things. He has been invited to give several keynote lectures and plenary talk on Internet of Things in various institutions and organizations internationally. He has published several papers in IEEE Conferences and Journals and is serving as TPC Member for several reputed IEEE conferences. He is Active Reviewer for IEEE Internet of Things Journal, IEEE Communication Magazine, IEEE Sensors Journal, and IEEE Transaction for Automation Science and Engineering, to name a few. João Manuel R. S. Tavares is graduated in Mechanical Engineering from the University of Porto, Portugal (1992). In 1995, he obtained a M.Sc. in Electrical and Computer Engineering, in the field of industrial informatics, also at the University of Porto. In 2001, he obtained a Ph.D. degree in Electrical and Computer Engineering from the same university, and in 2015 the Habilitation in Mechanical Engineering. From 1995 to 2000, he was Researcher at the Institute of Biomedical Engineering (INEB). Since 2001, he has been Senior Researcher and Project Coordinator at the Laboratory of Optical and Experimental Mechanics (LOME) of the Institute of Mechanical Engineering and Industrial Management (INEGI). He was Assistant Professor in the Department of Mechanical Engineering (DEMec) of the Faculty of Engineering of the University of Porto (FEUP) between 2001 and 2011, and since then, he has been Associate Professor in the same department. Since 2001, he has been Supervisor and Co-supervisor of several M.Sc. and Ph.D. theses. He is Co-author
Editors and Contributors
xix
of more than 550 articles in national and international journals and conferences, Co-editor of 30 international books, and Guest Editor of several special issues of international journals. Also, he has been evolved in several research projects, both as Researcher and as Scientific Coordinator. He has been Member of several national and international journal editorial boards, Editor-in-Chief of the T&F Journal Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, and Co-editor of the Springer book series “Lecture Notes in Computational Vision and Biomechanics.” His main research areas include computational vision, computational mechanics, scientific visualization, human–computer interaction, and new product development.
Contributors Shinu Abhi RACE, REVA University, Bangalore, India Pawanesh Abrol University of Jammu, Jammu, J&K, India Asfak Ali Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata, India Mayank Anchlia Samsung Research Institute, Bangalore, India A. T. Anju Sullamussalam Science College, University of Calicut, Areekode, Kerala, India Mohd. Arif Computer Science and Technology Research Group, Department of Applied Sciences and Humanities, Faculty of Engineering and Technology, Jamia Millia Islamia (A Central University), New Delhi, India Sulabh Bansal The LNM Institute of Information Technology, Jaipur, Rajasthan, India; School of Information Technology, Manipal University Jaipur, Jaipur, Rajasthan, India M. S. Barak Department of Mathematics, Indira Gandhi University, Meerpur, Rewari, India Prachotan Reddy Bathi Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India; Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, USA Venkataramana Battula Department of Computer Science and Engineering, MVSR Engineering College, Nadergul, Hyderabad, India Kanika Bhalla Guru Gobind Singh Indraprastha University, New Delhi, India
xx
Editors and Contributors
Shraddha Bharadwaj Computer Science Engineering Department, PES University, Bengaluru, Karnataka, India Vishal Bharadwaj Computer Science Engineering Department, PES University, Bengaluru, Karnataka, India Vaibhav Bhatnagar Department of Computer Applications, Manipal University Jaipur, Jaipur, Rajasthan, India Abhishek Bhatt School of Data Science, Symbiosis Skill and Professional University, Pune, India Yogiraj Bhoomkar Xoriant Solutions Pvt. Ltd., Pune, Maharashtra, India Mahipal Bukya Department of Electrical and Electronics Engineering, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, India; Department of Electrical Engineering, Malaviya National Institute of Technology Jaipur, Jaipur, India Binu P. Chacko Prajyoti Niketan College, University of Calicut, Pudukad, Kerala, India Avik Chatterjee Department of MCA, Techno College (Hooghly Campus), Chinsurah, West Bengal, India Sheli Sinha Chaudhuri Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata, India Harsh Chauhan Department of Computer Science and Engineering, The LNM Institute of Information Technology, Jaipur, Rajasthan, India Sandeep Chaurasia Department of Computer Science and Engineering, School of Computing and Information Technology, Manipal University Jaipur, Jaipur, Rajasthan, India Mrityunjoy Chowdhury Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, India Nitin Damame Xoriant Solutions Pvt. Ltd., Pune, Maharashtra, India Deap Daru Dwarkadas Jivanlal Sanghvi College of Engineering, Mumbai, India Abhijit Das Department of IT, RCCIIT, Calcutta, West Bengal, India Bhaskarjyoti Das Computer Science Engineering Department, PES University, Bengaluru, Karnataka, India G. Deepa Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India Deepender School of Computer Applications, Lovely Professional University, Phagwara, Punjab, India
Editors and Contributors
xxi
V. Deneshkumar Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamil Nadu, India Vibhor Deshmukh Xoriant Solutions Pvt. Ltd., Pune, Maharashtra, India Atharva Deshpande Department of Electronics and Telecommunication, Vishwakarma Institute of Technology, Pune, Maharashtra, India Deepali Deshpande Vishwakarma Institute of Technology, Pune, Maharashtra, India Ajoy Dey Jadavpur University, Kolkata, India Aishwarya R. Dhayighode MS Ramaiah Institute of Technology, Bengaluru, India Ninad Ekbote Department of Electronics and Telecommunication, Vishwakarma Institute of Technology, Pune, Maharashtra, India Dwayne Fernandes Xoriant Solutions Pvt. Ltd., Pune, Maharashtra, India Poojan Gadhiya Department of Computer Science and Engineering, The LNM Institute of Information Technology, Jaipur, Rajasthan, India Ravit Garg CSED, Thapar Institute of Engineering and Technology, Patiala, India Reena Garg Department of Mathematics, J.C. Bose University of Science and Technology, YMCA, Faridabad, India Prashantkumar. M. Gavali Department of Technology, Shivaji University, Kolhapur, India; DKTE Society’s Textile and Engineering Institute, Ichalkaranji, India Anirban Ghosh SRM University, Amaravati, AP, India Avra Ghosh Jadavpur University, Kolkata, India Vivek Ghuge Department of Computer Science, Vishwakarma Institute of Technology, Pune, India Divija Godse Department of Computer Science, Vishwakarma Institute of Technology, Pune, India Mitrajeet Golsangi Department of Computer Science, Vishwakarma Institute of Technology, Pune, India Anjana Gosain USICT, Guru Gobind Singh Indraprastha University, New Delhi, India Mahendra Kumar Gourisaria School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India Sai Dheeraj Gummadi SRM University, Amaravati, AP, India Vibhakar Gupta Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, India
xxii
Editors and Contributors
Pravin Harne Department of Computer Science, Vishwakarma Institute of Technology, Pune, India Tanveer Hassan Computer Science and Technology Research Group, Department of Applied Sciences and Humanities, Faculty of Engineering and Technology, Jamia Millia Islamia (A Central University), New Delhi, India Arshi Husain Guru Gobind Singh Indraprastha University, USICT, New Delhi, India Mahesh Jangid Department of Information Technology, Manipal University Jaipur, Jaipur, India R. Jebitha Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamil Nadu, India T. Jemima Jebaseeli Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India G. Jithu Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamil Nadu, India Sandeep Joshi Manipal University Jaipur, Jaipur, India Shail Kardani Deparment of Computer Science Engineering, The LNM Institute of Information Technology, Jaipur, Rajasthan, India Namrata Karlupia Central University of Jammu, Jammu, J&K, India H. Keerthan Bhat RV College of Engineering, Bengaluru, India Dhruva Khanwelkar Vishwakarma Institute of Technology, Pune, Maharashtra, India Manju Khurana CSED, Thapar Institute of Engineering and Technology, Patiala, India K. Kiranmai Department of ECE, VNITSW, Guntur, Andhra Pradesh, India K. Krishnakumar Department of Multimedia, VIT School of Design, Vellore Institute of Technology, Vellore, India Ajay Kumar Department of Mathematics, J.C. Bose University of Science and Technology, YMCA, Faridabad, India Amit Kumar Andhra University, Visakhapatnam, Andhra Pradesh, India Ashish Kumar Department of Mathematics and Statistics, Manipal University Jaipur, Jaipur, India Rajesh Kumar Department of Electrical Engineering, Malaviya National Institute of Technology Jaipur, Jaipur, India Sandeep Kumar I.K. Gujral Punjab Technical University, Jalandhar, Punjab, India
Editors and Contributors
xxiii
N. Kushal CSED, Thapar Institute of Engineering and Technology, Patiala, India Satpal Singh Kushwaha Manipal University Jaipur, Jaipur, India Parveen K. Lehana University of Jammu, Jammu, J&K, India Shrikar Madhu Computer Science Engineering Department, PES University, Bengaluru, Karnataka, India Akshath Mahajan Dwarkadas Jivanlal Sanghvi College of Engineering, Mumbai, India Palak Mahajan Central University of Jammu, Jammu, J&K, India Yousha Mahamuni Computer Science Engineering Department, PES University, Bengaluru, Karnataka, India Vipul Maheshwari Department of Computer and Communication Engineering, The LNM Institute of Information Technology, Jaipur, Rajasthan, India T. Mala College of Engineering Guindy, Anna University, Chennai, Tamil Nadu, India Pulak Malhotra International Institute of Information Technology, Hyderabad, India Akhilesh Mathur Department of Electrical Engineering, Malaviya National Institute of Technology Jaipur, Jaipur, India K. P. Mohamed Basheer Sullamussalam Science College, University of Calicut, Areekode, Kerala, India Harsh More Vishwakarma Institute of Technology, Pune, Maharashtra, India Soumya Mukherjee Department of Management Studies, Techno India (Hooghly Campus), Chinsurah, West Bengal, India Debajyoti Mukhopadhyay WIDiCoReL Research Lab, Mumbai, India Aashish Mukund RV College of Engineering, Bengaluru, India Abirami Murugappan College of Engineering Guindy, Anna University, Chennai, Tamil Nadu, India Raja Muthalagu Department of Computer Science, Birla Institute of Technology and Science Pilani, Dubai Campus, Dubai, UAE Aravind S. Mysore Computer Science Engineering Department, PES University, Bengaluru, Karnataka, India S. Nagaraj Noesys Software Pvt. Ltd., Bengaluru, India Megha Nain Department of Computer Applications, School of Basic Science, Manipal University Jaipur, Jaipur, Rajasthan, India
xxiv
Editors and Contributors
Amine Naite-Ali Université Paris-Est Créteil, Créteil, France Meera Narvekar Dwarkadas Jivanlal Sanghvi College of Engineering, Mumbai, India Mani Padmanabhan Faculty of Computer Applications (SSL), Research Supervisor (SITE), Vellore Institute of Technology, Vellore, Tamil Nadu, India Shalini Pal Department of Information Technology, Manipal University Jaipur, Jaipur, India Siddhartha Sourav Panda RACE, REVA University, Bangalore, India Rohit Pandey Hughes Systique Corporation, Gurugram, India H. S. Pannu Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, India Shivani Pareek Department of Computer Applications, Manipal University Jaipur, Jaipur, Rajasthan, India Azra Parveen Indraprastha Research Laboratory, Indraprastha Institute of Information Sciences Private Limited, New Delhi, India Hetav Patel Computer Science, The LNM Institute of Information Technology, Jaipur, Rajasthan, India Nishanth Kumar Pathi RACE, REVA University, Bangalore, India Sakshi Patil Department of Electronics and Telecommunication, College of Engineering, Pune, India Pranav M. Pawar Department of Computer Science, Birla Institute of Technology and Science Pilani, Dubai Campus, Dubai, UAE Sameer Pimperkhede Department of Electronics and Telecommunication, Vishwakarma Institute of Technology, Pune, Maharashtra, India Ramesh Chandra Poonia Professor, Department of Computer Science, CHRIST (Deemed to be University), Delhi NCR, Ghaziabad, Uttar Pradesh, India R. Prakash RV College of Engineering, Bengaluru, India Ajay Prasad University of Petroleum and Energy Studies, Dehra Dun, India S. N. Prasanth Centre for Computation Engineering and Networking, Amrita School of Engineering, Ettimadai, Coimbatore, India Aarti Purohit Department of Computer Science and Engineering, Banasthali University, Tonk, Rajasthan, India Pingali Shriram Rahul Department of Computer Science and Engineering, MVSR Engineering College, Nadergul, Hyderabad, India
Editors and Contributors
xxv
R. Aswin Raj Centre for Computation Engineering and Networking, Amrita School of Engineering, Ettimadai, Coimbatore, India Juhi Rajani Vishwakarma Institute of Technology, Pune, Maharashtra, India Ashwin Rao International Institute of Information Technology, Hyderabad, India Aman Rastogi Xoriant Solutions Pvt. Ltd., Pune, Maharashtra, India Mohd. Sadiq Software Engineering Laboratory, Computer Engineering Section, UPFET, Jamia Millia Islamia (A Central University), New Delhi, India Aditi Sahastrabudhe Department of Electronics and Telecommunication, Vishwakarma Institute of Technology, Pune, Maharashtra, India Biswajit Sahoo School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India Vishal Kumar Sahoo School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India M. Sai Ramya Department of ECE, VNITSW, Guntur, Andhra Pradesh, India Ravuri Sai Srichandra Department of Computer Science and Engineering, MVSR Engineering College, Nadergul, Hyderabad, India Avantika Saklani Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, India Nader Salam Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India Debabrata Samanta Department of Computing Information Technologies, RIT Kosovo (A.U.K), Rochester Institute of Technology – RIT Global Campus n.n., Germia Campus, Pristina, Kosovo Divija Sanapala Department of Computer Science, Birla Institute of Technology and Science Pilani, Dubai Campus, Dubai, UAE G. Sandhya Department of ECE, VNITSW, Guntur, Andhra Pradesh, India Ninad Sangli Computer Science Engineering Department, PES University, Bengaluru, Karnataka, India Monika Sani Department of Mathematics and Statistics, Manipal University Jaipur, Jaipur, India Adithya Sanyal Dwarkadas Jivanlal Sanghvi College of Engineering, Mumbai, India Partha Pratim Sarangi School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India
xxvi
Editors and Contributors
S. Saravanan Department of Multimedia, VIT School of Design, Vellore Institute of Technology, Vellore, India Sonali Sargar Department of Electronics and Telecommunication, College of Engineering, Pune, India Anirbit Sengupta Department of ECE, Dr. Sudhir Chandra Sur Institute of Technology & Sports Complex, Calcutta, West Bengal, India Abhishek Sharma Department of Electronics and Communication Engineering, The LNM Institute of Information Technology, Jaipur, Rajasthan, India; School of Information Technology, Manipal University Jaipur, Jaipur, Rajasthan, India Amit Kumar Sharma Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, India; Department of Computer Science and Engineering, IcfaiTech, The ICFAI University, Jaipur, India Kartikey Sharma The LNM Institute of Information Technology, Jaipur, Rajasthan, India; School of Information Technology, Manipal University Jaipur, Jaipur, Rajasthan, India Lokesh Sharma Department of Information Technology, Manipal University Jaipur, Jaipur, India Raghav Sharma Hughes Systique Corporation, Gurugram, India; Deparment of Computer Science Engineering, The LNM Institute of Information Technology, Jaipur, Rajasthan, India Rahul Sharma Department of Computer Science and Engineering, ACEIT, Jaipur, Rajasthan, India Shilpa Sharma Department of Computer Applications, School of Basic Science, Manipal University Jaipur, Jaipur, Rajasthan, India Suresh K. Shirgave DKTE Society’s Textile and Engineering Institute, Ichalkaranji, India Arjun Singh Manipal University Jaipur, Jaipur, India Jaspreeti Singh USICT, Guru Gobind Singh Indraprastha University, New Delhi, India Rajeshwar Singh Doaba Khalsa Trust Group of Institutions, Rahon, Punjab, India Shivam Singh Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, India
Editors and Contributors
xxvii
Vijander Singh Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India; Department of ICT and Natural Sciences, Faculty of Information Technology and Electrical Engineering, NTNU—Norwegian University of Science and Technology, Alesund, Norway M. A. C. Sivamaran Centre for Computation Engineering and Networking, Amrita School of Engineering, Ettimadai, Coimbatore, India Akshat Soni Department of Electronics and Communication Engineering, The LNM Institute of Information Technology, Jaipur, Rajasthan, India Nirvisha Soni Vishwakarma Institute of Technology, Pune, Maharashtra, India Nehil Sood Electronics and Communication, The LNM Institute of Information Technology, Jaipur, Rajasthan, India V. Sowmya Centre for Computation Engineering and Networking, Amrita School of Engineering, Ettimadai, Coimbatore, India M. L. Sravya Department of ECE, VNITSW, Guntur, Andhra Pradesh, India M. Srivani College of Engineering Guindy, Anna University, Chennai, Tamil Nadu, India Devesh Kumar Srivastava Department of Information Technology, Manipal University Jaipur, Jaipur, India Rajarajeswari Subramanian MS Ramaiah Institute of Technology, Bengaluru, India Pramod Sunagar MS Ramaiah Institute of Technology, Bengaluru, India Kruthika Suresh Computer Science Engineering Department, PES University, Bengaluru, Karnataka, India A. Susmitha Department of ECE, VNITSW, Guntur, Andhra Pradesh, India Piyusha Taware Department of Electronics and Telecommunication, College of Engineering, Pune, India Aditya Thaker Dwarkadas Jivanlal Sanghvi College of Engineering, Mumbai, India Kaushalya Thopate Department of Computer Science, Vishwakarma Institute of Technology, Pune, India Shailendra Tiwari Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, India Yatin Tomer Hughes Systique Corporation, Gurugram, India Pooja Tyagi USICT, Guru Gobind Singh Indraprastha University, New Delhi, India
xxviii
Editors and Contributors
Priyanka Vashisht Department of Computer Science and Engineering, Amity University, Gurugram, Haryana, India Chirag Vaswani Vishwakarma Institute of Technology, Pune, Maharashtra, India Virendra P. Vishwakarma Guru Gobind Singh Indraprastha University, USICT, New Delhi, India Shilpa Dinesh Vispute Department of Computer Science and Engineering, The NorthCap University, Gurugram, Haryana, India Chaudhary Wali Mohammad Computer Science and Technology Research Group, Department of Applied Sciences and Humanities, Faculty of Engineering and Technology, Jamia Millia Islamia (A Central University), New Delhi, India Tarandeep Singh Walia School of Computer Applications, Lovely Professional University, Phagwara, Punjab, India Medha Wyawahare Department of Electronics and Telecommunication, Vishwakarma Institute of Technology, Pune, Maharashtra, India Kuldeep Kumar Yogi Department of Computer Science and Engineering, Banasthali University, Tonk, Rajasthan, India
Empirical Study of Multi-class Weed Classification Using Deep Learning Network Through Transfer Learning Mahendra Kumar Gourisaria , Vishal Kumar Sahoo , Biswajit Sahoo , Partha Pratim Sarangi , and Vijander Singh
Abstract In agriculture, weed infestation presents various challenges to quality farming. They (weed) compete for resources and hinder a crop’s growth. To combat weed infestation on cropland, farmers use expensive fertilizer which uses toxic chemicals that can have lowered crop yield and quality. Fertilizer application is labor-intensive if done manually, automation is prohibitively expensive and resourceintensive. With the rise in popularity of deep neural networks especially CNN for image classification, it is an excellent candidate to automate weed classification in real-time and potentially automate weed removal using robots/UAVs cost-effectively. The paper adopts dual training approach, one from scratch and the other using transfer learning on 17,509 weed images with nine different classes of weed on five CNN architecture. Various preprocessing techniques are employed to further increase the accuracy of the models. ResNet50V2 achieved the highest classification accuracy of 93%. Keywords Weed detection · Convolutional neural network · Deep learning · Agriculture · ResNet50V2 · Cost-effective
M. K. Gourisaria (B) · V. K. Sahoo · B. Sahoo · P. P. Sarangi School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha 751024, India e-mail: [email protected] V. Singh Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, Rajasthan 302034, India Department of ICT and Natural Sciences, Faculty of Information Technology and Electrical Engineering, NTNU—Norwegian University of Science and Technology, Alesund, Norway © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_1
1
2
M. K. Gourisaria et al.
1 Introduction Crops in farmlands have to compete for resources like water, sunlight, and nutrients which degrades the crop’s quality [1] and potentially risks food security. In recent times, a range of herbicides are used in weed control practices which have led farmers in Australia to incur a cost of $4.8 billion in grain production [2]. Herbicides are a chemical-based weed controlling application which is applied evenly throughout the farmland. This form of weed control works well but also harms crops, expensive, and detrimental to the environment. In a report by Medina-Pastor et al. [3], unprocessed produce has harmful substances linked with herbicides which is toxic to humans. An alternative to even application of herbicide throughout the cropland is selective application. Selective application of herbicide is when the chemical is only applied on areas infested with weeds saving resources and cost. With recent advances in deep learning technology, weed discrimination can be automated with the real-time classification of weeds and crops saving on labor-associated costs. Machine learning and deep learning have contributed a lot in the fields of technology, robotics, health [4, 5], agriculture [6], etc. Algorithms like Naïve Bayes and support vector machine have been used to study Twitter Sentiments [7]. With a convolutional neural network, learning features from videos, audio, text, or images can be automatically done without any human intervention or hardcoded rules [8]. Fields of agriculture and farming have also gained from the advances in deep learning with application in livestock monitoring [9], crop disease prediction [10], and commercial weed removal robots [11] that can identify leeks, onions, carrots, parsnips, and potatoes. The commercial robot suffers from the inability to distinguish between weed and crop which is essential for cost-effective and efficient use of resources. In this paper, we adopt a dual training approach, one from scratch and the other using transfer learning on five CNN architectures, namely VGG16 EfficientNet, Inception-ResNet, ResNet, and MobileNet on 17,509 weed images with nine different classes. The organization of our paper is as follows. Section 2 reviews some recent works in the literature on weed detection. Data preparation algorithms are used in Sects. 3 and 4. Section 5 provides the experimental results. The conclusion and future work are described in Sect. 6.
2 Related Work This section summarizes some of the recent work in the field of weed classification. In a study by Milioto et al. [12], the authors used RGB and near-infrared imagery from their unmanned aerial vehicle to classify sugar beets from weed. Their preprocessing involved separating vegetation from background soil. The image undergoes blob-wise segmentation. It allows for a shallower CNN network, enabling the authors
Empirical Study of Multi-class Weed Classification Using Deep …
3
to make real-time classifications in a UAV using readily available single-board computers such as the NVIDIA Jetson platform. They obtained accuracies above 95%. Knoll et al. [13] published a paper explaining to use of field-programmable gate arrays (FPGA)-based deep neural networks on agricultural datasets (Terasic DE1SoC for plant detection in organic farming). The model consumed only 4 Watts of power and ran at 42 frames per second. Sa et al. [14] presented a semantic weed classification approach that utilizes images captured through multiple spectrums of light via a microaerial vehicle (MAV). The authors make use of the encoder-decoder CNN called SegNet. SegNet infers dense semantic classes while allowing any number of input image channels and class balancing. Embedded GPU with Nvidia TensorRT library was used for classification. This paper achieved an AUC score of 0.78 and an F1 score of approx. 0.8. Tao et al. [15] proposed a six-class classifier that incorporates a deep (CNN) and support vector machine (SVM). This technique greatly improves the classification accuracy of the final model. The VGG network was adopted for CNN. The VGG-SVM network obtained an accuracy of 92% on a six-class weed classification problem. The study also compares VGG-SVM with five different CNN models, and it can be concluded that the hybrid model was the best performer. Lease et al. [16] used a novel pixel-level weed classification uniform local binary pattern features for precise weed control. Two-level optimization structures are used; genetic algorithm optimization to select the best uniform local binary pattern configurations and covariance matrix adaptation evolution strategy in the neural network to select the best combinations of voting weights for each classifier. Their best accuracy was 87.90%. Garibaldi-Marquez et al. [17] implemented VGG16, VGG19, Xception on a large dataset they generated. The dataset consists of crops, narrow leaf weeds, and broadleaf weeds (of different varieties). Regions of Interest were extracted, and connected component analysis was employed. This three-class classification problem achieved 97% accuracy on average and was compared with shallow learning approaches. Table 1 gives a summary of all the related work.
3 Data Preparation and Methodology This section gives a detailed explanation of the dataset, pre-processing, and algorithms used.
4
M. K. Gourisaria et al.
Table 1 Comparative study of related work Ref. No.
Paper
Novel work
No. of classes
Metric
[12]
Milioto et al. (2017)
Used RGB and near-infrared imagery 2 from their unmanned aerial vehicle. Image undergoes blob-wise segmentation. It allows for a shallower CNN network. Enabling real-time classifications in single-board computers
0.97 (Accuracy)
[13]
Knoll et al. (2017)
First to use FPGA-based DNNs for 2 agricultural image classification (Terasic DE1-SoC for plant detection in organic farming). The model runs at 4 W of power
0.93 (Accuracy)
[14]
Sa et al. (2018)
Utilized images captured through multiple spectrums of light via a micro aerial vehicle (MAV). The authors make use of the recently developed encoder-decoder CNN called SegNet. SegNet infers dense semantic classes while allowing any number of input image channels and class balancing
3
0.78 (AUC)
[15]
Tao et al. (2022)
Proposed a deep CNN with SVM classifier which aims to improve the classification accuracy
6
0.92 (Accuracy)
[16]
Lease et al. (2022)
Novel pixel-level weed classification uniform local binary pattern features for precise weed control
2
0.8790 (Accuracy)
[17]
Garibaldi-Marquez et al. (2022)
Regions of interest were extracted and connected component analysis was employed
3
0.97 (Accuracy)
3.1 Dataset Used The dataset used in this study was obtained from Kaggle [18] which contains 17,508 unique 256 × 256 RGB images of nine different weed species. These images were collected from eight different environments across northern Australia. Class and species labels are as follows: 0—Chinese Apple (6.42%); 1—Lantana (6.07%); 2—Parkinsonia (5.89%); 3—Parthenium (5.83%); 4—Prickly Acacia (6.06%); 5— Rubber Vine (5.76%); 6—Siam Weed (6.07%); 7—Snake Weed (5.8%); 8—Others (52%) illustrated using Figs. 1 and 2. The dataset is already divided into training and testing subsets with an 85:15 split ratio. Figure 1 x-axis = class; y-axis = no. of images.
Empirical Study of Multi-class Weed Classification Using Deep …
5
Fig. 1 Class distribution of the dataset
Fig. 2 Sample images from each classes
3.2 Preprocessing The images were split into test and train folders with the labels for each image given in their respective train and test csv files. To import the images into Google Colab, Tensorflow’s ImageFromDataFrame API was used. These imported images were then made training ready using the ImageGenerated API present within TensorFlow. For preprocessing, the training images were first downsampled to eliminate any bias toward a specific class and split into train and validation subsets. These new downsampled subsets are normalized and then randomly cropped into 224 × 224 RGB images, randomly rotated by 15 degrees on either side to increase the variation within images which helped in training. Table 2 gives a summary of all the preprocessing steps involved.
6 Table 2 Preprocessing
M. K. Gourisaria et al.
Preprocessing
Factor
Resize
224,224
Random rotation
0.15
Random translation
0.1 (h), 0.1 (w)
Random flip
True
Random contrast
0.1
Fig. 3 Proposed workflow for weed classification
Models were trained and tested on Google Colab (GPU Environment), Jupyter Notebook with Nvidia Cuda, cuDNN with Python 3, Scikit-Learn, and TensorFlow 2.8. The system runs Windows 11 with Intel i7 10th Generation, Nvidia GeForce MX330, and 16 GB RAM at 3200 MHz. Figure 3 shows the proposed workflow for weed classification.
4 Materials and Methods This paper employs convolutional neural networks to classify the given image into one of the nine species of weed.
Empirical Study of Multi-class Weed Classification Using Deep …
7
4.1 Convolutional Neural Network (CNN) CNNs contain two main types of layers—convolutional layers and dense layers. In the convolutional layer (conv-layer), a mathematical operation called convolution takes place where operation of two function produces a third function which describes how the shape of one function modifies another (e.g., edge detection). The dense layer is the same as the fully connected layers in multilayer perceptron models, in CNNs dense layers are used after flattening the previous layers’ outputs. In our implementation, we use RGB images which is represented by as a matrix of pixel values in three planes. The initial layers in a CNN generally extract basic features such as edges. The next layers extract more complex features such as corners and the deeper we go the more complex the extracted features become. The output of the final layer is flattened and passed on to the fully connected dense layers which narrows down to the output layer. To save computation and memory while also suppressing noise from outputs of the convolutional layers we use pooling in between conv-layers. In our implementation, we use max pooling where we use the maximum value of a pixel from area covered by the pooling kernel. Vgg16 The models secured second in ImageNet 2014. The networks were proposed in 2014 in the authored by Simonyan et al. [19]. The networks have an input dimension of 224, 224, 3. VGG uses a stack of 3 × 3 conv-layers of stride 1 because it has the same effective receptive field as a single 7 × 7 conv-layer. VGG-16 has a total of 138 million parameters and uses 96 MB of memory per image I during a forward pass. This architecture does not use local response normalization as it was found that it does not help in the networks performance. ResNet50-V2 ResNets solve the problem of vanishing gradients. Vanishing gradient is when a network goes too deep, its gradient from where the lose function is calculated easily goes to zero due to the inherent property of the chain rule. As a result, no weights are updating therefore no learning. ResNets employ residual connections to solve this issue, the gradients flow directly through the residual connection backwards from later layers to initial layers. In ResNetv2 [20], the last nonlinearity is removed, clearing the path from the input to the output giving better performance. Inception-ResNet-V2 This network [21] combines the concept of inception architecture, with residual connections. Inception-ResNet-v2 has 164 layers with and with various input sizes in this implementation. In the inception-residual block residual connections are combined with multiple sized conv. Filters. The residual connection avoids the problem of degradation due to deep networks while also reducing training time. The inception layer combines 1 × 1, 3 × 3, and 5 × 5 conv-layers with their
8
M. K. Gourisaria et al.
Table 3 Trainable parameters of classifiers with and without transfer learning Model
Method 1 (transfer learning) trainable parameters
Method 2 (non-TL) trainable parameters
VGG16
658,697
15,373,385
Inception-ResNet-V2
1,709,321
55,985,513
ResNet-50-V2
2,234,633
25,753,993
MobileNet-V2
1,446,665
3,670,537
EfficientNet-V2-S
1,446,665
21,624,153
outputs concatenated in a single output vector. The network has 55,873,736 trainable parameters and 164 layers deep. MobileNet-V2 The network was designed to perform fine on mobile devices. Its [22] architecture is based on an inverted residual structure, residual connections between bottleneck layers. Lightweight convolutions are used to filter features as a source of nonlinearity in the expansion layer. MobileNet-v2 initially has 32 filters, followed by 19 residual bottleneck layers. EfficientNet-V2-S EfficientNet network [23] scales up models in an effective manner, eliminating the need to spend huge man-hours to tune and scale a network. It uses a technique called compound coefficient to scale a model where, the width, depth, and resolution of the networks scales up uniformly.
4.2 Transfer Learning Transfer learning is where weights of the network on a previously trained dataset are preserved while training on a new dataset. This helps save learning time, usually obtaining higher accuracy in comparison with the raw implementation. Through the following table, we explain the five architectures used in this study. We tried both, raw implementation and transfer learning [24] (ImageNet weights). Table 3 shows how many parameters can be trained if the classifier uses transfer learning or not.
5 Experimental Results In this section, experimental outcomes are examined in detail. All the deep learning CNN architectures are evaluated against standard scoring metrics for a classification problem. The best model is determined based on the standard evaluation
Empirical Study of Multi-class Weed Classification Using Deep …
9
Fig. 4 Graphical representation of experimental results
metrics. Metrics used are Sensitivity, Specificity, Precision, Matthews Correlation Coefficient (MCC), F1-Score, and Accuracy. Consequently, this study employs and evaluates five widely used CNN architectures (Refer to Sect. 3) with and without transfer learning. While executing transfer learning, base_model.trainable is set to True because through experimentation it was evident that doing such increases accuracy. All models’ learning rate is initialized to 0.001. Categorical cross-entropy is used for the loss function with Adamax optimizer and accuracy as the target metric. Models were run over 25 epochs. Given below are the results for each CNN architecture with the mentioned preprocessing steps explained in Sect. 3.2. ResNet-50-V2 and VGG 16 are the best performers with the highest accuracies of all the models tested. Other models’ scores weren’t that impressive indicating this could be a case of overfitting. Figure 4 provides a graphical representation of the experimental results. Figures 5 and 6 show the features maps of initial and final layers, respectively. Table 4 shows the experimental results of scratch Implementation and Table 5 shows the experimental results of transfer learning with ImageNet weights.
6 Conclusion and Future Work This paper was taken upon an image dataset with nine different classes of weed from northern Australia. This dataset was trained on five different popular CNN architectures, namely VGG16, ResNet-50-V2, Inception-ResNet-V2, MobileNet-V2, and
10
Fig. 5 Neural activation of the initial layers of ResNet50V2
Fig. 6 Neural activation of the final layers of ResNet50V2
M. K. Gourisaria et al.
Empirical Study of Multi-class Weed Classification Using Deep …
11
Table 4 Experimental results of scratch implementation Model
Accuracy Precision Sensitivity Specificity F1 score MCC
VGG16
0.9232
0.9517
0.9445
0.8618
0.9481
0.682
Inception-ResNet-50-V2 0.8936
0.9454
0.903
0.8705
0.9237
0.7507
ResNet-50-V2
0.9368
0.9687
0.9458
0.9104
0.9571
0.838
MobileNet-V2
0.3287
0.6372
0.3021
0.4186
0.4098
− 0.2434
EfficientNet-V2-S
0.8816
0.9398
0.9072
0.7885
0.9232
0.6673
Table 5 Experimental results of transfer learning with ImageNet weights Model
Accuracy Precision Sensitivity Specificity F1 score MCC
VGG16
0.8461
0.9663
0.8485
0.832
0.9039
Inception-ResNet-50-V2 0.3651
0.7265
0.3584
0.3947
0.48
0.5585
ResNet-50-V2
0.443
0.7965
0.3792
0.6643
0.5138
0.0375
MobileNet-V2
0.3187
0.6123
0.2644
0.4853
0.3694
− 0.2298
EfficientNet-V2-S
0.4886
0.7784
0.5194
0.3541
0.623
− 0.0989
− 0.1943
EfficientNet-V2-S. From the experimental results, it is evident that VGG16 and ResNet-50-V2 were the best performers. It must be noted that models with transfer learning performed worse than the raw implementation. This can be attributed to the fact that the per-trained models on the ImageNet dataset do not contain images of weeds and crops. For future work, we would collaborate or combine different weed/crop datasets to make a benchmark dataset such as ImageNet, MS-COCO for weeds and crops from different farmlands to help classification models generalize better.
References 1. Iqbal N, Manalil S, Chauhan BS, Adkins SW (2019) Investigation of alternate herbicides for effective weed management in glyphosate-tolerant cotton. Archives Agron Soil Sci 2. Annual costs of weeds in Australia. https://invasives.com.au/wp-content/uploads/2019/01/ Cost-of-weeds-report.pdf. Last accessed on 2022/5/10 3. European Food Safety Authority (EFSA), Medina-Pastor P, Triacchini G (2020) The 2018 European Union report on pesticide residues in food. EFSA J 18(4):e06057 4. Singh V, Gourisaria MK, GM H, Rautaray SS, Pandey M, Sahni M, Espinoza-Audelo LF (2022) Diagnosis of intracranial tumors via the selective CNN data modeling technique. Appl Sci 12(6):2900 5. Sarah S, Singh V, Gourisaria MK, Singh PK (2021) Retinal disease detection using CNN through optical coherence tomography images. In: 2021 5th international conference on information systems and computer networks (ISCON). IEEE, pp 1–7 6. Mahmud MS, Zahid A, Das AK, Muzammil M, Khan MU (2021) A systematic literature review on deep learning applications for precision cattle farming. Comput Electron Agric 187:106313
12
M. K. Gourisaria et al.
7. Chandra S, Gourisaria MK, Harshvardhan GM, Rautaray SS, Pandey M, Mohanty SN (2021) Semantic analysis of sentiments through web-mined twitter corpus. In: ISIC, pp 122–135 8. Dargan S, Kumar M, Ayyagari MR, Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch Comput Methods Eng 27(4):1071–1092 9. Jung DH, Kim NY, Moon SH, Jhin C, Kim HJ, Yang JS, Kim HS, Lee TS, Lee JY, Park SH (2021) Deep learning-based cattle vocal classification model and real-time livestock monitoring system with noise filtering. Animals 11(2):357 10. Rautaray SS, Pandey M, Gourisaria MK, Sharma R, Das S (2020) Paddy crop disease prediction—a transfer learning technique. Int J Recent Technol Eng 8(6):1490–1495 11. Robocrop spot sprayer: weed removal. https://garford.com/products/robocrop-spot-sprayer/. Last accessed on 2022/5/12 12. Milioto A, Lottes P, Stachniss C (2017) Real-time blob-wise sugar beets vs weeds classification for monitoring fields using convolutional neural networks. ISPRS Ann Photogrammetry Remote Sensing Spatial Inf Sci 4 13. Knoll FJ, Grelcke M, Czymmek V, Holtorf T, Hussmann S (2017) CPU architecture for a fast and energy-saving calculation of convolution neural networks. In: Digital optical technologies 2017, vol 10335, pp 362–370. SPIE 14. Sa I, Chen Z, Popovi´c M, Khanna R, Liebisch F, Nieto J, Siegwart R (2017) Weednet: dense semantic weed classification using multispectral images and MAV for smart farming. IEEE Robot Autom Lett 3(1):588–595 15. Tao T, Wei X (2022) A hybrid CNN–SVM classifier for weed recognition in winter rape field. Plant Methods 18(1):1–12 16. Lease BA, Wong WK, Gopal L, Chiong CW (2020) Pixel-level weed classification using evolutionary selection of local binary pattern in a stochastic optimised ensemble. SN Comput Sci 1(6):1–13 17. Garibaldi-Márquez F, Flores G, Mercado-Ravell DA, Ramírez-Pedraza A, Valentín-Coronado LM (2022) Weed classification from natural corn field-multi-plant images based on shallow and deep learning. Sensors 22(8):3021 18. Corey Lammi DeepWeedsX. https://www.kaggle.com/datasets/coreylammie/deepweedsx. Last accessed on 2022/4/19 19. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 20. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In European conference on computer vision. Springer, Cham, pp 630–645 21. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence 22. Howard A, Zhmoginov A, Chen LC, Sandler M, Zhu M (2018) Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation 23. Tan M, Le Q (2021) Efficientnetv2: smaller models and faster training. In: International conference on machine learning. PMLR, pp 10096–10106 24. Guo Y, Shi H, Kumar A, Grauman K, Rosing T, Feris R (2019) Spottune: transfer learning through adaptive fine-tuning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4805–4814
Secured Face Recognition System Based on Blockchain with Machine Learning K. Krishnakumar , S. Saravanan , and Amine Naite-Ali
Abstract Face recognition system has become an integral part of the security mechanisms used in the IoT applications. Smart cities and industries started deploying face recognition applications for security purposes through surveillance cameras and biometric systems. In computer vision, facial recognition has been a widely studied area and has become mature. Recent developments in hardware and deep learning usage for face recognition techniques improve accuracy and considerably reduce the processing time after training the model. This chapter proposes a face recognition system with blockchain for a secured and tamper-free facial recognition. The face recognition system uses VGG19 to train image data to identify the face in the image or in the video. VGG19 model is utilized to reduce training time and reduce the sample size. This system uses sample image data to train the model for recognizing the face. With blockchain, the data stored in the database for face recognition, is secured, and prevents unauthorized alteration in the data from hackers or any malicious activities. Keywords Face detection · Blockchain · Computer vision
1 Introduction Face recognition technology is a significant part of the biometric identification domain. The main objective of face recognition is to identify and track human faces by leveraging the stored human face images. The system detects the face image on multiple persons using the camera and then compared it with the face database’s human face to attain face recognition. K. Krishnakumar (B) · S. Saravanan Department of Multimedia, VIT School of Design, Vellore Institute of Technology, Vellore, India e-mail: [email protected] A. Naite-Ali Université Paris-Est Créteil, Créteil, France © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_2
13
14
K. Krishnakumar et al.
There are many threats and security breaks in this process. Blockchain arose as a leading technology that ensures security in a distributed framework. It is a blend of blocks where individual block consists of data and metadata such as the hash of the data and pointer to the previous block’s hash. By using blockchain mechanism, the data is secured from the hacking and other threats. Face recognition system is found to be one of the popular authentication methods in biometrics. It works based on the mathematical mapping of face features with the stored individual’s identity. Comparison of real-time capture of face with the already stored pixel values is analyzed in this process. In terms of enhancing the data security, a lot of research works have been done. Carrera et al. [1] detailed the wide variety of face recognition algorithms used in image analysis. He also states that the main challenges of face recognition are pose variation, feature occlusion, facial expression, and imaging conditions. Detection of faces is classified into four different categories as knowledge-based method, feature-invariant method, template matching method, and appearance-based method. Fahima et al. [2] proposed the facial recognition system using discrete wavelet transform and machine learning algorithm. Combination of principal component analysis (PCA), eigenvector of PCA, linear discriminant analysis (LDA), and convolutional neural network (CNN) are done using the entropy of detection probability. 93% of recognition rate is achieved using this proposed method.
2 Literature Review Lou et al. [3] proposed a CNN-based face image recognition model. In order to attain the accuracy in identifying the image feature information, several processes like pooling, calculation, and abstraction are carried out in this proposed method. And it achieves a 0.27% higher recognition rate compared with convolutional neural network. As a limitation of this method, computational time is higher as compared with other existing methods. Zhao et al. [4] proposed a multi-view face recognition using the deep neural network. This method used the well popular PCA for dimensionality reduction in CAS-PEAL dataset. As a main concept of implementing the joint Bayesian method to achieve the vector similarity judgment is attained in this proposed method. Blockchain methodology has become a popular among the data security on face recognition. It also provides security to face database with tamperproof, immutable, and decentralized characteristics. Goel et al. [5] proposed a security trust model using CNN. Using feature extraction, matching, and template storage stages, a deep learning model is used in biometric recognition. As a limitation of this model, computation time is high due to the cryptographic computation. Nasir et al. [6] proposed a secured convolutional neural network (SCNN) using the blockchain method. Proposed secured CNN model is tested using SVGG19 and InceptionV3 models, and the results are satisfactory with low computational time and also tackle other attacks.
Secured Face Recognition System Based on Blockchain with Machine …
15
Reddy et al. [7] proposed an automatic face expression and gesture detection system with the blockchain security. Facial expressions and gestures are stored in BigchainDB which is a blockchain technology. This method focuses more on surveillance video recognition using the blockchain technology. Chiu et al. [8] proposed a collaborative surveillance system based on blockchain technology. In order to cover the wide area protection, distributed architecture is initiated in this proposed method. Using the CNN peer-to-peer network based on blockchain technology, the blocks are distributed to attain a secured wide protection in surveillance. Tripathi [9] proposed a salient region detection method using the blockchain technology. VGGFace deep neural network is used in this proposed method for extraction of features and logistic regression for classification. Each set of pixel values is combined as a block to set as tamperproof, and the resultant block is sent to other node in order to attain a high accessibility. Yan et al. [10] proposed a face recognition system based on CNN. Convolution architecture for feature extraction framework (Caffe) is used during the testing and training process. This method attains a recognition rate of 98.95% and 98.30%, respectively, on ORL and AR face database. Páez et al. [11] proposed a verifying identification system using the blockchain methodology. Blockchain network architecture is proposed in a distributed and decentralized way with all the nodes. Outcome of this proposed method satisfies different aspects like computation time, memory, and CPU usage. This method focuses more over the document security. Wang et al. [12] proposed a face recognition method and concentrated more toward the protection of privacy and utilize the multi-device diversity. A framework for effective face recognition system is developed with edge and cloud networks. Further, the resultant framework is proved to be computationally efficient. Karczmarek [13] proposed chain code-based local descriptor with blockchain in face recognition technique. Fan et al. [14] proposed e-voting system using blockchain methodology and face recognition system to protect user privacy. Tanwar et al. [15] analyzed the different methods for ML on blockchain for smart applications. CNN and long short-term memory (LSTM) are used for analyzing the attacks on a blockchain network. Detailed applications of UAV, SG, and data trading are dealt. Energy trading system efficiency testing is implemented in this proposed method to verify the effectiveness. Qin [16] proposed an alternative representation of facial images for face recognition. By reducing the image resolution, the beneficial for better classification of face images are clarified in this proposed method. It uses a change in the pixel gray value range and achieves a great accuracy improvement for face recognition. Adaptation of nonlinear transform to convert the original pixel gray value into an alternative representation is carried out in this method in order to attain a high accuracy rate in Face recognition. Zhu et al. [17] proposed an optimized the big data face recognition algorithm using multi-feature fusion deep learning algorithm, the global feature is extracted using two-dimensional principal component analysis (2DPCA) and local binary pattern (LBP) [18] is used to extract local feature that is texture feature information from the image. Using the fusion feature, the convolution neural network is trained for face
16
K. Krishnakumar et al.
recognition. Kim et al. [19] used real-time face detection to recognize emotion in the detected face information using big data. After detecting the human face accurately, the emotion is recognized. Li et al. [20] propped a novel face recognition algorithm using HGL method. HGL method combines H-channel of HSV color space with portrait and grayscale. These information’s are used to train the CNN for classification. Zhou et al. [21] reported the impact of big date in facial recognition system performance. From the observation, authors have developed Megvii face recognition system. The report points out gap between machine recognition and human performance. Qu et al. [22] proposed CNN-based face recognition algorithm developed using FPGA. Using FPGA parallel processing the real-time processing of face recognition computation time is decreased. Wang et al. [23] proposed a face recognition algorithm using Goldstein branching method to improve face recognition. With phase unwrapping which improves threedimensional face reconstruction and using face radial curve elastic matching face recognition method improved compared to conventional face recognition method. Chen et al. [24] introduced new face recognition algorithm using adaptive learning algorithm with an adaptive learning rate. The process speed is increased and appropriate learning rate is chosen. The paper is organized as follows: Sect. 2 describes the proposed work and the novel present in the work. In Sect. 3, the experiments carried out are explained and compared with already existing state-of-the-art methods. Sect. 4 concludes the paper by stating its advantages.
3 Proposed Work Due to the recent advancements in technology and hardware, face recognition is being used in various environments. From the security camera in highways to industry, it has played an essential role in securing society. During the decades of research in face recognition, face detection accuracy is increased from using template matching and feature registration. In today’s hardware advancement, deep learning is used in face recognition technology. As the face recognition systems based on image data stored the images in the database, cyber-threat is in raise nowadays. The stored data is likely to be accessed by unauthorized persons, leading to security concerns. Even though many researches have proposed to secure the information, still cyber-based activity is an area of concern. With blockchain technology, the face information in the database can be secured even further. Because of blockchain’s immutable property, the information stored will be more secure. In the blockchain, the face pixel information is extracted from the image datasets and placed in the blockchain nodes. This blockchain acts like an immutable distributed ledger [25]. For each face information, an ID is assigned. Since this ID is placed on the chain, it cannot be accessed without authorization. Due to this advantage of blockchain, the face information present in the database cannot be accessed
Secured Face Recognition System Based on Blockchain with Machine …
17
or manipulated by any of the unauthorized cyber activities. With every node having the contents of the entire blockchain ledger, the face recognition techniques can be used to identify the face feature information. In this paper, using convolution neural networks, the face image features are extracted for face recognition. The layer is adapted to extract invariant features from the image to overcome scale, illumination, and rotation invariance. Regression formulation is used to classify the face. The steps followed in the proposed work are shown in Fig. 1 and are described in the following sections: • The image pixel values used for training stored in the block and distributed to every node so as to create blockchain architecture. • Pixels values are extracted from the blockchain to reconstruct the images. Fig. 1 Flowchart
18
K. Krishnakumar et al.
• From the reconstructed image, features are extracted using CNN feature extraction layers and the neural network is trained. • From the testing images, the feature extraction process is repeated and predicted. • Using the classification layer, the face is recognized.
3.1 Pixel to Blocks In the blockchain, each block consists of four attributes, index, previousHash, data, and hash. The block information is expressed as follows Fig. 2: To store the image in the block, the image is converted into a grayscale image. From the two-dimensional image, 1D vector is formed and stored in the matrix along column. The matrix is structured in such a way that the image is stored column-wise. In blockchain, each column is considered as the data for blocks. Once the data is prepared for the blocks, the hash detail is created using the SHA3-256 [26] algorithm. Compared to other hashing function, it is less time-consuming when calculating the value. Once the hashing value is generated, the image and hash value are stored in the block. Figure 2 describes the blockchain architecture. Fig. 2 Blockchain structure
Secured Face Recognition System Based on Blockchain with Machine …
19
Once the blockchain is formed, the data present is broadcasted into the network. The data information is sent to other nodes to be added into the blocks to form the blockchain. With the image data is available for all the nodes, the nodes start training the image data with the deep learning face recognition model to classify the image captured.
3.2 Face Recognition In this paper, the feature extraction is based on the local ternary pattern (LTP) feature map, and face recognition is based on VGG deep learning architecture. In this method, LTP [13, 27] feature map is given as input to the CNN for face recognition [28]. LTP is an extended version of the local binary pattern [29]. LBP operates on the basis of 2 valued patterns to find the threshold for the pixel. With LTP, it works on the basis of 3 values to threshold a pixel. In LBP, 0 and 1 are used depending upon the condition. In LTP [30], when a pixel gray value is of ± t width around the center pixel value are taken as 0. If the gray value is above the threshold, then the value is changed to + 1. If the pixel value below is considered – 1, then, in our paper, t value is taken as 5. The LTP expression is given in Eq. (1). ⎧ ⎨ 1, p ≥ c + t LTP ( p, c, t) = 0, | p − c| < t ⎩ −1, p ≤ c − t
(1)
In the above equation, p denotes neighbor pixels values, c denotes center pixel value. The above equation gives the following output, as shown in Fig. 3. This feature map will be acting as an input to the VGG19 deep learning model for face recognition. VGG19 is used to classify the face. VGG19 is a variant of deep neural network VGG architecture [31] as shown in Fig. 4, pre-trained with ImageNet Large Scale Fig. 3 Local ternary pattern output
20
K. Krishnakumar et al.
Visual Recognition Challenge (ILSVRC) database [32]. VGG19 contains 16 layers of CNN with three fully connected layers and a SoftMax layer. VGG19 has more depth compared to the other variants of VGG model. With VGG19, the images are given as input to train the model for the prediction of the face.
4 Experiments The experiment is carried out in the system with the following configuration Intel i7 processor, Nvidia RTX2060, with 16 GB RAM and Windows 10 operating system. The dataset used in the experiment is AT&T (formerly called as ORL face dataset) [33]. The sample face images are shown in Fig. 5. Keras and TensorFlow packages are used to implement the proposed architecture. A good evaluation is more appropriate to optimize the classification model to evaluate the face classification problem. When it comes to classification, three types of properties specify the classification’s performance: accuracy, sensitivity, and specificity. These three properties are expressed as follows: Accuracy = (TP + TN)/(TP + FN + FP + TN)
(2)
Sensitivity = TP/(TP + FN)
(3)
Specificity = TN/(TN + FP)
(4)
From Eq. 2, accuracy can be calculated, TP stands for Total Positive, TN stands for Total Negative, FP stands for False Positive, and FN stands for False Negative. From Tables 1, 2, and 3, it can be seen that the proposed method performs better than the conventional CNN model and LBP + CNN model. Due to the less sensitive nature of LTP toward noise in the image, the accuracy is better than the other state-of-the-art methods. Compared to LBP, LTP is invariant to geometric and grayscale variance in the image, because of this factor LTP feature map gives better accuracy than LBP. With the use of SHA-3 256, the details are more secured compared to other stateof-the-art hashing techniques. SHA3-256 is less time-consuming compared, and due to this factor, it will take less time in creating hashing details for the given data.
5 Conclusion With the current advancement in technology, hacking into critical data information to get illegal access has become the primary concern. Blockchain is incorporated with the face recognition database to design secured and threat-free applications. In the
Secured Face Recognition System Based on Blockchain with Machine … Fig. 4 VGG19 architecture
21
22
K. Krishnakumar et al.
Fig. 5 Sample dataset images AT&T Table 1 Comparison of accuracy
Table 2 Comparison of sensitivity
Table 3 Comparison of specificity
Evaluation index
Accuracy (%)
CNN
90.33
LBP + CNN
93.43
LBP + VGG19
94.42
LTP + VGG19
96.55
Evaluation index
Sensitivity (%)
CNN
88.50
LBP + CNN
90.55
LBP + VGG19
90.75
LTP + VGG19
91.75
Evaluation index
Specificity (%)
CNN
91.45
LBP + CNN
95.75
LBP + VGG19
96.75
LTP + VGG19
97.10
Secured Face Recognition System Based on Blockchain with Machine …
23
proposed method, for hashing the data SHA3-256 is used which makes the proposed model more secured. The image pixel values are kept in the blocks. This information is broadcasted so that all nodes that have accessibility can access the information. This eliminates any threat or any hacking of the data from outsiders. To classify the face, VGG19 with LTP feature map is used to predict accurate faces from the tested database. From the experiment and result, this model gives better accuracy. Acknowledgements The authors would like to thank the Vellore Institute of Technology for providing research facilities and VIT Seed Grant (SG20210269).
References 1. De Carrera PF (2010) Face recognition algorithms 2. Tabassum F, Imdadul Islam M, Tasin Khan R, Amin MR (2020) Human face recognition with combination of DWT and machine learning. J King Saud Univ Comput Inf Sci. https://doi.org/ 10.1016/j.jksuci.2020.02.002 3. Lou G, Shi H (2019) Face image recognition based on convolutional neural network, pp 117– 124 4. Zhao F, Li J, Zhang L, Li Z, Na SG (2020) Multi-view face recognition using deep neural networks. Futur Gener Comput Syst 111:375–380 5. Goel A, Agarwal A, Vatsa M, Singh R, Ratha N (2019) Securing CNN model and biometric template using blockchain, pp 1–7 6. Nasir IM (2020) SCNN: a secure convolutional neural network using blockchain 7. Reddy RV, Reddy EM (2020) Automatic face expressions and gesture detection system using blockchain security 8. Hung JC, Yen NY, Chang J-W (2019) Lecture notes in electrical engineering 551 frontier computing theory, technologies and applications (FC 2019), no. Fc 9. Tripathi RK (2020) A novel algorithm for salient region detection, vol 1241. CCIS 10. Yan K, Huang S, Song Y, Liu W, Fan N (2017) Face recognition based on convolution neural network, pp 4077–4081 11. Páez R, Pérez M, Ramírez G, Montes J, Bouvarel L (2020) An architecture for biometric electronic identification document system based on blockchain †, no i, pp 1–19 12. Wang Y, N. T. T. Network, N. T. T. Corporation (2020) Secure face recognition in edge and cloud networks: from the ensemble learning perspective, pp 2393–2397 13. Karczmarek P, Kiersztyn A, Pedrycz W, Dolecki M (2017) An application of chain code-based local descriptor and its extension to face recognition. Pattern Recogn 65:26–34 14. Fan W, Kumar S, Jadhav V, Chang S-Y, Park Y (2020) A privacy preserving e-voting system based on blockchain. In: Silicon valley cybersecurity conference. Springer, Cham, pp 148–159 15. Tanwar S, Bhatia Q, Patel P, Kumari A (2020) Machine learning adoption in blockchain-based smart applications: the challenges, and a way forward. IEEE Access 8:474–488 16. Qin Y, Sun L, Xu Y (2020) Exploring of alternative representations of facial images for face recognition. Int J Mach Learn Cybern 11(10):2289–2295 17. Zhu Y, Jiang Y (2020) Optimization of face recognition algorithm based on deep learning multi feature fusion driven by big data. Image Vis Comput 104 18. Zhang H et al (2017) A face recognition method based on LBP feature for CNN. In: 2017 IEEE 2nd advanced information technology, electronic and automation control conference (IAEAC). IEEE 19. Kim J-A, Park RC, Hwang G-H (2017) Real-time emotion analysis service with big data-based user face recognition. J Inst Converg Signal Process 18(2):49–54
24
K. Krishnakumar et al.
20. Li S, Ning X, Yu L, Zhang L, Dong X, Shi Y, He W (2020) Multi-angle head pose classification when wearing the mask for face recognition under the COVID-19 coronavirus epidemic. In: 2020 international conference on high performance big data and intelligent systems (HPBD&IS). IEEE, pp 1–5 21. Zhou E, Cao Z, Yin Q (2015) Naive-deep face recognition: touching the limit of LFW benchmark or not? arXiv:1501.04690 22. Qu X, Wei T, Peng C, Du P (2018) A fast face recognition system based on deep learning. In: 2018 11th international symposium on computational intelligence and design (ISCID), vol 1. IEEE, pp 289–292 23. Wang Z, Zhang X, Yu P, Duan W, Zhu D, Cao N (2020) A new face recognition method for intelligent security. Appl Sci 10(3):852 24. Chen L, Guo X, Geng C (2016) Human face recognition based on adaptive deep convolution neural network. In: 2016 35th Chinese control conference (CCC). IEEE, pp 6967–6970 25. Shankar S, Madarkar J, Sharma P (2020) Securing face recognition system using blockchain technology. In: International conference on machine learning, image processing, network security and data sciences. Springer, Singapore 26. Dworkin MJ (2015) SHA-3 standard: permutation-based hash and extendable-output functions. No. Federal Information Processing Studies (NIST FIPS)-202. (2015). 27. Srivastava P, Binh NT, Khare A (2014) Content-based image retrieval using moments of local ternary pattern. Mobile Netw Appl 195:618–625 28. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252 29. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987 30. Sun W et al (2020) Face spoofing detection based on local ternary label supervision in fully convolutional networks. IEEE Trans Inf For Secur 15:3181–3196 31. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 32. Russakovsky O et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252 33. Vo DM, Lee S-W (2018) Robust face recognition via hierarchical collaborative representation. Inf Sci 432:332–346
Classifying Paintings/Artworks Using Deep Learning Techniques Shivam Singh and Sandeep Chaurasia
Abstract With the emergence of a large number of digitized art collections, and people’s interest in evaluating artworks along with the successful performance of deep learning techniques, new research expeditions unfold in the cross-domains of artificial intelligence and digital art. The paper focuses on employing deep learning techniques to classify artworks and paintings available online. Convolutional neural networks (CNNs) were employed for classifying images and their recognition. The paper mentions the dataset that had been used containing the paintings and artworks of the top 50 painters of all time in the different art periods. After evaluating the images with the most appropriate CNN model, we obtain a model accuracy of 57% which was an improvement over previous results by approximately 3%. Our model was also successful in able to identify the genre of the paintings that they belong to. Keywords Art recognition · Convolutional neural networks · Feature extraction
1 Introduction With recent technological advances, artists are shifting toward creating digital copies of their artwork/paintings. Websites like WikiArt and ImageNet are examples that provide artists with a chance to upload their art and also provide a platform for the existing popular art to be presented as digitized. Deep learning techniques are useful in image processing and thus can be utilized to classify paintings. The objective of the research is to recognize the visual content of art using deep learning. The recognition is extremely beneficial to art historians, who are often interested in determining when an object first appeared in the painting or how the portrayal of an object has evolved [1]. Visual object recognition is a field that has seen tremendous advances in recent years due to the widespread induction of deep CNNs. Several objects can assume one of many different poses. Furthermore, lighting in S. Singh · S. Chaurasia (B) Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_3
25
26
S. Singh and S. Chaurasia
images is prone to an extreme variety [2]. An image in terms of its pixels is a poor way of describing its object content (e.g., the pixels of a cow in a dark room, and a cow outdoors in the sunshine will bear no resemblance). Because of this, an image is usually represented by a feature: a vector describing the discriminative elements within an image, which is ideally invariant to scale, pose, etc.
2 Literature Review The current work focuses on building a model that can classify paintings based on their genre using ConvNets [3]. The dataset used for training the model contained around 8000 images with different artists and painting genres over time. We started by analyzing the data by performing exploratory data analysis (EDA) [4], which was followed by importing the images as NumPy arrays, and thereafter building our model and classifying paintings based on their genre. Unlike other models [5], our model focused on considering all the artists and genres for classification, a possible reason for the drop in accuracy of our model. The method used in [6] targeted to use deep residual neural network for classifying the paintings. However, it first computed the network on the ImageNet dataset followed by the Kaggle dataset for deep retraining. Training the model twice can be quite complex, and tedious, especially with these high-quality images. Another paper [7] focused on reducing the number of images in the dataset by filtering out the top five painters. This drastically reduced the number of images in the dataset by restricting the images to around 4500. The organization of the paper is as follows: The paper starts with the introduction and the motivation behind the research. Followed by the methodology that describes the dataset used for our research. The methodology talks about how our model had been built. Then, we look at the results compiled from our research, and consequently, the conclusion which describes our future scope and defines the purpose of our research.
2.1 Motivation In light of the growing interest in this domain, the research aims to provide an overview of some of the most notable works investigating the application of deep learning-based approaches to pattern extraction and recognition in visual artworks [8]. Artworks are developed primarily for aesthetic purposes and are mainly concerned with painting, drawing, photography, and architecture. In this research, we focus only on paintings and drawings, two of the most studied artworks. This allows for learning powerful visual models that recognize a wide range of object categories. Such widespread annotation does not exist for paintings. This presents two challenges [9]: The first is how to cope with limited annotation and the second is how to obtain more annotation in an automated, efficient manner.
Classifying Paintings/Artworks Using Deep Learning Techniques
27
Fig. 1 Represents the average RGB calculations for three paintings by Ruffino Tamayo
Domain Shift: Due to the lack of annotated art, it is often necessary to learn models from natural images and apply them [10] to these paintings. This introduces a domain shift problem as natural images and paintings can have very different low-level statistics. A learned model should ideally be able to adapt to the new domain; however, [11] several studies have shown that there is typically a significant drop in performance when models are learned in one domain and applied to another. Variation in Art: Paintings can vary considerably in depiction style from photorealistic renderings through particular movements (e.g., Impressionism, Pointillism) to more abstract depictions (Fauvism, Cubism). There [12] are also temporal differences in the depiction of objects in paintings. For example, dogs appear in paintings not only as beloved pets but also in hunting scenes where they are often significantly smaller. Figure 1 [13] depicts the average RGB calculations for three paintings by Ruffino Tamayo.
3 Methodologies 3.1 Dataset The dataset used for the research was Kaggle’s, Best Artworks of All Time [14]. It is a collection of artworks of the 50 most influential artists of all time which contains information about each artist, their paintings, and the era that the painting belonged (e.g., Renaissance, Impressionism, Post-Impressionism, Baroque). The dataset is publicly accessible and the painting/artists’ information has been retrieved from
28
S. Singh and S. Chaurasia
WikiArt. The dataset consists of three files: (1) artists.csv which contains the information for each artist; (2) images.zip which contains collections of images(full size); and (3) resized.zip which is the same collection; however, images have been resized and extracted from the folder structure. We performed exploratory data analysis (EDA) on the dataset to get a thorough understanding about the attributes and their functionalities.
3.2 Model Building After performing EDA on the dataset [9], we focused on cleaning the dataset to get various genres as well as the number of paintings per genre. Followed by this, we explored the images of the top artists and got the corresponding images associated with them. The images were then imported as NumPy arrays and resized. For building the model, we identified a style (genre) and performed training on it.
3.3 Model Testing The model was tested, where a user inputs the URL of a painting/artwork [15], and the model predicts the genre/style for the particular painting along with the prediction probability. The image URL is transformed [16], and the image is predicted accordingly (Fig. 2).
Fig. 2 represents the structure of our methodology
Classifying Paintings/Artworks Using Deep Learning Techniques
29
4 Results Different pre-existing CNN models based on the dataset from ImageNet; AlexNet [17], ResNet50 [18], and VGG-16 [19] are the three architectures that have been widely used. In our case, we developed a convolution model to get the maximum accuracy which focused on the number of paintings, the training data, and the number of epochs. The training was done using ten epochs and an approximate accuracy of 20%. Followed by this, the top five painting styles were identified and transformed them. The final model contained paintings from the top five painting styles with Adam optimizer, the loss function as SparseCategoricalCrossEntropy [20], and 50 epochs. The final model achieved an accuracy of approximately 57%, with 3 convolution layers, 3 pooling layers, and 3 continuous linear vectors, i.e., 2 dense layers and 1 flatten layer with ReLu [21] as the activation function. To improve the accuracy of our model, we would have to consider a bigger dataset with a greater number of images, as this dataset only contained 8334 images. The purpose behind having a bigger dataset would mean that more images would be available for training the data and thus, that would ensure that our model is better at testing the data. In [22], artificial neural networks (ANNs) were used to predict the genre of the top five art genres: (1) Impressionism, (2) Renaissance, (3) Post-Impressionism, (4) Symbolism, and (5) Baroque. Their model achieved an accuracy of 55% with these five genres. However, our model outperformed their model by employing ConvNets by approximately 3% and also predicted the user input of paintings in the form of URL by displaying the genre associated with that painting as well as the prediction probability (Fig. 3; Table 1). Here we can see that the parameter value for the pooling layer is 0 as it only calculates a specific number and nothing else besides that. There is no backdrop learning involved around it as mentioned in Table 2 [23]. The purpose behind making the confusion matrix is to get an idea of the predictability score performance of our model. As the value tends to increase, we can note that the performance of the model keeps on improving [24]; however, a low score indicates that the model was not able to perform well for that particular row and column entry, in this case, the genre. From the table, we can see high score when the row and column value are the same, viz. Impressionism-Impressionism (322), Renaissance-Renaissance (279), etc. The significance behind this score for these particular values justifies that our model is able to classify the different genres distinctively with a considerable prediction rate. Figures 4, 5 and Table 4 talk about the prediction labels, and the accuracy with which they are able to predict the same. The objective of the model was to predict the genres in the dataset, also ask the user for the URL of the image, and predict [25] the associated genre with it along with the prediction probability.
30
S. Singh and S. Chaurasia
Fig. 3 Different layers of our convolutional network Table 1 Data collected while interpreting the EDA Top five genres
Impressionism Renaissance
Post-impressionism Symbolism
Baroque
Number of paintings per genre
Impressionism Renaissance
Post-impressionism Symbolism
Baroque
Genre Renaissance representation over the top 50 artists
Impressionism Symbolism
Post-impressionism Baroque
Ranking of painters from their number of paintings
Vincent Van Gogh
Edgar Degas
Rembrandt
Alfred Saley
Andy Warhol
Most proficient nationalities over the top 50 artists
French
Dutch
Spanish
Italian
Russian
Classifying Paintings/Artworks Using Deep Learning Techniques
31
Table 2 ConvNet model, the layer, and the output S. No.
Layer (type)
Output shape
Parameter value
1
Convolution layer
(None, 98, 98, 32)
896
2
Max pooling
(None, 49, 49, 32)
0
3
Convolution layer
(None, 47, 47, 64)
18,496
4
Max pooling
(None, 23, 23, 64)
0
5
Convolution layer
(None, 21, 21, 128)
73,856
6
Max pooling
(None, 10, 10, 128)
0
7
Flatten
(None, 12,800)
0
8
Dense
(None, 128)
1,638,528
9
Dense
(None, 5)
645
Table 3 Confusion matrix score for the different art periods Period
Impressionism Renaissance Post-impressionism Symbolism Baroque
Impressionism
322
50
Renaissance
51
279
47
33
44
Post-impressionism
75
41
152
35
11
Symbolism
42
36
19
94
9
Baroque
17
45
9
7
98
67
44
71
Fig. 4 Prediction made for labels (genres) which are compared with the true labels
5 Conclusion The research summarizes the methodology followed, the motive behind the research, and the results obtained from the research. We also enabled user input, wherein the user could give input in the form of a painting, and our model was able to predict the associated genre along with the prediction probability. Our model was also able to outperform the previous model by approximately 3%. The future scope is to use deep learning techniques further to predict the prices of paintings by analyzing their
32
S. Singh and S. Chaurasia
Fig. 5 A user testing the model
Table 4 Plotting the first test images, their predicted labels, and true labels Labels
True labels
Prediction%
Correctly predicted?
Baroque
Symbolism
100
No
Symbolism
Symbolism
99
Yes
Renaissance
Renaissance
100
Yes
Symbolism
Symbolism
100
Yes
Renaissance
Renaissance
94
Yes
Symbolism
Impressionism
63
No
Impressionism
Impressionism
61
Yes
Renaissance
Impressionism
99
No
features and generating a predictability score, which is sold in online auction houses like Sotheby’s and Christie’s and thus gives artists a fair idea about the predicted price of the paintings and artworks.
Classifying Paintings/Artworks Using Deep Learning Techniques
33
References 1. Cetinic E, Lipic T, Grgic S (2019) A deep learning perspective on beauty, sentiment, and remembrance of art. IEEE Access 7:73694–73710 2. Castellano G, Vessio G (2021) Deep learning approaches to pattern extraction and recognition in paintings and drawings: an overview. Neural Comput Appl 33(19):12263–12282 3. Zhao W, Zhou D, Qiu X, Jiang W (2021) Compare the performance of the models in art classification. PLoS ONE 16(3):e0248414 4. Crowley E (2017) Visual recognition in art using machine learning (Doctoral dissertation, University of Oxford) 5. Srinivasan R, Denton E, Famularo J, Rostamzadeh N, Diaz F, Coleman B (2021, Aug) Artsheets for art datasets. In: Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 2) 6. Smirnov S, Eguizabal A (2018, Oct) Deep learning for object detection in fine-art paintings. In: 2018 metrology for archaeology and cultural heritage (MetroArchaeo). IEEE, pp 45–49 7. Urbi G, Eduardo P, Fredy P (2019) color intensity variations and art pieces: an examination of Latin American art. SSRN Electron J 8. Amirshahi SA, Hayn-Leichsenring GU, Denzler J, Redies C (2014, Sept) Jenaesthetics subjective dataset: analyzing paintings by subjective scores. In: European conference on computer vision. Springer, Cham, pp 3–19 9. Worth T (2020) Painting2auction: art price prediction with a siamese CNN and LSTM. 10. Bialynicka-Birula J (2021) Statistical methods used for identification of art prices determinants. Available at SSRN 3840721 11. Lecoutre A, Negrevergne B, Yger F (2017, Nov) Recognizing art style automatically in painting with deep learning. In: Asian conference on machine learning. PMLR, pp 327–342 12. Blessing A, Wen K (2010) Using machine learning for identification of art paintings. Technical report 13. Tan WR, Chan CS, Aguirre HE, Tanaka K (2016, Sept) Ceci n’est pas une pipe: a deep convolutional network for fine-art paintings classification. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 3703–3707 14. https://www.kaggle.com/datasets/ikarus777/best-artworks-of-all-time 15. Agarwal B, Ali F, Kolli P, Yang X (2014) Project report: team 4 predicting the price of art at auction 16. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 17. Bartz C, Jain N, Krestel R (2020, May) Automatic matching of paintings and descriptions in arthistoric archives using multimodal analysis. In: Proceedings of the 1st international workshop on artificial intelligence for historical image enrichment and access, pp 23–28 18. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90 19. Ulyanov D, Lebedev V, Vedaldi A, Lempitsky V (2016) Texture networks: feed-forward synthesis of textures and stylized images. arXiv:1603.03417 20. Zhong SH, Huang X, Xiao Z (2020) Fine-art painting classification via two-channel dual path networks. Int J Mach Learn Cybern 11(1):137–152 21. Kelek MO, Calik N, Yildirim T (2019) Painter classification over the novel art painting data set via the latest deep neural networks. Procedia Comput Sci 154:369–376 22. Sandoval C, Pirogova E, Lech M (2021) Adversarial learning approach to unsupervised labeling of fine art paintings. IEEE Access 9:81969–81985 23. Zujovic J, Gandy L, Friedman S (2007) Using neural networks to classify paintings by genre. Northwestern University
34
S. Singh and S. Chaurasia
24. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 25. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Hybrid Deep Face Music Recommendation Using Emotions Divija Sanapala, Raja Muthalagu, and Pranav M. Pawar
Abstract Facial emotion recognition is a biometric model which uses facial features to find out the emotion and state of a person. Facial expressions are important as emotions are unique forms of psychological processes. Importantly, emotions are expressed through nonverbal behavior in the face, voice, and body, and different emotions are expressed through different, specific, unique facial configurations that are universal to all cultures, regardless of race, nation, ethnicity, religion, gender, or any other characteristic of the population. This paper explains how to use OpenCV in Python to build and create a facial emotion recognition system using the pre-trained model deep face. deep face is a deep learning facial recognition system, and it has an accuracy of 97%. Based on the user’s current emotion, a music playlist is generated using K-means clustering. It is often difficult for people to decide which songs to listen to ease this confusion. This model has been developed. Here, the person’s image is captured using a Webcam, the emotion is recognized, and then, a playlist is recommended along with the link of the song to listen to. Most available methods include manually playing music, employing wearable computing devices, or categorizing based on auditory attributes. In this paper, we recommend that the manual sorting and playing be replaced by recommending a song playlist and providing a YouTube link for the songs. This has an accuracy of 99% for both training and testing the dataset of songs. Keywords Facial emotion recognition · Pre-trained model · Deep face · OpenCV · Music suggestions · K-means clustering
D. Sanapala · R. Muthalagu (B) · P. M. Pawar Department of Computer Science, Birla Institute of Technology and Science Pilani, Dubai Campus, Dubai, UAE e-mail: [email protected] D. Sanapala e-mail: [email protected] P. M. Pawar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_4
35
36
D. Sanapala et al.
1 Introduction Facial emotion recognition uses algorithms to identify the faces of people, analyze facial expressions, and find their emotional state. It is done by evaluating the faces in photos or videos using cameras on computers or laptops, or smartphones. Facial analysis usually involves three main steps. The first one is face detection which includes locating the face in the image. The second one is facial landmark detection which includes extracting information about facial features. The third one is facial emotion classification which includes analyzing movements of facial features and categorizing them as happy, angry, sad, etc. For emotion detection, deep face model is used which detects seven emotions which are happy, neutral, angry, disgust, fear, sad, and surprise. Based on the emotion detected, it provides a music recommender system based. The music playlist is usually manual, but in this paper, we try to make it automatic by using emotion detection. Using a music recommender system based on the qualities of the previously heard music, the music provider can forecast and then provide the appropriate songs to its customers. This is done using K-means clustering [1]. An emotionally sensitive recommendation system would be better able to comprehend people’s needs and sentiments and choose acceptable musical selections for the situation’s emotional climate. Traditional theories about music-related emotions range from two to primary perspectives: The feelings conveyed by music and feelings that are thought of music. Music recommendations can be used in a variety of contexts, including music therapy, sports, studying, relaxing, and supporting mental and physical activity [2]. The main aim of this paper is to create a system for recommending music based on an individual’s emotional state. A person frequently finds it difficult to choose which music to listen to from the vast selection of available selections. This recommender system helps to solve the choice dilemma, discover new musical compositions, and promote physical and mental health. The design combines generic, music therapy, and recommendation methodologies with machine learning technology. This paper explains how to apply emotion-driven personalization to the process of making music recommendations [3]. This paper has been segregated into the following sections. Section 2 presents a review of some of the research conducted involving emotion recognition and music recommender systems. Section 3 gives detailed information on the datasets used, the implementation details and elaborates on the methodology proposed in this study and discusses the two phases included in this system that is emotion recognition and music recommendation. Section 4 explains the experimental results achieved for both emotion and music recommendation systems and outlines the conclusions drawn from this study and relates it to future research and areas that are open to further betterment.
Hybrid Deep Face Music Recommendation Using Emotions
37
2 Related Works PCA algorithm has been utilized in paper [4] which was used for dimensionality reduction and K-nearest neighbor which was used to detect the face picture category in the testing set. One of the major drawbacks of PCA is that it requires a huge dataset for correct prediction and if any objects are obstructing the face and if the light is too intense or too faint, it may result in an incorrect prediction. Support vector machine was another technique adopted to overcome the drawbacks in PCA. SVM is a facial recognition technology designed for limited samples and high dimensions. Due to the restricted amount of training data, the samples outside the training set may be closer to the segmentation line than the data inside the training set. A study conducted by Mehta et al. in [5] proposed methods to recognize and segment the face for localization using SVM and CNN. This paper provides a quick overview of the many approaches and strategies for recognizing emotions along with Microsoft HoloLens (MHL) which is a mixed reality gadget that was used to observe emotion detection in augmented reality (AR). The problem was that the sad emotion recognition needs improvement and also has problems of MHL on performing multiple times which was solved in the other papers. In [6], the support vector machine (SVM) was utilized by Vaishali M. Chavan and V. V. Gohokar to characterize five emotional states of human speech: anger, happiness, disappointment, surprise, and neutral emotion. The Danish Emotion Speech (DES) Database was used in this study. The accuracy obtained using the SVM classifier was 68% for the linear regression version, 60% for polynomial regression, 55.40% for RBF regression, and 60% for sigmoid regression. The whole time taken using SVM decreased to 15 s for testing the records set and 43 s for testing the 88 values. According to article [7] in 2020, right face asymmetry is found to be better than left face asymmetry. Face recognition still has a lot of issues with facial recognition. A later study found a remedy for the illusion of diversity in face poses. They employed subject-specific descriptors to develop a three-dimensional posture invariant method. Convolutional networks are used to handle a variety of problems, such as excessive makeup, stance, and expression. Recent advances in facial expression identification have caused advancements in cognitive science and neuroscience to motivate research in this field. Furthermore, advances in machine learning and computer vision have made emotion recognition considerably more precise and accessible to the general public. In 2020, Teoh et al. in [8] explained how to use OpenCV in Python to build and create a face recognition system utilizing deep learning. It has been proven that a classifier trained with many face photos may reach an accuracy of 91.7% in image recognition and 86.7% in real-time video. The accuracy of the system is lower than when the light intensity is higher.
38
D. Sanapala et al.
Another study performed in [9] suggested that the toolkits for image processing like OpenCV which includes the PCA technique and other approaches for face recognition are incorporated by software developers who want to implement face recognition into their applications. Peng Peng, Ivens Portugal, Paulo Alencar, and Donald Cowan 2021 wrote a paper that provides a framework for facial recognition based on PCA which is efficient and easy to develop. A new feature which is the dual-channel expression identification system based on emotional philosophy and machine learning theory was suggested in [10]. The proposed algorithm’s initial route uses the Gabor feature of the ROI area as input since features retrieved using CNNs overlook minor alterations in facial expressions’ active areas. To make the most of the active facial expression area’s detail features, the active facial expression region is first segmented from the original face picture, and the region’s features are extracted using the Gabor transform, with a focus on the local region’s detail description. Paper [11] examined the state of the art and key areas for future study in automated emotion representation, recognition, and prediction. This paper emphasized the results of emotion analysis and the most recent advances in multimodal emotion recognition. The techniques for analyzing emotional content in texts, sounds, images, videos, and physiological data have been discussed in detail in this paper. This study demonstrated the viability of automated emotion analysis, which can be highly helpful for increasing the accuracy of computer/machine replies and enabling quicker prediction of the interlocutor’s emotional state. Another approach as seen in [12] displayed an intelligent music player that can understand music. An animated character performs a song while expressing emotions through their facial expressions. The six different music kinds and moods can be represented by the GUI’s use of six different colors. As a result of the user’s play history, the system also learns about the user’s preferences, present mood, and musical personality. Most of their contribution takes the form of a ground-breaking music player interface and the investigation of novel music player functionality. Paper [13] recommended manual playlist segmentation and song annotation. To save time and labor costs associated with executing this procedure manually, the article introduces an algorithm that automatically creates an audio playlist based on a person’s facial expressions. Libraries like OpenCV, EEL, and NumPy have been utilized. This technique described in the study aims to cut down on both the system’s total computational time and cost. It also tries to improve the system design’s correctness. Comparing the system’s facial expression recognition module to a dataset that is both user-dependent and user-impartial serves to validate the system. Research [14] suggested an emotion-based music recommendation system that learns the user’s emotion from signals received by wearable computing devices that are coupled with physiological sensors for the galvanic skin response and photoplethysmography. A fundamental aspect of human nature is emotion. They are essential to life in general. In this study, arousal and valence prediction using multi-channel physiological data is considered a solution to the challenge of emotion identification. The goal of the paper [15] is to create an android application that will let users explore lengthy playlists with the least amount of effort possible. The idea behind this
Hybrid Deep Face Music Recommendation Using Emotions
39
project is to use image processing and a convolutional neural network to detect human emotions and then play the music that will amplify those emotions. This will extract the user’s facial characteristics and expressions to ascertain their current mood. Once the emotion is identified, the user will be shown a playlist of music appropriate for his or her mood using the YouTube API. The user’s local music collection is first arranged according to the emotions the album evokes. This is frequently determined by considering the words of the music. Similarly, paper [16] suggested an approach that extracts facial expressions and would create a playlist automatically, saving time and work compared to executing the process manually. To determine the user’s mood or emotion, the image is taken with a camera and put through a series of processing steps. To identify faces and identify emotions, the proposed work has employed the Viola-Jones method and multiclass SVM. The program uses the Viola-Jones method to identify faces and extract facial features. The developed approach needs less computational and processing time, memory overheads, and processing, which lowers the cost of any extra hardware like EEG or sensors. The suggested technique outperforms the current approaches in terms of efficiency and real-time performance. The suggested approach will not only assist the user but also ensure that the songs are systematically classified, reducing the user’s work in making and managing playlists. Due to the similarity between classes and the ambiguity of the annotation, noisy label facial expression recognition (FER) is more difficult than conventional noisy label classification tasks. Recent research focuses on filtering out large-loss samples to address this issue. Paper [17] presented a novel feature-learning approach to deal with noisy labels. FER models learn from the entire set of features that lead to the hidden truth rather than concentrating on a subset of features that can be thought of as connected to the noisy labels in order to remember noisy samples. Erasing attention consistency (EAC) strategy to automatically suppress the noisy samples throughout the training phase as a result of this. However, the currently used algorithms are slow, use extra hardware (such EEG structures and sensors), raise the system’s overall cost, and have substantially lower accuracy. In order to save time and work often expended on executing this procedure manually, the paper offers an algorithm that suggests a music playlist based on a person’s facial expressions. The algorithm described in the study aims at cutting down on both the system’s overall cost and calculation time. It also tries to improve the system design’s correctness (Table 1).
40
D. Sanapala et al.
Table 1 Literature review summary S. No.
Methodology
Results
Accuracy (%)
1
PCA was utilized to reduce the size of the data set and K-nearest neighbor, which was used to find the category of face pictures in the test set
This technique can be used in order to recognize faces in a variety of lighting conditions, facial expressions, etc.
92.50
2
SVM categorizes the extracted characteristics and the classified features display facial emotion
This technique is useful for high-dimensional spaces, and it is memory systematic
85.7
3
CNN technique is able to learn facial features and enhance facial emotion recognition
This model can automatically 91 extract and identify features from facial images, and a high recognition rate is attained
4
Deep learning with Python’s OpenCV This model is used for locating and library explains the idea of creating a authenticating the face in digital face recognition system photographs or video frames to detect face emotion recognition.
5
Haar cascade classifier highlights the Minimum characteristics are used in 92.8 emotions of a person by extracting the this method to reduce computational facial features complexity
92
3 Proposed Method 3.1 Data Collection The data for the emotion recognition is from the pre-trained model called deep face. Then for music recommendation, the dataset is from Kaggle [18]. It is the Kaggle Spotify dataset. This dataset has 170,653 rows and 19 columns. The columns have a few features like—danceability, energy, valence, and loudness which describe the song’s mood. It also has a column called mode which has a value of 0 and 1 in which 0 means that the mood of the song is sad and 1 means that the mood of the song is happy. Feature selection is done by noise and redundant feature removal. Trap, Techno, Techhouse, Trance, Psytrance, Dark Trap, DnB (drums and bass), Hardstyle, Underground Rap, Trap Metal, Emo, Rap, RnB, Pop, and Hiphop are all featured in the CSV’s entire list of genres of the music playlist.
3.2 Methodology The proposed method would detect the emotion of the user using real-time video and then suggest a music playlist based on the emotion detected.
Hybrid Deep Face Music Recommendation Using Emotions
41
Fig. 1 Flowchart of emotion recognition
Emotion Recognition The first part involves detecting the user’s emotion using facial features. The flowchart for facial emotion recognition is shown in Fig. 1. The model is done in Google Colab in Python, and it involves image processing and feature extraction in which an image is taken as input, and then, the face is detected using code facial expressions, facial landmark detection, and facial emotion recognition which is detected using deep face and OpenCV. Deep face in Fig. 2 is a Python framework for face recognition and facial attribute analysis which analyze gender, emotion, age, and race. Detect, align, normalize, represent, and verify are the five phases of a contemporary face recognition pipeline. Deep face takes care of all these frequent steps in the background, so we don’t have to know the process behind it in depth. With a single line of code, we may call its verification, search, or analysis functions. Deep face was used in the model to get the predictions from the given image. It is a 9-layer neural network with an accuracy of 97%. A Facebook research team developed the deep learning facial recognition algorithm known as DeepFace. In digital photos, it recognizes human faces. The program was trained on four million photographs shared by Facebook users with approximately 120 million connection weights. OpenCV in Fig. 3 is an open-source computer vision library. One of the most fascinating and difficult jobs is computer vision. Computer vision serves as a link between
Fig. 2 Deep face architectural diagram [21]
42
D. Sanapala et al.
computer software and graphics. Computer vision enables software to recognize and learn about the visuals in its environment. For such difficult situations, computer vision is the ideal module. OpenCV is a video and image processing library that may be used for picture and video analysis such as facial recognition, license plate reading, photo editing, sophisticated robotic vision, and more. In Fig. 3, the image after feature extraction the matching process is done which involves clustering, similarity detection, and classification. Cascade classifier is used to train the images, and Haar cascade is to identify the face and eyes which is a pre-trained model which is used to draw a box on the face identified in the image as shown in Fig. 4. The model also adds the text of the dominant emotion detected in the image as shown in Fig. 5.
Fig. 3 OpenCV architectural diagram
Fig. 4 Use of Haar cascade
Hybrid Deep Face Music Recommendation Using Emotions
43
Fig. 5 Image with the text
This model has a real-time demo as well which uses the Webcam and can capture the picture and detect the emotion of the captured picture and add the text of the mood to the image as well. Music Recommendation The second part of this project is to suggest a music playlist based on the emotion recognized in the previous step. For this, the Kaggle Spotify dataset is used. This dataset has 170,653 rows and 19 columns. In which 70% is for training and 30% is for testing. The columns have a few features like—danceability, energy, valence, and loudness which describe the song’s mood. It also has a column called mode which has a value of 0 and 1 in which 0 means that the mood of the song is sad and 1 means that the mood of the song is happy as shown in Fig. 6. Even though deep face recognizes seven emotions—happy, neutral, angry, disgust, fear, sad, and surprise. They are broadly divided into two main emotions that are happy and sad for convenience. K-means clustering is used to cluster the songs into two clusters—happy and sad. After K-means is performed on the dataset, a column is generated called k-means, and this also has values of 0 and 1.0 means sad cluster, and 1 means happy cluster. The duplicates of the dataset are dropped and adding the K-means column the dataset is reduced to 133,638 rows × 20 columns. This dataset is trained and tested based on the features—danceability, energy, valence, and loudness which help to define the mood of the song as sad or happy. Using all this information, the accuracy of testing and training the model for finding the mood of the song is recorded as 99%. From the seven emotions, two clusters are generated. This is done by grouping sad, angry, fear, and disgust as sad and then happy, surprise, and neutral as happy.
44
D. Sanapala et al.
Fig. 6 Defining mood of song
3.3 Algorithm This algorithm is used to find the emotion of a person and suggest a music playlist according to the mood detected. This helps to solve the confusion people face while deciding which songs to listen to. This is a code done in Python which is done using OpenCV which is used to find facial expressions, and then, deep face is used to detect the emotion of a person. Then, a music playlist is suggested using k-means clustering. Youtube links for the music are also provided for easy access.
Hybrid Deep Face Music Recommendation Using Emotions
45
4 Experimental Results and Analysis 4.1 Results This study focused on developing an approach that could be used to detect the emotion of a person and recommend a song according to the mood detected. The accuracy of this model is 99.9% in the training phase and 99.5% in the testing phase. The emotion of the person is identified by capturing a picture using a Webcam and then using deep face to recognize the emotion as shown in Fig. 7. Deep face is a 9 layer neural network, and it is trained with more than four million images. Then based on the emotion recognized above, the playlist is displayed as shown in Fig. 8. For convenience, top 10 songs are displayed from the dataset where the mode and k-means column of the dataset match. This has an accuracy of 99%. To add to this, a YouTube link for the song is also displayed as shown in (Figs. 9 and 10). Several papers are evaluated which use support vector machine (SVM) and convolution neural network (CNN) for music recommendation using emotion. These results were compared against the training and testing accuracy of this model. The model used in this paper proves to be more accurate and efficient for music recommendation systems using emotion detection. Deep face is used for emotion recognition from images, and K-means is used for predicting songs based on the emotion recognized (Table 2).
46
D. Sanapala et al.
Fig. 7 Emotion identified
Fig. 8 Playlist for happy emotion recognized
4.2 Conclusion In this paper, we can conclude that this proposed approach is effective in identifying a person’s emotion by using a pre-trained model called deep face and suggesting a song based on the recognized emotion. This helps to reduce the confusion on what song to play as the model generates the songs based on the mood recognized. For convenience, the YouTube link for the song is also provided so the person can just click on the link and enjoy listening to the song. The song mood detection is done using k-means clustering. The accuracy recorded for determining the emotion of the person is 97%, and the accuracy of song mood detection is 99% for both testing and training.
Hybrid Deep Face Music Recommendation Using Emotions
47
Fig. 9 YouTube link displayed
Fig. 10 Accuracy of the model
Table 2 Training and testing accuracies of different models for music recommendation using emotion detection Algorithm
Training accuracy
Testing accuracy
SVM
0.85
0.66
CNN
0.95
0.71
Deep face and K-means clustering (current model)
0.99
0.99
4.3 Future Scope In this paper, for convenience, only 2 major moods of the songs are recognized that are happy and sad, but this project can be expanded into more clusters like one for each emotion detected by the deep face which includes neutral, angry, surprise, fear, disgust other than happy and sad done in this paper.
48
D. Sanapala et al.
References 1. Vemou K, Horvath A (2021) EDPS TechDispatch: facial emotion recognition, Issue 1. In: Zerdick T (ed). Publications Office. https://doi.org/10.2804/014217 2. Schäfer T, Sedlmeier P, Städtler C, Huron D (2013) Front Psychol 4:511. Prepublished online 24 May 2013. Published online 13 Aug 2013. https://doi.org/10.3389/fpsyg.2013.00511 3. Schedl M, Zamani H, Chen CW et al (2018) Current challenges and visions in music recommender systems research. Int J Multimed Info Retr 7:95–116. https://doi.org/10.1007/s13735018-0154-2 4. Mellouk W, Handouzi W (2020) Facial emotion recognition using deep learning: review and insights. Procedia Comput Sci: 689–694. https://doi.org/10.1016/j.procs.2020.07.101 5. Nannapaneni R (2019) Human emotion recognition using machine learning. Dell technologies proven professional knowledge sharing, pp 1–24 6. Padmaja NJ, Rao RR (2019) Telugu based emotion recognition system using hybrid features. Int J Comput Appl 182(37):9–16. https://doi.org/10.5120/ijca2019918359 7. Li L, Mu X, Li S, Peng H et al (2020) A review of face recognition technology. IEEE Access 8:139110–139120. https://doi.org/10.1109/access.2020.3011028 8. Mehendale N (2020) Facial emotion recognition using convolutional neural networks (FERC). SN Appl Sci 2:446. https://doi.org/10.1007/s42452-020-2234-1 9. Teoh KH, Ismail RC, Naziri SZM, Hussin R, Isa MNM, Basir MSSM (2021) Face recognition and identification using deep learning approach. J Phys Conf Ser 1755(1):012006. https://doi. org/10.1088/1742-6596/1755/1/012006 10. Peng P, Portugal I, Alencar P, Cowan D (2021) A face recognition software framework based on principal component analysis. PLoS One: 1–46. https://doi.org/10.1371/journal.pone.025 4965 11. Marechal C et al (2019). Survey on AI-based multimodal methods for emotion detection, vol 11400. Springer, Cham.https://doi.org/10.1007/978-3-030-16272-6_11 12. Tan SFC, Fan X, Su H, Zhang J (2011) HeartPlayer: a smart music player involving emotion recognition. In: Expression and recommendation. Advances in multimedia modeling, pp 483– 485 13. Sadhvika CH, Abigna G, Srinivas Reddy P. Emotion based music recommendation system. Int J Emerg Technol Innov Res (JETIR) 9(8):2170–2175. ISSN: 2349-5162 14. Ayata D, Yaslan Y, Kamasak ME (2018) Emotion based music recommendation system using wearable physiological sensors. IEEE Trans Consum Electron 64:196–203. https://doi.org/10. 1109/TCE.2018.2844736 15. Guidel A, Sapkota B, Sapkota K (2020) Music recommendation by facial analysis. engineeringsarokar 16. Preema JS, Rajashree SM, Savitri H (2018) Review on facial expression based music player. Int J Eng Res Technol (IJERT) 06(15):2278-0181 17. Zhang Y, Wang C, Ling X, Deng W (2022) Learn from all: erasing attention consistency for noisy label facial expression recognition 18. Clive Owen. DeepFace: face generation using deep learning. https://www.semanticscholar. org/paper/DeepFace-%3A-Face-Generation-using-Deep-Learning-Owen-Ronaldo/587f81ae8 7b42c18c565694c694439c65557d6d5/figure/0
Integrating ResNet18 and YOLOv4 for Pedestrian Detection Nader Salam and T. Jemima Jebaseeli
Abstract Ever-since the advent of computer vision-related research field, object detection has been in the forefront of this field. Among all the object detection tasks, pedestrian detection has been the most challenging. The scope and use of pedestrian detection task in surveillance, traffic monitoring, and autonomous cars like The Tesla’s of the world makes it an essential field of research. To this cause, the work done by the YOLO detection algorithms is significant as it detects objects in a single stage, whereas traditional detectors did the same in multiple stages. But the training and computational cost for YOLO are very high. So in our paper, we propose a combination of a simpler YOLOv4 version where the training is done faster, and also the inference times are greatly reduced because of the integration of ResNet18. ResNet18 is used for feature extraction, and YOLOv4 is used for classification. Keywords YOLO · ResNet · Visual geometry group · Feature map
1 Introductıon Object detection technique consists of object localization and image classification methods. Object localization scans for elements in a given image and encompasses it with bounding boxes. Image classification on the other hand predicts the class of the object belongs to. According to statistics, about 7000 pedestrians died in road accidents in the European Union each year, accounting for around 27% of all traffic fatalities [4]. 70% of the world’s population will live in cities by 2050 [5]. As a result, traffic in urban areas will worsen [6], and pedestrians will face more hazards on the road in future. The employment of self-driving automobiles to reduce N. Salam · T. Jemima Jebaseeli (B) Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India e-mail: [email protected] N. Salam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_5
49
50
N. Salam and T. Jemima Jebaseeli
congestion safety and effectiveness has been universally accepted. The need for better pedestrian detection applications in object detection processing helps with primarily two aspects such as faster detection and accurate detection. The first requirement is used to address how quickly objects can be detected in an image and how fast a model can be trained using minimum resources, while the last is how accurate the model predicts a given object. The proposed model is adhering to YOLOv4 on input image processing, building backbone, and feeding from the backbone to head via the neck. The proposed system used ResNet18 as the backbone of YOLOv4 to reduce the number of trainable parameters and train it on CPU’s instead of high-end GPU’s. The obtained results are pedestrians in an image that is bounded by a box. The proposed work is organized as follows: In Sect. 2, some of the related works of pedestrian detection are discussed. Section 3 explains the architecture and design techniques of implementation. Section 4 discusses the experiment results and discussions of the outcome. Section 5 concluded the research, respectively.
2 Related Works The object recognition domain has extensively investigated pedestrian detection as a subdivision of generic object detection, and it allows a range of human-centric activities such as autonomous driving [1–3], security monitoring [4–7], and so on. Despite tremendous progress in overcoming this difficult challenge over the preceding couple decades, most pedestrian detectors fail catastrophically under the situations such as wet or cloudy weather, poor illumination, and a crowded backdrop [8]. To that aim, researchers are working to integrate novel sensory modalities and utilize their complimentary features to increase accuracy and robustness [1, 9, 10]. Some recently proposed multispectral object tracking methods endeavor to exploit the adequate information from multispectral data in various lighting environments by developing a global illumination-aware method to perform frame-based re-weighting of multispectral characteristics or detection results. However, supplying a global illumination label for the whole image has the obvious disadvantage of not being capable of adapting for illumination conditions in the morning, evening, overnight, or shadowed locations [11, 12]. Certain cutting-edge pedestrian detectors, such as CSP [13], RepLoss [14], ORCNN [15], and ALFNet [16], achieved error rates of around 4% MR-2 at 0.1 false positives per image or partly occluded pedestrians. When severe anomalies are present, performance suffers greatly. Additionally, the proposed component detector is not capable of dealing with the vast diversity of pedestrian positions and occlusions encountered in real-world settings. Conventional multiview detection algorithms estimate the targeted attributes using the background removal technique. Then spatially combining many views using conditional random field (CRF) or mean-field inference [17–19]. These systems usually need the manual design of specific characteristics and are inadequately reliable.
Integrating ResNet18 and YOLOv4 for Pedestrian Detection
51
The main disadvantage of existing multispectral pedestrian detection methods is that they exclusively considered human-related multispectral characteristics within particular bounding boxes, rather than trying to investigate the mutual data of various pedestrian occurrences to generate a sustained and selectively enforced pedestrian representation [11, 12]. Collisions with pedestrians are a major issue in transportation safety [20–22]. Traditional object detection methods started with the works done by Higgs and Dalal on HOG-SVM [23] for pedestrian detection systems. These models used a sliding window with histograms for feature extraction and SVM for classification. In 2010, DPM [24] was proposed, which had five detectors for detecting the various parts of humans, and by combining the various scores of all the detectors, they were able to detect humans in an image. From 2012, object detection models started transitioning from the classic computer vision models to CNN models. The first of this is the AlexNet [25], which had five convolution layers and 2 fully connected layers with two Max pool layers. In the subsequent years, VGG [26], GoogleNet [27], and ResNet [28] were implemented. From the error rate of 28.2% in 2010 with classic computer vision models by 2015, the error rate had decreased all way to 3.57% by using ResNet. The problem of using CNN as an object detector is too many inputs are fed to the model as the image is cropped using sliding window method. In order to overcome this problem, only interesting regions of an image were considered, and this is called as region proposals. By making use of region proposals models like RCNN [29], fast RCNN [30] and faster RCNN [31] were developed which got good performance results. Mask RCNN [32] was extended from faster RCNN by adding a branch for mask prediction. All the above architectures were multistage detection models. By 2015, researchers started developing single stage object detection algorithm. Models like YOLO [33] and SSD [34] were developed as single stage detectors. Over the years, many YOLOversions [35, 36] were released by Redmon and subsequently others.
3 Proposed System The proposed model for pedestrian detection is done in four phases of pipeline. Figure 1 depicts the pipeline of the proposed model.
3.1 Input The input layer converts the input image into texts to be accepted by the model. The image is pre-processed as per custom need by drawing rectangle boxes on the visible human subjects. Once saved a text file corresponding to the image considered will be generated. Figure 2 illustrates an image and the corresponding text file generated.
52
N. Salam and T. Jemima Jebaseeli
Fig. 1 Proposed system architecture (diagram of YOLOv4 modified [36])
Integrating ResNet18 and YOLOv4 for Pedestrian Detection
53
Fig. 2 a Input image, b image text file
Each row in the input image text file corresponds to the number of human detections. The columns of the text file are divided as object_class (human in this case), x_axis point, y-axis point, width, and height.
3.2 Backbone (ResNet18) The task of the backbone phase is to extract features. The ResNet model we have used has 17 convolution layers and 1 fully connected layer shown in Fig. 3. ResNet was introduced so as to solve the problem of diminishing efficiency as a result of having dense convolution layers. This decrease in efficiency was due to the problem of vanishing gradient problem (VGP). VGP exists because the partial derivatives with respect to the parameters of any neural network become exponentially smaller. This results in insignificant updating for the parameters of the gradient. Normally in deep neural networks make use of sigmoid or hyperbolic tangent function which causes this problem of VGP due to backpropagation and chain rule. ResNet allows skip connection, by which the information is skipped. Performing linear combination and applying ReLU activation function, the activation of any given layer can be computed. So if the activation of any given layer is smaller than activation’s from other layers, then adding the activation’s from the other layers will make the activation bigger. The problem of VGP can be mitigated in this way by adding skip connection. The ResNet18 model has only 2 pooling layers. All the convolution filters are of size 3 × 3. There are only 11 million trainable parameters, which is why we selected this model to be the backbone of our YOLOv4 model instead of DarkNet53 which has 53 layers and 40 million (which needs high-computational cost while training).
N. Salam and T. Jemima Jebaseeli
Fig. 3 ResNet18 model
54
Integrating ResNet18 and YOLOv4 for Pedestrian Detection
55
To make use of BoS techniques for backbone to increase the accuracy of the model, these are mish activation and cross stage partial connections (CSP). Mish Activation Function It provides better accuracy lower loss and smoother transitions compared to ReLU and leaky ReLU. The equation is given as follows: y = x ∗ tanh(softplus(x)) softplus(x) = ln(1 + e x ) tanh(x) =
(e x − e−x ) (e x + e−x )
(1) (2)
Combining the two equation, we get the mish function x x x ∗ eln(1+e ) − e− ln(1+e ) Mish(x) = ln(1+ex ) e + e− ln(1+ex ) Cross Stage Partial Connections (CSP) This technique is used to improve the learning capability of a CNN network. The FMs from the base layer are split into two parts. One part proceeds through a dense block and then transition layer, whereas the other part is concatenated with the previous parts transition. This helps with avoiding duplicate gradient information. Also the problem of vanishing gradient is avoided as when backpropagation is done, the weights are updated along the two split paths. Also the depth is decreased as the FM is split into two, and computation cost is reduced as only one part is sent through the dense block. BOF To make use of BOF techniques for backbone to increase the accuracy of the model, these are CutMix, Mosaic, and Drop Block. CutMix Augmentation To make use of cutting and pasting of random patch between the training images. The cutout of one image is replaced by a patch from another image and the second image’s ground truth labeling. This feature is useful because the algorithm can predict a human even if parts of the human being are not visible in the image. Mosaic Augmentation To combine four training images into one image at different ratios is performed here. With the help of this technique, the problem of scale awareness during the extraction process is solved. Even if the scale is high or low, the algorithm will be able to detect it with high accuracy.
56
N. Salam and T. Jemima Jebaseeli
Drop Block This technique works pretty much similar to dropout regularization, wherein instead of dropping a single pixel, it randomly drops contiguous regions from a FM. It considers the parameters block_size and y. block_size is the size of the region to be dropped, and y is how many activation units to be dropped.
3.3 Neck The neck phase collects the FM from different stages and combines it and feeds it to the head. A pyramid structure is adopted within the backbone, neck, and head to detect the subject at different scales. Basically, the head is fed with input containing spatial info that is obtained from the hierarchy of the backbone and the semantic info from the hierarchy of the neck. Below, we take a look into the technique used for the design of the neck. Feature Pyramid Network (FPN) To predict the subjects at different scales, FPN add the up-samples from the previous layer to the corresponding neighbor layer as shown in Fig. 4. To avoid the effects of the up-sample, it is then passed through a 3 × 3 convolution filter, which results in a FM to be passed to the head. Modified Path Aggregation Network The problem with dense layers is that when we move toward the right of the network, we are not able to fine-tune localized parameters, and the prediction will be lost. When we take a FPN, the info from the neighboring layers and combine them. PAN allows a bottom up path to be augmented, as this leads to information from the lower layers to be made available in the top. Instead of the FPN moving across 100’s of layers, PAN makes use of a skip path to reach the top layer of the neighboring layer. Thus, we are able to get fine-tuned parameters in the top layer, and predictions are not lost. The other issue that arises from FPN is duplication of predicted classes as info from
Fig. 4 FPN image taken from original FPN source paper [37]
Integrating ResNet18 and YOLOv4 for Pedestrian Detection
57
all FM’s is not taken into account. PAN concatenates FM’s from all the layers and this removes duplicate. Head The head performs all the tasks performed in YOLOv3. YOLOv3 creates bounding box with the coordinates (x, y, w, h). It also gets the confidence score for the class. Anchors-based technique is basis of the YOLOv3 network. We go in depth the design techniques for the head block. BOS The BoS for head consists of MISH activation, modified path aggregation network (PAN), and modified spatial attention module (SAM), DIoU-NMS. Modified SAM The original paper of SAM proposes using max and average pools to all the input FM’s and split it into two FM’s. The output data is fed as input to a convolution layer, and sigmoid activation is applied. The modified SAM in YOLOv4 removes the pooling layers. DIoU-NMS Non-max suppression allows prediction of the boundary box with the best confidence score, when multiple bounding boxes predict the same object. IoU is the region covered by taking both the intersection and union of bounding boxes predicting the same object. Distance IoU (DIoU) takes the ground truth bounding box and the predicted bounding box’s center point and removes redundant boxes. The equation is given below. RDIoU =
ρ 2 (b, b gt ) c2
where p is the Euclidean distance, b and bgt are the central points of the bounding box B and groundtruth Bgt , c is the diagonal length of the enclosing box covering B and Bgt BOF To make use of BOF techniques for backbone (as mentioned in the original paper) to increase the accuracy of the model, these are Mosaic, DropBlock, CIoU, CmBN, SAT. CIoU Complete IoU loss allows the reduction of cost by weight adjustment by increasing the overlap of B and B as mentioned in the case of DIoU. It also decrease the central point distance of b and bgt .
58
N. Salam and T. Jemima Jebaseeli
CmBN Batch normalization was introduced so that when the weights or network parameters are changed during training epochs, the network activation variance and mean are not shifted. The authors of CBN (cross iteration BN) leveraged the stats from the previous epochs to compute the mean and variance from current as well as the previous epochs. CmBN is a modified version of CBN that collects stats from a mini batch of a single batch. SAT Neural Nets classify image wrongly if it has a slight bit of noise added to it. So we need to train images from understanding the ground truth image and the perturbed image. SAT is an augmentation technique which adds noise in the image such that the detector performance is made worse. Then, train the model on this new image by putting the same boundary box and class label on it.
4 Experiments and Discussion Data Collection The proposed model is trained on the custom datasets from Kaggle containing 559 images. The link to the data is provided below. https://www.kaggle.com/datasets/constantinwerner/human-detection-dataset Data Preparation We have used a Labellmg, a free image annotator which allows us to draw a bounding box over the object that we want to detect and classify. We have explained this process earlier when we explained the input layer. Training We use google co-lab to train the images. We make sure CUDA is enabled so that the GPU is used instead of the CPU by YOLOv4. We define the batch, learning rate, kernel size, and channels as a tuple. We first train the image dataset on ResNet18 backbone by making use of list. Then, the output is fed to the detector model, and the two stage detection is completed. We get training weights every 1000 iterations in google co-lab. If there is an issue and training has stopped for some reason, then we can start training from the last saved weights. We then download these weights from google drive folder and use it to test it in our local machine with CUDA enabled. The below image shows the person detection (Fig. 5). We trained our model for 3 h in google co-lab. To do the training for YOLOv4 with cspdarknet53, it took 8 hours. This is because DarkNet53 has close to 4 times more trainable parameters than ResNet18 as used by our model. We reached 2150 iterations and last weight file we downloaded was yolo_person_last_weights.
Integrating ResNet18 and YOLOv4 for Pedestrian Detection
59
Fig. 5 Person detection completed on test image
Evaluation Evaluation is done for primarily the following parameter; mean average precision mAP, recall, and F1-score, true positive (shows the total humans detected), false positive (shows total non-human detected), and false negative (amount of humans not detected). Precision = Recall = F1 Score =
True Positive True Positive+False Positive True Positive True Positive+False Negative × Recall) 2 × (Precision (Precision + Recall)
When we tested this weight with our datasets, we got a mAP of 77.67 % as compared to the 83.6% achieved from the original YOLOv4 model. The mAP difference is a trade-off we can accept considering the computational cost that we have saved. Refer Table 1 for inference. Also we would need far more lesser resource like a CPU to train this model instead of a GPU like Tesla P4 as provided in google co-lab (Fig. 6).
60
N. Salam and T. Jemima Jebaseeli
Table 1 Comparison table for DarkNet YOLOv4 and ResNet18 YOLOv4 Model
Training time (in h)
Precision (%)
Recall (%)
F1-score (%)
Trainable parameters
mAP (%)
YOLOv4 with DarkNet53
8
85
78
81.34
40M
84.6
YOLOv4 with ResNet53
3
78
78
78
11M
77.67
Fig. 6 mAP for the model based on the training dataset
5 Conclusions In this paper, we present a way to detect pedestrians/person using a modified version of YOLOv4. This was done in order to get a model which is easily trainable, has less computational cost, and do not need too many high-end resources (GPU) to complete the training. We also needed to be sure that we do not have serious tradeoffs in efficiency and accuracy by implementing such a model. We made sure that our model was able to produce almost comparable accuracy as to the original YOLOv4 model. The future of object detection is very promising. In the first half of 2022, two more YOLO models have been released, and the present YOLO version is V7. More and more single stage detection algorithms are being worked on every year. From our side, we need to see if we can put in ResNet101 or ResNet152 into the backbone block of YOLOv6 which is currently using EfficientRep as its backbone.
References 1. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3354–3361 2. Geronimo D, Lopez AM, Sappa AD, Graf T (2009) Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans Pattern Anal Mach Intell 32(7):1239–1258 3. Benenson R, Omran M, Hosang J, Schiele B (2014) Ten years of pedestrian detection, what have we learned? In: Proceedings of the European conference on computer vision (ECCV), pp 613–627 4. Ojha S, Sakhare S (2015) Image processing techniques for object tracking in video surveillance—a survey. In: Proceedings of the IEEE international conference on pervasive computing (ICPC), pp 1–6 5. Kumaran SK, Dogra DP, Roy PP (2019) Anomaly detection in road traffic using visual surveillance: a survey. arXiv:1901.08292 6. Wang X, Wang M, Li W (2013) Scene-specific pedestrian detection for static video surveillance. IEEE Trans Pattern Anal Mach Intell 36(2):361–374
Integrating ResNet18 and YOLOv4 for Pedestrian Detection
61
7. Li X, Ye M, Liu Y, Zhang F, Liu D, Tang S (2017) Accurate object detection using memorybased models in surveillance scenes. Pattern Recognit 67:73–84 8. Cao J, Pang Y, Xie J, Khan FS, Shao L (2020) From handcrafted to deep features for pedestrian detection: a survey. arXiv:2010.00456 9. Ha Q, Watanabe K, Karasawa T, Ushiku Y, Harada T (2017) MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: Proceedings of the IEEE/ RSJ international conference on intelligent robots and systems (IROS), pp 5108–5115 10. Shivakumar SS, Rodrigues N, Zhou A, Miller ID, Kumar V, Taylor CJ (2020) Pst900: Rgb-thermal calibration, dataset and segmentation network. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 9441–9447 11. Li C, Song D, Tong R, Tang M (2019) Illumination-aware faster RCNN for robust multispectral pedestrian detection. Pattern Recognit 85:161–171 12. Guan D, Cao Y, Yang J, Cao Y, Yang MY (2019) Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf Fusion 50:148–157 13. Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5187–5196 14. Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2018) Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7774–7783 15. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware R-CNN: detecting pedestrians in a crowd. In: Proceedings of the European conference on computer vision, pp 637–653 16. Liu W, Liao S, Hu W, Liang X, Chen X (2018) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Proceedings of the European conference on computer vision, pp 618–634 17. Fleuret F, Berclaz J, Lengagne R, Fua P (2007) Multicamera people tracking with a probabilistic occupancy map. IEEE Trans Pattern Anal Mach Intell 30(2):267–282 18. Alahi LJ, Boursier Y, Vandergheynst P (2011) Sparsity driven people localization with a heterogeneous network of cameras. J Math Imag Vis 41(1):39–58 19. Roig G, Boix X, Shitrit HB, Fua P (2011) Conditional random fields for multi-camera object detection. In: 2011 international conference on computer vision. IEEE, pp 563–570 20. Li G, Liao Y, Guo Q, Shen C, Lai W (2021) Traffic crash characteristics in Shenzhen, China from 2014 to 2016. Int J Environ Res Public Health 18:1176 21. Grassi P, Frolov V, Puente Leon F (2011) Information fusion to detect and classify pedestrians using invariant features. Inf Fusion 12:284–292 22. Song X, Shao X, Zhang Q, Shibasaki R, Zhao H, Zha H (2013) A novel dynamic model for multiple pedestrians tracking in extremely crowded scenarios. Inf Fusion 14(3):301–310 23. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893 24. Felzenszwalb P, Girshick R, Mcallester D, Ramanan D (2013) Visual object detection with deformable part models. Commun ACM 56:97–105 25. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25 26. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition 27. Szegedy C et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9 28. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition 29. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation 30. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks 31. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN
62
N. Salam and T. Jemima Jebaseeli
32. Redmon J, Divvala S, Girshick R, Farhadi A (2015) You only look once: unified, real-time object detection 33. Liu W et al (2016) SSD: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. ECCV 2016. Lecture notes in computer science, vol 9905. Springer, Cham 34. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger, 6517–6525 35. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement 36. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection 37. Lin T-Y, Dollar P, Girshick R (2017) Feature pyramid networks for object detection
Image Restoration from Weather Degraded Images Using Markov Random Field Ajoy Dey, Avra Ghosh, and Sheli Sinha Chaudhuri
Abstract Images often suffer degradation by foggy weather conditions. Scene recovery from such degraded images poses a significant challenge and is often a time-consuming task. Single image haze removal remains one of the most challenging tasks in image processing. Two of the most efficient method used for image recovery are dark channel prior and colour attenuation prior. Both these methods require an extra filter, such as a guided filter, for refining the estimated transmission map. This paper proposes a single image dehazing method that combines both dark channel prior and colour attenuation prior and uses the Markov random field for refining the transmission map, thus evading the use of any extra filter. Keywords Dehazing · Parallel processing · Markov random field · Dark channel prior · Colour attenuation prior
1 Introduction Outdoor images are greatly degraded by atmospheric turbidity. Conditions such as fog, smoke, and presence of water droplets or particles often contribute to such degradation of scene radiance. Airlight (light scattered by atmospheric particles) also interferes with the amount of incoming light, thus degrading the image quality. The degraded image often suffers from low contrast values and loses colour saturation (see Fig. 1). The amount of degradation is largely dependant on the distance between the object and the camera since the amount of scattering is directly proportional to the distance. The fact that degradation amount is distance dependent is often utilised in the restoration of such degraded images. Single image dehazing possess a significant challenge since the amount of haze varies drastically with the position. It is indeed a difficult task to model such erratic changes. Various methods have been employed to improve the quality of a weather A. Dey · A. Ghosh (B) · S. S. Chaudhuri Jadavpur University, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_6
63
64
A. Dey et al.
Fig. 1 Degraded image formation process
degraded image, which include contrast-based, depth-based, and saturation-based methods. The usage of multiple images to gain necessary knowledge about the degraded image is amongst the very first methods for dehazing. This method is extremely effective but suffers from computational problems. First, multiple ground truths are required, which might not be available in all cases. Secondly, comparing the degraded image with the ground truths consumes a huge amount of time if the image is of large size. Contrast enhancement methods can be also utilised for dehazing purpose. Since the degraded image suffers from low contrast, contrast enhancement method is often employed to improve the image quality. This method produces low results, hence not used in practical cases. Deep learning can also be used for image restoration. Deep-learning-based models such as Dehazenet [1] can estimate the transmission map to a high precision, giving significant results. Lastly, priors such as dark channel Prior [2] provide great results in single image dehazing problems. Filters such as guided filter [3] are used to fine-tune the approximated transmittance map for better image recovery. Instead of using a filter, Markov random field-based energy minimization [4] can be used to refine estimated transmittance map.
Image Restoration from Weather Degraded Images Using Markov Random Field
65
2 Literature Survey Wang et al. [5] did an extensive research on the advances in dehazing algorithms developed till date. The authors also provided qualitative comparison of different algorithms on the basis of various image parameters such as PSNR, SSIM, BRISQUE [6], and many more. Tan et al. [7] developed an automatic method to facilitate dehazing using a single image. The authors developed the algorithm taking two important observation into consideration. First, a comparison between clear and weather degraded images presents that clear images are relatively higher in contrast values. Secondly, airlight distribution tends to be smooth over distance. Surface shading has no correlation with the transmission map. Fattal [8] used this observation to get and estimate of the surface albedo. They developed a novel method using Markov random field and independent component analysis to accurately estimate the depth information. He et al. [2] proposed a method of single image dehazing technique based upon dark channel prior, but time complexity is high for the same. He et al. [2] presented a method of single image dehazing technique based upon guided filter to calculate the transmission coefficient, which improves the time complexity from the previous model. Kratz et al. [9] processed the degraded image using factorial Markov random field, which provided a estimate to the depth information of the degraded image. Avra et al. [10] provided an efficient hardware implementation dark channel prior which can be used in real-time. The authors proposed a parallel architecture to calculate airlight and transmission coefficient. The same author [11] presented a Raspberry Pi-based single image dehazing machine, where it can process the image and dehazed locally.
3 Background The most common model used to describe a weather degraded image is the scattering model given by Eq. 1. I (x) = J (x)t (x) + A(1 − t (x))
(1)
here, the degraded images are represented by I(x), the actual scene radiance is denoted by J(x), t(x) represents the transmission of the medium, and A represents the atmospheric light. Equation 1 consists of only one known value I(x), rest all three values are unknown. So in order to recover scene radiance, some sort of approximation is needed. In 2009, He et al. [2] proposed a novel single image dehazing method based on dark channel prior. A patch-based division of haze free image contains some pixels with very low intensity values in one or more colour channels. DCP is based on the
66
A. Dey et al.
above observation. It provides an effective approximation of the atmospheric light. The dark channel of a image is obtained by Eq. 2. J dark (x) =
min ( min (J c (y)))
c∈{R,G,B} y∈Ω(x)
(2)
here, the dark channel is denoted as J dark , c represents the three colour channels, and Ω(x) denotes local patch. With the use of dark channel, He et al. derived the following equation for transmission approximation see Eq. 3. t (x) = 1 − w
min ( min (
c∈{R,G,B} y∈Ω(x)
I c (y) )) Ac
(3)
here, w(0 < w < 1)is a parameter which can be modified accordingly to the amount of haze to be considered. c denotes the three colour channels, Ω(x) represents local patch. I c denotes the dark channel of the weather degraded image, and Ac denotes airlight, approximated to the top 0.1% brightest pixel in the dark channel. If the atmosphere is considered homogeneous, the transmission coefficient of a medium is given as Eq. 4. (4) t (x) = e−βd(x) where the transmission coefficient of the atmosphere is given as t (x), β denotes the medium’s scattering factor, and d(x) denotes the scene depth. Equation 4 clearly shows that the attenuation is directly dependent on the scene depth. Xu et al. [12] provided a novel dehazing method, known as colour attenuation prior which is based on the linear dependence of scene depth on the saturation and brightness of the hazy image. (see Eq. 5). d(x) = θ0 + θ1 .S(x) + θ2 .V (x)
(5)
where d(x) denotes the scene depth, S(x) denotes the saturation values, and V (x) denotes the brightness values of an image. θ0 , θ1 , and θ2 are the coefficients of the linear equation, whose values were approximated by the authors using a machine learning model and actual data. The values so obtained are θ0 = 0.12, θ1 = 0.96, θ2 = −0.78. From the above discussed works, it is clearly visible that dark channel prior gives a direct approximation of airlight from dark channel, whereas colour attenuation prior gives a direct approximation of transmission map. This paper utilises this and devise a method to perform parallel approximation of airlight and transmission map. The above discussed works, dark channel prior and colour attenuation prior, both use some kind of filter, such as guided filter for refining the approximated transmission map. This paper also aims to overcome the problem of using a filter to refine transmission map. This paper proposes a parallel calculation of airlight and transmission map, followed by transmission map refinement using Markov random field inspired from Xu et al. [4] (refer Fig. 2).
Image Restoration from Weather Degraded Images Using Markov Random Field
67
INPUT HAZY IMAGE(RGB)
Compute Dark Channel of the image
Change image into HSV model
Scene depth calculation
Airlight Calculation
Transmission coefficient calculation
top brightest pixel of dark channel
refined using Markov Random Field
SCENE RADIANCE RECOVERY
Fig. 2 Flow chart of proposed Dehazing algorithm
After airlight and transmission has been estimated, the scene radiance is given by Eq. 6. (I (x) − Ac ) + Ac (6) J (x) = max(t (x), t0 ) where J (x) is the recovered haze free scene radiance, I (x) is the hazy image, t (x) is the refined transmission coefficient, t0 is the threshold transmission (t (x) ≥ t0 ), and Ac is the airlight calculated from the dark channel.
4 Proposed Methodology Markov fields are often utilised in image processing to segregate an image. Generally, Markov random field model labels each pixel of an image F with fixed labels. The set of unique labels Ω is dependent on a defined neighbourhood(4-neighbour or 8-
68
A. Dey et al.
neighbour) of each pixel. The labelling set Ω can be effectively estimated by proper minimization of the cost function: V (Ω, F) = V1 (Ω, F) + V2 (Ω)
(7)
where V (Ω, F) represents the final energy state, V1 (Ω, F) represents the deviation from labelling, and V2 (Ω) represents the spatial dependencies of pixels on neighbours. Here, the input image field is the transmittance map obtained from the HSV model [12], and the set Ω is the labelling obtained employing clustering the transmittance map. The estimated transmittance map can be effectively refined by performing proper minimization of the cost function V (Ω, F). The transmittance map obtained by employing the HSV model [12] is pixel-based, facilitating easy labelling. Let us consider a variable f ( p) representing each pixel of the transmittance map, i.e. f i where f ∈ t(transmission map obtained) . Let f L ( p) denotes labelling of each pixel( p) and f i ( p) represent the actual pixel intensity. Then, the cost function is defined as V p ( f L ( p), f i ( p)) + V p,q ( f L ( p), f L (q)) (8) V ( fL ) = p∈t
( p,q)∈N
where q represents neighbours of p and N represents the 4-neighbourhood. The term V p ( f L ( p), f i ( p)) signifies the cost of associating a label with pixel p. The function is given in the form of the following equation: V p ( f L ( p), f i ( p)) = ω × log(( f L ( p) − f i ( p))2 + 1)
(9)
where ω is a weighing parameter. The next term in Eq. (8) is the smoothing term which gives the cost of assigning different labels to two neighbour pixels. It is given as follows: V p,q ( f L ( p), f L (q)) = min(ω1 × ( f L ( p) − f L (q))2 , log(( f L ( p) − f L (q))2 + 1) (10) where ω1 is a weighing parameter. To solve the Markov random field minimization problem, graph cut methods have been used as referred in [13], which is faster when compared to simulated annealing [14]. The initial labelling of the transmittance map is done using k-means clustering.
5 Experimental Results The experiment is processed in a device with 6GB RAM, Intel Core i3-3217U CPU, 1.80 GHz Processor, and Windows 10 Home Single Language operating system. The initial labelling of transmittance map was done considering 24 labels, w = 0.2
Image Restoration from Weather Degraded Images Using Markov Random Field Table 1 Comparison of reference image quality parameter for images Serial No. Image SSIM PSNR MSE 1 2 3 4 5 6 7 8 9 10 11 12
Merry go round Children’s slide Bench Park crossroad Tree Swing Children’s cage Children’s playwheel Slide Flower graffiti Road Tree
69
Cor.
0.4828
27.7387
109.4477
0.4652
0.6454
27.8390
106.9483
0.6861
0.4010 0.4766 0.4800 0.6241 0.5627
27.9887 27.8945 27.8582 28.1162 27.7121
103.3239 105.5906 106.4780 100.3356 110.1197
0.4741 0.5855 0.6615 0.4784 0.5856
0.5788
27.8845
105.8351
0.6218
0.5159 0.3888 0.5778 0.4813
28.0380 27.7321 28.1170 27.6406
102.1581 109.6136 100.3161 111.9479
0.3144 0.3494 0.6541 0.6521
and w1 = 10. Table 1 provides an extensive comparison of recovered images with ground truth based on image parameters. Table 2 gives a visual depiction of the results obtained. The qualitative performance of this algorithm has been compared with the previously done work by others. Comparison has been done with the results obtained from dark channel prior [2], colour attenuation prior [12], and Meng’s method [15]. The results are provided in Table 3.
6 Conclusion and Future Work Single image haze removal has been performed in this paper. A parallel calculation of airlight and transmission map based on dark channel prior and colour attenuation prior has been proposed. Refinement of transmission map is carried out incorporating a Markov random field model. The results obtained suggest this method is quite effective in haze removal. One salient feature of this method is that image colour distribution is retained. The comparison of qualitative and quantitative results with different other existing logic also shows competitive results for the same. Statistical parametric study (MSE, SSIM, PSNR, correlation, FPS) shows the objective effectiveness of the proposed model.
70
A. Dey et al.
Table 2 Experimental results for different photos using this method Serial No. Image name Hazy image Dehazed image
1
Merry go round
2
Children’s slide
3
Bench
4
Park crossroad
5
Tree
6
Swing
7
Children’s cage
8
Children’s playwheel
9
Slide
Image Restoration from Weather Degraded Images Using Markov Random Field Table 3 Qualitative comparison with different algorithms Image name Hazy image DCP CAP
Merry go round
Slide
Bench
Children’s cage
Tree
Road
Flower graffiti
Swing
Meng
71
Our method
72
A. Dey et al.
In the future, we aim to expand our method into a real-time application by improving the time complexity of the method. This work has been carried out in the Digital Control and Image Processing Lab. ETCE Department, Jadavpur University.
References 1. Cai B, Xu X, Jia K, Qing C, Tao D (2016) Dehazenet: an end-to-end system for single image haze removal. IEEE Trans Image Process 25(11):5187–5198 2. He K, Sun J, Tang X (2011) Single image haze removal using dark channel prior. IEEE Trans Pattern Anal Mach Intell 33(12):2341–2353 3. He K, Sun J, Tang X (2013) Guided image filtering. IEEE Trans Pattern Anal Mach Intell 35(6):1397–1409 4. Hao X, Tan Y, Wang W, Wang G (2020) Image dehazing by incorporating Markov random field with dark channel prior. J Ocean Univ China 19(3):551–560 5. Wang W, Yuan X (2017) Recent advances in image dehazing. IEEE/CAA J Automatica Sinica 4(3):410–436 6. Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708 7. Tan RT (2008) Visibility in bad weather from a single image. In: 2008 IEEE conference on computer vision and pattern recognition 8. Fattal R (2008) Single image dehazing. ACM SIGGRAPH 2008 papers on—SIGGRAPH’08 9. Kratz L, Nishino K (2009) Factorizing scene albedo and depth from a single foggy image. In: 2009 IEEE 12th international conference on computer vision 10. Ghosh A, Roy S, Chaudhuri SS (2020) Hardware implementation of image dehazing mechanism using Verilog HDL and parallel DCP. In: 2020 IEEE applied signal processing conference (ASPCON), pp 283–287 11. Ghosh A, Chaudhuri SS (2021) IoT based portable image dehazing machine. In: 2021 8th international conference on signal processing and integrated networks (SPIN), pp 31–35 12. Zhu Q, Mai J, Shao L (2015) A fast single image haze removal algorithm using color attenuation prior. IEEE Trans Image Process 24(11):3522–3533 13. Boykov Y, Veksler O, Zabih R (1999) Fast approximate energy minimization via graph cuts. In: Proceedings of the seventh IEEE international conference on computer vision, vol 1, 377–384 14. Jeng F-C, Woods JW (1990) Simulated annealing in compound gaussian random fields (image processing). IEEE Trans Inf Theor 36(1):94–107 15. Meng G, Wang Y, Duan J, Xiang S, Pan C (2013) Efficient image dehazing with boundary constraint and contextual regularization. In: 2013 IEEE international conference on computer vision, pp 617–624
A Feature Representation Technique for Angular Margin Loss Asfak Ali and Sheli Sinha Chaudhuri
Abstract The most crucial task in deep learning is feature extraction and representation. How features are extracted and represented can lead to a variety of possible solutions to the difficult issue of computer vision. Many studies have been conducted to extract features in an efficient manner; however, few studies have been conducted on feature representation. This work proposes a straightforward yet effective feature representation approach for the angular margin loss (Arc) feature. The representation approach makes it possible to detect similarities between distinct characteristics, which is particularly beneficial for many computer vision problems such as face recognition and object tracking. We offer what is perhaps the most comprehensive experimental evaluation of the most recent state-of-the-art face recognition algorithms, ArcFace, using three face recognition benchmarks. In most situations, our technique outperforms the previous state-of-the-art model in terms of accuracy and true positive rate. The code is available at https://github.com/asfakali/FRAML. git Keywords Feature extraction · Feature representation · ArcFace · Feature similarity
1 Introduction It is often easier to depict something in a picture than to explain it in words. Humans can gain more information through images than from words. The question now is whether a computer can achieve the same thing. The answer is “Yes”, possible with the recent advancements in machine learning and deep learning, a computer can accomplish it, however, it is usually expressed as a tensor or an array. The same way “a good graph is worth a thousand numbers”, for that reason researcher put their attention to finding the best way to represent a feature. The most common features A. Ali (B) · S. S. Chaudhuri Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_7
73
74
A. Ali and S. S. Chaudhuri
that can be extracted from images are color, texture, shape, edge, etc., using classical image processing tools. Wang et al. [1] proposed an information-based color feature presentation approach that takes into consideration the connection between adjacent components of a standard color histogram, eliminating unnecessary information and representing it as a uniform quantized color histogram. Ma et al. [2] suggested a textural feature extraction representation approach based on Gabor functions, which is nothing more than a Gaussian function modulated by a complex sinusoidal feature retrieval technique. Shape features can be represented by different approaches such as shape signatures, polygonal approximation, spatial interrelation features, moments, scale-space methods shape transform domains, etc. Shape features are helpful since they have a much shorter retrieval time than other traditional feature extraction approaches. However, shape signatures and polygonal approximation representations may lead to noise elimination. In spatial interrelation, the feature representation technique can provide some “dummy” points. Higher order moments feature extraction method is very sensitive to noise. Shape transform domain techniques are scale-invariant and robust to noise, but it gives poor result for occultation. In the recent computer vision trend, the most robust feature extraction technique is deep feature extraction. Wu et al. [3] proposed a fully convolutional deep feature extractor network called feature fusion net(FFN) for person re-identification. It utilized color histogram features in the different color domains like RGB, HSV, YCbCr, Lab, etc., and different texture features like multi-scale and multi-orientation Gabor features, in a very compact deep feature. The softmax loss function is applied to their model for person re-identification. The person re-identification problem is nothing but a task to find the feature similarity between two deep features extracted by the model of the two different persons and the task of finding the feature similarity between the two objects. So the method can be useful for tasks like object tracking, face recognition, etc. Deng et al. [4] proposed an additive angular margin loss (ArcFace) to obtain highly discriminative features for face recognition. This paper proposed a feature representation approach for Arc Feature and developed two methods to find the similarity between the features. The rest of the paper is organized as in Sect. 2. we have discussed the background of ArcFace, and Sect. 3 contains the proposed method. The experiment result and the conclusion are in Sects. 4 and 5, respectively.
2 Background Softmax loss is one of the most widely used classification losses in the machine learning and computer vision community, which is represented as L1 = −
T N −1 1 e W yi xi +b yi log n−1 T N i=0 e W j xi +b j
n=0
(1)
A Feature Representation Technique for Angular Margin Loss
(a)
(b)
(c)
75
(d)
Fig. 1 Example of LFW dataset a, b is two different images of Aaron Peirsol, and c and d are images of Akhmed Zakayev
where xi is the deep feature of the i-th sample, yi is the class. N and n are the batch size and the number of classes. There is a performance gap in deep face recognition under large intraclass appearance variations because the softmax loss function does not optimize the feature embedding for higher similarity and diversity. To fix this issue. Deng et al. [4] proposed the adaptive angular margin loss. Thus, the nearest classes are more clearly separated. The adaptive angular margin loss is given by L3 = −
N −1 1 es(cos(θ yi +m) ) log s(cos(θ j )) N i=0 es(cos(θ yi +m)) + n−1 j= j, j= y j e
(2)
where s is the radius of the hypersphere distribution of the embedding feature. An angle of θ j exists between each weight W j and the feature xi . s is used to re-scale the feature. As compared with the Arc loss, the softmax loss produces a more noticeable gap between the nearest classes, but it also produces increased ambiguity in feature embedding.
3 Proposed Technique From Sect. 1, we can see that there are different ways to represent any feature, and according to that, we will apply various methods to find the similarity of the two features, i.e., object. In this section, we will discuss the feature representation as a signal and different similarity functions for this feature representation technique. We have used four images of two different person Aaron Peirsol and Akhmed Zakayev, two of each to explain our method. The image is taken from LFW [5] dataset.
76
A. Ali and S. S. Chaudhuri
Table 1 Comparison of accuracy of different similarity on ResNet50 and MobileNetv2 backbone, best result is shown in bold font Model Arc + ResNet50 Arc + MobileNetv2 Dataset similarity Crop False True False True function LWF [8]
AgeDB30 [9]
CFP-FP [10]
Difference similarity Cross-correlation similarity Cosine similarity SSIM Difference similarity Cross-correlation similarity Cosine similarity SSIM Difference similarity Cross-correlation similarity Cosine similarity SSIM
99.42 99.42
99.30 99.23
99.10 99.20
98.88 98.98
99.42 99.48 95.32 95.32
99.23 99.32 94.92 94.92
99.20 99.12 91.62 91.63
98.98 98.83 90.83 90.72
95.32 95.48 92.56 92.37
94.92 94.75 91.23 91.11
91.63 91.55 91.50 91.66
90.72 90.28 90.83 90.83
92.56 92.23
91.23 90.97
91.66 91.13
90.83 90.09
3.1 Feature Representation as a Signal A cosine feature, also known as an angular margin loss feature or Arc Feature, is essentially the softmax of the rescaled cosine logits. In a 2D space, we may therefore visualize this property as a signal. For clarification purposes, we have combined the retrieved facial features from two photos each of Aaron Peirsol and Akhmed Zakayev in Table 1. All of the left-hand side figure plots depict various combinations of two people’s faces. The features of the images in Fig. 2a, b correspond to Fig. 1b, c, respectively. As the features were extracted from identical photos, they overlapped in both instances. As the features were extracted from identical photos, they overlapped in both instances. The facial features of two separate images of Aaron Peirsol and Akhmed Zakayev are depicted in Fig. 4a, b, respectively. It is obvious that though the features do not overlap each other due to differences in face alignment, lighting, etc., the trend or selection of the signals is more or less the same in both cases. The facial features of Aaron Peirsol image 2 and Akhmed Zakayev image 2 and Aaron Peirsol image 1 and Akhmed Zakayev image 1, respectively, are shown in Fig. 6a, b. In this case, the trend of the signal is not the same at all, from this, it can be clearly concluded that we can represent an Arc Feature as a signal.
A Feature Representation Technique for Angular Margin Loss
77
Fig. 2 Cross-correlation of same image of same person
(a) Embedding of Aaron Peirsol’s image 2
(b) Embedding of Akhmed Zakayev’s image 1
3.2 Feature Similarity Functions Feature similarity matching is one of the important tasks in the field of computer vision, machine learning, and deep learning. For Arc Feature similarity matching, the author of the ArcFace used the difference similarity metric. In this paper, we have discussed the four different types of possible metrics to find the similarity between two features, difference similarity, cosine similarity, structural similarity index measure(SSIM), and cross-correlation similarity.
3.2.1
Difference Similarity
The difference similarity is nothing but the mathematical difference between two features. Let, consider a two feature f 1 and f 2 , then the difference similarity is given by n−1 D =1− f 1i − f 2i (3) i=0
where n is the number of feature points. In our case, n is 512 as the Arc Feature gives a total of 512 feature points. Now, if the D > Th, then the features are called similar else it is not, where Th is some threshold. For the ideal case, the D = 1 for the similar feature.
78
3.2.2
A. Ali and S. S. Chaudhuri
Cosine Similarity
Cosine similarity has basically identified the similarity between two vectors by calculating the inner product in vector space. In this method, the features are represented as a vector. The cosine similarity of two feature f 1 and f 2 is given by n−1 i i i=0 f1 f2 i 2 n−1 i 2 (f ) 1 i=0 i=0 (f2 )
f1 f2 = cos(f1 , f2 ) = f1 f2 n−1
(4)
where n is the number of feature points. Now, if the cos(f1 , f2 ) > T h, then the features are called similar else it is not, where Th is some threshold. For the ideal case, the cos(f1 , f2 ) = 1 for similar features.
3.2.3
Structural Similarity Index Measure
SSIM is mainly used for finding the similarity between two images in the computer vision community. In the image processing domain, an image is considered a signal, from intuition, we can consider the SSIM as the similarity metric of the signally represented feature, as we have discussed a feature can be represented as a signal. The SSIM is given for the two features f 1 and f 2 as, SSIM( f 1 , f 2 ) =
(2μ f1 μ f2 C1 ) + (2σ f1 f2 + C2 ) (μ2f1 + μ2f2 + C1 )(σ 2f1 + σ 2f2 + C2 )
(5)
where n is the number of feature points, μ f1 the average of f 1 ; μ f2 the average of f 2 ; σ 2f1 the variance of f 1 ; σ 2f2 the variance of f 2 ; σ f1 f2 the covariance of f 1 and f 2 ; C1, C2 are constant for stabilizing the division. Now, if the SSIM > Th, then the features are called similar else it is not, where Th is some threshold. For the ideal case the SSIM = 1 for the similar feature.
3.2.4
Cross-Correlation Similarity
Cross-correlation can be obtained for any signal and is also called sliding inner product. It is beneficial for the displacement of one relative to the other. The crosscorrelation is defined as n−1 ( f1 × f2 ) = f 1i f 2i (6) i=0
Now, if the maxi ( f 1 × f 2 ) > T h, then the features are called similar else it is not, where Th is some threshold. For the ideal case, the maxi ( f 1 × f 2 ) = 1 for similar feature. Embedding of different person is shown in Figs. 4 and 6.
A Feature Representation Technique for Angular Margin Loss
79
Fig. 3 Embedding of different image of same person
(a) Cross-correlation of Aaron Peirsol’s image 2
(b) Cross-correlation of Akhmed Zakayev’s image 1
For example, the first case in Fig. 3a, b as shows the cross-correlation between the two features of the same photo of the same person, Aaron Peirsol’s image 2 and Akhmed Zakayev’s image 1, respectively. For each of the cases, we got a peak at a lag of zero and maxi ( f 1 × f 2 ) = 1 because the images are the same. Now for the second case, Fig. 5a, b is shown the cross-correlation between the two features of the different images of the same person, Aaron Peirsol, and Akhmed Zakayev, respectively. maxi ( f 1 × f 2 ) = 0.78 for Aaron Peirsol and 0.82 for Akhmed Zakayev because though the image is different, the person is the same for each of the cases we got a peak at a lag of zero. Lastly, Fig. 7a, b shows the cross-correlation between the two features of the same person, Aaron Peirsol’s image 2 and Akhmed Zakayev’s image 2 and Aaron Peirsol’s image 1 and Akhmed Zakayev’s image 1, respectively. We obtain maxi ( f 1 × f 2 ) = 0.12 for Aaron Peirsol and 0.11 for Akhmed Zakayev. There is no peak at a lag of zero because the person is different.
80
A. Ali and S. S. Chaudhuri
Fig. 4 Embedding of different person
(a) Embedding of Aaron Peirsol’s image 1 and Aaron Peirsol’s image 2
(b) Embedding of Akhmed Zakayev’s image 1 and Akhmed Zakayev’s image 2 Fig. 5 Cross-correlation of different person
(a) Cross-correlation of Aaron Peirsol’s image 2 and Aaron Peirsol’s image 2
(b) Cross-correlation of Aaron Peirsol’s image 1 and Aaron Peirsol’s image 2
A Feature Representation Technique for Angular Margin Loss Fig. 6 Embedding of different person
(a) Embedding of Aaron Peirsol’s image 2 and Akhmed Zakayevs image 2
(b) Embedding of Aaron Peirsol’s image 1 and khmed Zakayevs image 1 Fig. 7 Cross-correlation of different person
(a) Cross-correlation of Aaron Peirsol’s image 2 and Akhmed Zakayevs image 2
(b) Cross-correlation of Aaron Peirsol’s image 1 and khmed Zakayevs image 1
81
82
A. Ali and S. S. Chaudhuri
4 Experiment and Result 4.1 Experiment Setup The experiment is done on Colab Notebook, with 16GB RAM and 12GB Nvidia Tesla P100 GPU. We have used the pre-trained ArcFace Model. ResNet50 [6] and MobileNetv2 [7] have been used as the backbone for this experiment. There are two methods we adopted approaches, one is cropped face, where the person’s face is cropped and other one is without a crop, where the face is not cropped. we employed our model on LWF, AgeDB30, and CFP-FP datasets separately and tested the accuracy and true positive rate. We used five facial points to create the normalized face cropped (112 × 112) for data prepossessing.
4.2 Result Results on LWF [8], AgeDB30 [9], and CFP-FP [10] are mostly used datasets for benchmarking the face verification models. From Table 1, it is seen that in ResNet50 SSIM method outperforms other similarity methods on LWF, AgeDB30 for the without-cropped case. ResNet50 had the greatest performance on the CFP-FP dataset
Table 2 Comparison of true positive rate of different similarity on ResNet50 and MobileNetv2 backbone, best result is shown in red color Model Arc + ResNet50 Arc + MobileNetv2 Dataset similarity Crop False True False True function LWF [8]
AgeDB30 [9]
CFP-FP [10]
Difference similarity Cross-correlation similarity Cosine similarity SSIM Difference similarity Cross-correlation similarity Cosine similarity SSIM Difference similarity Cross-correlation similarity Cosine similarity SSIM
86.55 93.22
85.24 92.56
86.58 93.23
85.41 92.64
93.21 93.17 75.11 87.53
92.56 92.52 74.06 87.00
93.23 93.23 75.13 87.53
92.64 92.64 73.84 86.90
87.49 87.63 85.24 85.69
86.97 87.11 69.85 84.98
87.50 87.71 72.50 86.26
86.86 87.10 71.54 85.78
85.61 85.79
84.86 85.08
86.19 86.41
85.71 85.94
A Feature Representation Technique for Angular Margin Loss
83
for both difference similarity and cosine similarity. When it comes to crop face, SSIM performs the best on the LWF dataset. However, AgeDB30 and CFP-FP difference similarity, cosine similarity, and cross-correlation Similarity got nearly the same accuracy for the ResNet50 model. Similarly, for the MoileNetv2 backbone, on LWF, AgeDB30, and CFP-FP most of the cases cross-correlaton similarity and cosine similarity work accurately. Now, in terms of true positive rate, SSIM achieves the best performance on LWF, AgeDB30, and CFP-FP for most of the cases in both ResNet50 and MobileNetv2. On the LWF dataset, cross-correlation similarity gives the best result which is shown in Table 2.
5 Conclusion In this paper, we discussed the different ways to represent the Arc feature such as in vector form and signal form. We also introduced some techniques, like SSIM and cross-correlation to better feature marching, which increases the accuracy and matching rate. We tested the model on three face recognition benchmarks LWF, AgeDB30, and CFP-FP. It was observed SSIM and cross-correlation work more accurately than the original method of ArcFace in most cases. In the feature, the technique may also be used for different applications like object tracking and person re-identification. Acknowledgements This work has been carried out in the Digital Control and Image Processing Lab. ETCE Department, Jadavpur University.
References 1. Wang SL, Liew AWC (2007) Information-based color feature representation for image classification. In: 2007 IEEE international conference on image processing, pp VI-353–VI-356. https://doi.org/10.1109/ICIP.2007.4379594 2. Ma WY, Manjunath BS (1996) Texture features and learning similarity. In: Proceedings CVPR IEEE computer society conference on computer vision and pattern recognition, pp 425–430. https://doi.org/10.1109/CVPR.1996.517107 3. Wu S et al (2016) An enhanced deep feature representation for person re-identification. http:// arxiv.org/abs/1604.07807 4. Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4690–4699 5. Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Accessed 29 July 2022 6. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
84
A. Ali and S. S. Chaudhuri
7. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520 8. Huang G, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Tech 9. Sengupta S, Chen J-C, Castillo C, Patel VM, Chellappa R, Jacobs DW (2016) Frontal to profile face verification in the wild. IEEE Winter Conf Appl Comput Vis (WACV) 2016:1–9. https:// doi.org/10.1109/WACV.2016.7477558 10. Stylianos M, Athanasios P, Christos S, Jiankang D, Irene K, Stefanos Z (2017) AgeDB: the first manually collected. In-the-wild age database, 1997–2005. https://doi.org/10.1109/CVPRW. 2017.250
LeafViT: Vision Transformers-Based Leaf Disease Detection H. Keerthan Bhat, Aashish Mukund, S. Nagaraj, and R. Prakash
Abstract Diseases in plant leaves deplete crop quality and reduce yield. Monitoring the crops periodically is important to mitigate these risks and preserve food integrity. Traditional methods of detecting diseases in plants involve visual inspection by humans which is error-prone and cumbersome. Advances in deep learning architectures rely on convolutional neural networks (CNNs) and have achieved promising results in agricultural domain. However, detecting diseases in leaves requires the model to capture finely grained attributes in the leaf. Toward this, this paper proposes a vision transformer (ViT)-based approach which leverages selfattention mechanism to identify subtle yet discriminative regions and learn by amplifying them. The methodology also involves data augmentation and sampling strategies to combat the class imbalance problem in datasets. The proposed framework is tested on Plant Pathology 2020 dataset, and experimental results show an accuracy of 97.15% achieved which is significantly better than the performance of state-of-the-art CNN-based models on plant disease detection. Keywords Vision transformers · Leaf disease · Plant pathology
1 Introduction Agriculture is the backbone of any country since it provides food, which is a basic necessity for humans, provides a livelihood for many people, and contributes significantly to the economy of the country. As human population grows, the need for food grows. United Nations has stated that the human population on Earth may reach 9.7 billion by 2050 [1]. Keeping this enormous growth rate in view, it is essential for mankind to produce good quality food to meet the requirements of the global H. Keerthan Bhat (B) · A. Mukund · R. Prakash RV College of Engineering, Bengaluru, India e-mail: [email protected] S. Nagaraj Noesys Software Pvt. Ltd., Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_8
85
86
H. Keerthan Bhat et al.
population and ensure that crop waste is minimal. However, the loss in crop production globally due to diseases is estimated to be 20–40%, due to which the global economy suffers a loss of approximately 220 billion dollars annually [2]. This creates an alarmingly urgent need to come up with solutions for plant disease detection and management. It is found that among plant diseases, 80–90% of the diseases occur in leaves [3]. Therefore, in this paper, rather than the whole plant, the focus of research is on detecting the diseases on the leaf. Conventional and traditional methods of detecting diseases on the leaf involve farmers manually inspecting the crops and keeping track of the health of the plants. This process is non-scalable, cumbersome, and error-prone. The advances in technology such as robots and drones are used here to automatically detect and capture images of the leaf from the plants. Later, image-based plant phenotyping is done to detect diseases digitally [4]. Machine learning (ML) models are trained in a supervised way on labeled datasets so that they are able to learn from the features, recognize patterns and make predictions on the learned task with minimal human intervention [5]. Artificial intelligence (AI) and ML can be used extensively in agriculture to help farmers optimize for overall growth of their production. Disease detection in plants is seen as an image classification problem. When a computer vision (CV) model is trained using leaf images with disease and images of healthy leaves, accurate predictions are made whether a leaf, which it hasn’t seen before, is healthy or has a disease [6]. Many advances in model architectures have resulted in achieving lengthy strides in deep learning, and it is safe to say that the typical image classification problem is a solved problem [7]. However, leaf disease detection cannot be considered as a typical image classification problem. As is evident from Fig. 1, infected leaves have very small spots as signs of infestation and these hardly visible spots need to be identified in the image and classified accordingly. State-of-the-art CNN architectures depend on convolution operations where spatial features are captured uniformly. This led us to head toward self-attention mechanism [8] which has the capability to identify which parts of the image are important to make the prediction while training. This paper proposes the design of a framework using vision transformers architecture which uses multihead attention to make the predictions, and it is shown experimentally that this method achieves significantly better results than existing state-of-the-art methods. The research contributions of this paper are as follows: 1. A discussion about the performance of current state-of-the-art techniques in deep learning to detect diseases in leaves. The ablation studies of primarily CNN-based architectures such as VGG-16 [9], ResNet50 [10], DenseNet [11], SqueezeNet [12], and MobileNet [13] are shown. 2. This paper proposes a vision transformer [14]-based framework, LeafViT, to detect diseases in plant leaves. Trained on Plant Pathology 2020 dataset [15], the framework achieves an accuracy of 97.15% along with 0.91 precision, 0.93 recall, and 0.91 F1-score. It is shown empirically that the proposed framework outperforms the state-of-the-art deep learning methods in computer vision.
LeafViT: Vision Transformers-Based Leaf Disease Detection
87
3. The proposed framework includes novel data augmentation and sampling strategies to combat class imbalance problem in datasets. This is extremely useful especially because in the process of collecting plant disease data in the real world encountering a class imbalance is inevitable.
Fig. 1 Samples of infected leaves from apple plant from Plant Pathology 2020 dataset [15]. Indication of infestation will be minute but notable in the images
The arrangement of the rest of the paper is as follows. Section 2 discusses the existing literature in traditional and deep learning methods to solve leaf disease detection, fine-grained visual categorization, and vision transformers. Section 3 covers the details about the proposed method, and Sect. 4 has the implementational details. Section 5 discusses and reports the experiments conducted and results obtained. Finally, Sect. 6 briefly summarizes the contributions by the paper.
2 Related Work 2.1 Shape and Texture-Based Leaf Disease Identification Developments in computer vision algorithms have resulted in many new pipelines for detecting leaf diseases. Many of them involve extracting important features from field-acquired leaf images like shape, texture, and color after which machine learning methods are used to model and classify them. Geometric and histogram features are extracted in [16] before employing a support vector machine (SVM) for classifying the images into two of tomatoes leaf viruses. The method described in [3] combines two frameworks, one for template matching against illumination changes using adaptive orientation code matching and the other, for two-class classification using SVM. In [17], features such as intensity-level and color are extracted using a region-growing algorithm. These features are used to train a radial basis function neural network (BRBFFN) for classification.
88
H. Keerthan Bhat et al.
2.2 Deep Learning-Based Leaf Disease Detection Deep neural networks (DNN) are known to perform well on image classification tasks. The effect of adding more and more layers to CNNs has been illustrated in [9] which served as the basis for better performance in the ImageNet challenge in 2014. Currently, many state-of-the-art CNNs exist for large-scale image classification. In [10], residual networks are employed to achieve higher accuracies with increasing depth. The shortcut connections added in ResNet do not come at extra computation cost and have shown that networks can be substantially deeper. The DenseNet in [11] has feature maps of all previous layers serving as input to a given layer. DenseNets are similar to ResNets but the former has better model compactness and feature reuse. The model illustrated in [13] called MobileNet is more efficient since they use depth-wise separable convolutions. As the name suggests, they can be used in mobiles and other embedded systems with vision applications. The SqueezeNet in [12] replaces 3 × 3 filters with 1 × 1 filters and uses other techniques to shrink the number of parameters by 50 ×. The state-of-the-art networks mentioned above have been used as backbones for leaf disease detection. The method described in [4] uses a version of the deep residual network, ResNet50 as the backbone network, the output of which is fed to two different mask branches for segmentation of leaf instances. This end-to-end deep learning approach quantifies the disease severity by generating segmentation masks for all the leaf images. Models with optimal parameter size like MobileNetv2 and InceptionResNetv2 with pretrained weights from ImageNet dataset have been used in [6] to identify 38 different classes of leaf diseases. This transfer learning approach is also implemented in [18], where decompressed image format is fed to ResNet50 for the classification of diseases in leaves. The AFD-Net in [19] uses an EfficientNet [20] backbone along with some modifications in layer connections and activations. However, all the methods explored are based on the convolution operation and convolution has its limitations in detecting the fine-grained features.
2.3 Fine-Grained Visual Categorization (FGVC) Detecting subtle inter-class differences is not a straightforward task which can be accomplished by state-of-the-art CNNs. Fine-grained visual categorization requires techniques like attention-based learning to be used in conjunction with recurrent or convolutional neural networks to focus more on the subtle yet discriminative regions of images. In [21], specific attention modules are combined with CNNs in a threebranch network. This framework works well in classifying images based on overall structure as well as subtle characteristics of images. A course to fine methodology is proposed in [22] where a recurrent attention CNN learns attention and features region-wise recursively for multiple scales. The network for each scale has a CNN combined with an attention-proposal sub-network and the region parameters are
LeafViT: Vision Transformers-Based Leaf Disease Detection
89
transferred from course layers to finer ones to perform FGVC. Zheng et al. [23] propose a more efficient method to perform this task with a single CNN. The feature maps produced by the CNN are sent to a tri-linear attention module, and the attention maps generated undergo random attention sampling to detect the fine-grained features in the images.
2.4 Transformers Models in Computer Vision Transformers have recently been known to perform well in applications with sequential data like language processing tasks where they have become more and more powerful over the recent years. A multi-layer bidirectional model is proposed in [24] which obtained new state-of-the-art results in such tasks. Transformers work solely on attention mechanisms and allow for significant parallelization resulting in lesser training time, as illustrated in [8]. More recently, transformers have been applied for image classification tasks [35]. The images are split into a sequence of patches and fed to transformers for classification. Dosovitskiy et al. [14] illustrate that they produce very good results for image classification when compared with state-of-theart CNNs. Because of the self-attention mechanism of transformers, they can also be used for detecting fine-grained differences in images. On top of [14, 25] describes a hierarchical approach along with using shifting windows scheme to perform better than [14] in benchmark CV tasks. A pure transformer network has been proposed by [26] which aggregates on low-level and middle-level features from each layer. This feature fusion is important for FGVC. He et al. [27] use a similar model to select discriminative patches but unlike [26], the discriminative tokens are sent directly to the last transformer layer instead of aggregating them. The meta-information available along with images is used in assisting FGVC in [28]. Joint learning of vision and this extra information has resulted in the model outperforming the state-of-the-art approaches as described in [28].
3 Proposed Method In this paper, the leaf disease detection is approached as a fine-grained visual classification (FGVC) problem. To learn the minute features, a vision transformers (ViT) backbone is adopted to leverage the self-attention mechanism and the framework is coined the name LeafViT.
90
H. Keerthan Bhat et al.
3.1 Model Architecture Transformer-based models revolutionized the domain of natural language processing (NLP) [8, 24]. They quickly became the state of the art for modern NLP tasks and are one of the heavily researched as well as extensively implemented deep learning methods in the industry. Recently, it was studied if the same concepts can be used in computer vision on image data and it performed very well on benchmark image classification and object detection tasks [14]. Figure 2 represents the vision transformer architecture that used in this paper for leaf disease detection. The usage of a transformers model on datasets such as ImageNet and CIFAR-100 was first proposed by Dosovitskiy et al. [14]. The pipeline of a ViT model consists of seven steps. At the beginning, the input image is broken into patches of smaller fixed size. A patch size of 16 × 16 is used in the proposed model. The patches are then flattened into one-dimensional vectors. Now, each patch pixel values are represented as flattened vectors in the input sequence as shown in [14]. For an input image, (x) ∈ R H ×W ×C
(1)
Here, H × W represents the resolution of the input image and C indicates the number of channels. The split patches having a size of P × P are created such that, (x) P ∈ R N ×P×P×C
Fig. 2 LeafViT model architecture based on vision transformers [14]
(2)
LeafViT: Vision Transformers-Based Leaf Disease Detection
91
and N denotes the total number of patches created. N=
H ×W P×P
(3)
Each flattened element is passed into a linear projection layer, which outputs patch embeddings. There will be a matrix used for assigning positional embeddings after the linear projection. Each of the patches is taken and unrolled before multiplying with the embedding matrix. Linear combination of the sequence of image patches with the positional embeddings is computed so that their positional information is retained. The output from this stage would have captured the relative or absolute position of each image patch in the sequence. These embeddings will be learnable parameters. Considering the vectorized patches as I p , which have been mapped into a latent D-dimensional embedding space, Z 0 = Iclass ; I p1 E; I p2 E; . . . ; I pn E + E pos
(4)
2 where E ∈ R ( P .C )×D is the projection of the patch embedding, whereas the position embedding is denoted by E pos ∈ R (n+1)×D [14] and Iclass = z 00 . The embedding sequence of x = (x1 , . . . , xn ) given by (4) will be inputted to the transformer encoder. From Fig. 3, it can be seen that the encoder consists of L layers and each layer l (l = 1, . . . , L) is composed of a multihead self-attention layer (MSA) and a feedforward layer. The MSA layer consists of multiple attention layers in parallel. The objective of self-attention is to learn the interactions among all the embedding entities x via the encoding of each entity with respect to global contextual information. This is performed by computing three weight matrices with learnable parameters to transform queries (W f ∈ R n×d f ), keys (W g ∈ R n×dg ), and values (W v ∈ R n×dv ). The input sequence x is linearly projected using the weight matrices to obtain f = x W f , g = x Wg and v = x Wv . As shown in Fig. 3, the output z of the self-attention layer is given by
fTg z = softmax v dg
(5)
The attention scores given in (5) are obtained by applying softmax function after normalizing the dot-product of the query with all keys for each entity in the sequence. Each entity will then become the weighted sum of all entities in the sequence, wherein the weights are given by the attention scores. The MSA mechanism in any of the l-layer in the encoder comprises of h-selfattention blocks which learns the multiple complex relationships among the various elements in the query sequence. Each block will have learnable weight matrices f W h , W gh , W vh , where h = 0, . . . , (H − 1). For an x as input, the output z obtained from the h-self-attention blocks in the multihead attention given by (5)
92
H. Keerthan Bhat et al.
Fig. 3 Schematic representation of a transformer encoder block [14]. The model will have L such blocks
is then concatenated into a single matrix zl0 , zl1 , . . . , zlH −1 ∈ R n×H.dv and projected into a weight matrix W . The output of the lth multihead self-attention (MSA) layer will then be zl = MSA(LN(zl−1 )) + zl−1 , l = 1, . . . , L
(6)
where LN(.) represents the layer normalization operator and zl being the representation of the encoded image. It is then followed by a fully connected dense feed-forward block in every encoder block, zl = MLP(LN(zl )) + zl , l = (1, . . . , L)
(7)
Finally, a multiclass classifier head gives the predictions yˆ as seen in Fig. 2. A feedforward neural (FFN) is used consisting of a dense linear layer and a softmax activation function.
yˆ = FFN z 0L
(8)
The mathematical equations from the ViT paper [14] as well as [29, 30] have been referred for the above section.
LeafViT: Vision Transformers-Based Leaf Disease Detection
93
3.2 Loss Function The cross-entropy (CE) loss is used for training the LeafViT architecture. It was found to be performing better than other commonly used loss functions such as focal loss, MAE loss, and MSE loss. As depicted in [31], cross-entropy is defined as L CE = −
n
ti log( pi )
(9)
i=1
where ti is the ground truth and pi is the softmax probability for the ith class among n classes. Cross-entropy gives the measure of the difference between two probability distributions for a given random variable or set of events.
3.3 Class Imbalance The dataset that is used for training and validation suffers from class imbalance. More details on this are present in Sect. 4.1. To combat this, two strategies are adopted— data augmentation and sampling. These strategies are dataset-agnostic, and it can be used in any dataset which suffers from class imbalance. Data Augmentation: Five types of augmentation techniques are used—random crop, flip (horizontal and vertical), rotations, translation, and color distortion as shown in Fig. 4. It was found experimentally that using all these techniques gives us the best training and testing accuracies, as shown in Table 1. Using augmentations, the classes of the dataset were balanced.
Fig. 4 Visualization of various augmentation techniques from original image a, b random crop, c vertical flip, d horizontal flip, e rotate by 90°, f translation, and g color distortion
94 Table 1 Performance of LeafViT using different augmentation techniques
H. Keerthan Bhat et al.
Augmentations
Train accuracy (%)
Test accuracy (%)
Flip + rotate + translate + crop
98.45
95.01
Flip + rotate + translate + distort
98.56
95.70
Flip + rotate
98.58
95.98
Flip + rotate + translate + distort + crop
99.73
97.15
Bold indicates the results from the best performing configuration
Table 2 Performance of LeafViT using different sampling strategies
Sampling strategy
Train accuracy (%)
Test accuracy (%)
Class balanced
99.52
95.73
Progressively balanced
98.83
94.58
Instance balanced
99.73
97.15
Bold indicates the results from the best performing configuration
Data Sampling: Instance-balanced sampling is utilized as opposed to classbalanced and progressively balanced sampling [32]. As shown in Table 2, instancebalanced sampling gives the best performance on test set. Instance-balanced sampling is a technique where in the probability of choosing a datapoint from a class is proportional to the cardinality of that class. Whereas in class-balanced, the probability of each class being selected will be equal and progressively balanced involves using instance-balanced in the initial epochs and class-balanced sampling in the final epochs.
4 Implementation 4.1 Dataset The work presented in this paper utilizes the Plant Pathology 2020 dataset [15] from Kaggle (URL: https://www.kaggle.com/competitions/plant-pathology-2020-fgvc7/ data). It provides healthy and infected images of leaves from Apple plant. The dataset contains 1821 labeled images with four classes—healthy, multiple diseases, rust, and scab as shown in Fig. 5. The images are in RGB format having resolution 1365 × 2048 or 2048 × 1365. Figure 6 depicts the distribution of the four classes and Table
LeafViT: Vision Transformers-Based Leaf Disease Detection
95
Fig. 5 Samples from Plant Pathology 2020 dataset containing four classes—rust, scab, multiple diseases, healthy
3 number of images per class in the dataset and as seen, images in the multiple diseases class are very low compared to the other three, which are almost balanced. An 80:0.7:19.3 ratio is taken for the training, validation, and testing split. It is to be noted here that a very small sample for the validation set is taken since it is only used to determine the best-performing model across all epochs. Table 4 shows the number of images across the four classes across the split sets. Mean normalized 224 × 224 input images are taken as input after applying the data augmentations as explained previously.
4.2 Hyperparameters A standard stochastic gradient descent (SGD) optimizer with 0.001 learning rate and a cosine scheduler was used. The weight decay was set to 0.0005 and momentum to 0.9. Batch size was taken as 16; however, it was found that varying the batch size wasn’t affecting the model performance very much. All models were trained for 30 epochs using ImageNet pretrained weights initialization. The algorithms and methods were built on top of PyTorch 1.12.0 framework. The methods and evaluations discussed in this paper were ran on Google Colab Pro which provides the following hardware—2 × vCPU, NVIDIA T4 or Tesla P100 GPU used one at a time, 32 GB RAM, Linux-based OS.
96
H. Keerthan Bhat et al.
Fig. 6 Distribution of the classes in the labeled images of Plant Pathology 2020
Table 3 Number of images per class in the dataset
Class
Numbers
Rust
622
Scab
592
Multiple diseases
91
Healthy
516
Total
Table 4 Number of images per class in the training, validation, and testing sets
Class
1821
Training
Rust
497
Scab Multiple diseases Healthy Total
Validation
Testing
5
120
483
3
106
70
1
20
407
4
105
1457
13
351
5 Experiments This section contains all the experiments and analyses that were performed. The performance of the proposed framework is benchmarked against that of convolutionbased deep learning models, and the results from the model are analyzed.
LeafViT: Vision Transformers-Based Leaf Disease Detection
97
5.1 Ablation Studies A comparison of the performance of LeafViT against the performance of state-ofthe-art computer vision architectures such as VGG [9], ResNet [10], DenseNet [11], SqueezeNet [12], and MobileNet [13] on the same training and testing datasets and same hyperparameters is given. Table 5 demonstrates this comparison using model accuracy as the metric, similar to the comparison given in [33]. Among the variants available in the models, VGG16, ResNet50, and DenseNet161 were used. As is evident from Table 5, good performance is obtained from the proposed framework, LeafViT and it outperforms all the other methods. From this, it can be said that convolution-free LeafViT which incorporates self-attention to capture the finegrained visual features is a better choice than convolution operation-based networks for plant disease detection. Among standard vision transformers, three variants are available with varying computational complexities. In increasing number of encoder blocks and hence number of parameters, base, large, and huge are the variants. This paper excludes huge from evaluation since huge is too complex of a model considering the dataset that is used. A study of the performance across base and large with patch sizes 16 and 32 of them as shown in Table 6. Large ViT with patch size 16 is observed to be performing the best, and hence, it is adopted as the backbone in the proposed LeafViT framework. As far as computational cost or number of parameters is concerned, the complexity of LeafViT is comparable with VGG16 in terms of the order of degree. The ResNet50 and DenseNet161 come next in the list as shown in Table 7. The aforementioned table Table 5 Performance evaluation of LeafViT with CNN-based architectures
Model architecture
Train accuracy (%)
Test accuracy (%)
VGG16
94.71
93.64
ResNet50
96.29
95.03
DenseNet161
97.46
94.47
MobileNet
94.37
94.75
SqueezeNet
93.48
90.61
LeafViT
99.73
97.15
Bold indicates the results from the best performing configuration
Table 6 Performance across different vision transformer variants
Model architecture
Train accuracy (%)
Test accuracy (%)
ViT B/16
98.61
95.73
ViT B/32
97.73
93.37
ViT L/16
99.73
97.15
ViT L/32
96.29
92.59
Bold indicates the results from the best performing configuration
98
H. Keerthan Bhat et al.
Table 7 Number of parameters across various models
Model architecture
Number of params
VGG16
138M
ResNet50
25M
DenseNet161
29M
MobileNet
3.5M
SqueezeNet
1.2M
ViT B/16
86M
ViT B/32
88M
ViT L/16
304M
ViT L/32
306M
provides a comparison of number of parameters across all the models evaluated on the dataset. It has to be noted that if there is a need for a low-complex model to be deployed in any kind of edge device or a smartphone, MobileNet might be a better option because it provides a trade-off of some performance for a considerable reduction in computational cost and running time.
5.2 Evaluation Metrics The dataset has four classes—rust, scab, multiple diseases, and healthy. By running the classifier models on the test set, a predicted class label is obtained for each image. The predicted class is compared with the ground truth class, and the values for the metrics are obtained. Using the predicted and actual labels for each class ci , the values of true positives (TPci ), false positives (FPci ), true negatives (TNci ), and false negatives (FNci ) are obtained for each class. With these, the accuracy, precision, and recall are deduced as explained in [34]. Accuracy is the number of correct predictions to the total number of predictions, as shown in (10). Precision is a metric computed as the ratio between number of positive datapoints classified correctly to the total number of datapoints classified as positive which is depicted in (11). Recall is a metric obtained when a ratio between number of positive datapoints classified correctly to the total number of positive datapoints is taken as shown in (12). While accuracy indicates how many times the model is correct overall, precision talks about how reliable the model is when classifying samples as positives and recall reflects on how many times the model is able to detect a specific category. Accuracyci =
TPci + TNci TPci + TNci + FPci + FNci
(10)
TPci TPci + FPci
(11)
Precisionci =
LeafViT: Vision Transformers-Based Leaf Disease Detection
99
Fig. 7 Confusion matrix from LeafViT where horizontal axis is ground truth and vertical is predicted labels
Recallci =
TPci TPci + FNci
(12)
Generally, it is required to make a trade-off between the precision and recall. There is another metric which captures both the precision and recall values of the classifier for each class by taking their harmonic mean. It is called the F1-score, as denoted in (13). F1 - scoreci =
2 × precisionci × recallci precisionci + recallci
(13)
Figure 7 is the confusion matrix of the performance on test set from the proposed method, LeafViT. The horizontal axis represents the ground truth while the vertical axis represents the predicted labels. The evaluation metrics are computed class-wise as shown in Table 8. For the entire test set, mean precision is obtained as 0.91, mean recall as 0.93, and mean F1-score as 0.91 for the LeafViT framework. From the table, it can be noted that the metrics are hampered especially by the multiple diseases category. Since the leaves in that category might contain rust and scab, the model can predict one of them without detecting the other. Additionally, the graphs for loss values, training, and testing accuracy across epochs obtained during the training of the proposed LeafViT are provided in Fig. 8.
100
H. Keerthan Bhat et al.
Table 8 Accuracy, precision, recall, and F1-score computed class-wise Class
Accuracy (%)
Precision
Recall
F1-score
Healthy
99.43
1
0.98
0.99
Multiple
97.44
0.65
0.87
0.74
Rust
99.15
0.98
0.99
0.98
Scab
98.29
0.99
0.88
0.93
0.91
0.93
0.91
Overall
Fig. 8 Plots from LeafViT training for 30 epochs—a loss curve across epochs, b training accuracy versus epochs, c testing accuracy versus epochs
6 Conclusion The field of agriculture is plagued by diseases and infections to the crops which makes timely identification of infestation crucial. In this paper, the leaf disease detection is studied as a fine-grained visual classification (FGVC) problem instead of a typical image classification problem. The paradigm of self-attention via the vision transformer models is leveraged to learn and specify Apple foliar diseases like rust and scab in the Plant Pathology 2020 dataset. The proposed LeafViT model is to be preferred for its high accuracy and other classification metrics since it outperforms current state-of-the-art CNN-based models like VGG, ResNet, and DenseNet. Furthermore, the class imbalance in the datasets for these kinds of tasks is studied and data augmentation and sampling strategies are analyzed to mitigate its adverse effects. It is evident from the experiments conducted that researchers and practitioners can move away from convolution operations toward attention-based models and the scope of vision transformers in agriculture looks promising. In future works, research will be conducted about how to lower the computational costs in transformer-based models while preserving the accuracies to make it more suitable for usage in edge devices and smartphones. Additionally, the usage of vision transformer-based backbones for object detection and segmentation tasks will be studied so that the framework will not only detect whether there is a disease in the leaf, it should also be able to identify and localize the region of infestation.
LeafViT: Vision Transformers-Based Leaf Disease Detection
101
References 1. United Nations, Department of Economic and Social Affairs (2019) World population prospects 2019: highlights. https://www.un.org/development/desa/publications/world-popula tion-prospects-2019-highlights.html. Last accessed 29 Oct 2022 2. Food and Agriculture Organization of the United Nations (2019) News story. https://www.fao. org/news/story/en/item/1187738/icode/. Last accessed 29 Oct 2022 3. Zhou R, Kaneko S, Tanaka F, Kayamori M, Shimizu M (2014) Disease detection of Cercospora Leaf Spot in sugar beet by robust template matching. Comput Electron Agric 108:58–70 4. Garg K, Bhugra S, Lall B (2021) Automatic quantification of plant disease from field image data using deep learning. IEEE Winter Conf Appl Comput Vis (WACV) 2021:1964–1971. https://doi.org/10.1109/WACV48630.2021.00201 5. Nagaraj S, Seshachalam D, Hucharaddi S (2018) Model order reduction of nonlinear circuit using proper orthogonal decomposition and nonlinear autoregressive with eXogenous input (NARX) neural network. In: 2018 16th ACM/IEEE international conference on formal methods and models for system design (MEMOCODE), pp 1–4. https://doi.org/10.1109/MEMCOD. 2018.8556906 6. Hassan S, Maji A, Jasi´nski M, Leonowicz Z, Jasi´nska E (2021) Identification of plant-leaf diseases using CNN and transfer-learning approach. Electronics 10:1388. https://doi.org/10. 3390/electronics10121388 7. Deng T (2022) A survey of convolutional neural networks for image classification: models and datasets. In: 2022 international conference on big data, information and computer network (BDICN), pp 746–749. https://doi.org/10.1109/BDICN55575.2022.00145 8. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. https://arxiv.org/abs/1706.03762 9. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/ARXIV.1409.1556 10. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. https://doi. org/10.48550/ARXIV.1512.03385 11. Huang G, Liu Z, van der Maaten L, Weinberger KQ (2016) Densely connected convolutional networks. https://doi.org/10.48550/ARXIV.1608.06993 12. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 0, ηik = 1 − mbik − 1 − mbik ηi∗
n 1 = ηik , N k=1
(8)
where k ∈ [1, N]. However, it considers the noisy points as data points hence unable to handle them properly.
2.4 The Kernel Version of Intuitionistic Fuzzy C-Means Algorithm (KIFCM) [6] Here authors [24] introduced Radial Basis kernel function for calculating the distance. Objective function of KIFCM is [25]: OJKIFCM =
tc n
∗m φ(tk ) − φ(mbi )2 + mbik
i=1 i=0
tc
πi∗ e1−πi . 2
(9)
i=1
3 Optimization Techniques This section discusses the four metaheuristic algorithms used in this paper PSO, FA, GA, and ABC.
Performance Analysis of Hybridized Fuzzy Clustering Algorithms …
449
3.1 Particle Swarm Optimization (PSO) It is easy to implement, describe and is hinged on the swarm intelligence conceptbased global search potential and quick confluence. In the N-dimensional space, the total number of particles is n and 1 ≤ i ≤ n. Ii = [Ii1 , Ii2 , Ii3 , . . . . . . . . . , Ii N ] is the position of particle i with velocity represented by, Vi = [Vi1 , Vi2 , Vi3 , . . . . . . . . . , Vi N ], and its best position is Pb = [Pi1 , Pi2 , Pi3 , . . . . . . . . . , Pi N ]. The best position of all the particles is Pg = [P1 , P2 , P3 , . . . . . . . . . , Pn ]. . c1 and c2 are known as the accelerated constants; r1 and r2 are real numbers between [0,1]. The algorithm is given as follows. Step 1: Initialization. (a) Initialize Pb to its initial position Pb (i, 0) = Pi (0). (b) Initialize Pg to minimum value of the swarm Pg (0) = argmin f [Pi (0)]. (c) Initialize velocity: Vi = U (−|U B − L B|, |U B − L B|). Step 2: Repeat. (a) Update particle’s velocity as: Vin+1 = Vin + c1r1 Pb − Iin + c2 r2 Pg − Iin .
(10)
(b) Update particle’s position as [26]: Iin+1 = Iin + Vin .
(11)
Step 3: Output Pg (n) that has the best solution. The drawback of PSO is that all the particles lie in the optimal direction in the confluence condition, and the particle can easily fall into the local optimum.
3.2 Firefly Algorithm (FA) Firefly Algorithm is a metaheuristic approach developed by Yang [14]. All fireflies are androgynous where any firefly can attract to other. The amount of attractiveness among the fireflies is directly proportional to the intensity of their lights and inversely proportional to the distance between them, i.e., I ∝ 1/d 2 . The firefly with low intensity of light gets attracted toward the firefly having high intensity of light, whereas the firefly having same intensity moves in random way. The intensity is calculated by the objective function of the problem under consideration. The parameter which controls the strength of the randomness or perturbations in FA is α. As the attractiveness among the firefly is dependent on the light intensity, so the attractiveness variant β is calculated for distance d as [27, 28]:
450
K. Bhalla and A. Gosain
β = β0 ε−γ d . 2
In the above equation, γ is the absorption coefficient; d is the distance, and β0 is the attractiveness at d = 0. The algorithm is given as follows: Step 1: Initialization. (a) Initialize all the parameters (α, β, γ, n). (b) Initialize population of n fireflies randomly. (c) Evaluate fitness of initial population at xi by f (xi ) for i = 1……., n. Step 2: Repeat. ∀i = 1 . . . . . . n. ∀ j = 1 . . . . . . n. if j if brighter (better) than i 2 xit+1 = xit + β0 ε−γ di j x tj − xit + αt εit .
(12)
Step 3: Output that has the best solution. Here, αt and εit are the randomization parameter and randomized variable at time t, respectively. The drawback Firefly Algorithm is the articulation of attractiveness and the variation of the intensity of light.
3.3 Artificial Bee Colony (ABC) In this algorithm, the Artificial Bee Colony comprises three groups: working bees, spectators, and lookouts. Half of the colony consists of the working artificial bees and the remaining half consists of the spectators. For each food particle, there is only one working bee. Hence, the number of working bees is equal to the number of food particles. The working bees determine the food source in the neighborhood and share this information with the spectators in the hive. Spectators select a food source [12, 13, 29–31]. The working bee of which the food has been abandoned becomes a lookout and starts a new search for food. The selection of a food particle by a spectator bee is dependent on the amount of nectar F(θ ) of that food particle. The algorithm is given as follows: Step 1: Initialization. (a) Initialize the population xi = 1 . . . .n. Step 2: Repeat. (a) Produce new solutions vi for the working bees by using
Performance Analysis of Hybridized Fuzzy Clustering Algorithms …
vi j = θi j xi j − xk j .
451
(13)
(b) Apply the greedy selection process for the working bees. (c) Calculate the probability Pi food particle located at location θi for the solutions xi using F(θi ) . Pi = S k=1 F(θk )
(14)
(d) Produce the new solutions for vi the spectators from the solutions xi selected depending on updated Pi . (e) Determine the abandoned solution for the lookouts, if it exists, and replace it with a new randomly produced solution xi using j j j − xmin , ∀ j = 1 . . . . . . D. x ij = xmin + rand[0, 1] xmax
(15)
(f) Iterate. Step 3: Output that has the best solution.
3.4 Genetic Algorithm (GA) This algorithm involves the generation of random size population of n number of chromosomes. The fitness value f (xi ) of each chromosome i is evaluated in the entire population. According to the fitness value, two chromosomes are selected from population P in T times to generate the new population [31]. The algorithm is given as follows: Step 1: Initialization. (a) Initialize population P, size n and maximum number of iterations, Maxit . (b) Generate initial population of n chromosomes, Yi (i = 1 . . . ..n). Step 2: Repeat. (a) Selection: Select parents from initial population based on the fitness value f (xi ) based on probability P(xi ) f (xi ) . j=1 f (x j )
P(xi ) = n (b) (c) (d) (e)
(16)
Cross-over: Generate offspring using parents by applying crossover operation. Mutation: Mutate new offspring by applying mutation operation. Replace: Replace old population with new population. Iterate.
452
K. Bhalla and A. Gosain
Step 3: Output that has the best solution.
4 Data Set Used For experimental purpose, we have considered synthetic data sets D10, D12, and 10 , D12 is the union of D10 and D14. D10 data set is a noiseless data set of points xi=1 2 4 . It is shown in Figs. 1, 2, and outliers xi=1 , D14 is the union of D10 and outliers xi=1 3.
Fig. 1 D10 data set
Fig. 2 D12 data set
Performance Analysis of Hybridized Fuzzy Clustering Algorithms …
453
Fig. 3 D14 data set
5 Experimental Analysis and Results We have implemented hybridized FCM, IFCM, and KIFCM using MATLAB on synthetic data sets D10, D12, and D14 discussed above. Iterations and best cost for each algorithm are shown in Tables 1, 2, and 3. Results are also shown graphically in Figs 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. Table 1 D10 data set
Fuzzy clustering
Optimization
Iterations
Best cost
FCM
PSO
1000
1.3928e−267
FA
1000
7.713e−21
ABC
200
3.4666e−23
IFCM
KIFCM
GA
100
27.558
PSO
1000
2.0379e−268
FA
1000
4.9099e−21
ABC
200
4.4609e−22
GA
100
26.1821
PSO
1000
1.3057e−270
FA
1000
3.3846e−21
ABC
200
1.6838e−21
GA
100
25.03
454 Table 2 D12 data set
K. Bhalla and A. Gosain
Fuzzy clustering
Optimization
Iterations
Best cost
FCM
PSO
1000
2.9208e−267
FA
1000
1.0854e−20
ABC
200
1.4516e−22
GA
100
26.0344
PSO
1000
3.5098e−271
FA
1000
4.7189e−21
ABC
200
9.64e−23
GA
100
25.2035
PSO
1000
2.9041e−277
FA
1000
1.067e−20
ABC
200
1.3739e−23
GA
100
25.045
Fuzzy clustering
Optimization
Iterations
Best cost
FCM
PSO
1000
8.2682e−277
FA
1000
1.2506e−20
ABC
200
5.2454e−22
GA
100
27.8688
PSO
1000
2.6555e−274
FA
1000
8.6023e−21
ABC
200
7.675e−23
IFCM
KIFCM
Table 3 D14 data set
IFCM
KIFCM
GA
100
25.7882
PSO
1000
1.2095e−272
FA
1000
1.1727e−21
ABC
200
4.7433e−23
GA
100
25.048
5.1 Results Observed for D10 Data Set In case of D10 data set, hybridized KIFCM outperformed hybridized FCM and hybridized IFCM and also KIFCM when hybridized with PSO performed better than ABC, FA, and GA.
Performance Analysis of Hybridized Fuzzy Clustering Algorithms …
455
Fig. 4 Fuzzy clustering with PSO for D10
Fig. 5 Fuzzy clustering with FA for D10
5.2 Results Observed for D12 Data Set In case of D12 data set, hybridized KIFCM outperformed hybridized FCM and hybridized IFCM and also KIFCM when hybridized with FA performed better than PSO, ABC, and GA.
456
K. Bhalla and A. Gosain
Fig. 6 Fuzzy clustering with ABC for D10
Fig. 7 Fuzzy clustering with GA for D10
5.3 Results Observed for D14 Data Set In case of D14 data set, hybridized KIFCM outperformed hybridized FCM and hybridized IFCM and also KIFCM when hybridized with FA performed better than PSO, ABC, and GA.
Performance Analysis of Hybridized Fuzzy Clustering Algorithms …
457
Fig. 8 Fuzzy clustering with PSO for D12
Fig. 9 Fuzzy clustering with FA for D12
6 Conclusion and Future Scope In this paper we have hybridized three fuzzy clustering algorithms FCM, IFCM, and KIFCM, using metaheuristic algorithms GA, PSO, FA, and ABC. We have done the experimental analysis on three synthetic data sets D10, D12, and D14 using MATLAB. It is observed that optimized KIFCM outperformed optimized IFCM and optimized FCM. The results are shown graphically and in tabular form. The inspiration for future work could look into various stochastic algorithms that can be hybridized with fuzzy clustering to enhance the algorithm’s performance.
458 Fig. 10 Fuzzy clustering with ABC for D12
Fig. 11 Fuzzy clustering with GA for D12
K. Bhalla and A. Gosain
Performance Analysis of Hybridized Fuzzy Clustering Algorithms … Fig. 12 Fuzzy clustering with PSO for D14
Fig. 13 Fuzzy clustering with FA for D14
459
460
K. Bhalla and A. Gosain
Fig. 14 Fuzzy clustering with ABC for D14
Fig. 15 Fuzzy clustering with GA for D14
References 1. Kaur P, Soni AK, Gosain A (2013) RETRACTED: a robust kernelized intuitionistic fuzzy c-means clustering algorithm in segmentation of noisy medical images. 163–175 2. Yu X (2014) Blurred trace infrared image segmentation based on template approach and immune factor. Infrared Phys Technol 67:116–120 3. Yu X et al (2016) Target extraction of banded blurred infrared images by immune dynamical algorithm with two-dimensional minimum distance immune field. Infrared Phys Technol 77:94–99 4. Meenakshi S, Suganthi M, Sureshkumar P (2019) Segmentation and boundary detection of fetal kidney images in second and third trimesters using kernel-based fuzzy clustering. J Med Syst 43(7):1–12 5. Zadeh LA (1965) Zadeh, fuzzy sets. Inform Control 8:338–353
Performance Analysis of Hybridized Fuzzy Clustering Algorithms …
461
6. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters 32–57 7. Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer Science and Business Media 8. Dave RN (1991) Characterization and detection of noise in clustering. Pattern Recogn Lett 12(11):657–664 9. Dave RN (1993) Robust fuzzy clustering algorithms. In: [Proceedings 1993] second IEEE international conference on fuzzy systems. IEEE 10. Kaur P, Gupta P, Sharma P (2012) Review and comparison of kernel based fuzzy image segmentation techniques. Int J Intell Syst Appl 4(7):50 11. Zhang D-Q, Chen S-C (2003) Clustering incomplete data using kernel-based fuzzy c-means algorithm. Neural Process Lett 18(3):155–162 12. Bansal JC, Sharma H, Jadon SS (2013) Artificial bee colony algorithm: a survey. Int J Adv Intell Paradigms 5(1/2):123–159 13. Karaboga D (2010) Artificial bee colony algorithm. Scholarpedia 5(3):6915 14. Fister I, Yang X-S, Fister D (2014) Firefly algorithm: a brief review of the expanding literature. Cuckoo Search firefly algorithm (2014): 347–360. 15. Han J (2012) Micheline KAMBER a Jian PEI. Data mining: concepts and techniques 6–7 16. Wu X et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37 17. Kesavaraj G, Sukumaran S (2013) A study on classification techniques in data mining. In: 2013 fourth international conference on computing, communications and networking technologies (ICCCNT). IEEE 18. Kaur P, Gosain A (2011) A density oriented fuzzy c-means clustering algorithm for recognising original cluster shapes from noisy data. Int J Innovative Comput Appl 3(2):77–87 19. Xiang Y et al (2013) Automatic segmentation of multiple sclerosis lesions in multispectral MR images using kernel fuzzy c-means clustering. In: 2013 IEEE international conference on medical imaging physics and engineering. IEEE 20. Saikumar T et al (2013) Image segmentation using variable kernel fuzzy C Means (VKFCM) clustering on modified level set method. In: Computer networks and communications (NetCom). Springer, New York, pp 265–273 21. Kaur P, Gosain A (2010) Density-oriented approach to identify outliers and get noiseless clusters in Fuzzy C—Means. In: International conference on fuzzy systems. IEEE 22. Gosain A, Dahiya S (2016) Performance analysis of various fuzzy clustering algorithms: a review. Procedia Comput Sci 79:100–111 23. Tasdemir K, Merényi E (2011) A validity index for prototype-based clustering of data sets with complex cluster structures. IEEE Trans Syst Man Cybern Part B (Cybern) 41(4):1039–1053 24. Tushir M, Srivastava S (2010) A new Kernelized hybrid c-mean clustering model with optimized parameters. Appl Soft Comput 10(2):381–389 25. Chaira T (2011) A novel intuitionistic fuzzy C means clustering algorithm and its application to medical images. Appl Soft Comput 11(2):1711–1717 26. Niu Q, Huang X (2011) An improved fuzzy C-means clustering algorithm based on PSO. J Softw 6(5):873–879 27. Senthilnath J, Omkar SN, Mani V (2011) Clustering using firefly algorithm: performance study. Swarm Evol Comput 1(3):164–171 28. Hassanzadeh T, Meybodi MR (2012) A new hybrid approach for data clustering using firefly algorithm and K-means. In: The 16th CSI international symposium on artificial intelligence and signal processing (AISP 2012). IEEE 29. Price K, Storn RM, Lampinen JA (2006) Differential evolution: a practical approach to global optimization. Springer Science & Business Media 30. Pinto ARF, Crepaldi AF, Nagano MS (2018) A genetic algorithm applied to pick sequencing for billing. J Intell Manuf 29(2):405–422 31. Ding Y, Fu X (2016) Kernel-based fuzzy c-means clustering algorithm based on genetic algorithm. Neurocomputing 188:233–238
AI-Based Open Logo Detection for Indian Scenario S. N. Prasanth, R. Aswin Raj, M. A. C. Sivamaran, and V. Sowmya
Abstract Logo detection from images and videos has a wide range of applications. Most of the existing logo datasets does not contain the logos of Indian companies. Since there is no existing dataset with Indian companies’ logo, we created our own dataset covering three different industries with 4 different classes for each industry. In this work, we have trained a YOLOv5 model to detect Indian logos in closed and QMUL-Openlogo dataset settings, with pretrained weights and without pretrained weights. In addition to this, we also created a GUI using Python’s Tkinter library. The Python code for GUI can be accessed from our GitHub repository. (https://git hub.com/Prasanth-S-n/IndianLogoDetection) Keywords Open logo · YOLOv5 · Tkinter
1 Introduction The logo is a unique graphic symbol used for brand recognition or property protection. Identifying the logos in images or videos was a challenging pattern recognition task that has a wide range of applications such as copyright infringement, vehicle logo detection for traffic control [1, 2] and for autonomous cars [3–5]. It is a challenging task due to its varying size with multiresolution, occlusion, uncontrolled illumination, and background clutter. In addition to this, new logos tend to come out more often and thus resulting in never-ending increase in the number of classes. S. N. Prasanth (B) · R. A. Raj · M. A. C. Sivamaran · V. Sowmya Centre for Computation Engineering and Networking, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Ettimadai, Coimbatore, India e-mail: [email protected] R. A. Raj e-mail: [email protected] V. Sowmya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_35
463
464
S. N. Prasanth et al.
In this paper, we trained and analyzed the performance of YOLOv5 on our logo detection dataset containing images with only Indian companies’ logos. The dataset contains images with more than one logo of the same class or of different classes. The model is trained and tested with and without weights. In the end, we also created a GUI using Tkinter Python library with provision to load any YOLOv5 weights, to browse and load images of our interest, to specify the confidence score we needed, and to detect and save the output. The GUI can be used to understand the scalability and prediction time taken by the model on deployment. Using the GUI, users can predict the logos present in the image in minimal time without extensive manual labelling. Using the save button in the GUI, the predictions can be saved and used for later purposes. The rest of the paper is organized as follows: Sect. 2 briefs about the related works, Sect. 3 briefs about our dataset and YOLOv5 architecture, Sect. 4 talks about the experimentations that have been carried out along with results and our GUI implementation, and Sect. 5 concludes the paper followed by references.
2 Related Works In literature, a significant amount of effort had been devoted to logo detection. Existing methods demand for large amounts of accurately annotated image data which made them not scalable to realistic deployment, however, there were very few exceptions such as [6, 7]. Methods described in [6, 7] do not require more accurately annotated images but have the disadvantage of exploiting a large volume of data. In [8], a new dataset, named ‘Logos in the Wild’, is published, and the models trained with this dataset results in better performance. In [9], it was described that the training does not require a large dataset, i.e. training for the new logo class does not require large amounts of data. In order to solve the problem of the never-ending increase in the number of classes, transfer learning can be used. In [10], authors proposed a multi-perspective cross-class (MPCC) domain adaptation method to generalize and use transfer learning for fully supervised logo classes to new unseen classes. With the help of transfer learning, the model can be trained for new classes with a lesser amount of data by using the weights that have been trained previously in order to solve the same or similar problem. In paper [11], the authors showed real-time implementation of logo detection on open-source BeagleBorad.
3 Dataset and Method The training of any model requires access to a dataset. So, our first objective was to create a logo detection dataset containing only Indian companies’ logos as classes. We collected images for our dataset using google image search, and we also included images captured using a phone camera.
AI-Based Open Logo Detection for Indian Scenario
465
We have collected logos with respect to three industries namely food, automobile, and pharmaceutical. Our dataset consists of 584 images for 12 classes where 4 classes per industry is taken. The dataset statistics for closed set is given in Table 1. The images were annotated using ImageAnnotationLab (GUI Software) and exported as.txt for each single image containing the annotation in YOLOv5 format. Models are trained with pretrained weights and without weights for comparison. We evaluated all the models with the same test data which includes images for all the classes. The training and testing data contain images with more than one object of the same class or different classes. Figure 1 shows the pipeline of our proposed method. Figure 2 shows a few sample images from our dataset. After training the model, we created a simple GUI using the Tkinter Python library. Though there are lots of GUI frameworks, Tkinter is the only framework that has built into the Python standard library. One main disadvantage of Tkinter is that the interface created using it will look outdated which is not our main concern. The GUI that we have created contains the following buttons. Figure 6 shows our GUI along with components’ functions. A. YOLOv5 We have used YOLOv5 model to detect Indian companies’ logos due to its advantage of inference speed and detection of small or far objects when compared with faster Table 1 Total number of images in each class for a closed set Class
Train
Test
Validation
Total images
Ashok Leyland
28
42
7
77
Divis
28
2
7
37
Dr. Reddy
28
17
7
52
Force
28
7
7
42
Gland
28
7
7
42
Haldiram
28
9
7
44
Mahindra
28
9
7
44
Mother Dairy
28
11
7
46
Parle
28
13
7
48
Sun
28
6
7
41
Tata
28
20
7
55
Unilever
28
13
7
48
Multi-class
0
7
1
8
Total images
336
163
85
584
Fig. 1 Pipeline for our proposed method
466
S. N. Prasanth et al.
Fig. 2 Sample images present in our dataset
region-based convolutional neural network (FR-CNN) model and YOLOv5 uses spatial pyramid pooling; thus, it has the advantage of no fixed size images constraint for training. You only look once (YOLO) is a state-of-the-art and most popular, real-time object detector, and achieved top performances on two official object detection datasets (Pascal VOC and Microsoft COCO). The network architecture of Yolov5 consists of three parts, namely backbone (CSPDarknet), neck (PANet), and head (YOLO layer). Cross stage partial network (CSPNet) solves the problem of repeated gradient information in large-scale backbones and integrates the gradient changes into the feature map, thereby helps in decreasing the parameters and FLOPS of a model which not only helps with inference speed and accuracy, but also helps in reducing the size. Yolov5 incorporated CSPNet with darknet, making CSPDarknet as its backbone. Spatial pyramid pooling (SPP) of Yolov5 removes the fixed image size constraints of conventional CNN models. PANet improves the utilization of accurate localization signal in lower layers which can enhance the location accuracy of the object by adopting a new feature pyramid network (FPN) structure with enhanced bottomup path which helps in improving the propagation of low-level features, and at the same time, adaptive feature pooling, which links all feature levels and feature grid, is used to make useful information each feature level propagate directly to following subnetwork. Finally, the head of yolov5 generates 3 different sizes of feature maps to achieve multiscale prediction and enables the model to handle small, medium, and big objects [12]. The architecture of the YOLOv5 model can be seen in Fig. 3.
AI-Based Open Logo Detection for Indian Scenario
467
Fig. 3 Architecture of YOLOv5
4 Experimentation and Result To accelerate the computations speed, the experiments are carried through GPUenabled PyTorch in single NVIDIA Tesla k-80. The evaluation is performed on images of different sizes and with different environments. The evaluations of the model are analyzed using metrics such as precision, recall, and mAP. For closed set, 35 images per class are taken and in case of openset configuration 2, 6 classes (2 per industry) are taken to be trained, and rest is used for testing. Results for the closed set were shown in Tables 2 and 3. Similarly, the results for openset configuration 2 were shown in Tables 4 and 5. In all these experiments, it was observed that the performance of the proposed framework with pretrained weights is higher. The proposed framework has high performance for the IoU threshold score at 0.5. The performance of the proposed framework declines as the threshold for IoU increases (Figs. 4, 5, 6, 7 and 8). (a) Closed Set (b) Openset Configuration 2 The model trained with the openset2 configuration does not perform better than the model trained with closed set because, in closed set, all the classes have training images. The results show that the training does not require a large amount of data for training a new logo class as described in literature. In Fig. 5, we can see that the
468
S. N. Prasanth et al.
Table 2 Comparison between YOLOv5 model with pretrained weights and without pretrained weights for closed set Methods
Number of epochs
Precision
Recall
mAP @0.5
mAP @0.5:0.95
YOLOv5 without pretrained weights
100
83.1
74.4
80.5
56.4
YOLOv5 with pretrained weights
50
96.8
87.5
93.4
74.5
Bold signifies best result obtained
Table 3 Comparison between YOLOv5 model with pretrained weights and without pretrained weights with different IoU values for closed set Methods/ mAP@IoU
@0.6
@0.7
@0.8
@0.9
Without pretrained Weights
57.9
55.7
50.4
40.2
With pretrained Weights
71.5
71.1
68.6
58.6
Bold signifies best result obtained
Table 4 Comparison between YOLOv5 model with pretrained weights and without pretrained weights for openset configuration 2 Methods
Number of epochs
Precision
Recall
mAP @0.5
mAP @0.5:0.95
YOLOv5 without pretrained weights
100
95.2
47.9
48.3
35
YOLOv5 with pretrained weights
100
99.3
49
50.1
41.1
Bold signifies best result obtained
Table 5 Comparison between YOLOv5 model with pretrained weights and without pretrained weights with different IoU values for openset configuration 2 Methods/mAP@IoU
@0.6
@0.7
@0.8
@0.9
Without Pretrained Weights
34.9
34.4
33.5
28.1
With Pretrained Weights
41.1
41.1
40.6
37.1
Bold signifies best result obtained
model trained with pretrained weights predicts much better than other models and the model trained without pretrained weights and trained in openset2 configuration have not performed well and leads to misclassification. We can also see that the YOLOv5 is capable of detecting the logos that are behind another object but with lower confidence. Training with a greater number of epochs may lead to better results. Our GUI capability and functions can be seen from the above figures.
AI-Based Open Logo Detection for Indian Scenario
469
Fig. 4 Training results for each model. a closed set with pretrained weights, b closed set without pretrained weights, c openset2 with pretrained weights, d openset2 without pretrained weights
470
S. N. Prasanth et al.
Fig. 4 (continued)
5 Conclusion In the present work, we proposed a framework based on the YOLOv5 model to detect Indian Logos in closed and QMUL-Openlogo dataset settings, with pretrained weights and without pretrained weights. It was observed that the proposed framework with the pretrained weights performed better than the framework without the pretrained weights. We also developed a GUI for the proposed framework. As a part of our future work, we like to extend the GUI part where it takes not only images but also a video or webcam input and displays the detection as the frame goes on. We also like to add an option which will be used to predict only the desired classes as per user’s wish.
AI-Based Open Logo Detection for Indian Scenario
471
a)
b)
c)
d)
Fig. 5 Predicted labels from each model. a closed set with pretrained weights, b closed set without pretrained weights, c openset2 with pretrained weights, d openset2 without pretrained weights To load YOLOv5 weights
To set the confidence threshold
To save the result image
To detect logos in the image
To browse and select an image
To close the application
Fig. 6 Interface after launching the application
472
S. N. Prasanth et al.
Fig. 7 After loading the browsed image
Fig. 8 Detecting the logos in the browsed image after loading the weights and setting the confidence threshold
References 1. Psyllos AP, Anagnostopoulos C-NE, Kayafas E (2010) Vehicle logo recognition using a siftbased enhanced matching scheme. Intell Transp Syst IEEE Trans 11(2):322–328 2. Senthilkumar T, Sivanandam SN (2013) Logo classification of vehicles using SURF based on low detailed feature recognition. Int J Comput Appl 3:5–7 3. Deepika N, Sajith Variyar VV (2017) Obstacle classification and detection for vision-based navigation for autonomous driving. In: International conference on advances in computing, communications and informatics (ICACCI). IEEE 4. Babu RN, Sowmya V, Soman KP (2019) Indian car number plate recognition using deep learning. In: 2019 2nd international conference on intelligent computing, instrumentation and control technologies (ICICICT), vol. 1. IEEE 5. Emani S et al (2019) Obstacle detection and distance estimation for autonomous electric vehicle using stereo vision and DNN. In: Soft computing and signal processing. Springer, Singapore, pp 639–648
AI-Based Open Logo Detection for Indian Scenario
473
6. Su H, Gong S, Zhu X (2017) Weblogo-2m: Scalable logo detection by deep learning from the web. In: Workshop of the IEEE international conference on computer vision 7. Su H, Gong S, Zhu X (2018) Scalable deep learning logo detection. arXiv preprint arXiv:1803. 11417 8. Tüzkö A, Herrmann C, Manger D, Beyerer J (2017) Open set logo detection and retrieval, arXiv.org. https://doi.org/10.48550/arXiv.1710.10891 9. Su H, Zhu X, Gong S (2018) Open logo detection challenge. arXiv preprint arXiv:1807.01964 10. Su H, Gong S, Zhu X (2021) Multi-perspective cross-class domain adaptation for open logo detection. Comput Vis Image Underst 204:103156. https://doi.org/10.1016/j.cviu.2020.103156 11. George M, Kehtarnavaz N, Estevez L (2011) Real-time implementation of logo detection on open source BeagleBoard. SPIE Proc. https://doi.org/10.1117/12.871952 12. Xu R, Lin H, Lu K, Liu Y (2021) A forest fire detection system based on ensemble learning. In: Conference on acoustics, speech and signal processing (ICASSP), ResearchGate. IEEE, 13 Feb 2021
A Comparison of Machine Translation Methods for Natural Language Processing and Their Challenges Aarti Purohit, Kuldeep Kumar Yogi, and Rahul Sharma
Abstract Machine translation (MT) plays an important role in natural language processing (NLP) to automate the translation of different natural languages. Machine translation (MT) comes into existence as computational linguistic subfield of NLP and AI to automate various tasks and generate intelligence in machine like humans. The demand of MT systems growing day by day to make the task easier. By using various approaches of MT one can implement natural language processing concept in various fields like spam filtering, robotics, sentiment analysis automated learning, real-time translation applications like text-to-text or text-to-speech and vice versa, etc. There are various approaches like rule based MT, statistical MT, hybrid or neural MT approaches that are used to achieve MT process like Google Translator currently uses NMT approach to translate many language pairs. But still machine translation is facing numerous challenges to achieve high accuracy states. In this paper, we have discussed some approaches and the current challenges in MT to achieve natural language processing. This paper discusses the existing MT approaches for NLP and their comparative analysis. Keywords Machine translation · Machine translation approaches · NLP · MT challenges
1 Introduction In this era of information technology, we use computer and Internet in our daily life to get information of every field by reading newspapers, searching topics and queries on web in many languages like English, Hindi, etc. Web is large knowledge base of information in form of text, pdf or spoken form in multiple languages, and can be A. Purohit (B) · K. K. Yogi Department of Computer Science and Engineering, Banasthali University, Tonk, Rajasthan, India e-mail: [email protected] R. Sharma Department of Computer Science and Engineering, ACEIT, Jaipur, Rajasthan, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_36
475
476
A. Purohit et al.
accessed by using Google or searching websites. Humans use to communicate and share information either in spoken language or written form. Around 6000 languages are used for communication and sharing information with each other. But no one is capable to understand each and every language due to cultural diversity, grammar formats and structural differences. To understand correct meaning of data we can use translation process. The ideas of automation to translate one language to another evolve use of natural language processing (NLP) which is a subfield of AI. Cognitive science and linguistics is related to study of natural languages to extract correct meaning of words, sentences and phrases [1]. To process natural languages using computers we use natural language processing. There are various application areas where NLP is used currently, some are sentiment analysis, chat bots, voice assistance, grammar checker, filtering emails, automatic summarization and machine translation, etc. In this paper we are discussing machine translation, the approaches to process natural language and current challenges facing by researchers in coming sections.
2 Translation Process Translation is the process that can be manual or automated, to convert input source language text into target language with exact meaning. Figure 1 shows translation process includes analysis, semantic and syntactic levels in processing task. Machine translation (MT) which is said to be a part of computational linguistics, fully automated process of translation, related to study of various approaches and system tools to translate text or speech in source language into target language, for, e.g. English to Hindi. Machine or computer system plays a vital role in translation with no human intervention. Many experts and researchers are currently working for machine translation in all Indian languages from 80s or 90s but still not achieved 100% accuracy. The machine translation system follows almost all phases of NLP to complete the process of translation.
Fig. 1 Translation process
A Comparison of Machine Translation Methods for Natural Language …
477
Fig. 2 Machine translation process
This Fig 2 shows phases of machine translation to do natural language processing including steps for analysis of source language, transfer to intermediate representation and generation of target language. There are various criteria to check accuracy of translation like word, sentence, phrase meanings, grammar structure and adequacy. There are various algorithms to check accuracy of MT systems and to calculate linguistic quality is WER, BLEU and METEOR.
3 Approaches of Machine Translation 3.1 Rule Based Machine Translation (RBMT) [1] This approach is also known as knowledge based machine translation or classical MT. RBMT system can use bilingual dictionaries and grammar rules to do morphological, syntax rules of languages. It requires linguistic knowledge of both source language (SL) and target language (TL) to do syntactic analysis, semantic analysis and morphological rules to do translation process. The basic process steps for rule based machine translation are shown in Fig. 3.
478
A. Purohit et al.
Fig. 3 RBMT steps
There are three sub approaches of RBMT Approach. Figure 4 represents direct translation, transfer based and Interlingua approach with different levels of analysis of source language to get language independent structure to generate target language. (a) Direct Machine Translation It is the oldest and least popular approach. In this approach system uses bilingual dictionaries for SL to TL word to word conversion. Direct translation approach requires less knowledge for syntactic and semantic analysis. In direct MT system, first step is to extract root word at morphological analysis phase. In second step, use two way (bilingual) dictionaries to find exact translation. Finally rearrange the words according to syntactic rules of TL. (b) Interlingua Machine Translation In this approach source language is transformed into an abstract intermediate language (universal language) that is neutral to both source language and target language. So the target language can be generated by using this transition system. By
Fig. 4 RBMT approaches
A Comparison of Machine Translation Methods for Natural Language …
479
using intermediate language, we can convert SL into multiple languages. Knowledge based natural language translation (KANT) is the only Interlingua MT system used for commercial purpose. (c) Transfer Based Machine Translation In this approach we use a translation rule to translate source text to target text. But unlike Interlingua approach, here we use to implement rules on pair of languages. The transfer based MT system works in three stages: analysis, transfer and generation. In first stage, parser is used for SL to get syntactic representation. In next stage, system used to convert SL syntactic representation into equivalent TL intermediate representation. Finally use TL morphological analyzer to generate TL text. By using this approach 90% of accuracy can be achieved.
3.2 Example Based Machine Translation (EBMT) [1] It is a data-driven or memory based approach. It uses corpus to find examples b/w SL and TL. Here, mapping approach is used to find similarity b/w words, syntax, semantics of SL and TL to identify approximate matching translation output. Example based model uses retrieval approach to match sentences to find correct translation output and adaption model is used to find that translation which can be used further. If match is not found exactly, most relevant match is used as approximate output.
3.3 Statistical Machine Translation (SMT) [1] SMT is corpus based or data driven approach. Here, we need a large amount of bilingual corpus for knowledge of source language and target language to set parameters. In this approach mathematical calculations are used to build statistical model to analyze bilingual corpus. SMT model includes: (i) language model, (ii) translation model, (iii) decoder model as shown in Fig. 5. SMT model uses Baye’s Theorem and translation will be done using probabilistic approach. Language model (LM) chooses most probable translation for words or phrases in input sentence to assign translation as probability P(T). Then, translation model(TM) calculates conditional probability between target sentences and source sentence, P(T/S), while, last decoder model searches the best possible translation with maximum product between two above-derived probabilities to get a higher accuracy result for translation.
480
A. Purohit et al.
Fig. 5 Statistical machine translation approach
Fig. 6 Hybrid approach [1]
3.4 Hybrid Machine Translation (HMT) [1] HMT approach is combination RBMT and SMT approaches. It is classified in two types: (i) Rule based MT by using corpus and linguistic rules, then apply statistical model to correct the output from RBMT system. It is used where post editing is required called RBMT-guided hybrid approach. (ii) Rules are used at preprocessing stage to use statistical system. Rules are used to normalize performance of model at postprocessing stage. This is called statistical guided hybrid approach. We need large linguistic rules from training corpus (Fig. 6).
3.5 Neural Machine Translation (NMT) [2] In this approach, neural network is used to perform machine translation. It uses single network instead of pipelining of separate tasks. One-hot encoding is used to transform input sequence into numbers in neural network to do embedding. While seq2seq model used after 2014 increases possibilities for researchers to achieve high accuracy outputs in natural language processing. By using seq2seq model it becomes possible to train a network model with input and output sequences to get improved results when compared with other approaches. But, NMT model cannot learn from rare samples and random inputs creates an unbalanced training data, which is a challenging task for researchers and experts.
A Comparison of Machine Translation Methods for Natural Language …
481
4 Some Related Work With growing time lot of work is done in machine translation by researchers. All the approaches are used to do translations for many language pairs. In this section, we have discussed some work in various approaches.
4.1 Rule Based Machine Translation (RBMT) Rule based MT approach is used to do English to Finnish translation according to [3]. This MT model is used to analyze news translations in WMT 2017. As different languages have different word orders and grammatical rules, POS tags are considered as key for converting source language into target language. The rules are designed to simplify POS tags in English to change input sentence into Finnish target output. The model provides high quality results if inputs are grammatically correct. Output for ordinary news are satisfactory while results for sports and other sections are poor as of less grammatical texts creates a major problem effecting quality of translation badly. The author [4], has proposed a Punjabi to Hindi MT system. It is direct MT approach where word by word replacement is done. It consists of three modules: preprocessing, tokenization and tokens are extracted using unicode format. It helps in entity reorganization and ambiguity removal process. The model generates an error of about 24.26%. The author in [5], proposed English to Bengali MT system knowledge base and MySQL database is used to store tags of English words and their equivalent Bengali word, as author had used rule based m/c translation system, it works well but the corpus size is increased and it gets complicated to create huge corpus.
4.2 Example Based Machine Translation (EBMT) The EBMT approach is used by Sharma et al. [6], and has applied for English to Hindi machine translation by collecting 1000 English sentences from renowned newspapers. The further check performance of the system using F-score, Blue, WER, NIST found good results when compared with RBMT, SMT. According to author [7], point out the drawback in Angla Bharti II system, handcrafted rules of hybridization of RBMT, an example based approach is used to solve the issue of scaling the bilingual parallel corpus which is very less in Indian languages. The architecture performs 40–80% for English to Hindi translation.
482
A. Purohit et al.
4.3 Statistical Machine Translation (SMT) The author [8], the English to Hindi machine translation is done using SMT bidirectional machine translation proposed by IBM Research Lab, India. The system uses one lac English–Hindi parallel corpus. The author uses BLEU score, NIST to check accuracy of model improvement by 7.16 and 2.46%, while the author [9] uses a SMT approach over 6000 sentences for English to Urdu translation, out of which they use 5000 for tuning, 800 for tuning and 200 for testing purpose. Blue score obtained after tuning is 9.035 by the system which is good. In the same manner author [10] has also applied SMT approach for English to Urdu translation on 20,000 sentences and BLEU score performance is 37.10. Term translation in context to SMT approach is implemented according to [11]. They designed a model to encourage consistent translation for terms with high strength, selecting desired translation for terms in SL with domain information. Unit hood model is implemented to hypothetically translate SL string into TL string as a whole unit. By applying SMT approach they achieved improved translation results.
4.4 Hybrid Machine Translation (HMT) According to [12] hybrid approach is used to translate English to Marathi. Here, the researcher provides input to statistical system, then grammatical correction rule is applied on received output as part of RBMT to replace correct words to obtain final output. The results are improved as this model overcomes the lacking of statistical and rule based approach. The author [13], has proposed a hybrid approach using EBMT, RBMT and SMT in the same model to translate Hindi to English. They cover four types of sentences: “simple present and past tense, present continuous and past continuous tense” to train the model and achieve good results The author [14], used a combination of NMT and SMT as hybrid approach for Chinese-English and English-German translation. They use SMT word to predict NMT word predictions and combine them to get accurate translations without using original vocabulary. While paper [15], used both rule based and SMT for Turkish spoken language to sign language. According to author, Turkish input is evaluated grammatically and translation rules are applied to generate output which is boosted by SMT to increase translation quality. The author of [16], uses hybrid approach (NMT + RBMT) for Sanskrit to Hindi translation by using deep learning features to remove disadvantages of RBMT and SMT approaches. The system overcomes the issue of lower feedback timing and system speed is improved.
A Comparison of Machine Translation Methods for Natural Language …
483
4.5 Neural Machine Translation (NMT) Neural network is applied as RNN based encoder and decoder NMT model for Arabic dialects according to [2]. Multi-tasking learning (MTL) model is designed to implement decoder for language pair and encode source language. Small training set is used to train the model from Arabic dialect to modern standard Arabic form (MSA) to generate correct sequence of translation. This model identifies data scarcity and insufficient orthographic for Arabic language. This model works well for both low resource and rich resource languages to provide fast and better results. In paper [17], has applied artificial neural network based machine translation (ANMT) on Punjabi to English translation and uses BLEU score to evaluate performance of the system which is obtained as 60.68 on small sentences, 39 on medium sentences and 26.38 for large sentences. The author [18], has used features of SMT such as n-gram language model with NMT under log-linear framework model to translate Chinese to English. The improvement in quality gains a hike up to 2.33 BLEU score on NIST test set. The author [19], has used NMT approach by implementing LSTM encoder decoder architecture including Bahdanau attention mechanism to translate English text to Urdu. They collected corpus from Urdu and Bible from UMC005 English— Urdu corpus, news, web scrapping to train the model on total 542810 English tokens and 123636 Urdu tokens. Then uses automatic evaluation metrics like BLEU, Fmeasure, WER, etc., to check quality of translation and the model scores good level of accuracy. While the author [20], had applied recurrent neural network (RNN) in a generalized feed forward n/w for sequence to sequence mapping to do translation. It is applied on WMT 14 English to French MT task using SMT system as baseline and using LSTM model to easily train the model to apply word embedding and deep LSTM in four layers. Their experimental set-up when analyzed using BLEU score, the quality of translation by LSTM performs well over SMT model and on long sentences as well.
5 Comparative Analysis of MT Approaches Tables 1 and 2 show comparison according to methods used, domain used in research, and the evolution results of machine translation for NLP.
484
A. Purohit et al.
Table 1 Analysis of different machine translation models Author
MT approach
(Language –pair)
Domain (corpus used)
Performance level
Lehal et al. (2008)
RBMT
Punjabi to Hindi
News corpus WMT 2017
Error rate of 24.26%
Adak [5]
RBMT
English to Bengali
English sentences
Error rate depends on input sentence length
Sharma et al. [6]
EBMT
English to Hindi
Newspaper 1000 English sentences
BLEU score up to 0.91
Godase and Govilkar [7]
EBMT
English to Hindi
Bilingual parallel corpus
40% to 80% accurate
Udupa and Faruquie [8]
SMT
English to Hindi
English–Hindi sentences
BLEU score 7.16% and NIST score 2.46%
Ali et al. [9]
SMT
English to Urdu
6000 sentences English domain
BLEU score improved up to 9.035
Ali et al. [10]
SMT
English to Urdu
20,000 sentences English domain
BLEU score 37.10
Wang et al. [14]
Hybrid (NMT + SMT)
Chinese to English and English to German
Chinese domain
Accuracy improved using NMT with SMT up to 10%
Kayahan and Gungor [15]
Hybrid (SMT + RBMT)
Turkish Turkish input spoken sentences(spoken) language to sign language
Boosted quality of translation with the accuracy 53%
Singh et al. [16]
Hybrid (NMT + RBMT)
Sanskrit to Hindi
Generic huge domain
BLEU score of 61.2% higher by 41% then existing system
Dhariya et al. [13]
Hybrid (EBMT + RBMT + SMT)
Hindi to English
Travel, shopping, and diary-conversation domains
Accuracy 71.75%
Sutskever et al. NMT [20]
English to French
WMT 14 corpus
BLEU score 36.5
Deep et al. [17] NMT
Punjabi to English
Parallel corpus of 259,623 sentences from online resources in Punjabi
BLEU score 38.30
(continued)
A Comparison of Machine Translation Methods for Natural Language …
485
Table 1 (continued) Author
MT approach
(Language –pair)
Domain (corpus used)
Performance level
Wang et al. (2016)
NMT
Chinese to English
30 K parallel corpus Improved BLEU for both source and score up to 2.33 target language
Wahid (2022)
NMT
English Text to Urdu
news, web 70% Accuracy scrapping, Urdu and Bible from UMC005 English—Urdu corpus
Table 2 Benefits and limitations of MT approaches S. NO MT approaches
Benefits
Limitations
1
Rule based MT approach • Single language text required • Domain independent • Linguistics have total control on process • • Reusability
• Require good dictionaries • Need expertise in knowledge for setting rules • Large number of rules makes the task complex
2
Statistical MT approach
• Need large bilingual corpus • Avoid hard decision states • speedy quality outputs • Work better in the presence of unexpected in-puts
• Creating corpus is costly • As we need less work for language pairs may affect output • Errors are hard to predict • Difficult in handling mathematical computations • Need training data set to train the model
3
Hybrid MT approach
• High accuracy • High performance than SMT and RBMT • More flexible and robust for high dimension data
• Classifier dependent method • Dependency on combination of features selected
4
Neural MT approach
• Use structurally rich model • Need large bilingual • Much better generalization corpus • Need data set to train the through learned rich NMT model structures • Features are learned rather • Heavy computations and specified hardware are than specified advance needed at training time parameter • Hard to understand decision process of system
486
A. Purohit et al.
6 Conclusion and Future Scope In this paper we have discussed the evolution of MT and various approaches used to implement automation help researchers to build many MT systems like Google Translator, Microsoft Translator. There are many challenges still faced by developers to get 100% accuracy in translation in many language pairs due to presence of above-discussed issues. To resolve ambiguity, word sense disambiguation (WSD) filed comes into existence that is used by researchers from many years, a lot of work has been done in many languages. WSD is an ongoing research area for many Indian language pairs also still facing issues related to ambiguity. Tables 1 and 2 show the comparative analysis of various existing MT approaches and from study, it has been concluded that rule based approach is easy to implement and accuracy depends on rules designed, works well with small corpus but are domain dependent, while other approaches are used more in recent need large bilingual corpus, more computational time and depend on features selected by model during training. The authors have worked on many language pairs but still researches have huge scope in many Indian languages and various other domains where we are facing many language processing issues related to morphology and semantic, lexical and grammar errors and connected with culture differences b/w source and target language.
References 1. Anbukkarasi S et al (2019) Machine translation (MT) techniques for Indian languages, published. IJRTE 8(2S4), July 2019. ISSN: 2277-3878 2. Baniata LH et al. (2018) A neural machine model for Arabic dialects that utilizes multi- task learning (MTL). Comput Intell Neurosci 10. Article ID 7534712 3. Hurskainen A et al. (2017) Rule-based machine translation from English to finnish. In: proceedings of the second conference on machine translation, Jan 2017 4. Josan GS, Lehal GS (2008) A Punjabi to Hindi machine translation system. In: Proceedings of the COLING’08: 22nd international conference on computational linguistics: demonstration papers. Manchester, UK. pp 157–160, Aug 2008 5. Adak C (2014) A bilingual machine translation system: English & Bengali. In: Proceedings of the 2014 first international conference on automation, control, energy and systems (ACES). Adisaptagram, India, pp. 1–4, Feb 2014 6. Sharma R et al (2014) implementation of example based machine translation system. Int J Eng Res Technol (IJERT) IJERT, 3(3), Mar 2014, ISSN: 2278-0181 7. Godase A, Govilkar S (2015) Machine translation development for Indian languages and its approaches. Int J Nat Lang Comput 4(2):55–74. https://doi.org/10.5121/ijnlc.2015.4205 8. Udupa R, Faruquie TA (2004) An English-Hindi statistical machine translation system. In: Lecture notes in artificial intelligence (Subseries lecture notes in computer science, vol. 3248, Springer, Berlin, pp 254–262 9. Ali A, Siddiq S, Malik MK (2010) Development of parallel corpus and English to Urdu statistical machine translation. Int J Eng 10(05):3–6 10. Ali A, Hussain A, Malik MK (2013) Model for English-Urdu statistical machine translation. World Appl Sci J 24(10):1362–1367. https://doi.org/10.5829/idosi.wasj.2013.24.10.760.-DOI
A Comparison of Machine Translation Methods for Natural Language …
487
11. Xiong D et al (2016) Topic-based term translation models for statistical machine translation. Artif Intell 232:54–75 12. Salunke P et al (2015) A research work on English to Marathi hybrid translation system. IJCSIT l6(3), ISSN:0975-9646 13. Dhariya O et al (2017) A hybrid approach for Hindi-English machine translation. In: 31st international conference on information networking (ICOIN-2017) 14. Wang X, Tu Z, Zhang M (2018) Incorporating statistical machine translation word knowledge into neural machine translation. IEEE /AC M transactions on audio, speech, and language processing (14) (2018) 15. Kayahan D, Gungor T (2019) A hybrid translation system from Turkish Spoken language to Turkish Sign language. 1–6. https://doi.org/10.1109/INISTA.2019.8778347 16. Singh M, Kumar R, Chana I (2019) Improving neural machine translation using rule-based machine translation. ICSCC 17. Deep K, Kumar A, Goyal V (2021) Machine translation system using deep learning for Punjabi to English. In: Proceedings of the international conference on paradigms of computing, communication and data sciences. algorithms for intelligent systems, Springer, Singapore, pp 865–878 18. He W, He Z, Wu H, Wang H (2016) Improved neural machine translation with SMT features. Proc AAAI Conf Artif Intell 30(1). https://doi.org/10.1609/aaai.v30i1.9983 19. Andrabi SAB, Wahid A (2022) Machine translation system using deep learning for English to Urdu. Comput Intell Neurosci 3(2022):7873012. https://doi.org/10.1155/2022/7873012. PMID:3502404s6;PMCID:PMC8747903 20. Sutskever, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inform Proc Syst 4:3104–3112
Globalizing Pre-trained Local BERT Embedding of Text for Sentiment Analysis Prashantkumar. M. Gavali and Suresh K. Shirgave
Abstract Sentiment analysis (SA) model calculates the opinion of the text after representing text as a learned feature vector. Vector representation generated from the model is crucial for sentiment analysis. Modern transformer models, including generative pre-training (GPT) and Bidirectional Encoder Representations from Transformers (BERT), dynamically learn text representation. However, the sentiment orientation from a global perspective is ignored. On the other hand, several models such as Convolutional Neural Network (CNN) is also utilized to learn text representation. With pre-trained sentiment-aware static word embedding as an input, model captures the global representation of text. But the local dynamic representations are disregarded. So in this paper, the suggested model combines the local dynamic representation with the sentiment oriented global text representation to produce accurate text representation. It acquires global representation by refined pre-trained word embedding with a CNN and dynamic local representations by utilizing a pre-trained BERT transformer. The combined global and local representation from CNN and pre-trained BERT presents the text more precisely from sentiment point of view. The result signifies that the proposed model identified sentiment with greater accuracy than the pre-trained transformer. Keywords BERT · Deep learning models · Text embedding · Sentiment analysis
Prashantkumar. M. Gavali Department of Technology, Shivaji University, Kolhapur, India Prashantkumar. M. Gavali (B) · S. K. Shirgave DKTE Society’s Textile and Engineering Institute, Ichalkaranji, India e-mail: [email protected] S. K. Shirgave e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_37
489
490
Prashantkumar. M. Gavali and S. K. Shirgave
1 Introduction Sentiment analysis, the study of natural language, determines the polarity of input text. Various machine learning models determine the sentiment of text automatically. For sentiment identification, model determines the text representation in a vector. A vector that represents the text is known as embedding. It is connected to the actual sentiment classifier network. This embedding has a substantial impact on sentiment accuracy. Modern transformer models generate the text’s dynamic representation. GPT [1] and BERT [2] are the popular transformer used to learn embedding of the text and in turn the sentiment identification. Transformers learn dynamic embedding of text by considering context of word in which it is used. It multiplies word embedding of words to learn the relationship between words. Large valued output vector indicates words are strongly related with each other. BERT has demonstrated promising performance in a variety of natural language tasks. But it only learns the text representation from the local context and ignores the sentiment-aware global representation. On the other hand, various models, such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU), are utilized to express complex and effective text embedding. Word2Vec and GloVe [3], two well-known pre-trained static global word embeddings, are employed as input. The pre-trained models are trained on a massive corpus of general information and acquire neural network parameters. These models can be utilized for different tasks. These pre-trained model reduces the training time and computational cost. Text representation learned from these refined pre-trained word embedding captures the global representations. But, as the input word embedding are static, it only captures the global representations and ignore the local, context aware and dynamic representation of text [4]. In order to learn both local and global text representations, we propose a model that combines a CNN and a pre-trained BERT transformer. The classifier receives merged local and global embeddings. The result indicates that a combination of local and global representation is more effective for sentiment analysis on many benchmark datasets. The remainder of the paper is structured as follows. Section 2 provides a review of the literature on text representation using various deep learning and transformer models. Section 3 gives example for identifying the research gap. Section 4 explains the model architecture. Section 5 discusses the experimental setting, while Sect. 6 presents sentiment results using several sentiment datasets.
Globalizing Pre-trained Local BERT Embedding of Text for Sentiment …
491
2 Literature Survey Text embedding has been developed using a variety of strategies such as the vector space model, bag-of-words, weighting systems [5], and deep learning models. Out of these, deep learning has provided promising results as deeper neural network learns the complex features of input text on its own. Extremely deep neural network [6] consisting of 29 convolutional layers is used to represent text at character level. This model reported that learned representation of text improved the performance of different text classification tasks. CNN [7] is also used for sentiment classification at sentence level. It uses one-dimensional filters to develop feature maps which are connected to the feed forward neural network for sentiment classification. But CNN fails to capture the long term dependence, so CNN is combined with RNN [8]. In combined model, CNN model considers the word features from the given filter size while RNN consider the dependences beyond the filter size given by CNN to represent text and sentiment identification. But RNN has limited capacity to handle the long term dependency. So instead of RNN model, LSTM model is combined with CNN to get better sentiment result [9]. Going beyond sentence representation, forward gated recurrent units are used to capture the relationship between the sentences to representation of the document [10]. This model identified sentiment of document more accurately compared to other baseline models. However, almost all of these networks used pre-trained word embedding techniques such as Word2Vec and GloVe. These pre-trained word embeddings fails to give sentiment oriented word embeddings. So to get better sentiment oriented representation of text, many attempts [11–13] are made to get sentiment oriented word embedding. More powerful seq-to-seq transformer models provide dynamic representation of text by considering context. Context Vector (CoVe) [14] generates text representation using a LSTM encoder with two layers of pre-trained GloVe. This model provided better result for various NLP tasks but incapable of capturing cross-lingual data. Embedding for Language Models (ELMo) [15] is another deep bidirectional LSTM used to represent text on the fly without using lookup table like Word2Vec and GloVe. It has provided improved sentiment oriented text representation. These models are LSTM-based. However, LSTM model has limitation to encode the text. Therefore, attention-based transformer models are employed to represent the text. One of the attention-based transformer model is GPT. It considers text in one direction. Therefore, BERT are employed to improve text representation by considering the forward and reverse directions of text. Different variants of BERT are used to identify sentiment of text which represented text in by considering dynamic context of text. The SentiBERT [16] mixes context-specific representation with a binary constituency parse tree to record semantic composition. It performs admirably in phrase-level sentiment categorization. Further, few attempts [17–19] were made to fine tune the pre-trained embeddings learned by BERT model. The LSTM layers [17] were used to tune the parameters to identify sentiments of text. In another paper [19], after getting the embedding of text from, it is connected to feed forward neural network for fine-tuning. It has shown better result on various dataset for sentiment analysis.
492
Prashantkumar. M. Gavali and S. K. Shirgave
As sentiment analysis is highly domain dependent task, different variants of BERT developed domain dependent text representation [20–22]. Some of the variants are also utilized to comprehend the aspect level sentiment analysis [22–27] to understand sentiment of aspect. Summary Generating correct text representation from the model is crucial task in sentiment analysis. For this dynamic local representation as well as global sentiment representations are important. Popular BERT encoder learns the local representation of text with high accuracy but fails to capture the global sentiment representation. However, several well-known pre-trained representations, such GloVe and Word2Vec, offer static global representation. Numerous word embedding refinement methods modified pre-trained Word2Vec embedding in sentiment orientation, but fail to capture local dynamic representations.
3 Research Gap Identification In this investigation, we have focused on already trained BERT [2] structure. It contains 12 hidden blocks as well as 12 attention heads. This model was trained on Wikipedia and BooksCorpus. The model weights are set to the original author’s weights. Consider the following two statements: “I am happy by using this product” and “I am sad by using this product.” The embedding given by pre-trained BERT model is close to each other. Cosine similarity between these two sentences is 0.9969. From the sentiment point of view, the cosine similarity between these sentences must be small as these two statements are of opposite sentiment. This happened due the distance between “happy” and “sad” is minimum. So in conclusion, it fails to capture the global representation of text from sentiment point of view. To tackle the problem of global representation, we can use famous refined Word2Vect word embedding. In word2vec embedding, distance between “happy” and “sad” is 0.5354. But these word embeddings are fix and does not handle dynamic and context aware word embeddings. So it is hypothesized that the combination of BERT embedding and sentimentaware global word embedding improves sentiment analysis results.
4 Model Architecture The suggested model architecture is depicted in Fig. 1. It is made up of two main components. The first component provides text representation from pre-trained BERT model, while the second provides global representation from the CNN model. Both these components are integrated together to get local pre-trained BERT text
Globalizing Pre-trained Local BERT Embedding of Text for Sentiment …
493
Fig. 1 Proposed model architecture
embedding with global text embedding. Following is the detailed description of each component.
4.1 Local Pre-trained BERT Text Representation It mainly includes BERT preprocessing, BERT encoding, and feed forward classifier to learn the local word embedding. These are discussed in the subsections that follow. BERT Preprocess BERT preprocess step converts the original text to the numerical tensor. It converts row input in fixed length sequence which consist of word id generated from the
494
Prashantkumar. M. Gavali and S. K. Shirgave
vocabulary, start, end, and padding tokens. It also includes input mask and input type id. This representation is supplied to the BERT encoder as an input. Pre-trained BERT Encoder Pre-trained BERT [2] model provides dense representation of input text. It outputs 768 dimensional tensors. This is intermediate text representation generated from BERT model. It is forwarded to feed forward neural network. Feed Forward Neural Network It is made up of couple of dense layers of 350 and 150 neurons, respectively. The 300 neurons are connected to every other neuron from the next layer. At the end, 150 neurons give representation of input text.
4.2 Global Pre-trained Text Representation It includes learning global word embedding from the Convolutional Neural Network. It consists of preprocessing, embedding layer, conv1d, and maxpooling along with concatenation and average pooling. These are discussed in the subsections that follow. Preprocessing Input text is preprocessed to remove stop words, short form replacements such as “can’t” to “cannot,” remove non-alphabetic tokens, filter out short tokens, and perform word stemming. This preprocessed text is transmitted to the embedding layer. Embedding Layer Embedding layer develops the text representation by using lookup table given by pre-trained Word2Vec embedding. For example, each word is represented by 300 dimensional embedding then sentence having 17 words is represented by 17 × 300 matrix. Conv1D Layers and Maxpooling Layer Proposed model consists of three Conv1D layers having kernel size 3, 4, and 5, respectively, each in 8 number. These kernel convolutes over the output of embedding layer only in y-axis direction. Maxpooling layers identify maximum important feature for further processing. Concatenation and Global Average Pooling All the complex features learned by convolutional layers are combined vertically. To get the average vector of features, it is then passed on to the global average pooling layer.
Globalizing Pre-trained Local BERT Embedding of Text for Sentiment …
495
4.3 Sentiment Classifier It includes concatenation layer and sentiment identifier feed forward neural network. Both these are explained in following subsections. Concatenation Concatenation layer combines the text embedding learned from pre-trained BERT model and CNN model. This combined embedding represents local as well as global representation of input text. Sentiment Identifier Feed Forward Neural Network At the end, the basic feed forward classifier receives concatenated output as input. This network classifies the input text in corresponding sentiment class.
5 Experimental Setup 5.1 Purpose of Experiment The experiment’s primary objective is to determine the effect of integrating BERT text embedding with sentiment-aware global text embeddings. The effectiveness of combined text representation is checked on sentiment analysis task.
5.2 Datasets We have used three benchmark dataset from the category of binary class and multiclass sentiment classification. IMDB and Amazon Fine Food Review are the binary class sentiment classification while SST-5 is multiclass sentiment classification dataset. These are as follows: IMDB [28]: It consists of 2000 movie reviews labeled as positive or negative. Out of these 1800 reviews utilized for training and 200 are used for testing. Electronics Device Review [29]: It consists of total 5650 electronics device reviews. All these reviews are subjective. Out of the total reviews, 4500 reviews are used for training while 1150 are used for testing. SST-5 [30]: Every review from this dataset is labeled with stars from range one to five to indicate sentiment level. This dataset is available in parse tree having sentiment of words and phrases. It consists of total 11,855 reviews. From this, 8891 and 2964 reviews are used for training and testing, respectively.
496
Prashantkumar. M. Gavali and S. K. Shirgave
5.3 Hyper-parameters Few values of the neural network must be specified prior to training. This is known as a hyper-parameter. It has a significant effect on the outcome. We have defined hyper-parameters and set same for different models to validate sentiment outcomes. Learning rate is set to 0.01, the optimizer technique is Adam, and the epochs are set to 8.
6 Result Analysis 6.1 Evaluation Matrix Accuracy and F1 score have been utilized to assess the effectiveness of the suggested model. F1 score is more useful as it concurrently measures precision and recall. Models have evaluated across macro-average F1 score and weighted average F1 score. Macro-average F1 score simply takes average of F1 score for all classes. For binary classification problem, macro-F1 score is calculated by following formula: Macro F1 score = (F1positive + F1negative )/2,
(1)
where F1positive and F1negative indicate F1 score for positive and negative class. On the other hand, weighted F1 average uses weight of each class by considering number of samples available for training. For binary classification problem, weighted F1 score is calculated by using following formula: Weighted F1 score =
Spositive Stotal
∗ F1positive +
Snegative Stotal
∗ F1negative ,
(2)
where Spositive , Snegative indicate number of positive and negative samples, respectively, while Stotal indicates total number of samples. F1postive and F1negative indicate F1 score for positive and negative class.
6.2 Sentiment Analysis Result From Table 1, it is observed that proposed combination of BERT and CNN model (BERT + CNN) provided better result than the only BERT model developed for sentiment analysis. It has provided better results on all the three database. Proposed model
Globalizing Pre-trained Local BERT Embedding of Text for Sentiment …
497
has given almost 24.09% improvement on binary dataset while 22.58% improvement on multiclass dataset. The improvement provided in binary class classification is significantly greater than that of multiclass classification. This is because number of training sample in each class for multiclass classification problem are less. In addition, it is also tricky to capture the sentiment level difference in five different classes. Word embedding needs to consider the difference between “good,” “better,” and “best.” Figures 2, 3, and 4 show the accuracy graphs having epochs on x-axis and accuracy on y-axis for IMDB, Electronics Device, and SST-5 dataset, respectively. Every graph shows accuracy with BERT model and proposed BERT + CNN model. All graphs show that proposed BERT + CNN model provides better accuracy than the pre-trained BERT model alone. Table 1 Sentiment analysis result Dataset
Classifier
Accuracy
Macro-average
Weighted average
F1 score
F1 score
IMDB
BERT
60.66
67.00
67.00 85.00
BERT + CNN
93.28
85.00
Electronics device reviews
BERT
83.00
83.00
83.00
BERT + CNN
85.00
85.00
85.00
SST-5
BERT
0.31
0.14
0.17
BERT + CNN
0.38
0.16
0.20
Fig. 2 Model accuracy on IMDB dataset
498
Prashantkumar. M. Gavali and S. K. Shirgave
Fig. 3 Model accuracy of electronics device review
Fig. 4 Model accuracy of SST-5 dataset
7 Conclusion and Future Scope Text embedding generated by the model must consider the dynamic nature of natural language. In addition, it should also consider the global meaning of text. BERT model develops dynamic representation, while various deep learning model uses already trained word embedding to consider the static global representation. In this study, we proposed the model that handles both dynamic and static word embedding to represent local and global representation of text at the same time. The proposed model provides better sentiment result on various benchmark dataset. In the future, pre-trained word embedding will be modified to take into account various levels of sentiment words, such as good, better, and best. This revised word embedding is effective for improving sentiment classification across multiple classes. This enhanced word embedding in conjunction with BERT embedding may improve the result of sentiment analysis. Acknowledgements Shivaji University, Kolhapur, Maharashtra, India, is funding a part of this research under the Research Initiation Scheme.
Globalizing Pre-trained Local BERT Embedding of Text for Sentiment …
499
References 1. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training 2. Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1, June 2019 3. Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation 4. Shen Y, Liu J (2021) Comparison of text sentiment analysis based on BERT and Word2vec. In: 2021 IEEE 3rd international conference on frontiers technology of information and computer (ICFTIC), pp 144–147 5. Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artifi Intell Res 37:141–188 6. Conneau A, Schwenk H, Cun YL, Barrault L (2017) Very deep convolutional networks for text classification. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics, vol 1. Valencia, Spain, pp 1107–1116, 3–7 April 2017 7. Kim Y (2014) Convolutional neural networks for sentence classification. arxiv 8. Hassan A, Mahmood A (2018) Convolutional recurrent deep learning model for sentence classification. IEEE Access. March 2018 9. Zhou C, Sun C, Liu Z, Laue F (2015) A C-LSTM Neural network for text classification, arXiv 10. Tang D, Qin B, Liu T (2015) Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1422–1432, 17–21 Sept 2015 11. Tang D, Wei F, Qin B, Yang N, Liu T, Zhou M (2016) Sentiment embeddings with applications to sentiment analysis. IEEE Trans Knowl Data Eng 28(2) 12. Wang Y, Huang G, Li J, Li H, Zhou Y, Jiang H (2021) Refined global word embeddings based on sentiment concepts for sentiment analysis. IEEE Access 13. Yu L, Lai K, Zhang X (2018) Refining word embeddings using intensity socres for sentiment analysis. IEEE/ACM Trans Audio Speech Language Process 26(3) 14. McCann B, Bradbury J, Xiong C, Socher R (2017) Learned in translation: contextualized word vectors. Adv Neural Inf Proc Syst 30:6294–6305 15. Peters ME, Neumann M, Iyyer M, Gardner M (2018) Deep contextualized word representations. In: Proceedings of NAACL-HLT, pp 2227–2237 16. Yin D, Meng T, Chang K (2020) SentiBERT: a transferable transformer-based architecture for compositional sentiment semantics. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 3695–3706, 5–10 July 2020 17. Agarwal S, Datta S, Patra B (2021) Sentiment analysis of short informal text by tuning BERTBi-LSTM. In: IEEE EUROCON 2021—19th international conference on smart technologies, July 2021 18. Li X, Wang X, Liu, H (2021) Research on fine –tuning strategy of sentiment analysis model based on BERT. In: 2021 international conference on communications, information system and computer engineering (CISCE), May 2021 19. Samir A, Elkaffas SM, Madbouly MM (2022) Twitter sentiment analysis using BERT. In: 2021 31st international conference on computer theory and applications (ICCTA), Oct 2022 20. Rietzler A, Stabinger S, Opitz P, Engl S (2019) Adapt or get left behind: domain adaptation through BERT language model fine-tuning for aspect-target sentiment classification. https:// doi.org/10.48550/arXiv.1908.11860 21. Du C, Sun H, Wang J, Qi Q, Liao J (2020) Adversarial and domain-aware BERT for crossdomain sentiment analysis. In: Proceedings of the 58th annual meeting of the association for computational linguistics, July 2020 22. Xu H, Liu B, Shu L, Yu P (2020) DomBERT: domain-oriented language model for aspectbased sentiment analysis, findings of the association for computational linguistics: EMNLP 2020, Nov 2020
500
Prashantkumar. M. Gavali and S. K. Shirgave
23. Sun C, Huang L, Qin X (2019) Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. In: Proceedings of NAACL-HLT 2019, Minneapolis, Minnesota, pp 380–385, 2–7 June 2019 24. Li X, Bing L, Zhang W, Lam W (2019) Exploiting BERT for end-to-end aspect-based sentiment analysis. In: Proceedings of the 2019 EMNLP workshop W-NUT: the 5th workshop on noisy user-generated text, Hong Kong, pp 34–41, 4 Nov 2019 25. Wu Z, Ong D (2021) Context-guided BERT for targeted aspect-based sentiment analysis. In: The thirty-fifth AAAI conference on artificial intelligence (AAAI-21), pp 14094–14102 26. Liu B, Xu H (2021) Understanding pre-trained BERT for aspect-based sentiment analysis. In: Proceedings of the 28th international conference on computational linguistics, Jan 2021 27. Wang Y, Chen Q, Wang W (2021) Multitask BERT for aspect based sentiment analysis. In: 2021 IEEE international conference on smart computing (SMARTCOMP), Aug 2021 28. Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for computational linguistics, p 271, July 2004. https://doi.org/10.3115/1218955.1218990. Dataset Available at: https://www.cs.cornell.edu/people/pabo/movie-review-data/ 29. Guan Z, Chen L, Zhao W et al (2016) Weakly-supervised deep learning for customer review sentiment classification. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. AAAI Press, pp 3719–3725 30. Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642. Dataset Available at: https://nlp.stanford.edu/sentiment/
Crop Recommendation for Maximizing Crop Yield Using Random Forest Amit Kumar
Abstract The agriculture sector is a vital part of India’s economy. About 54.6% of the workers are employed in agricultural and allied activities, and 18.8% of the India’s gross value added (GVA) is generated by these activities. One of the common problems faced by Indian young farmers is choosing the right crop according to the soil conditions. This has led to a significant setback in productivity in agriculture. This study will help the farmers to determine which crop will be suitable to grow on their soil; thus, the prime motive of this study is to create economic welfare to farmers. The dataset used in this study was from different Indian government website and is publicly available. Based on seven different attributes, i.e., nitrogen, phosphorous, potassium, temperature, relative humidity, pH value, and rainfall, a crop is recommended to grow. Four different machine learning algorithms, i.e., Naive Bayes, decision tree, logistic regression, and random forest were used on the data. Random forest’s testing accuracy, i.e., R 2 was about 99%, and hence, it was used to develop and deploy a cloud-based app which recommends a particular crop to be grown in a particular soil. Keywords Agriculture · Crop recommendation system · Machine learning · Naive Bayes · Decision tree · Logistic regression · Random forest
1 Introduction India ranks among the top five countries in terms of agricultural production worldwide. For India’s agricultural production to progress further, it is crucial to promote the economic welfare of farmers. A farmer’s livelihood is frequently threatened by unpredictable factors such as weather, prices, and crop disease in developing countries. There are many farmers who live on the edge of extreme uncertainty, sometimes barely surviving, other times barely surviving just above the survival threshold. It is impossible for farmers to predict rain, sell their produce at what price, or preA. Kumar (B) Andhra University, Visakhapatnam, Andhra Pradesh 530003, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_38
501
502
A. Kumar
vent diseases from infecting their crops. Agricultural production in India is varied, with different consumer demands, production costs, and climate conditions. The profitability of a crop is dependent on demand, weather forecasts, and costs of cultivation, such as seeds, fertilizers, labor, and machinery [1]. A variety of subsidies are provided by state governments on raw materials and interest on capital loans. In addition, farmers receive support through the bench-marking of minimum harvest sale prices. Traditionally, crop profitability is only realized after the harvest. Since there has been a rapid increase in the population in recent years, the current crop production could not meet the needs of the population. Soil rich in nutrients plays an increasingly important role in the production of food and agriculture. A proper balance of nutrients has become increasingly important in farming with the advancement of nutrient research [2]. Growing an unsuitable crop on a particular soil can pose a risk in farming. This type of risk occurs mainly when farmers do not know which crop to grow in a particular weather and soil. Farmers income is often adversely affected by such risks. Farmer’s ability to make an informed decision becomes more difficult as the risk becomes more complex. In order to make effective farming decisions, farmers need information on many aspects. Accurate information is needed for good risk management decisions, which requires reliable data. The machine learning (ML) models predict whether a crop is profitable or not based on some key attributes like the weather forecasts, yield of the crop, supply and demand of the crop, predicted sale price of the crop, etc. These days recommendation systems play very prominent role in Websites like Netflix, YouTube, etc., where it recommends movies which are similar to the user’s interest. It is also used in the e-commerce platforms like Flipkart, Snapdeal, and Amazon [3]. The application of recommendation systems in the agriculture field can have a great impact on the productivity of the crops. Based on different attributes like weather, quality of soil, availability of water, the amount of fertilizers used, temperature, humidity, etc., a machine learning model can be built which could recommend a suitable crop to grow on that particular region. To protect a crop from disease, a particular pesticide recommendation system can also be built [4]. Thus by using the technology, the agricultural scientists cannot only increase the productivity but can also improve the quality of the crops which would be environment friendly and will have no side effects after consuming it. Thus by choosing the best crops and fertilizers, farmers can not only increase their harvests but can also increase their profits. Objectives of the study The main objectives of this study are as follows: • To develop a crop recommendation model using machine learning techniques which will depend on different factors like soil nutrients, rainfall, temperature, and other weather conditions. • To develop and launch a cloud-based computer application (app) for recommending crop to the farmers. In order to accomplish this task, the proposed model uses machine learning algorithms and regression techniques from statistics, such as Naive Bayes, decision tree,
Crop Recommendation for Maximizing Crop Yield …
503
logistic regression, and random forest. The rest of the paper is organized as follows: Section 2 discusses the literature review. Section 3 describes the methodologies used in the study. Section 4 contains the exploratory data analysis of the dataset, while Sect. 5 contains the results i.e., the comparison of all the models used in this study and the outlook of the app. Finally, Sect. 6 summarizes and concludes the findings.
2 Literature Review A high crop productivity is achieved when seed quality is increased genetically. Genetic improvement of seeds can be achieved by studying the combination of genotype and phenotype in a unique environment [5]. Other factors that can increase productivity are the quality of soil, weed control, the amount of nutrients, as well as the management of water [6]. Learning-based mechanisms also contribute to crop productivity improvement and precision agriculture development [7]. According to [8], greenhouse gas emission forecasting methods can be used to estimate potato yields on Iranian farms. Data was collected in person for several crops and then tested by experts under various conditions. Electrical power, fertilizers, and seed quality all influence forecasting output. The forecasting accuracy for emissions of gas was about 98% while that of weather was 99%. Reference [9] detected the moisture in soil using internet of things (IoT), i.e., wireless sensors. The sensors detected and collected the information related to micro nutrients present in the soil, pH levels, and the moisture content present in the soil. Thus [9] used Internet of things and data mining techniques for smart farming. Reference [10] used coordinated descent algorithm for calibrating an agricultural model. Root zone water quality model (RZWQM) was developed which would determine the amount of fertilizer and water required for cultivating a crop. The R 2 obtained for the model was about 77%, and the model efficiency was about 64%. Thus, after using this model, the yield of the crops increased by 7% and the profits of the farmers increased by 10%. Reference [11] used an optimized model of drip irrigation for spring maize in Northwest China. In order to determine the impact of fertilizer and water on the crops, an experiment was done which lasted for about two years. From the experiments, it was proved that the yields were less in the area with low irrigation and less fertilization amounts, while the yields increased where the optimized model was used. Based on different agriculture crops, different approaches have been developed in recent past for recommending crops. Reference [12] developed a framework based on support vector machine (SVM) which uses the information regarding dirt and yield of a crop to recommend a crop. The framework also predicts the amount of fertilizers required by a particular crop. Reference [13] used machine learning algorithms like random forest and Naive Bayes in order to predict the rainfall in a particular region. Based on the intensity of the rainfall, the regions were classified in three groups, i.e., high, mid, and low levels of rainfall. This forecasting helped
504
A. Kumar
farmers to know the uncertain behavior of the rainfall. Reference [14] predicted crop yields using K-means clustering algorithm and SVM. Several attributes like rainfall, nutrient levels of soil, temperature, etc., were taken as independent variables, and principal component analysis was used for reducing the dimensions in the dataset. The accuracy achieved by the proposed model was about 97%. Reference [15] used big data techniques in order to achieve the precision in agriculture. Based on the historical data [15] used linear regression algorithm to predict the temperature and rainfall of a particular region in three different seasons, i.e., monsoon, winter and summer. Reference [16] used K-means clustering model to classify the soils based on the chemical properties of the soil. Senors were used to capture and store the data of the soil. In the result, a visual representation of the classification of soil into four different categories was shown which would be helpful to assess the extent of each soil in a particular region. To manage irrigation in agriculture, [17] presented two machine learning models called partial least square regression (PLSR) and adaptive neuro-fuzzy inference systems (ANFIS). Based on soil conditions, weather conditions and crop conditions, the model could predict the amount of water to be used in the irrigation system. In the field of crop and pesticide prediction, many researchers have been working to develop more efficient algorithms based on the latest technologies [17–19]. Using ensemble learning models with multiple classifiers, [18] generated crop recommendations, while [20] developed pest control methods.
3 Methodologies 3.1 Logistic Regression (LR) The logistic regression is used when the outcome or the dependent variable is discrete for a given set of input variables [21, 22]. Logistic regression utilizes the logit function as the basis for modeling a binary dependent variable. Generally, logistic regression models produce binary outcomes, i.e., true/false, yes/no, etc. Based on the sigmoid function S(t), the logistic regression transforms a linear function β0 + β1 X and converts the output into a probability value ranging from [0, 1]. Based on Eq. (1), the logistic regression plots the probabilities on S-shaped curve.
(e)b+b X P(X ) = 1 + (e)b+b X where P is the probability of an outcome, X is the given set of inputs, and b and b are the coefficients.
(1)
Crop Recommendation for Maximizing Crop Yield …
505
3.2 Decision Tree Classifier (DTC) The decision tree classifier uses trees, and based on the importance of an attribute, it divides or splits the feature that is most important for predicting the output. Gini impurity and information entropy are common metrics used to divide the classification. The trees are split based on the conditions at the root node, and thus, the impurity of the nodes are reduced [23, 24]. The better split would be the one which reduces the impurity of the child nodes, and thus, the least impurity or entropy defined by Eq. (2) will create a decision node. A complete decision tree is generated by recurring the process on all the remaining nodes. S=
C (− pi log2 pi )
(2)
i=1
where S stands for entropy and ranges from 0 to 1, C is the number of distinct classes and pi probability of an event.
3.3 Random Forest Classifier (RFC) Random forest first builds new datasets from the original data. Then, the model randomly select rows from the original data to build new datasets. The decision tree is trained on each of the bootstrapped datasets independently. The model randomly selects a subset of features for each tree and use only them for training. Since this is a classification problem, the prediction is made by taking the majority voting of all the decision trees [25, 26]. This classification can also be expressed mathematically as shown in Eq. (3). H (x) = arg maxY
n
(I (h i (x) = Y ))
(3)
i=1
where, h i prediction of a single decision tree, Y is the dependent variable and I stands for the indicator function.
3.4 Naive Bayes A naive Bayesian classification employs the Bayes theorem, with the assumption that the features predicting the target value are independent. The highest probability class is selected by calculating the probability of each class. Despite the fact that Naive
506
A. Kumar
Bayes violates the independence assumption, it still produces significantly lower error rates when compared to several sophisticated machine learning algorithms [27]. In classification, the zero-one loss function is likely to be responsible for its good performance. A number of incorrect predictions is defined as the error in this function. As the correct class has the greatest probability, the loss function does not penalize inaccurate probability estimates, unlike other loss functions like squared errors [28, 29]. Despite inter-attribute dependencies frequently leading to inaccurate probabilities, Naive Bayes classification still performs well. Given a feature vector X = (x1 , x2 , . . . , xn ) which are input variables and a class variable y which is output variable, according to Bayes Theorem, the probability of y when input features X is given is as given in Eq. (4). P(y|X ) =
P(X |y) ∗ P(y) P(X )
(4)
Since, the Naive Bayes assumes that all the independent variables are independent of each other, P(X |y) can be written as shown in Eq. (5). P(X |y) = P(x1 |y) ∗ P(x2 |y) ∗ P(x3 |y) ∗ · · · ∗ P(xn |y)
(5)
Since the denominator remains constant for all values, the posterior probability can be given as shown in Eq. (6). P(y|X ) ∝ P(y)
n
P(xi |y)
(6)
i=1
Naive Bayes attempts to select the class y with the highest probability. The Argmax operation simply finds the argument with the maximum value from a target function. Therefore, the maximum value of y is given as shown in Eq. (7). y = arg max y P(y)
n
P(xi |y)
(7)
i=1
3.5 Evaluation Metrics Coefficients of determination (R 2 ) are used as evaluation metrics for measuring the accuracy of all the models. Adjusted R 2 as shown in Eq. (8) is a statistical measure that examines how changes in one variable can be explained by a change in a second variable while predicting the outcome of an event [30]. R2 = 1 −
RSS TSS
(8)
Crop Recommendation for Maximizing Crop Yield …
507
where yi = actual values, yi = predicted values, n = total number of data points, RSS = Residual sum of squares, and TSS = Total sum of squares.
4 Exploratory Data Analysis The dataset was downloaded from an Indian governmental Website which is publicly available [31]. Table 1 shows the description of the dataset used in this study. The dependent variable was a categorical variable which contains 22 different crops. There were totally eight different attributes with 2200 observations in the dataset. There were no null values and duplicate values in the dataset. Therefore, the dataset was totally a clean data. An exploratory data analysis was done, and as shown in Fig. 1, nitrogen, phosphorous, and potassium comparison of all the crops were done. From Fig. 1, it can be inferred that nitrogen requirements are highest for crops like cotton, coffee, banana, muskmelon, watermelon, and rice, while its requirement is lower and almost equal for the rest of the crops. Phosphorous requirements are highest for crops like apple, grapes, and banana, while its requirement is lower and almost equal for the rest of the crops. Potassium requirement is highest for crops like apple, grapes, and chickpea, while its requirement is lower and almost equal for the rest of the crops. Figure 2 shows the requirement of rainfall (mm) for each crop. Good amount of rainfall (i.e., 150–250 mm) is needed for crops like rice, coconut, jute, and coffee. While crops like watermelon, muskmelon, lentil, mothbeans, mungbean, etc., need less amount of rainfall (i.e., 50–150 mm) to grow. Figure 3 shows the requirement of temperature (Celsius) for each crop. Crops like blackgram, mango, and papaya need temperature more than 30 ◦ C while crops like chickpea and kidney beans require temperature about 20 ◦ C. Figure 4 shows the relationship between pH value of the soil and each crop. pH value shows the amount of acidity/base in the water. pH value from 0 to 7 indicates that the water level in the soil is acidic, while pH value of 7 is neutral, and pH value of more than 7 indicates that the water level in soil is base. Crops like chickpea, blackgram, and orange need pH level more than 7 (acidic), while rice, maize, mango, banana, coconut, etc., need pH value less than 7 (basic). Figure 5 shows the relationship between relative humidity and all crops. Crops like rice, mung beans, pomegranate, muskmelon, apple, orange, papaya, and coconut need relative humidity more than 80%, while crops like chickpea, kidney beans, pigeon beans, and moth beans require least humidity level from 20 to 50%. Combined effect of humidity and rainfall on all the crops is shown in Fig. 6. From Fig. 6, it can be inferred that crops like rice and orange need high amounts of rainfall and humidity to grow. Crops like chickpeas and kidney beans require less amount of rainfall and humidity level to grow. While crops like grapes, watermelon, apple, and cotton need less amount of rainfall and high amount of humidity levels to grow.
508
A. Kumar
Table 1 Dataset description S. No. Variable name 1
N
2
P
3
K
4 5
Temperature Humidity
6
pH
7 8
Rainfall Label
Description
Data information
Nitrogen content’s ratio in soil (kg/ha) Phosphorous content’s ratio in soil (kg/ha) Potassium content’s ratio in soil (kg/ha) Temperature (Celsius) Ratio of relative humidity in % Soil’s pH value
Continuous value
Rainfall (mm) Dependent variable (22 different crops)
Continuous value Continuous value Continuous value Continuous value Continuous value, ranges from 0 to 14 Continuous value Maize, rice, kidneybeans, chickpea, mothbeans, pigeonpeas, blackgram, mungbean, pomegranate, lentil, mango, banana, watermelon, grapes, apple, muskmelon, papaya, orange, cotton, coconut, coffee, jute
The correlation matrix of all the variables is plotted as shown in Fig. 7. The attributes phosphorous and potassium are highly positively correlated with a correlation value of about 0.74. There is a negative correlation between nitrogen and phosphorus, i.e., − 0.23. The correlation value between nitrogen and potassium is also negative, i.e., − 0.14. Nitrogen and humidity level have almost a positive correlation value of about 0.20. The rest of the attributes has very less correlation with each other.
5 Results 5.1 Accuracy Comparison of All the Models The dataset was divided into two parts, training dataset and a testing dataset. 80% of the total data was used for training, and the remainder 20% was used for testing dataset. Logistic regression (LR), decision tree (DT), random forest (RF), and Naive Bayes (NB) machine learning algorithms were trained and tested on the dataset.
Crop Recommendation for Maximizing Crop Yield …
509
Fig. 1 N–P–K values comparison between all the crops
Fig. 2 Relationship between rainfall and crop type
Accuracy comparison of all the algorithms which was used in this study was done, and it is represented in Table 2. As shown in Table 2, the random forest outperformed all other algorithms with highest testing R 2 of about 99%. The next best algorithms were Naive Bayes, logistic regression, and decision tree with testing accuracy of about 98%, 95%, and 90%, respectively.
510
Fig. 3 Relationship between temperature and crop type
Fig. 4 Relationship between pH value and all crops
Fig. 5 Relationship between relative humidity and all crops
A. Kumar
Crop Recommendation for Maximizing Crop Yield …
Fig. 6 Combined effect of rainfall and humidity on crops
Fig. 7 Correlation matrix between all the variables
511
512
A. Kumar
Table 2 Accuracy of all the models S. No. ML algorithm 1 2 3 4
Logistic regression (LR) Decision tree (DT) Random forest (RF) Naive Bayes (NB)
Training R 2 (%)
Testing R 2 (%)
98.6
95.2
95.0 100 99.9
90.0 99.0 98.9
5.2 Outlook of the App As the accuracy of random forest was highest, it was used for the development of computer application. Figure 8 shows the outlook of the app [32] which can be used by anyone to predict the suitable crop based on the inputs like N–P–K, temperature, humidity, pH, and rainfall of a particular region.
6 Conclusion Using historical data, machine learning models can reasonably accurately predict whether a crop will be profitable or not. This study used four different machine learning algorithms to recommend crops according to the weather conditions and soil nutrients. Random forest outperformed rest of the algorithms in this study with a testing accuracy R 2 of about 99%. Through this work, farmers will increase the productivity of their agriculture and prevent soil degradation on cultivated land. They will also reduce the use of chemicals in crop production and make better use of water resources. Further research can be conducted by considering more varieties of crops in future. The current research focuses on twenty-two crops due to the limited availability of data. In future studies, soil fertility data could be assessed by considering more granular geographical conditions, based on micro nutrients data like sulfur, zinc, iron, manganese, etc. Also, a machine learning framework can be built which could recommend optimum amount of pesticides and fertilizers to be used for a particular crop. By doing so, the production of quality crops and the profits of farmers can be increased.
Crop Recommendation for Maximizing Crop Yield …
Fig. 8 Outlook of the app
513
514
A. Kumar
References 1. Gathala MK, Timsina J, Islam MS, Rahman MM, Hossain MI, Harun-Ar-Rashid M, McDonald A (2015) Conservation agriculture based tillage and crop establishment options can maintain farmers’ yields and increase profits in South Asia’s rice-maize systems: evidence from Bangladesh. Field Crops Res 172:85–98 2. Römheld V, Kirkby EA (2010) Research on potassium in agriculture: needs and prospects. Plant Soil 335(1):155–180 3. Linden G, Smith B, York J (2003) Amazon. com recommendations: item-to-item collaborative filtering. IEEE Internet Comput 7(1):76–80 4. Suma V, Shetty RA, Tated RF, Rohan S, Pujar TS (2019) CNN based leaf disease identification and remedy recommendation system. In: 2019 3rd international conference on electronics, communication and aerospace technology (ICECA). IEEE, pp 395–399 5. Parent B, Tardieu F (2014) Can current crop models be used in the phenotyping era for predicting the genetic variability of yield of plants subjected to drought or high temperature? J Exp Bot 65(21):6179–6189 6. Khoury CK, Bjorkman AD, Dempewolf H, Ramirez-Villegas J, Guarino L, Jarvis A, Struik PC (2014) Increasing homogeneity in global food supplies and the implications for food security. Proc Natl Acad Sci 111(11):4001–4006 7. Rehman TU, Mahmud MS, Chang YK, Jin J, Shin J (2019) Current and future applications of statistical machine learning algorithms for agricultural machine vision systems. Comput Electron Agric 156:585–605 8. Khoshnevisan B, Rafiee S, Omid M, Mousazadeh H, Rajaeifar MA (2014) Application of artificial neural networks for prediction of output energy and GHG emissions in potato production in Iran. Agric Syst 123:120–127 9. Muniasamy A (2022) Applications of data mining techniques in smart farming for sustainable agriculture. In: Research anthology on strategies for achieving agricultural sustainability. IGI Global, pp 454–491 10. Bhar A, Kumar R, Qi Z, Malone R (2020) Coordinate descent based agricultural model calibration and optimized input management. Comput Electron Agric 172:105353 11. Zou H, Fan J, Zhang F, Xiang Y, Wu L, Yan S (2020) Optimization of drip irrigation and fertilization regimes for high grain yield, crop water productivity and economic benefits of spring maize in Northwest China. Agric Water Manag 230:105986 12. Suresh G, Kumar AS, Lekashri S, Manikandan R (2021) Efficient crop yield recommendation system using machine learning for digital farming. Int J Mod Agric 10(1):906–914 13. Sharma AK, Chaurasia S, Srivastava DK (2018) Supervised rainfall learning model using machine learning algorithms. In: International conference on advanced machine learning technologies and applications. Springer, Cham, pp 275–283 14. Khan H, Ghosh SM (2020) Crop yield prediction from meteorological data using efficient machine learning model. In: Proceedings of international conference on wireless communication. Springer, Singapore, pp 565–574 15. Bendre MR, Thool RC, Thool VR (2015) Big data in precision agriculture: Weather forecasting for future farming. In: 2015 1st international conference on next generation computing technologies (NGCT). IEEE, pp 744–750 16. Hot E, Popovi´c-Bugarin V (2016) Soil data clustering by using K-means and fuzzy K-means algorithm. Telfor J 8(1):56–61 17. Navarro-Hellín H, Martinez-del-Rincon J, Domingo-Miguel R, Soto-Vallés F, Torres-Sánchez R (2016) A decision support system for managing irrigation in agriculture. Comput Electron Agric 124:121–131 18. Pudumalar S, Ramanujam E, Rajashree RH, Kavya C, Kiruthika T, Nisha J (2017) Crop recommendation system for precision agriculture. In: 2016 eighth international conference on advanced computing (ICoAC). IEEE, pp 32–36
Crop Recommendation for Maximizing Crop Yield …
515
19. Kumar A, Sarkar S, Pradhan C (2019) Recommendation system for crop identification and pest control technique in agriculture. In: 2019 international conference on communication and signal processing (ICCSP). IEEE, pp 0185–0189 20. Kumar S, Balakrishnan K (2019) Development of a model recommender system for agriculture using apriori algorithm. In: Cognitive informatics and soft computing. Springer, Singapore, pp 153–163 21. Alenzi HZ, Aljehane NO (2020) Fraud detection in credit cards using logistic regression. Int J Adv Comput Sci Appl 11(12) 22. Salillari D, Prifti L (2016) A multinomial logistic regression model for text in Albanian language. J Adv Math 12(7):6407–6411 23. Shouman M, Turner T, Stocker R (2011) Using decision tree for diagnosing heart disease patients. In: Proceedings of the ninth Australasian data mining conference, vol 121, pp 23–30 24. Maji S, Arora S (2019) Decision tree algorithms for prediction of heart disease. In: Information and communication technology for competitive strategies. Springer, Singapore, pp 447–454 25. Azar AT, Elshazly HI, Hassanien AE, Elkorany AM (2014) A random forest classifier for lymph diseases. Comput Methods Programs Biomed 113(2):465–473 26. Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1. IEEE, pp 278–282 27. Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2):103–130 28. Friedman JH (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77 29. Webb GI, Keogh E, Miikkulainen R (2010) Encyclopedia of machine learning. Naïve Bayes 15:713–714 30. Ozer DJ (1985) Correlation and the coefficient of determination. Psychol Bull 97(2):307 31. https://www.icfa.org.in/ 32. https://recommend-crops.herokuapp.com/
Empirical Study of Semantic Analysis to Generate True Time Authenticity Scores for Subreddits Microblogging Amit Kumar Sharma, Sandeep Chaurasia, Vibhakar Gupta, Mrityunjoy Chowdhury, and Devesh Kumar Srivastava
Abstract Fake news is now widely disseminated through social media networking platforms. These stories have factual inaccuracies and crumbed material, and as a result of these inaccuracies, unfavorable events have arisen in society. Various verification methods and algorithms have been used on social media platforms to manage such content. In a prior study, the linguistic structures of text were employed to detect false or authentic contents, and machine learning and deep learning algorithms were used to discern temporal patterns. The complex models of neural network architecture have been employed in recent research to aid in the selection of the right parameters from the dataset as well as the calculation of superior outcomes. In order to identify whether certain articles or statements are true or fake, this study employed two separate datasets: one to train the model and the other to generate real-time authenticity scores. The deep neural networks’ LSTM and Bi-LSTM techniques were used to train the suggested model. This research was successfully tested using real-time subreddit blogs, generating authenticity measures with their probability scores and verifying the blogs’ trustworthy sources.
A. K. Sharma · S. Chaurasia (B) · V. Gupta · M. Chowdhury Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur 303007, India e-mail: [email protected] A. K. Sharma e-mail: [email protected] V. Gupta e-mail: [email protected] M. Chowdhury e-mail: [email protected] D. K. Srivastava Department of Information Technology, Manipal University Jaipur, Jaipur 303007, India A. K. Sharma Department of Computer Science and Engineering, IcfaiTech, The ICFAI University, Jaipur 302031, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_39
517
518
A. K. Sharma et al.
Keywords Reddit · Kaggle · Deep neural networks · LSTM · Bi-LSTM · Natural language processing · Text analysis · Semantic analysis
1 Introduction The fake news has spread quickly on social media, posing a serious danger to civilization. During the COVID-19 epidemic, social media became a highly quick method of disseminating information; nevertheless, a large number of fake news were published in the process, causing confusion and ambiguity in the minds of the public [1–3]. The fake news had been spread via different mediums in during COVID-19 (Fig. 1) [4]. Fake news pieces have an influence on several levels of society, including the stock market, business, government, and politics. The earlier study was based on word embedding and several text metadata elements [4]. The word vectors are generated using word embedding techniques such as TF-IDF, Word2Vec, and Glove, and these vectors are commonly utilized in machine learning and deep learning systems [5–7]. The RNN, LSTM, Bi-LSTM, CNN, and hybrid models for social media content classification have all been widely employed in deep learning [8, 9]. The prior studies have shown that using a deep learning network model with a word embedding model produces better outcomes than using previous state-of-the-art approaches [10, 11]. The various datasets have been taken from Twitter, Facebook, and other news websites were utilized in prior research [12]. In this research, two types of datasets were employed for training and testing the model: one was produced from Kaggle, while the other was retrieved from the Reddit social media network, which has a large number of individuals who are sensitive to such news. The Kaggle dataset [13] was used to train the models, and the real-time statements from the Reddit dataset were utilized to test the results. This research focuses on determining the authenticity probability ratings of fraudulent and true real-time statements or articles in order to determine whether they are fake or true. The findings were obtained using LSTM and Bi-LSTM deep neural networks sequential layered architecture in this study.
2 Related Work The spreading of fake news includes misinformation that becomes crucial for the society. The paper has focused on the likelihood that a hoax that began at a specific time will continue to spread indefinitely in online social networks; a model based on branching process theory is developed, which considers the basic possible responses that users can have after reading a post whose content is a hoax [14]. Many studies have been conducted in the past to identify bogus news on social media. For diverse features extraction and finding the appropriate patterns from the data, several approaches and methodologies have been used. Different machine learning algorithms have been presented in the research [5] to detect fake news. The researchers
Empirical Study of Semantic Analysis to Generate True Time …
519
Fig. 1 Spread of fake news via different mediums in Spain during COVID-19 [4]
observed that BERT and comparable pre-trained algorithms are the most effective at detecting fake news, especially with small datasets. To gather tweets containing hate speech terms, the researchers used a crowdsourced hate speech lexicon [15]. The crowdsourcing has been used to categorize these tweets into hate speech, offensive language, and neither categories. To discriminate between these multiple categories, train a multi-class classifier. The study [6] proposes a method for categorizing messages on Twitter into three categories: hateful, offensive, and clean. This research has conducted experiments using the Twitter dataset, using N-grams as features and sending their TF-IDF values to a variety of machine learning models. The work [8] has presented methodology on sentiment analysis and classification problems. The experimental findings suggest that the C-LSTM outperforms both the CNN and the LSTM in these tasks and can obtain good outcomes. The study [16] proposes a machine learning ensemble voting classifier-based smart detection system to work with both real and false news categorization problems. This research [10] tries to evaluate and compare several ways to alleviate this problem, including classic machine learning approaches like Naive Bayes and popular deep learning approaches like hybrid CNN and RNN. This paper offers the groundwork for choosing a machine learning or deep learning strategy for problem resolution that strikes the right combination of accuracy and portability. A false news detection model based on a bidirectional LSTM model has been presented in the publication [9]. The model’s performance was evaluated using two freely available unstructured news article datasets. The results reveal that the bidirectional LSTM model outperforms other approaches for detecting false news, such as CNN, vanilla RNN, and unidirectional LSTM, in terms of accuracy. Two fresh datasets for the undertaking of bogus news places and identifying proof of fake content in online news have been explored in this study. As a result, the Ngram model identifies fraudulent content [17]. A convolutional bi-LSTM model for automatically recognizing unsuitable language remarks has been suggested [11] and tested in real-world language, outperforming both handcrafted feature and patternbased techniques considerably. The paper [18] combines CNN with LSTM, with a little modification. It proposes the NA-CNN-LSTM text classification model, which
520
A. K. Sharma et al.
has no activation function in CNN. The suggested model outperforms the traditional CNN or LSTM on the text classification dataset, according to the experimental findings. To categorize tweets into rumor and non-rumor, the suggested technique [13] combines CNN and bi-LSTM with glove embedding. All of the tests were conducted using freely available data from Kaggle, the world’s largest community. When compared to the baseline model, experimental results reveal that the suggested model outperforms.
3 Related Work Text analysis has been done using a variety of technologies, such as data cleansing, natural language processing, word embedding, machine learning, and deep learning [17, 19, 20].
3.1 Data Cleaning The data was gathered from multiple sources; however, because the raw data was unstructured and multilingual, it was necessary to clean it in order to get the intended results. Tokenization, case folding, eliminating stop words, deleting hyperlinks and symbols, and stemming are all examples of data preparation.
3.2 Word Embedding To construct the input data for neural networks, word embedding is necessary. Because machine and deep learning models do not comprehend characters, it is necessary to translate the sentence’s words into numbers. Each phrase has been broken down into words, and each word must be turned into an integer sequence. The network designs use these integers or numbers as input.
3.3 Deep Neural Network RNN The RNN approach has been used to identify sequential information and offer a decision based on probability applied to short words [6]. The bidirectional LSTM method has been utilized for large words [8, 16].
Empirical Study of Semantic Analysis to Generate True Time …
521
Fig. 2 Block diagram of an LSTM cell
LSTM To detect binary classification, the LSTM deep learning architecture predicts the outcomes. A typical LSTM unit consists of a cell, an input gate, an output gate, and a forget gate. The three gates regulate the flow of data into and out of the cell, and the cell remembers values over long periods. The LSTM is fed into words that have been transformed into fixed-length sequences. Between the layers, a dropout layer called the forget gate is placed, which removes any useless data from the network (Fig. 2). Bi-LSTM The bidirectional recurrent neural networks are made up of two independent RNNs that are coupled. At each time step, the networks can store both backward and forward sequence information thanks to this structure. The bidirectional LSTM processes inputs in two ways, one from the past to the future and the other from the future to the past, and it varies from the LSTM in that it travels backward and saves future data.
4 Proposed Model This component of the study proposes a deep neural-based model that predicts probabilistic scores to determine if news is false or not. The outcomes of the sequential LSTM and bi-LSTM deep neural network models were compared in this study. During data preparation, the word embedding model was used to produce word vectors that would be used as input for deep neural networks. The models were trained using the Kaggle dataset, and their effectiveness was evaluated using Reddit news statements and articles (Fig. 3).
Fig. 3 Proposed sequential text deep learning model
522
A. K. Sharma et al.
Table 1 Kaggle dataset sample Article
Label
Merkel ally cites thousands cyber-attacks Russian IP …
True
Latina restaurant owner threatened called stage Trump rally video community …
Fake
Vietnam dissidents daughter calls Melania Trump help reuters …
True
Pope invited radical proabortion socialist named Bernie …
Fake
Merkel ally cites thousands cyber-attacks Russian IP …
True
Latina restaurant owner threatened called stage Trump rally video community …
Fake
Vietnam dissidents daughter calls Melania Trump help reuters …
True
4.1 Dataset Collection This research has used two different datasets: one dataset has been prepared from Kaggle and second dataset has been created from Reddit. Dataset 1: This dataset comprises fake and true news with a total number of rows of around 45,000. There are four fields in total in the dataset: title, text, subject, and date. We prepared a csv file with text and labels for fake and true from this dataset (Table 1). This dataset was utilized to train the suggested model in this study. Dataset 2: This dataset was produced with the intention of putting the suggested model to the test. There are probability ratings for both fake and true threads in the sample (Table 2). This dataset was built by extracting items relating to news and politics from Reddit using the Reddit scraper.
4.2 Data Preprocessing The dataset required considerable preprocessing, which included converting upper case to lower case, deleting special symbols, white spaces, and stop words, among other things. The frequency of the words in certain sentences is significant in text analysis; in this study, we are employing the frequency of the words in specific sentences. The TF-IDF, word embedding model, which is based on the frequency of the particular word that has occurred in the corpus, was employed in the following step of data preparation. The TF-IDF word embedding produces word vectors, which are then fed into a deep neural network model.
Score 72 348 732 348 732 1085
Title
Elon Musk’s promised ventilators never delivered to California hospitals, governor’s office says
ISIS orders all women and girls in Mosul to undergo female genital mutilation
Justin Bieber arrested for drag racing/DUI (Miami)
ISIS orders all women and girls in Mosul to undergo female genital mutilation
Justin Bieber arrested for drag racing/DUI (Miami)
Actress Ellen Page has come out as a lesbian
Table 2 Reddit dataset sample
1xyegt
1vxp05
2bl8d7
1vxp05
2bl8d7
g2edod
ID
http://www.hrc.org/
http://www.nbcmiami.com/Justin-Bieber-Arrested-for-DragRacing-in-Miami-Beach.html
http://www.theguardian.com/world/2014/jul/24/isis-womengirls-fgm-mosul-un
http://www.nbcmiami.com/Justin-Bieber-Arrested-for-DragRacing-in-Miami-Beach.html
http://www.theguardian.com/world/2014/jul/24/isis-womengirls-fgm-mosul-un
https://www.foxcarolina.com/elon-musks-promised-ventil ators-never-delivered-to.html
URL
2299
3607
1109
3607
1109
395
comms_ num
Empirical Study of Semantic Analysis to Generate True Time … 523
524 Table 3 Parameters from sequential LSTM model
A. K. Sharma et al.
Layer (type)
Output shape
Param #
embedding (Embedding)
(None, 250, 64)
3,200,000
spatial_dropout1d (Dropout)
(None, 250, 64)
0
lstm (LSTM)
(None, 64)
33,024
dense (Dense)
(None, 2)
130
Total params: 3,233,154 Trainable params: 3,233,154 Non-trainable params: 0
4.3 LSTM Deep Learning Architecture The LSTM or long short-term memory architecture is made up of four layers. The first layer is the input layer, which has a size of 250, 64, indicating that there may be a maximum of 250 word inputs and each word vector can be 64 characters long. The LSTM layer, which contained 64 neurons and 0.5 dropouts, is the second layer. The specific dropout layer follows, followed by the output layer of two neurons, with the article being categorized into the class with the highest probability. The model was trained with a batch size of 128 and 5 epochs. The binary classification problem was monitored using the binary cross-entropy function. For classifying articles using calculated scores at the final layer, we compare Ptrue and PFalse . Labels will be assigned based on which of them is greater, i.e., label = max(PTrue , PFalse ). The LSTM model has trained on different parameters (Table 3).
4.4 Bi-LSTM Deep Learning Architecture The bi-LSTM model, also known as the bidirectional LSTM model, has four layers. The first layer is the input layer (250, 64), where 250 is the article length and 64 is the word embedding length. The bidirectional LSTM with 64 units was the following layer. After that, a hidden layer with 64 units was added, followed by the last layer, the output layer, which consists of two neurons, and finally, the article was categorized into the class with the highest probability. This model uses the same loss function as the LSTM, namely the binary cross-entropy loss function. For classifying articles using calculated scores at the final layer, compare PTrue and PFalse . Labels will be assigned based on which of them is greater, i.e., label = max(PTrue , PFalse ). The model has trained on different parameters (Table 4).
Empirical Study of Semantic Analysis to Generate True Time … Table 4 Parameters from sequential bi-LSTM model
525
Layer (type)
Output shape
Param #
embedding (Embedding)
(None, 250, 64)
3,200,000
bidirectional (Bidirectional)
(None, 128)
66,048
dense (Dense)
(None, 64)
8256
dense_1 (Dense)
(None, 2)
130
Total params: 3,274,434 Trainable params: 3,274,434 Non-trainable params: 0
5 Proposed Model 5.1 Experiment Setup The models were tested using Kaggle as a training dataset and Reddit as a testing dataset in this study. In this study, 90% of the data from Kaggle was utilized to train the model, while 10% was used to test the model’s accuracy. For experimental evaluation, this study used a cloud-based Python IDE which was provided by Google. The following libraries were used to analyze the results: scikit-learn version 0.22.2, post 1, Seaborn version 0.11.1, Keras version 2.4.3, Matplotlib version 3.2.2, Pandas version 1.1.5, NumPy version 1.19.5, and NLTK version 3.2.5.
5.2 Experiment Results Result 1: In this research, first result has been received from LSTM sequential architecture, after number of 5 epochs the accuracy of trained the model was 99.38% and validation accuracy received 99.89% (Table 5). Result 2: The results have been received from bi-LSTM sequential architecture, after number of 4 epochs the accuracy of trained the model was 99.67% and validation accuracy received 99.95% (Table 6). Table 5 Loss and accuracy of LSTM model Epoch
Loss
Accuracy
Validation loss
Validation accuracy
Epoch 1
0.4152
0.8589
0.1621
0.9680
Epoch 2
0.2113
0.9495
0.1232
0.9895
Epoch 3
0.1015
0.9734
0.0467
0.9931
Epoch 4
0.0565
0.9912
0.0023
0.9956
Epoch 5
0.0113
0.9938
0.0004
0.9989
526
A. K. Sharma et al.
Table 6 Loss and accuracy of bi-LSTM model Epoch
Loss
Accuracy
Validation loss
Validation accuracy
Epoch 1
0.3657
0.9001
0.1823
0.9743
Epoch 2
0.2042
0.9626
0.1345
0.9902
Epoch 3
0.0902
0.9814
0.0362
0.9950
Epoch 4
0.0265
0.9944
0.0019
0.9983
Epoch 5
0.0092
0.9967
0.0002
0.9995
Table 7 Probability score received from LSTM and bi-LSTM for five true time sentences Sentences
Scores LSTM
Bi-LSTM
Fake
True
Fake
True
Sentence 1
0.8836
0.1128
0.9757
0.0246
Sentence 2
0.7952
0.2034
0.5468
0.4357
Sentence 3
0.5842
0.4239
0.8687
0.1298
Sentence 4
0.1029
0.8974
0.3556
0.6192
Sentence 5
0.3156
0.6734
0.1035
0.8894
Result 3: Further, in this research, both LSTM and bi-LSTM model are tested on four Reddit dataset sentences and one article which have been created from labeled Kaggle dataset. In this experiment, five true time statements have been selected and calculate the probability score for fake and true statements (Table 7). Sentence 1: Vote in Senate on ‘Dreamers’ hinges on a bipartisan pact: McConnell WASHINGTON (Reuters)…. Sentence 2: Las Vegas shooting: Isis claims responsibility for deadliest gun massacre in US history. Sentence 3: Kanye West halts Sydney concert after two fans refuse to stand up…. Sentence 4: Man who wore horns in Capitol riot moved to Virginia jail that serves organic food. Sentence 5: 67% of Illinois adults have received at least one vaccine dose and more than 50% are fully vaccinated.
5.3 Results Evaluation In the experiment results, both the LSTM and bi-LSTM models provided accurate training and validation accuracy (Tables 5 and 6). In comparison with LSTM, the
Empirical Study of Semantic Analysis to Generate True Time …
527
Fig. 4 LSTM model loss of train and test data
findings suggest that the bi-LSTM model is better trained (Figs. 5 and 7). Both models have calculated loss values that values are falling down after every epoch, means both models have been trained well (Figs. 4 and 6). This research also calculates the probability scores for true and fake news sentences (Figs. 8 and 9). The results (Table 7) show that bi-LSTM provides the better probability scores for the statements. The result shows that sentences 1, 2, and 3 are the fake sentences and sentences 4 and 5 are true sentences (Table 7). Fig. 5 LSTM model accuracy of train and test data
Fig. 6 Bi-LSTM model loss of train and test data
528
A. K. Sharma et al.
Fig. 7 Bi-LSTM model accuracy of train and test data
Fig. 8 Probability scores of fake and true statements by LSTM model
Fig. 9 Probability scores of fake and true statements by bi-LSTM model
6 Conclusion and Future Work The manually identification of the true and fake news is a very challenging task to people. This research provides the solution for identifying the fake or real news. This research is based of deep learning architecture that provides better results for sequential text analysis. In this research, bi-LSTM architecture provides better trained and test accuracy as compared to LSTM architecture. The proposed architectures have been successfully applied on real-time statements and through their probability
Empirical Study of Semantic Analysis to Generate True Time …
529
scores and identified the fake or true statement. Therefore, further automated verification would be the future work for our study. Therefore, further areas are opened in this study of verifying the news through motif datum-based indexed analysis or generalized online confidence-based mining.
References 1. Thorne J et al (2017) Fake news stance detection using stacked ensemble of classifiers. In: Proceedings of the 2017 EMNLP workshop: natural language processing meets journalism 2. Talwar S et al (2020) Sharing of fake news on social media: application of the honeycomb framework and the third-person effect hypothesis. J Retail Consum Serv 57:102197 3. Buntain C, Golbeck J (2017) Automatically identifying fake news in popular twitter threads. In: 2017 IEEE international conference on smart cloud (SmartCloud). IEEE 4. Fernández-Torres MJ, Almansa-Martínez A, Chamizo-Sánchez R (2021) Infodemic and fake news in Spain during the COVID-19 pandemic. Int J Environ Res Public Health 18(4):1781 5. Khan JY et al (2021) A benchmark study of machine learning models for online fake news detection. Mach Learn Appl 4:100032 6. Viswapriya SE, Gour A, Chand BP (2021) Detecting hate speech and offensive language on twitter using machine learning 7. Ombabi AH et al (2017) Deep learning framework based on Word2Vec and CNNfor users interests classification. In: 2017 Sudan conference on computer science and information technology (SCCSIT). IEEE 8. Zhou C et al (2015) A C-LSTM neural network for text classification. arXiv preprint arXiv: 1511.08630 9. Bahad P, Saxena P, Kamal R (2019) Fake news detection using bi-directional LSTM-recurrent neural network. Procedia Comput Sci 165:74–82 10. Han W, Mehta V (2019) Fake news detection in social networks using machine learning and deep learning: performance evaluation. In: 2019 IEEE international conference on industrial internet (ICII). IEEE 11. Yenala H et al (2018) Deep learning for detecting inappropriate content in text. Int J Data Sci Analytics 6(4):273–286 12. Brassard-Gourdeau É, Khoury R (2018) Impact of sentiment detection to recognize toxic and subversive online comments. arXiv preprint arXiv:1812.01704 13. Rani N, Das P, Bhardwaj AK (2021) A hybrid deep learning model based on CNN-BiLSTM for rumor detection. In: 2021 6th international conference on communication and electronics systems (ICCES). IEEE 14. Coluccia A (2020) On the probabilistic modeling of fake news (hoax) persistency in online social networks and the role of debunking and filtering. Internet Technol Lett 3(5):e204 15. Davidson T et al (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the international AAAI conference on web and social media, vol 11, no 1 16. Mahabub A (2020) A robust technique of fake news detection using ensemble voting classifier and comparison with other classifiers. SN Appl Sci 2(4):1–9 17. Konagala V, Bano S (2020) Fake news detection using deep learning: supervised fake news detection analysis in social media with semantic similarity method. In: Deep learning techniques and optimization strategies in big data analytics. IGI Global, pp 166–177 18. Luan Y, Lin S (2019) Research on text classification based on CNN and LSTM. In: 2019 IEEE international conference on artificial intelligence and computer applications (ICAICA). IEEE
530
A. K. Sharma et al.
19. Kosmajac D, Keselj V (2017) Language identification in multilingual, short and noisy texts using common N-grams. In: 2017 IEEE international conference on big data (big data). IEEE 20. Kanakaraj M, Guddeti RMR (2015) NLP based sentiment analysis on Twitter data using ensemble classifiers. In: 2015 3rd international conference on signal processing, communication and networking (ICSCN). IEEE
A Time Series Classifier-Based Ensemble for Predictive Maintenance of Machines Deepali Deshpande , Dhruva Khanwelkar , Harsh More , Nirvisha Soni , Juhi Rajani , and Chirag Vaswani
Abstract Maintenance of machines and equipment is crucial since timely maintenance ensures that they work efficiently, and their damage and the consequent expenses are prevented. However, most of the time, it is hard to predict the failure of machines beforehand due to unforeseen circumstances, which can result in massive losses. Though several studies have been conducted that use machine learning techniques for detecting the points of failure, such detections would not be helpful when put into production, as it is imperative to foresee their occurrence rather than considering it as a standard classification task. This study aims to leverage state-of-the-art time series classification algorithms to raise warning flags before any failure occurs. An ensemble of three algorithms, namely time series forest classifier, catch22 classifier, and Arsenal is proposed, which outperforms an LSTM model. Results show that the proposed approach achieves a recall of 83.7% against the 66.5% recall of the LSTM model and is practically more adept at raising alarms before an actual failure. Keywords Arsenal · Catch22 · Maximum voting · Predictive maintenance · Time series forest
1 Introduction Reliance on machines has solved an array of problems with the solution of automation. Almost every other physically intensive task has been automated to use machines. This still remains one of the best arguments favoring more and more automation and has subsequently given rise to an industry that focuses on building these machines. Introducing machines to solve intensive tasks didn’t just solve a problem but also gave rise to one—maintenance. Irregular maintenance severely affects the output of the machine and hence defeats the purpose of its introduction. Maintenance alone D. Deshpande · D. Khanwelkar · H. More · N. Soni · J. Rajani (B) · C. Vaswani Vishwakarma Institute of Technology, Pune, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_40
531
532
D. Deshpande et al.
contributes majorly to the 56% of gross value added (GVA) captured by the service industry in India. Hence, knowing well in advance about a potential maintenance repair optimizes the capital expenditure on a high level and improves machine health on an individual level. One way to develop a system that can accurately predict when a machine is malfunctioning or about to malfunction is to write a set of rules that define the machine’s health using its working state parameters. This, however, is momentary and, as a result, doesn’t capture enough information which may lead to many faulty predictions or even predictions that don’t have enough utility. This leads to a solution in which the system doesn’t just focus on an instance but on a series of instances that can be sequential. Enter machine learning algorithms, and the systems gain the ability to model the data that is being generated and has been generated to formally define a temporary state that is immediately followed by malfunction and hence identify in advance. The main aim of this study is to leverage state-of-the-art time series algorithms to classify whether maintenance is required in the foreseeable future. To create appropriate projections, time series algorithms take data from a specific period as input and understand the underlying pattern represented by the data points. An ensemble of three classifiers is employed to boost the performance even more. Such an ensemble of models combines the ability of multiple models to form a new model that better explains the data than its components. One of the integral parts of the ensemble is the time series forest classifier, built using multiple decision trees over a random interval. This is accompanied by Catch22, a model that is a signature of time series with respect to properties and scaling factors, and Arsenal, which relies on the power of simple linear classifiers using random convolutional kernels. These algorithms are trained on the data in question and then their predictions or instance outputs are subjected to max voting—upon which the final output of the ensemble model is decided. In the subsequent sections, we discuss the previously implemented approaches, describe the algorithms used in this study, expand upon the methodology, and compare the results of the proposed approach with already implemented techniques to highlight its effectiveness.
2 Literature Survey Predictive maintenance (PdM) can be simplified using a variety of ways. Featured engineering approaches and machine learning models are used to estimate failure risks up to five days ahead of time using sensor data [1]. The maximum accuracy is determined by comparing different machine learning models from the literature. When three different machine learning models were compared on the supplied dataset, the gradient boosting classifier gave the highest prediction accuracy of 99% for one day in advance prediction [1]. By categorizing research according to ML algorithms, [2] study intends to thoroughly evaluate current breakthroughs in ML approaches widely used to PdM for smart manufacturing in I4.0. The goal of the study conducted by Theissler, Andreas, et al. is to provide a thorough overview of
A Time Series Classifier-Based Ensemble for Predictive Maintenance …
533
ML-enabled PdM in automotive applications to a wide spectrum of readers. Firstly, the most important ML subspecialties for predictive maintenance. Articles on MLbased PdM for automotive systems are fully analyzed and categorized from both a use case and a machine learning standpoint. The most common use cases and commonly used machine learning algorithms are identified. Open difficulties in the subject and suggested prospective research options as a substantial contribution are highlighted [3]. In another study, the model was trained using eight different algorithms, and it concludes that the isolation forest algorithm performs the best, although its training time is a bit higher. The anomaly scores were calculated using the trained models on an unknown mixed normal and anomalous dataset [4]. According to the established threshold, the scores were then classified as anomalous or normal. If the anomaly score exceeds the threshold, it is labeled 1 as anomalous, and 0 if it is less than the threshold. In a paper published by Dr. P. Karuppusamy, a random forest model is constructed to predict the breakdown of several machines in the manufacturing industry [5]. The results are compared with the decision tree (DT) algorithm, demonstrating that the DT approach is more precise and accurate. Various machine learning techniques were covered, including generalized linear models, random forest, gradient boosting machine, and deep learning [6]. ML techniques were leveraged for the predictive maintenance of equipment, and these models were compared. Another paper analyzes the operational condition of hydro generators. Radial basis function neural network (RBNFF) performs well in analyzing sensor data [7]. By utilizing Azure Machine Learning Studio to train a random forest technique, the recommended predictive maintenance methodology makes it possible to apply dynamic decision rules for maintenance management. Preliminary results show that the technique correctly forecasts different machine states with high accuracy, based on a data set of 530,731 data readings on 15 different machine parameters collected in real time from the tested cutting machine (95%) [8]. A design and operating strategy for predictive maintenance systems are proposed in [9]. IoT and cellular communication networks are used to monitor and report on machine conditions and states continuously. The link between features and state is derived by processing machine condition signals to extract features. In the operating phase, these relationships are utilized to anticipate the likelihood of failure and, as a result, the decision to perform service. Decision trees and decision forests are utilized to represent the linkages discovered. This research explains how ML techniques were implemented to design and develop a nuclear infrastructure predictive maintenance framework. The system can predict reactor infrastructure and engine failure and classify cycle counts. Furthermore, it improves accuracy while requiring no additional processing power. The predictions were performed using support vector machine and logistic regression techniques [10]. In this study, we aim to bridge the gaps evident in previous studies and implement an efficient ensemble-based classifier for early-stage detection of machine failures. We implement three time series classifiers, namely time series forest, Catch22, and
534
D. Deshpande et al.
Arsenal, which have showcased state-of-the-art results in several time series classification tasks. In the upcoming sections, we discuss the robustness of the proposed approach and highlight the superior performance against an LSTM model.
3 Materials and Methods 3.1 Time Series Forest Classifier In the TSF classifier, the splitting rule is being utilized to estimate the most efficient way to divide a node in a time series tree, which serves as the basis for a time series forest. For simplicity, the root node is considered in the following condition: f k (t1 , t2 ) ≤ τ
(1)
where τ is the threshold [11]. The training instances that satisfy the condition go to the left node, whereas the rest go to the right node. There is a requirement for a splitting rule to determine the optimal division S as well. The criterion for splitting is a combination of gain in entropy and a distance measure. In tree models, entropy gain is frequently employed as a splitting criterion. The node’s entropy can be stated as follows: Entropy =
C
γc log log γc
(2)
c=1
where γc stands for the number of instances for corresponding classes. The difference between the weighted sum of entropy at the child nodes and the parent node, where the weight at a child node represents the percentage of cases assigned to that child node, is used to measure the increase in entropy for a split. Entropy is a metric that indicates how beneficial it is to be able to distinguish across classes. One drawback of TSF is that the number of candidate splits might be considerable, and numerous candidates breaking with the same entropy are common. The entrance gain is employed as a separation criterion in constructing a time series tree, which employs a top-down, recursive technique similar to that used in traditional decision tree methods. Moreover, a node is designated as a leaf if the entropy gain does not improve. A group of time series trees is called a time series forest (TSF). TSF predicts that a testing case will be the dominant class based on the votes from all-time series trees [11]. The random sampling strategy employed in random forests (RF) is also taken into consideration [1]. At each node, RF only √ checks p features chosen at random from the p features. Every time series tree node √ √ has O( M) interval widths and O( M) beginning positions chosen at random. If the features in question have varied scales, the split with the highest entrance gain for each feature type f k can be chosen, as described above. Finally, the split with
A Time Series Classifier-Based Ensemble for Predictive Maintenance …
535
maximum gain in entropy is selected as the best split among the various features and this process continues until no improvement is observed in the gain.
3.2 Canonical Time Series Characteristics (Catch22) Catch22 is a set of 22 explanatory qualities for analyzing time series data [12]. The purpose of catch22 is to create a compact and meaningful subset of the 7658 time series characteristics found in the hctsa toolkit. These characteristics might be applied to any mining situation. The feature selection and testing technique described in [12], however, mostly relies on the UCR archive’s categorization. Because more than half of the data in the UCR archive has been normalized, 766 of the 7658 hctsa traits that were susceptible to mean and variance were removed. This was subsequently whittled down to 4791 choices by eliminating characteristics that can’t be estimated on more than 80% of datasets. Data features like recurring values and negative values cause this failure. To minimize the number of characteristics even more, a three-step procedure was adopted. A decision tree classifier was used to perform stratified cross-validation across each dataset in the UCR repository for each attribute. The class-balanced accuracy metric was used to keep attributes that performed much better than blind chance. The balanced correctness of these key characteristics was then sorted, and those that fell below a certain level were deleted. The correlation matrix of the remaining attributes was subjected to hierarchical clustering to reduce redundancy. The resultant clusters were further ranked according to their accuracy, and from each of the resulting 22 clusters, a feature was taken based on accuracy findings, calculation time, and comprehensibility. The catch22 features cover a wide range of ideas, including time series statistics, linear correlations, and entropy, to name a few. The obvious method to utilize catch22 for classification is as a transform before developing a classifier. The catch22 characteristics are input to the random forest in this investigation, and the results are collected.
3.3 Arsenal: A ROCKET Ensemble Random convolutional kernel transform (ROCKET) is considered to be the most significant recent advancement in the industry because it is a highly quick classifier with cutting-edge accuracy. The essence of ROCKET lies in using randomly initialized convolutional kernels for generating several summary stats and selecting the most useful ones with the help of a linear classifier. The maximum value and percentage of positive values are noted and combined into a feature vector when each kernel is applied to a series. Then, a linear ridge regression classifier is constructed using these features. However, when ROCKET is integrated with the hierarchical vote collective of transformation-based ensembles (HIVE-COTE) [1], the ridge regressor fails to accurately output probability values associated with multiple classes.
536
D. Deshpande et al.
As a result, a collection of smaller ROCKET classifiers called Arsenal was introduced for integration with HIVE-COTE 2.0 [2]. A majority vote is used to classify new instances in Arsenal. Although Arsenal takes longer to construct than ROCKET, it is a superior option for HC2 because of its improved probabilities. In terms of accuracy, Arsenal is not more accurate than default ROCKET, and in HC 2.0, Arsenal performs similarly when employing the same probabilities-generation approach as ROCKET.
4 Experimentation See Fig. 1.
4.1 Dataset In this study, the pump sensor data available on Kaggle is used for training the model to predict the failure of a water pump before it breaks [13]. The data consists of the records of a water pump, and the sensor readings pertinent to this pump which have been collected per minute. There are a total of 52 sensors considered in the data; however, not all are used for the final model. There are around 200k records of the
Fig. 1 Project flow diagram of the methodology of this study. The high-dimensional pump sensor data undergoes feature engineering along with feature selection to filter out unimportant sensor values. The three time series models (time series forest classifier, Catch22, and Arsenal) are then fitted on this processed data, after which maximum voting takes place, and the final outputs are considered for the result analysis
A Time Series Classifier-Based Ensemble for Predictive Maintenance …
537
Fig. 2 Distribution of the classes in the data
pump functioning normally, 14k records of the pump being in a recovery phase, and 7 records of the pump being broken (Fig. 2).
4.2 Feature Selection Since the data under consideration consists of 51 sensors, which makes the data highly dimensional and computationally expensive, feature selection becomes a crucial step in this process. To identify the dataset’s most important features, we employ an extra tree classifier and rank the sensors according to their importance. The most important features in the data are shown in Fig. 3. Most of the sensors have very high importance for predicting the machine status, whereas some sensors, such as sensors 38, 40, 13, 51, 9, 8 have relatively less importance. However, we conduct more tests for these variables rather than deleting them directly. Figure 4 shows the correlation of the selected variables with the output variable machine status. From the figure, we can infer that many variables have a strong negative correlation with the machine status and would be of utmost importance. Sensors 13 and 51 do not correlate well with the output variable and hence are dropped. Of the remaining variables, their inter-correlation needs to be checked before finally filtering out the set of features that would be fed to the models. Several sensors in the dataset have a very high correlation with each other and record similar signals during the entire period. Figure 5 depicts one such correlation between sensor signals, and several such inter-correlations are prevalent in the dataset. Feeding such data to the models would be computationally expensive and would also result in poor performance of the models. Thus, selecting only one sensor from similar subsets is warranted.
538
D. Deshpande et al.
Fig. 3 Feature importance using extra tree classifier
Fig. 4 Heat map of correlation of the sensors with the output variable (machine status)
Fig. 5 Sensors 6 to 9 have similar features
A Time Series Classifier-Based Ensemble for Predictive Maintenance …
539
Sensors 8 and 9 are highly correlated with sensor 6, whose correlation is the highest with the machine status; hence we consider only sensor 6 in the final set and drop the other two sensors. Similarly, sensor 5 has a moderate correlation with the output variable, and it is correlated with sensor 6, which has been considered for modeling, so sensor 5 is dropped as well. Considering these feature selections, we finally have data of 10 sensors that are used for further processing and classification.
4.3 Feature Engineering An important aspect of time series classification problems is feature engineering. While the original data is at equal time stamps, some additional features need to be introduced so that the models understand the temporal relationship between records and ultimately perform better. Firstly, the running averages of the sensor data are calculated. These averages are taken over a rolling window of 60 previous instances, i.e., the average readings of the past hour are taken into consideration. Rolling averages can be solid information providers since if the average fluctuates from the normal due to a new instance, it can be easily detected, thus aiding the machine learning models to make sound decisions. Similarly, rolling standard deviations, as well as rolling maximum values, are also considered based on the hypothesis that these rolling statistics would be able to detect anomalies more prominently than normal sensor readings. Furthermore, another variable is introduced, which calculates how much variation has occurred within the mean observed. This feature extracts the fluctuations of the mean from the normal values to highlight certain spikes or anomalies. Since the goal of the model is to predict the occurrence of the failure beforehand, there is no need to detect recovering phases separately; hence, they are labeled as broken as well. To cater to our objective of early detection records 12 h before the actual failure are also marked as broken. Finally, the data consists of 10 features, around 200,000 instances of normal functioning, and 19,000 instances of failure, constituting sensor records before and after the actual failure.
4.4 Model Training The water pump sensor data contains seven instances of actual failures over five months. For training the three models used in this study, the data is split in such a way that the training set contains five failures of the dataset, whereas the test data includes the remaining two failures. As a result, the training data comprises 128,161 records, and the test contains 92,159 instances. This data is then standardized using the min–max scaler and is further converted to a nested pandas ‘DataFrame’ with a single column, which is the required input format for the classifiers used in this study. Then the time series classifier with the number of estimators set to 50 is fitted to this nested data. Parallelly, the Catch22 classifier is also trained with this
540
D. Deshpande et al.
data, which initially undergoes transformation via the catch22 transformer, and this transformed data is then fed to a random forest with five trees. Finally, the data is fed to Arsenal, with 100 kernels for each ROCKET transform and five estimators. Once the results of all three classifiers are obtained, they undergo maximum voting, i.e., the majority of the classifier results for every test input are considered the ensemble’s final output. For comparing the performance of the proposed approach, an LSTM model is trained with the transformed data that is given as an input to the ensemble. The LSTM model contains three blocks, each consisting of an LSTM layer followed by a Leaky ReLU and a dropout layer. The Adam optimizer with a learning rate of 0.01 is used while compiling, and the model is then trained for 50 epochs. The results of both models are then analyzed to gain insights.
5 Results and Discussions The outcomes of the investigations conducted in this study are discussed in this section. Firstly, we analyze the performance of the ensemble model proposed in the study. Then we take a look at the results of the LSTM model for comparison. Furthermore, we take into consideration a realistic view of the results to analyze how such a model might function in the industry. Finally, we see the actual visualization of the prediction to observe whether the system successfully predicts the failure early or not. Figure 6 shows the confusion matrix of the ensemble method when the test data is given as input. From the figure, we can see that the model performs strongly while predicting the normal instances of the machine. Around 99.5% accuracy is observed when the model has to identify the cases where the pump functions normally, and a meager 392 records are misclassified. However, false positives can be excused in this problem since they would not lead to any loss, whereas failure to detect the problem is unacceptable. Around 1599 records are classified as normal when they were supposed to be predicted as broken. The performance is impressive as 83.7% of the records which need to be flagged are correctly predicted. Figure 7 on the other hand shows the confusion matrix of the LSTM model, which is given the same input data as that of the ensemble model. The LSTM model is also impressive for predicting instances when the pump is functioning normally. Around 98% accuracy is observed when the model has to identify the instances where the pump functions normally, and a meager 1.4% of records are misclassified. However, the recall observed for the LSTM model for class 1 is just 66.5% which is detrimental to the objective of this study. While LSTM is known to perform well on time series problems, it showcases a modest performance for identifying the records where the pump must be declared as broken. The metrics used previously are useful to gain insights into the statistical performance of the proposed methodology; however, these measures do not shed light on how exactly the model might perform when put into production. The model outputs continuous alarm signals since it has been trained to identify the records leading up
A Time Series Classifier-Based Ensemble for Predictive Maintenance …
541
Fig. 6 Confusion matrix of the ensemble method
Fig. 7 Confusion matrix of the LSTM method
to the actual failure as potential warnings. But in industries, if the model raises an alarm, then that alarm would be effective for a certain window of time as necessary measures or repairments would be taken immediately, thus making the subsequent alarms of the model redundant. Moreover, the confusion matrix needs to be redefined accordingly to get the actual performance of the model. Firstly, the outputs of both the proposed approach and the LSTM model are processed to remove redundant alarms. For the purpose of this study, a window of 12 h is taken into consideration, i.e., when the model raises an alarm, it is valid for 12 h before the next signals are taken into consideration. Thus, if the model raises an alarm continuously within the 12-h window, then all of them are aggregated to the first instance of the alarm. With this processing, the total alarms raised by both the models reduce and give a realistic scenario of their response. The test data contains two instances when the machine actually fails. Post aggregation of redundant alarms, the proposed approach flags potential failures 16 times as against the 14 alarms raised by the LSTM model. With these results, we define a modified confusion matrix. The true positives are those instances where the machine failed and an alarm was raised in the 12-h
542
D. Deshpande et al.
Fig. 8 Realistic confusion matrices of the proposed and the LSTM model
window preceding it. Conversely, false negatives are the instances where the machine fails but do not fall under the true positive category, i.e., no alarm was raised in the window before the machine failed. The instances where the alarm was raised, but was beyond the 12-h window were flagged as false positives. The rest of the instances were considered to be true negatives. From Fig. 8, we can observe that out of the 16 alarms raised by the proposed model, 1 is true positive, i.e., one of the two failures of the test dataset was forecasted by this model. There were 14 false alarms, however, in this case, false alarms are not as harmful as false negatives since false alarms would only ensure extra precautions. Since the model fails to detect the second failure, 1 false negative is observed in the confusion matrix. On the other hand, the confusion matrix of the LSTM model shows that both failures remain undetected by it. The remaining 12 alarms are false positives; however, these alarms are raised well before the false positives of the proposed model, which may be undesirable. Thus, in the case where both models are put into production, the proposed approach would fare better than the LSTM model. To visualize the effectiveness of the predictions made by the ensemble, Fig. 9 plots the principal components of the data four days ahead and after one of the failures, along with the predictions made by the model. As seen in the figure, the model raises an alarm three times before the failure on July 8th with the last alarm being raised just minutes before the actual failure. Thus, we can safely say that the proposed model is efficient in detecting failure well in advance and can ultimately save lots of damage to the water pump.
6 Conclusion and Future Scope This paper focuses on using cutting-edge time series algorithms to determine whether maintenance is necessary for the near future. Ensemble learning has been used as an effective machine learning method that aims to improve prediction accuracy by
A Time Series Classifier-Based Ensemble for Predictive Maintenance …
543
Fig. 9 Plot of model predictions in the period of July 4th to July 12th, 2018. Blue lines depict the alarms raised by the model, whereas the red line indicates the actual status of the machine. The gray lines indicate the principal components of the sensor data
mixing results from many models. Results of three models, namely time series forest classifier, canonical time series characteristics (Catch22), and Arsenal, are used. The resultant model accurately forecasts the machine’s normal instances, with an accuracy of 99.5%. This performance is then compared with the results of the LSTM model. In comparison, it is observed that LSTM achieves an accuracy of 98.6% when identifying the instances of normal functioning. However, on comparing the sensitivities of the models, the proposed approach outperforms the LSTM model with a sensitivity of 83.5% as against 66.7%, thus highlighting the ability of our model to capture more instances of the records leading up to the actual failures. Moreover, comparing the models after processing their outputs reveals that the ensemble model raises correct alarms before one of the two failures in the test dataset, while the LSTM model fails to do so for both failures. Thus, this study shows promising results for the early detection of water pump failures, and its applications can be extended to several machines requiring frequent maintenance in the industry. The future scope of this study is to have a more robust dataset and investigate a diverse collection of features, mainly in the frequency domain. More complex and computationally expensive algorithms such as HIVE-COTE 2.0, and ROCKET can be explored, and their effectiveness can be investigated. Algorithms that have the potential to show impressive performances when fed with the original, unprocessed data can be implemented to eliminate several steps involved in our methodology. This study aims to open up new paths in the area of predictive maintenance utilizing machine learning and take a step further toward complete automation of this process.
References 1. Ouda E et al (2021) Machine learning and optimization for predictive maintenance based on predicting failure in the next five days. In: Proceedings of the 10th international conference on operations research and enterprise systems. SCITEPRESS—Science and Technology Publications. https://doi.org/10.5220/0010247401920199 2. Çınar ZM et al (2020) Machine learning in predictive maintenance towards sustainable smart manufacturing in Industry 4.0. Sustainability MDPI AG (19):8211. https://doi.org/10.3390/su1 2198211 3. Theissler A et al (2021) Predictive maintenance enabled by machine learning: use cases and challenges in the automotive industry. Reliab Eng Syst Saf 107864. https://doi.org/10.1016/j. ress.2021.107864
544
D. Deshpande et al.
4. Riazi M et al (2019) Detecting the onset of machine failure using anomaly detection methods. Big Data Anal Knowl Discov 3–12. https://doi.org/10.1007/978-3-030-27520-4_1 5. Karrupusamy P (2020) Machine learning approach to predictive maintenance in manufacturing industry—a comparative study (4):246–255. Inventive Research Organization, Jan 2021. https://doi.org/10.36548/jscp.2020.4.006 6. Butte S, Prashanth AR, Patil S (2018) Machine learning based predictive maintenance strategy: a super learning approach with deep neural networks. In: 2018 IEEE workshop on microelectronics and electron devices, WMED 2018, pp 1–5, May 2018. https://doi.org/10.1109/WMED. 2018.8360836 7. Nadai N, Melani AHA, Souza GFM, Nabeta SI (2017) Equipment failure prediction based on neural network analysis incorporating maintainers inspection findings. In: Proceedings— annual reliability and maintainability symposium, Mar 2017. https://doi.org/10.1109/RAM. 2017.7889684 8. Paolanti M et al (2018) Machine learning approach for predictive maintenance in Industry 4.0. In: 2018 14th IEEE/ASME international conference on mechatronic and embedded systems and applications (MESA), July 2018. IEEE. https://doi.org/10.1109/mesa.2018.8449150 9. Kaparthi S, Bumblauskas D (2020) Designing predictive maintenance systems using decision tree-based machine learning techniques. Int J Qual Reliab Manag (4):659–686. https://doi.org/ 10.1108/ijqrm-04-2019-0131 10. Predictive maintenance architecture development for nuclear infrastructure using machine learning. ScienceDirect. ScienceDirect.Com | Science, Health and Medical Journals, Full Text Articles and Books. https://www.sciencedirect.com/science/article/pii/S1738573319306783. Accessed 13 Apr 2022 11. Deng H et al (2013) A time series forest for classification and feature extraction. Inf Sci 142–153. https://doi.org/10.1016/j.ins.2013.02.030 12. Lubba CH et al (2019) Catch22: CAnonical time-series CHaracteristics. Data Min Knowl Discovery 6:1821–1852. https://doi.org/10.1007/s10618-019-00647-x 13. Pump_sensor_data | Kaggle (2022) Kaggle: your machine learning and data science community. https://www.kaggle.com/datasets/nphantawee/pump-sensor-data. Accessed 17 June 2022
Regression Test Case Optimization Using Jaccard Similarity Mapping of Control Flow Graph Mani Padmanabhan
Abstract The complexities of cyber physical systems development are mapping with controlling software in the modern engineering. Software testing are major key role in the modern engineering. The expected outcome from the test case suite is an important source for testing the software coding. Test results ensure the quality with the correctness of the developed software. The modification of the software source code has led to a change in the expected outcome in the regression testing. Validating a large number of test cases multiple times can be a high execution time and development cost. The methodology has been proposed for the selection of test paths during regression testing. The selected paths are covering new expected outcomes in the modified software program, thereby reducing the testing time. Regression test path selection methodology is based on Jaccard similarity mapping of control flow graph (CFG) and artificial bee colony (ABC) algorithm to find out the new test path which is not tested in the older version of the software programming. Our experiments show that the proposed approach identifies the new test path for test case generation during regression testing. The fault detection capability is high compared with structural novel test path selection methodology. Code coverage criteria for the modified software programs show high effectiveness with less test path selection. Keywords Software engineering · Regression testing · Test suite optimization · Artificial bee colony algorithm · Jaccard similarity mapping
1 Introduction Software testing is an activity that compares the actual results matches with desired results. According to the Pareto principle, 80% bugs free software if 20% of software code is tested but the principle has not meant any need to test the remaining 80% of M. Padmanabhan (B) Faculty of Computer Applications (SSL), Research Supervisor (SITE), Vellore Institute of Technology, Vellore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_41
545
546
M. Padmanabhan
code. Code coverage is the major activity in software testing [1]. International Software Testing Qualifications Board (ISTQB) defines “regression testing as the process of the previously tested program following modification to ensure that defects have not been introduced or uncovered in unchanged areas of the software, as a result of the changes made. It is performed when the software or its environment is changed.” A set of preconditions, inputs, actions, expected results, and post conditions are the software test case. The generation of test cases in regression testing for newly added codes is critical based on source lines of code (LOC) in the software programming. The automated software testing hold on the test cases for the expected outcome. Regression testing examines the software code based on the structure for determining the critical errors has consumes 60–80% of the software development time. Automated testing is a time incontrollable process but the test path to be perfect and high code coverage. Several test path selection methodologies have been proposed in our software testing research community. However, these techniques generally focused on all path coverage. A large number of test cases are generated during the automated process based on the existing techniques [2]. This process requires more time and cost. Regression testing is a type of testing, which needs to complete the testing process quickly for the modified programming source code. There have been many search-based testing (SBT) techniques in regression testing such as genetic algorithm (GA), model-driven engineering (MDE), BAT algorithm, crow search algorithm, and hybrid genetic algorithm, but the primary scope for the techniques is code coverage criteria. Test suite minimization is the core concept in regression testing, in the same view, the code coverage criteria are to be achieved for the modified path in the software program. Figure 1 describes the major three processes in regression testing. This process focuses on eliminating redundant test cases from the test suite, selecting the test cases that changed during the software re-engineering, and ordering the test cases based on the end-user requirement [1]. This research paper presents a methodology to select the test path that changed the software program flow during software re-engineering. The main contribution of this research is listed as follows: • Generate control flow graph (CFG) from modified software programming code.
Fig. 1 Process of regression testing
Regression Test Case Optimization Using Jaccard Similarity Mapping …
547
• Proposed the algorithm to convert the CFG to adjacency matrix. • Jaccard similarly mapping segments has been applied to find the changes in the control flow. • Compare the cyclomatic complexity of CFG to find the approximate new test paths. • Artificial bee colony algorithm has been applied to find the new path in the modified software programing for new test path selection. • Demonstrate the quality of the proposed methodology using several software programming experiments using the JAVA programming language. The rest of the paper is dived into four sections. Testing research community frequently used test path selection methodology are elaborated with background study. The proposed test path selection techniques using the ABC algorithm are summarized, experimental proof for the proposed approach, future development with the conclusion of this research paper are provided in the following sections.
2 Background Study Quality is the primary requirement of the software product. Software development life cycle looking to adopt an innovative testing methodology to ensure that software products meet quality standards [1]. Regression testing is the process of identifying the defects that have been introduced in the uncovered areas of software. It is a type of change-related software testing. The high-quality test cases are listed in the regression testing whereby manage the risk such as functional and non-functional bugs in the software automation.
2.1 Control Flow Graph Control flow graphs are visualized as node-edges linking diagrams using a hierarchical structure [3]. In this structure, general purpose is to describe the flow of the process, analysis the process, and identify the non-matching user requirements during the development of the software. Software engineering developers have to hang on to the visualized CFG for debugging purposes. During the Alpha testing, CFG are be responsible for composing path flow testing. Control flow graph for programming structures such as functions and loops. The loop contains the root node which is the first node executed in the programming flow. The self-loop goes from the child node to the parent node. In this CFG diagram, the flow of the back edge from the child node with path flow. This path flow identification methodology has been applied in the proposed research methodology.
548
M. Padmanabhan
2.2 Cyclomatic Complexity of CFG Cyclomatic complexity is one of the metrics developed for software testing. This metric monitoring the software development life cycle process during the design phases [1]. The cyclomatic complexity of CFG indicates the possible number of paths inside a software code. Software programming is the combination of functions put to gather to perform a task, each of these functions has several independent nodes or edges in CFG to produce the test path based on various parameters. In regression testing number of test path increase, and the test suite complexity is high, which leads to poor quality and high time complexity. The intersection of the older version of the programming and the updated version programming that is needed to be identified during the regression testing. List out the selected path to be testing during the regression testing. The detailed tracing methodology has been elaborated in the proposed methodology.
2.3 Finding Similarity The similarity finding is the fundamental data mining problem. The modified programming version of the source code during regression testing found textually similar items known as “min hashing” [4]. The problem in the regression testing arises when we search for similar items of any kind is that there may be larger. The data mining technique Jaccard similarity mapping is well suitable for finding programming similar items. To find the duplicate values in the dialog flows, the min hashing value to set S and T is |S ∪ T|/|S ∩ T| which is the ratio of the size of the intersection of S and T to the size of their union. The Jaccard similarity of S and T is to be represented by sim(S, T ). The textual similarity has important during the testing. This proposed technique reduced the cost and time during the regression testing.
2.4 Related Research In software testing, a metaheuristic is a higher-level search algorithm that provides a suitable solution in the test case generation process [3]. Artificial bee colony algorithm is using the numerical problems for searching the data. The sweet honey gathering bee from flowers is the basic concepts of this algorithm. In this algorithm, it is responsible for searching global data with local data matching functionality. In this research, test case selection during regression testing needs to be compare the test data in the larger number of test suite. Test path optimization is the one of the major focus in this research, and ABC algorithm is well suitable in this proposed approach as per the literature study.
Regression Test Case Optimization Using Jaccard Similarity Mapping …
549
In literature, various optimization techniques are adopted in the fault identification field of software testing. Genetic algorithm-based test suite selections are most commonly used technique. In the white box testing, approaches such as code based testing, data-flow testing, and control flow testing are adopted the various algorithm [5]. Control flow testing gives the access of programming structure details with path transaction. In the test path, selection emphasizes the programming structure. Berndt et al. [5] used genetic algorithm for breeding the test cases in the software development process. Ant colony optimization-based technique has developed by Campus et al. (2018) for unit test suite generation. Haider et al. [6] used the fuzzy logic system and bee algorithm for test suite selection. Mala et al. (2010) proposed the automated software testing technique using parallel behavior of bees using ABC algorithm. Dahiya et al. [7] generated software test suite by using ABC algorithm. Akay et al. [3] used the ABC algorithm and one bee for modified method based on the requirement of tester during the white box testing. In our own research in the past years, Padmanabhan ([8, 9], 2019) show the test case generation based on the graphical representation of specification diagrams of software code providing the efficient code coverage and high number of test cases. In the regression testing, it needs less number of test cases with high coverage criteria. The optimization technique is the best solution for less test cases [10]. In order to provide the optimized test path during the regression testing, Jaccard similarity tracing has been implemented in the proposed methodology.
3 Proposed Methodology In this paper projected the way to identify the test path for test case generation in regression testing using ABC algorithm and Jaccard similarity tracing. This methodology has validated with code coverage, and cyclomatic complexity to check the criticality of a path that should not be skipped. To ensure the quality of the methodology, the JAVA programming code has been added for testing in two phases. Figure 2 showing the overall flow of the proposed test path selection method.
3.1 Transformation of Control Flow Graph The first process of the approach is convert the control flow graph (CFG) from software programming code. CFG provides a moderate extraction of the path [8–14], branch distance, and flow of programming with combination of nodes and edges [15]. The sample JAVA programming and CFG for version 1.0 with limited applications are given in Fig. 3.
550 Fig. 2 Flow of test path selection approach
Fig. 3 CFG for traffic light
M. Padmanabhan
Regression Test Case Optimization Using Jaccard Similarity Mapping …
551
Algorithm 1: Sample JAVA pseudocode for traffic light. 1. 2. 3. 4. 5. 6. 7. 8. 9. }}}
Public void itemStateChanged(ItemEvent ie) { JRadioButton selected = (JRadioButton) ie.getSource(); String textOnButton = selected.getText(); if(textOnButton.equals(“Red”)) { message.setForeground(Color.RED); message.setText(“STOP”); else { message.setForeground(Color.GREEN); message.setText(“GO”);
Approximated number independent path to be identify based on the cyclomatic complexity V (G). E describes the number of edges in the CFG, and N denotes number of nodes in the CFG. In the Formula 2 provides the number of test paths to be possible in the version 2.0 software code. V (G) = E − N + 2
(1)
NC = CCi ∩ CCj
(2)
Here, CCi denotes the cyclomatic complexity of the software programming code version 1.0, and CCj describes the cyclomatic complexity of the modified software programming version 2.0. NC is a metric of approximated statistics value of independent new test paths during regression testing. To find the accurate number of test path in regression testing, CFG matrix is to be formulated. CCi for the sample JAVA programming given in Fig. 3 is CCi = 2. The below algorithm 1 describes the process of adjacency matrix exchange from CFG. Algorithm 2: Adjacency matrix from CFG. Step: 1 Start Step: 2 Initialize a matrix with 0 s Step: 3 Iterate over the vertices in the flow graph Step: 4 For every jth vertex in the adjacency list, traverse its edges Step: 5 For each vertex i with the jth vertex has an edge, set mat[i][ j] = 1 Step: 6 End.
3.2 Jaccard Similarity Tracing The fundamental problem in regression testing is to examine test path for similar. Finding the similarly using mirror technique that has almost the same content but differ in range of operations that can be run on a computer or other electronic system.
552
M. Padmanabhan
NC = Op--On
(3)
where Op Older version of adjacently matrix. On New version of adjacently matrix. NC Number of new test path.
Tp = N C/N (n = 1) P ∗ n((O p − On))
(4)
An important usage of Jaccard similarity tracing that addresses well to finding textually similar data in large programming [16–19]. The adjacency matrix of sets by Adj(x, y). Here, X → Op, Y → On. Adj(x, y) < 0 is not possible because the size of matrix cannot be zero. Adj(x, y) = 0, if x = y, then x ∪ x = x ∩ x = x and y ∪ y = y ∩ y = y. Adj(x, y) > 0, if x = y, then x ∩ y is strictly less than the size of x ∪ y so Adj(x, y) is positive this will process for the test path selection using artificial bee colony algorithm.
3.3 Test Path Selection Artificial bee colony (ABC) algorithm is a single objective finder, and there are three types of bee consider in ABC algorithm [3]. The random searches in the adjacency matrix based on the programming flow are carried out by scout bee. The new path is identify and save in the flow for the selection which are carried out by onlooker bee. The employed bee occurred the path which it had visited in the older version of the programming. Figure 4 shows the ABC algorithm-based activity for test path selection. The following activity has taken in the CFG and validated with NC based on the Jaccard similarity and cyclomatic complexity of the programming. • • • • • •
Initialization of programming flow. Scout bee is search the node in the CFG for new space. Onlooker bee is save the flow in the CFG on their respective path. Compare possible node with employed bee. Save the position trace (path) for test cases selection. Repeat until all the nodes are visited.
The test path in regression testing for the software programming needs only a minimum number for reducing the runtime [20–23]. The proposed regression test path selection methodology has its advantage over other methodology. The experimental evaluation for JAVA programming has provided the accuracy of the proposed methodology.
Regression Test Case Optimization Using Jaccard Similarity Mapping …
553
Fig. 4 ABC activity for test path selection
Onlooker bee
Scout bee
Employed bee
Outcome : Test Path 4 Experiential Evaluation The proposed method has been validated on eleven standard JAVA programming applications. In the first JAVA Applet program for identifying the traffic signal and changing the signal based on the input, value is considered for test path selection. Figure 3 shows version 1.0 of the code. This code has been modified based on the need of the user, and a few looping statements are added for a yellow color signal in the JAVA Applet programming. Figure 5 shows the modified CFG for version 2.0. The CFG-based adjacency matrix of sets by Adj(Ex1_v2.0) has been formulated based on the proposed methodology. 11 11 Adj(Ex1_v2.0) = 1 1 11 10
1 0 0 0 0
0 1 1 1 0
0 1 0 0 0
00 00 11 10 00
0 0 0 1 0
1 1 1 1 1
The matrix is generated from CFG and contains all the test paths. All these paths are equalized in length, and the test path for this matrix is five. The cyclomatic complexity of version 1.0 and version 2.0 [V (Ex1_v1.0) = CCi, V (Ex1_v2.0) = CCj] has been validated. CCj = 5. Here, NC = CCi − CCj so, based on the proposed methodology number of new test paths (NC) = 3 for the sample JAVA Applet program example 1. Jaccard similarity tracing-based validation is taken for example 1, that is, Adj(x, y). Here, X → Op = 2, Y → On = 5. Here, number of new test paths NC = NC = Op – On, so the Jaccard similarity tracing of developed Adj(x, y) shows the number of new test paths which is three in the first example of JAVA programming.
554
M. Padmanabhan
Fig. 5 CFG for regression testing (Ex1_V2.0)
Table 1 shows the test paths based on modified programming in regression testing. Apply the ABC algorithm to find the targeted test path. Scout bees search the place for new space. Onlooker bee is saved the flow, here path 1 and 3 are older versions (V1.0). Tested paths are fewer changes in their respective path. Compare possible node with employed bee based on the trace paths 2, 4, and 5 is the path modified during the modification. Table 2 shows the new test path based on the proposed approach.
5 Results and Discussion A comparative evaluation has been carried out with the GA-based test path selection technique during the software testing case generation process for eleven standard JAVA programming applications. Few commonly used methodologies by our software testing researchers such as genetic algorithm and specification diagram-based approaches are matched with the proposed approach. Figure 6 shows the effectiveness of the proposed approach.
Regression Test Case Optimization Using Jaccard Similarity Mapping … Table 1 Test path based on control flow graph
555
Trace flow
Output flow
Path 1
1–2–3–9
Output: RED
Path 2
1–2–4–5–9
Output: YELLOW
Path 3
1–2–4–6–7–9
Output: GREEN
Path 4
1–2–4–6–8–9
Invalid input()
Path 5
1–8–9
Waiting mode()
1 → Start: Get text() 2 → if 3 → Output: RED 4 → if else 5 → Output: YELLOW 6 → else 7 → Output: GREEN 8 → New app() 9 → End Table 2 Targeted test path
Trace flow
Output flow
Path 1
1–2–4–5–9
Output: YELLOW
Path 2
1–2–4–6–8–9
Invalid input()
Path 3
1–8–9
Waiting mode()
Comparison of Methodology LOC
60 40 20 0 0
2
4
6
8
10
12
Axis Title GA based Test Path
Proposed Approach
Fig. 6 GA-based approach with proposed approach
The proposed artificial bee colony algorithm-based test path selection approach has been applied to eleven standard JAVA programming in two phases. Version 1.0 has fewer steps in the user requirements. Version 2.0 is the process of regression testing with the fulfillment of all the user requirements. The proposed Jaccard similarity tracing identifies the targeted test path in the V2.0 programming. The number of test cases for the experiential programming in V1.0 and V2.0 is provided in Fig. 7. The total number of test paths with a comparison of cyclomatic complexity has been elaborated in Fig. 8. The background study shows the need for metrics in software engineering. The cyclomatic complexity metrics are to be compared with the number of new paths based on the proposed methodology. The results show that
556
M. Padmanabhan
Number of Test cases in Versioin 1.0 and 2.0
Axis Title
Class constructor Factorial of given number Method overriding in JAVA. Mul-thread Passing parameters 0
10
20
30
40
50
60
70
80
LOC Test cases in V2
Test cases in V1
Fig. 7 Number of test cases in regression testing
ABC with Jaccard similarity tracing had archived all path coverage with simple and easy implementation. Based on the graphical analysis of outcomes, it has been observed for palindrome verification programming 67%, passing parameters programming 53%, simple calculator programming 65%, simple GUI using JFRAME programming 34%, method overloading programming 77% of effort has saved after the targeted path selection. The total number of test paths with a reduced percentage shows in Fig. 9. These results reflect the efficiency of the proposed methodology. This approach supports the optimal test case generation during the regression testing with time constraints. COMPARATIVE RESULTS Cyclomac complexity
No. of Adjacency Matrix
Total Number of Test path
250 200
LOC
150 100 50 0
AXIS TITLE
Fig. 8 Test path comparison software metrics
No. of New Path
Regression Test Case Optimization Using Jaccard Similarity Mapping …
557
Test Cases Reduced Percentage Class constructor Factorial of given number Method overriding in JAVA. Mul-thread Passing parameters 0%
10%
Cyclomac complexity
20%
30%
40%
Total Number of Test path
50%
60%
70%
80%
90% 100%
Test Cases Reduced percentage
Fig. 9 Test cases reduced metrics
6 Conclusions This paper presented a methodology in regression testing for test path selection. The adjacency matrix of the control flow graph helps in building the optimal test path. Artificial bee colony algorithm with Jaccard similarity tracing exploits limited test path. The experiments performed in the JAVA software code show the optimized test path for generating the test cases. By using this approach, a tester can generate a set of test paths from CFG for avoiding duplications. For the validation purpose, the methodology combines the cyclomatic complexity of programming version 1.0 and version 2.0. The test path in experimental results reduced the cost and time during the regression software testing. The experimental results achieved almost 100% of path coverage. Future research work is to cover additional complex programming and Web applications in evolutionary system software. The target of the upcoming research is to cover optimal critical paths with automated test path selection.
References 1. Subramanian GH, Pendharkar PC, Pai DR (2017) An examination of determinants of software testing and project management effort. J Comput Inf Syst 57:123–129 2. Zhang P, Yu J, Ji S (2020) ADF-GA: data flow criterion based test case generation for Ethereum smart contracts. In: Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops, Seoul Republic of Korea, June 2020, pp 754–761 3. Akay B, Karaboga D (2012) A modified artificial bee colony algorithm for real-parameter optimization. Inf Sci 192:120–142 4. Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets, 2nd edn. Cambridge University Press, USA 5. Berndt D, Fisher J, Johnson L, Pinglikar J, Watkins A (2003) Breeding software test cases with genetic algorithms. In: Proceedings of the 36th annual Hawaii international conference on system sciences, Big Island, HI, USA 6. Haider AA, Rafiq S, Nadeem A (2012) Test suite optimization using fuzzy logic. In: 2012 International conference on emerging technologies, Islamabad, Pakistan, pp 1–6
558
M. Padmanabhan
7. Dahiya SS, Chhabra JK, Kumar S (2010) Application of artificial bee colony algorithm to software testing. In: 2010 21st Australian software engineering conference, Auckland, New Zealand, pp 149–154 8. Mani Padmanabhan (2018) Test path identification for internet of things using transaction based specification. In: Proceedings of the international conference on current trends towards converging technologies, Coimbatore, India, pp 1–6 9. Mani Padmanabhan (2022) Test case generation for Arduino programming instructions using functional block diagrams. Trends Sci 19(8):3472 10. Agrawal AP, Kaur A (2018) A Comprehensive comparison of ant colony and hybrid particle swarm optimization algorithms through test case selection. Data Eng Intell Comput 542:397– 405 11. Wang H, Xing J, Yang Q, Song W, Zhang X (2016) Generating effective test cases based on satisfiability modulo theory solvers for service-oriented workflow applications: effective test cases for service-oriented workflow applications. Softw Test Verif Reliab 26:149–169 12. Mani Padmanabhan, Prasanna M (2017) Validation of automated test cases with specification path. J Stat Manag Syst 20:535–542 13. Memon A, Gao Z, Nguyen B, Dhanda S, Nickell E, Siemborski R, Micco J (2017) Taming Google-scale continuous testing. In: Proceedings of the IEEE/ACM 39th international conference on software engineering: software engineering in practice track, Buenos Aires, Argentina, pp 233–242 14. Mani Padmanabhan (2022) Rapid medical guideline systems for COVID-19 using databasecentric modeling and validation of cyber-physical systems. In: Cyber-physical systems. Elsevier, pp 161–170 15. Bondy JA, Murty USR (2008) Graph theory, 1st edn. Springer Publishing Company, Incorporated 16. Mani Padmanabhan (2020) Test path identification for virtual assistants based on a chatbot flow specifications. In: Das KN, Bansal JC, Deep K, Nagar AK, Pathipooranam P, Naidu RC (eds) Soft computing for problem solving. Springer, Singapore, pp 913–925 17. Azizi M, Do H (2018) Graphite: a greedy graph-based technique for regression test case prioritization. In: Proceedings of the IEEE international symposium on software reliability engineering workshops, Memphis, Tennessee, pp 245–251 18. Mani Padmanabhan, Prasanna M (2017) Test case generation for embedded system software using UML interaction diagram. J Eng Sci Technol 12:860–874 19. Mani Padmanabhan, Prasanna M (2017) Test case generation for real-time system software using specification diagram. Int J Intell Eng Syst 10:166–175 20. Wang H, Xing J, Yang Q, Wang P, Zhang X, Han D (2017) Optimal control based regression test selection for service-oriented workflow applications. J Syst Softw 124:274–288 21. Mukherjee R, Patnaik KS (2021) A survey on different approaches for software test case prioritization. J King Saud Univ Comput Inf Sci 33:1041–1054 22. Agrawal AP, Kaur A (2018) A comprehensive comparison of ant colony and hybrid particle swarm optimization algorithms through test case selection. In: Satapathy SC, Bhateja V, Raju KS, Janakiramaiah B (eds) Data engineering and intelligent computing. Springer, Singapore, pp 397–405 23. Eghbali S, Tahvildari L (2016) Test case prioritization using lexicographical ordering. IIEEE Trans Softw Eng 42:1178–1195 24. Chen J, Zhu L, Chen TY, Towey D, Kuo FC, Huang R, Guo Y (2018) Test case prioritization for object-oriented software: an adaptive random sequence approach based on clustering. J Syst Softw 135:107–125
Investigating the Role of Semantic Analysis in Automated Answer Scoring Deepender
and Tarandeep Singh Walia
Abstract The technology underlying automatic scoring is still in its infancy. The study’s primary goal is to provide a new method that can be utilized for computerized response grading. Research is considering semantic analysis to fulfill this objective. Research work has considered the deep learning approach and considered the long short-term memory technique. We did this by joining the semitone-filtered spectrogram’s first-order difference to the original input. With this improvement, we saw a dramatic boost in transcribing efficiency. Recurrent neural networks (RNNs) have the advantage over feed-forward neural networks (FNN) in music response scoring because they can learn the temporal dependence of sequential input. The long shortterm memory unit’s memory block is updated only while an input or forgets gate is open, and gradients may move across memory cells without being multiplied at each time step. Together, the networks’ backward and forward layers provide access to the past and future of the specified period. The suggested method outperforms the state-of-the-art, according to the studies conducted. When compared to an unfiltered dataset, accuracy in the filtered dataset ranges from 95 to 97%, precision ranges from 0.95 to 0.90 for filtered datasets, compared to 0.84–0.90 for unprocessed datasets. According to simulation findings, the proposed long short-term memory approach is performing better than the traditional system in terms of f 1 score, precision, and recall. Keywords Natural language processing · Long short-term memory · Automated scoring system · Ambiguity · Recurrent neural network
Deepender (B) · T. S. Walia School of Computer Applications, Lovely Professional University, Phagwara, Punjab, India e-mail: [email protected] T. S. Walia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_42
559
560
Deepender and T. S. Walia
1 Introduction The term “automated scoring” is used to describe the practice of using computers to assess items that would traditionally be reviewed by humans [1]. AI scoring is the scientific field that developed some of the techniques and algorithms powering modern automated scoring systems. Scoring essays submitted in a school environment with the use of high-tech computers are known as “automated essay scoring” (AES) [2, 3]. It serves as a kind of educational evaluation and also works as a natural language processing tool. The goals of this study are to provide new method of employing semantic analysis to automate and simplify scoring for Hindi literature [4, 5]. Various machine learning approaches are recognized as viable options for accomplishing this goal. To enhance precision and efficiency, it has been shown that filtering the dataset using an optimizer is necessary [6]. The authors of the study suggested here have combined sentiment analysis with a machine learning and optimization technique [7, 8]. The optimal information for a machine learning model to learn from is obtained via optimization. As a result, the suggested work has supplied a clever and high-performance means of locating lengthy text in Hindi [9–11]. Findings from this study might be useful in the text analysis of school and university data published in Hindi. Long Hindi texts are the focus of the research team’s work [12]. To that end, researchers are exploring several machine learning strategies. Filtering the dataset is necessary to improve accuracy and performance when using an optimizer [13, 14]. The work schedule incorporates text analysis and machine learning. If you optimize your machine learning setup, you may use just the highest quality training data [15–17]. As a result, the proposed study has created a very efficient and astute strategy for locating extensive Hindi literature [18]. This kind of study has the potential to significantly alter information-processing procedures in classrooms and libraries that utilize Hindi language resources [19, 20].
1.1 Role of Text Mining and Analysis in Automated Essay Scoring Text mining, also known as text analytics, is a branch of artificial intelligence (AI) that makes use of natural language processing (NLP) [21, 22] to convert the free-form (unstructured) [23] text found in documents and databases into more conventional, structured data for analysis or to power machine learning (ML) [24] algorithms. Figure 1 illustrates the text mining method. Transcriptions from call centers, Internet reviews, questionnaires, and focus groups are just a few examples. A treasure trove of untapped information lies in these unprocessed text files [25]. Utilizing text mining and analytics, these hitherto unutilized data sources may now be put to use. Preprocessing texts to extract information that can be read by a computer is known as text analysis. Text analysis is used to create structured data from unstructured textual material. It is a lot like trying
Investigating the Role of Semantic Analysis in Automated Answer Scoring Fig. 1 Process of text mining
561
Process of text mining • • • • •
Text preprocessing Application Evaluate Data mining Feature Selection
• Text Information to make sense of reams of disparate documents by chopping them up into more digestible pieces. The term “automated scoring” is used to describe the practice of using computers to evaluate tasks formerly performed by humans [26, 27]. The term “AI scoring” is sometimes used interchangeably with “artificial intelligence,” the study that provides some of the techniques and resources used in automated scoring systems. Automated essay scoring refers to the process through which academic essays are graded using a computerized grading system (AES) [28]. An educational assessment tool that also does NLP tasks. There are several benefits of using automated scoring for educational assessment. As a result, students are given more time to write and make revisions without waiting for grades and comments from teachers. Students’ writing abilities may be better developed with a shorter practice-feedback cycle like this.
1.2 Role of NLP in Automated Essay Scoring The study of how to program computers to efficiently analyze and assess vast amounts of NLP data is at the heart of natural language processing, which draws on linguistics, computer science, and artificial intelligence [29, 30]. The goal of natural language processing is to build computers that can understand and respond to written or spoken language much as a human would. Teachers may use the AES to assign marks to essays based on results of computer analysis. System evaluates academic articles using natural language processing (NLP), a subfield of AI that mimics human language processing capabilities [31].
562
Deepender and T. S. Walia
1.3 Manual and Automated Scoring Manual scoring is required for any subjective questions, file upload questions, or diagram questions. Manual scoring is required for subjective questions due to the free-form nature of the responses. Because there is no one right answer to a diagram question, it must be graded by hand. In the case of file upload questions, the question creator may see the candidate’s answer file through the URL supplied and study it in detail. The person who created the questions or is administering the test must read the responses, evaluate the accompanying illustrations and explanations, and then score the responses [32, 33]. For purposes of this discussion, “automated scoring” will refer to the practice of using computers to do tasks that would otherwise be performed by humans. The term “automated” often refers to the employment of computers, but the term “scoring” is more open to interpretation. The process of awarding points is to respond using a human grader. In recent years, automated scoring systems have emerged as a central concern of NLP study. Automatic scoring is accomplished by deducing both grammatical and semantic links between the student’s response and the reference answer. When compared to human graders, where an answer may get a high score from one grader and a poor score from another, this technology is proving to be reliable in its rating.
2 Literature Review There have been several studies suggested for data categorization, text analysis, and automated essay scoring. Some have utilized supervised machine learning, while others have employed unsupervised machine learning. Furthermore, some academics have suggested a hybrid method to accomplish the goal of text analysis of data using a classifier. The following table is presenting existing research along with methodology and objective. Kouloumpis et al. [4] were reviewing the mechanism for Twitter text analysis. Paltoglou et al. [5] were performing unsupervised text analysis in the case of social groups such as Twitter. Saif et al. [6] to alleviate data sparsity during the analysis of the Twitter text. Work focused on data sparsity in the case of the Twitter text. Ghiassi et al. [7] were applying a hybrid neural network for Twitter text analysis. Ortega et al. [8] made use of unsupervised learning during the analysis of Twitter text. Anjaria et al. [9] considered influencing factor-dependent opinion mining in the case of Twitter content opinion mining which has been focused on the case of Twitter data. Musto et al. [10] made a comparison of the lexicon-dependent mechanism during text analysis. Using a combined set of techniques, Khan et al. [11] mined Twitter for user feedback. In the case of Twitter data, Martnez-Cámara et al. [12] were thinking about using a classifier for text analysis. Text analysis of tweets was the subject of a survey by Kharde et al. [13]. Azzouza et al. [14] performed real-time analysis for Twitter text. Tripathy et al. [15] classified text of reviews considering supervised
Investigating the Role of Semantic Analysis in Automated Answer Scoring
563
learning. A supervised learning mechanism has been considered to classify the text. Zainuddin et al. [16] proposed hybrid text classification considering the Twitter aspect to perform text analysis. In order to analyze Twitter texts effectively, Fouad et al. [17] used feature selection and classifier ensemble. Asghar et al. [18] considered a hybrid classification mechanism for Twitter text analysis. Alsaeedi et al. [19] studied on text analysis mechanism in the case of Twitter data. Kumar et al. [21] explained automated essay scoring. Sentiment analysis using deep learning architectures was examined by Yadav et al. [22]. Machine learning-based consumer sentiment analysis for suggesting buyers and retailers based on reviews was examined by Yi et al. [23]. In [24], Yuan et al. reflected on recent developments in sentiment analysis using deep learning. Load balance scheduling for intelligent sensors deployment in the industrial Internet of things was the topic of work by Sah et al. [25]. Polynomial computing using unipolar stochastic logic and the correlation approach was introduced by Chu et al. [26]. Chandrasekaran et al. [27] expressed multimodal analysis for social media applications. Ramesh et al. [28] reviewed an automated essay scoring system. Review of automated essay scoring (AES) research and development was taken into account by Lim et al. [29]. Klebanov et al. [30] focused on automated essay scoring.
3 Proposed Work 3.1 Problem Definition There have been several types of research which have been proposed for data classification and text analysis of data. Some have made use of supervised while some used unsupervised machine learning approaches. It has been observed that some research considered multi-model sentiment analysis while some considered hybrid classification. Moreover, some researchers considered polynomial computation and load balance schedules. Some of the research considered a deep learning approach for automated essay scoring operations. Research that is providing solutions is taking a lot of time. Moreover, it has been observed that there is a need to improve accuracy along with the performance of machine learning. There is also a need to increase scalability by integrating optimization mechanisms. The proposed system is supposed to provide a more reliable, flexible, scalable solution for long text Hindi classification by providing a hybrid machine learning approach where optimization mechanism and NLP-based automated essay scoring operations would work together.
3.2 Proposed Research Methodology The primary goal of research is to suggest new method that might be utilized for the computerized grading of student responses. Semantic analysis is being considered in
564
Deepender and T. S. Walia
research toward this end. The LSTM technique has been studied using a deep learning methodology. Proposed work is considering experimental research methodology. Existing research related to deep learning and classification has been considered. Many types of research that supported supervised and unsupervised machine learning for data analysis have been considered. Several models have supported multi-model, hybrid model, and polynomial function which were focused. Then, the problem and issue related to existing research during automated scoring have been considered. These problems were scalability issues, performance, and accuracy. The proposed model is considering a novel approach to resolving issues. The proposed approach is making use of optimization, NLP, and classifiers together with deep learning for automated essay scoring in case of issues of performance and accuracy. After building the proposed model, the comparison of performance and accuracy parameters was performed.
3.3 Process Flow of Work To perform classification using an LSTM-based machine learning model, the initial answer is extracted from dataset. Parameters for LSTM model are initialized; then, data is split for training and testing. Classification operation is performed during testing, and the confusion matrix is calculated to get accuracy. Then, a comparison of the proposed LSTM approach is made to the conventional model accuracy (Fig. 2).
4 Result and Discussion The answer has been collected for classification and trained by the proposed LSTM model. The following confusion matrix extracted in the case of the conventional mechanism is shown in Table 1, while the accuracy parameter of the confusion matrix extracted in the case of conventional work is shown in Table 2. The following confusion matrix extracted in the case of the proposed mechanism is shown in Table 3, while the accuracy parameter of the confusion matrix extracted in the case of the proposed work is shown in Table 4.
Investigating the Role of Semantic Analysis in Automated Answer Scoring
565
Data set
Classify answers for automated answering
Data preprocessing
Initialization of LSTM parameters
Data splitting
70% training
30% testing
Classification
Get confusion matrix
Accuracy parameters
Comparative analysis
Fig. 2 Process flow of proposed work Table 1 Confusion matrix for conventional mechanism Case A
Case A
Case B
Case C
Case D
336
21
24
19
Case B
12
359
18
11
Case C
13
11
361
15
Case D
10
25
24
341
TP: 1397 and overall accuracy: 87.31%
566
Deepender and T. S. Walia
Table 2 Accuracy parameter for conventional mechanism Class
n (truth)
n (classified)
Accuracy (%)
Precision
Recall
F 1 score
1
371
400
93.81
0.84
0.91
0.87
2
416
400
93.88
0.90
0.86
0.88
3
427
400
93.44
0.90
0.85
0.87
4
386
400
93.5
0.85
0.88
0.87
Table 3 Confusion matrix for the proposed mechanism Case A
Case B
Case C
Case D
Case A
368
11
12
9
Case B
6
379
9
6
Case C
7
6
378
9
Case D
7
19
15
359
TP: 1484 and overall accuracy: 92.75%
Table 4 Accuracy parameter for the proposed mechanism Class
n (truth)
n (classified)
Accuracy (%)
Precision
Recall
F 1 score
1
388
400
96.75
0.92
0.95
0.93
2
415
400
96.44
0.95
0.91
0.93
3
414
400
96.38
0.94
0.91
0.93
4
383
400
95.94
0.90
0.94
0.92
4.1 Comparison Analysis There is a comparative analysis of all the parameters calculated for conventional and proposed work as follows: Accuracy Table 5 displays the results of the accuracy of conventional work and proposed work for each of the four classes (1, 2, 3, and 4). Table 5 Comparative analysis of accuracy Class
Conventional work (%)
Proposed work (%)
1
93.81
96.75
2
93.88
96.44
3
93.44
96.38
4
93.5
95.94
Investigating the Role of Semantic Analysis in Automated Answer Scoring Fig. 3 Comparative analysis of accuracy
567
Accuracy
98.00% 96.00% 94.00%
Conventional work
92.00%
Proposed Work
90.00% 1
2
3
4
Class
Taking into account the data in Table 5, we can now show the accuracy of the conventional work in comparison to the proposed work in Fig. 3. Precision Table 6 displays the results of the precision of conventional work and proposed work for each of the four classes (1, 2, 3, and 4). Taking into account the data in Table 6, we can now show the precision of the conventional work in comparison with the proposed work in Fig. 4. Recall Value Table 7 displays the results of the recall value of conventional work and proposed work for each of the four classes (1, 2, 3, and 4). Taking into account the data in Table 7, we can now show the recall value of the conventional work in comparison with the proposed work in Fig. 5. F1 -Score Table 8 displays the results of the F 1 -score of conventional work and proposed work for each of the four classes (1, 2, 3, and 4).
Fig. 4 Comparison of precision
Class
Conventional work
Proposed work
1
0.84
0.92
2
0.90
0.95
3
0.90
0.94
4
0.85
0.90
Precision
Table 6 Comparative analysis of precision
1 0.95 0.9 0.85 0.8 0.75
Conventional work Proposed Work 1
2
3
Class
4
568
Fig. 5 Comparative analysis of recall value
Class
Conventional work
Proposed work
1
0.91
0.95
2
0.86
0.91
3
0.85
0.91
4
0.88
0.94
1 Recall Value
Table 7 Comparison of recall value
Deepender and T. S. Walia
0.95 0.9
Conventional work
0.85
Proposed Work
0.8 1
2
3
4
Class
Table 8 Comparative analysis of F 1 -score
Class
Conventional work
Proposed work
1
0.87
0.93
2
0.88
0.93
3
0.87
0.93
4
0.87
0.92
Fig. 6 Comparison of F 1 -score
F1-Score
Taking into account the data in Table 8, we can now show the F 1 -score of the conventional work in comparison with the proposed work in Fig. 6. A comparison of features and outcomes of the proposed approach and conventional approach has been made in Table 9 considering performance, accuracy parameters, and optimization mechanism and classification approach.
0.94 0.92 0.9 0.88 0.86 0.84
Conventional work Proposed Work 1
2
3
Class
4
Investigating the Role of Semantic Analysis in Automated Answer Scoring
569
Table 9 Comparison of conventional and proposed work Features
Conventional [24]
Proposed
Performance
Comparatively low
Comparatively high
Hybrid classification
Not applicable
Applicable
Multi-model
Yes
Yes
Optimization mechanism
Not applicable
PSO
Model used
ANN
LSTM
Accuracy
Less than 94%
Above 95%
Precision
Less than 90%
Above 90%
Recall value
Less than 90%
Above 90%
F 1 -score
Less than 90%
Above 90%
5 Conclusion A novel approach for automated answer scoring using semantic analysis has been proposed. As the result, accuracy in the case of the filtered dataset is 95–97% where it is 93–94% in case of the unfiltered dataset, precision in the case of the filtered dataset is 0.95–0.90 where it is 0.84–0.90 in case of the unfiltered dataset, recall in case of the filtered dataset is 0.91–0.95 where it is 0.85–0.91 in case of the unfiltered dataset, and F 1 -score in case of the filtered dataset is 0.92–0.93 where it is 0.87–0.88 in case of the unfiltered dataset. Simulation results conclude that accuracy is above 95%. Precision value and recall value are better proposed LSTM model as compared to a conventional system.
6 Future Scope Such research would play a significant role in semantic analysis. Factors such as epoch, number of iterations, and hidden layer are playing significant roles in decisionmaking while classification. This type of work could be enhanced using advanced optimization techniques.
References 1. Read J (2005) Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL student research workshop. Association for Computational Linguistics, pp 43–48 2. Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. In: CS224N project report, Stanford, vol 1, no 2009, p 12
570
Deepender and T. S. Walia
3. Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: LREC, 4, 2010, vol 10, no 2010 4. Kouloumpis E, Wilson T, Moore JD (2011) Twitter sentiment analysis: the good the bad and the omg! ICWSM 11(538–541):164 5. Paltoglou G, Thelwall M (2012) Twitter, MySpace, Digg: unsupervised sentiment analysis in social media. ACM Trans Intell Syst Technol (TIST) 3(4):66 6. Saif H, He Y, Alani H (2012) Alleviating data sparsity for Twitter sentiment analysis. In: 2012: CEUR workshop proceedings (CEUR-WS.org) 7. Ghiassi M, Skinner J, Zimbra D (2013) Twitter brand sentiment analysis: a hybrid system using n-gram analysis and dynamic artificial neural network. Expert Syst Appl 40(16):6266–6282 8. Ortega R, Fonseca A, Montoyo A (2013) SSA-UO: unsupervised Twitter sentiment analysis. In: Second joint conference on lexical and computational semantics (* SEM), vol 2, pp 501–507 9. Anjaria M, Guddeti RMR (2014) Influence factor based opinion mining of Twitter data using supervised learning. In: 2014 Sixth international conference on communication systems and networks (COMSNETS), pp 1–8 10. Musto C, Semeraro G, Polignano M (2014) A comparison of lexicon-based approaches for sentiment analysis of microblog posts. Inf Filtering Retrieval 59 11. Khan FH, Bashir S, Qamar U (2014) TOM: Twitter opinion mining framework using hybrid classification scheme. Decis Support Syst 57:245–257 12. Martınez-Cámara E, Gutiérrez-Vázquez Y, Fernández J, Montejo Ráez A, Munoz-Guillena R (2015) Ensemble classifier for Twitter sentiment analysis 13. Kharde V, Sonawane P (2016) Sentiment analysis of Twitter data: a survey of techniques. arXiv preprint arXiv:1601.06971 14. Azzouza N, Akli-Astouati K, Oussalah A, Bachir BA (2017) A real-time Twitter sentiment analysis using an unsupervised method. In: Proceedings of the 7th international conference on web intelligence, mining and semantics. ACM, p 15 15. Tripathy A, Rath SK (2017) Classification of the sentiment of reviews using supervised machine learning techniques. Int J Rough Sets Data Anal (IJRSDA) 4(1):56–74 16. Zainuddin N, Selamat A, Ibrahim R (2017) Hybrid sentiment classification on Twitter aspectbased sentiment analysis. Appl Intell 1–15 17. Fouad MM, Gharib TF, Mashat AS (2018) Efficient Twitter sentiment analysis system with feature selection and classifier ensemble. In: International conference on advanced machine learning technologies and applications. Springer, Berlin, pp 516–527 18. Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F (2018) T-SAF: Twitter sentiment analysis framework using a hybrid classification scheme. Expert Syst 35(1) 19. Alsaeedi A (2019) A study on sentiment analysis techniques of Twitter data. (IJACSA) Int J Adv Comput Sci Appl 10(2) 20. Drees L, Kusche J, Roscher R (2020) Multi-modal deep learning with sentinel-3 observations for the detection of oceanic internal waves. ISPRS Ann Photogram Remote Sens Spatial Inf Sci 813–820, V-2-2020 21. Kumar V, Boulanger D (2020) Explainable automated essay scoring: deep learning really has pedagogical value. Front Educ 22. Yadav A, Vishwakarma DK (2020) Sentiment analysis using deep learning architectures: a review. Artif Intell Rev 53(6):4335–4385. https://doi.org/10.1007/s10462-019-09794-5 23. Yi S, Liu X (2020) Machine learning based customer sentiment analysis for recommending shoppers, shops based on customers’ review. Complex Intell Syst 6(3):621–634. https://doi. org/10.1007/s40747-020-00155-2 24. Yuan JH, Wu Y, Lu X, Zhao YY, Qin B, Liu T (2020) Recent advances in deep learning based sentiment analysis. Sci China Technol Sci 63(10):1947–1970. https://doi.org/10.1007/s11431020-1634-3 25. Sah DK, Nguyen TN, Cengiz K, Dumba B, Kumar V (2021) Load-balance scheduling for intelligent sensors deployment in industrial internet of things. Cluster Comput 1–13 26. Chu SI, Wu CL, Nguyen TN, Liu BH (2021) Polynomial computation using unipolar stochastic logic and correlation technique. IEEE Trans Comput
Investigating the Role of Semantic Analysis in Automated Answer Scoring
571
27. Chandrasekaran G, Nguyen TN, Hemanth DJ (2021) Multimodal sentimental analysis for social media applications: a comprehensive review. In: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, p e1415 28. Ramesh D, Sanampudi SK (2021) An automated essay scoring systems: a systematic literature review. Artif Intell Rev. https://doi.org/10.1007/s10462-021-10068-2 29. Lim CT, Bong CH, Wong WS, Lee NK (2021) A comprehensive review of automated essay scoring (AES) research and development. Pertanika J Sci Technol 29(3):1875–1899 30. Klebanov BB, Madnani N (2021) Automated essay scoring. In: Synthesis lectures on human language technologies, Nov 2021, 314 p. https://doi.org/10.2200/S01121ED1V01Y202108HL T052 31. Ludwig S, Mayer C, Hansen C, Eilers K, Brandt S (2021) Automated essay scoring using transformer models. Psych 3:897–915. https://doi.org/10.3390/psych3040056 32. Lu C, Cutumisu M (2021) Integrating deep learning into an automated feedback generation system for automated essay scoring. In: Conference: educational data mining 2021 at online from Paris, France 33. Filho AH, Concatto F, do Prado HA, Ferneda E (2021) Comparing feature engineering and deep learning methods for automated essay scoring of Brazilian National High School Examination. In: Proceedings of the 23rd international conference on enterprise information systems (ICEIS 2021)
Siamese-Discriminative Person Re-identification Raghav Sharma and Rohit Pandey
Abstract Person re-identification is a well-known technique in deep learning to identify a person over images captured from different non-overlapping camera views. For re-identification, a technique must be able to differentiate and match photographs of individuals independent of image variations. Significant changes in viewpoint, illumination, resolution, occlusion, color, pose, and camera angle contribute to these variations. Recently, many algorithms have been proposed for the person re-identification task, which can handle these variations. However, these algorithms do not perform well if the training and testing data differ. This issue is addressed in this work. This work includes the development of a discriminative network for re-identification tasks. The proposed network is a combination of a classifier-based feature extractor and a siamese network. It produces a score between 1 and − 1, depending on the similarity and dissimilarity. This score is used to find the similarity between the probe and gallery images. It was trained on the Market-1501 dataset and tested on MOT17, Duke-MTMC, and the MSMT17 dataset. The proposed network has shown a 28.62% gain in overall accuracy on Duke-MTMC and an increase of 200% on MSMT17. It attained an 80.7% accuracy on the MOT17 data, despite the fact that it had not been specifically trained on that data. Which shows that this network can work well even though it was trained on a different set of data. Keywords Person re-identification · Siamese · Discriminator
R. Sharma (B) · R. Pandey Hughes Systique Corporation, Gurugram, India e-mail: [email protected] R. Pandey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_43
573
574
R. Sharma and R. Pandey
1 Introduction Identifying a person with multiple non-overlapping cameras is known as person reidentification. Multiple cameras throughout a building or image/video data could be used. Re-identification has various uses, including finding someone, looking for criminals, monitoring potential issues, evaluating personal safety, and many more. There are numerous challenges, such as a variable number of classes, intra-class variation, illumination, very low-resolution images for identification, occlusion, uniform dressing, and different scales. To handle such challenges, many algorithms have been proposed, like Auto-ReID [4], DG-Net [15], st-ReID [7], Parameter-Free Spatial Attention [8], CTL Model [11], RGT&RGPR [1], LDS [13]. Most practical scenario networks are trained on different datasets and are expected to perform on different datasets. However, most of the existing algorithms failed in this case. To overcome this problem, a siamese discriminative model has been proposed which can work efficiently on different datasets without requiring fine-tuning on new datasets.
1.1 Motivation Nowadays, all rely on multiple cameras and machines for security. Multiple cameras produce multiple images of a person from different angles. The object looks very different from the different angles or poses. To track the person, the algorithm should identify the person regardless of angle or pose, as shown in Fig. 1. It has multiple applications, such as tracking an individual or suspect. It also keeps track of the number of people entering and exiting the facility. Using CCTV footage can pinpoint a person’s location in real time. It can be utilized to monitor retail sales. These applications inspired us to pursue this endeavor.
1.2 Challenges It has been discussed that the algorithm must match or distinguish between two images to perform re-identification. To accomplish this, the algorithm must handle the issues discussed in this section, including intra-class, inter-class variation, occlusion, lighting, scale, and uniform dressing. Varying Number of Classes: The number of identities (classes) varies continuously in the real world. For instance, It is impossible to predict how many individuals will attend an event. The algorithm must be capable of managing variable classes. Intra-class Variation: People look very different based on their pose, camera angle, outdoor or indoor conditions, and background, as shown in Fig. 2. Even though all the images in the figure belong to the same class/person, it is hard to identify them as
Siamese-Discriminative Person Re-identification
575
Fig. 1 Figure shows the query and the result in different camera views, images taken by V47 [9] dataset
Fig. 2 Sample images of Market-1501 dataset [14] show the intra-class and scale variation
being in the same class due to their posing and appearance. Re-identification should be able to handle this issue. Low Resolution Images: Generally, person re-identification applies to CCTV cameras that produce low-resolution images. It is challenging to process low-resolution images with high accuracy. Uniform Dressing: Sometimes, objects look the same even if they belong to different identities (classes) due to the same dressing, as shown in Fig. 3. All the images belong to a different identity (class); however, they seem to belong to the same identity (class) because of the similar dress. Variation in Various Datasets: Many datasets have been proposed for the person re-identification task discussed in Sect. 3. These datasets are very different, as shown in Table 1, so if a network trained on one dataset cannot perform well on another dataset. This work attempted to address this challenge as well as the others discussed in this section.
576
R. Sharma and R. Pandey
Fig. 3 Sample images of Market-1501 dataset [14] show uniform dressing
1.3 Paper Organization The challenges addressed in this work are discussed in Sect. 1.2. Numerous algorithms have been proposed to address these challenges discussed in Sect. 2. The overall proposed idea, architecture, training, and testing details have been discussed in Sect. 4. Section 3 describes the datasets used to test and train the proposed algorithm. Last but not least, Sect. 5 contains the results of this study, which demonstrate that the proposed network performs well even though testing and training data come from different sources.
2 Literature Survey This section contains an overview of the deep neural network model proposed for the person re-identification problem. Many proposed models improve re-identification performance either by developing a new approach or modifying the existing algorithm. These approaches can be classified into two categories: classification-based or siamese-based.
2.1 Model Based on Classification The model based on the classification classifies the input image into a particular class. This approach needs a large amount of data, and the model will easily overfit if the training data has fewer instances per class. Wu et al. [12] use a handcrafted feature and classification model to perform re-identification. Su et al. [6] proposed a deep learning model to extract the feature to perform re-id. It includes two subparts. The first step is to extract the pedestrian’s global feature. The second step is to extract the local features of the pedestrian body. Finally, it combines both features and uses a classification model to perform re-id. Mscan [2] used a robust multi-scale context-aware network to extract features. Base model (DG-Net) [15] also belongs to this category. It employs the GAN network to learn the pedestrian’s appearance and structure.
Siamese-Discriminative Person Re-identification
577
2.2 Model Based on Siamese The Siamese network is widely used in re-id problems. It overcomes the problem of a classification-based model needing many instances per class. The Siamese networkbased model produces a score based on the input image pair, which tells us how similar or dissimilar the input image pair is. The proposed model falls into this category.
3 Dataset Many datasets have been proposed for re-identification problem like Duke-MTMC [5], MSMT17 [10], Market-1501 [14], MOT17 [3], etc. The proposed network has trained on Market-1501 and tested on the Duke-MTMC, MSMT17, and MOT17 datasets in this work. All these datasets and data comparisons have been discussed in this section. Market-1501 The “Market-1501” dataset was gathered from six outdoor cameras at the local supermarket location. This dataset has 32k bounding boxes for the 1501 object categories. At least two cameras depicted every person in this dataset, most of them by six cameras and annotated using the deformable part model (DPM). Duke-MTMC The Duke-MTMC dataset was captured by eight still cameras at Duke University. More than two million individual frames were gathered for this dataset. Fifty-four people may be depicted in each frame. The 1080p video with 60fps is recorded to capture it. It has approximately 2834 identities (persons). MSMT17 This dataset has been collected by 15 cameras, both indoor and outdoor. Twelve outdoor cameras and three indoor cameras. It took one month to collect data in 4 different weather conditions, 180 video hours in total. Faster RCNN bounding boxes have been used to annotate. They include 126K boxes with the identity of 4101. MOT17 MOT17 contains multiple objects. It includes pedestrians as well as vehicles, bicycles, and motorbikes. For person re-identification, only the pedestrian class has been utilized. This dataset has been built from various public places using video footage. In several places and at different frame rates, 1389 films were shot collectively. There are 33,705 frames with around 900,000 bounding and 4k tracks. Data were gathered in different settings, including indoor and outdoor static and moving cameras, cloudy weather, and low, medium, and high-point views. Table 1 shows the comparison between different datasets like scenes, detectors, bounding boxes, and identities.
578
R. Sharma and R. Pandey
Table 1 Dataset comparison Dataset BBoxes Market-1501 Duke-MTMC MSMT17 MOT17
32K 36K 126K 900K
Identities
Cameras
Detector
Scene
1.5K 1.8K 4.1K 3.9K
6 8 15 –
DPM Hand Faster RCNN DPM, RCNN
Outdoor Outdoor Indoor, outdoor Indoor, outdoor
Fig. 4 Overall idea of the network
4 Proposed Methodology This work aims to develop a network that can perform re-identification even when training and testing data distribution differ. Therefore, the network should be minimally impacted by data distribution. The impact of data distribution on image comparison is minimal. So, consider using a Siamese network to compare two images for re-identification. In that case, it can improve the cross-data accuracy (trained on another dataset and tested on another dataset). This work includes two modules: the base model and the discriminator. The base model is an image-generated classification-based model used to generate image features. The output of the base model is fed into the discriminator, which produces an output ranging from − 1 to 1 based on the similarity of the input features. The discriminator is used to convert the base model into a siamese network, as shown in Fig. 4. In this work, the network was initially trained as a generator, then rescaled to the siamese to take advantage of the siamese model for re-identification. This has dramatically enhanced cross-data efficiency, as shown in Table 3. The base model and discriminator have been discussed in this section.
Siamese-Discriminative Person Re-identification
579
4.1 Base Model The DG-NET-inspired [15] base model consists of two modules, generator, and re-id learner, for image generation and re-id learning, respectively. The generator module is a GAN-type network that employs an appearance encoder and a structure encoder to generate images. The purpose of the appearance encoder is to record a person’s physical appearance, while the structure encoder is to record the individual’s pose and surroundings. The re-id learner module then utilized the appearance encoder to learn the re-identification. Proposed Changes in Base Model In this work, ResNet50 was replaced with ResNet101 in DG-NET. The DG-NET model is a classification model that performs well for the re-identification task. However, this model will be of little utility if the training and testing data come from different sources. This issue was resolved by adding a discriminator to the DG-NET model to transform it into a siamese model. The discriminator utilized to convert it into a siamese model is detailed in the next section.
4.2 Discriminator The objective is to determine the similarity between two images using a base model and a discriminator. A discriminator is a model that matches or distinguishes between two images based on the features generated by the base model. For instance, two images I1 and I2 are inputs to the base model, which generates D(I1 ) and D(I2 ), respectively. D(I1 ) and D(I2 ) would be fed to the discriminator, whose goal is to predict Y ∈ {−1, +1}. Y is expected to be − 1 when I1 and I2 are of different classes, but Y is expected to be 1 when I1 and I2 are of the same class. To reach this objective, a multi-layer perceptron network has been trained. Y = tanh(W1T × R(W T × C(D(I1 ), D(I1 )) + b) + b1 )
(1)
Equation 1 is the mathematical representation of the discriminator, where b, b1 is bias term, W1T , W T is weight of fully connected layer, R( ) denotes the relu activation function, C( ) denotes the concatenate function, Y denotes the output or score. To make sure Y ∈ {−1, +1}, a proposed network has trained on Market-1501 dataset. Network and Training of the Discriminator The discriminator is a feed-forward network containing two fully connected layers, network architecture shown in Table 2. The images need to compare first pass through the base model. Features obtained from the base model feed to the discriminator. Based on the features, it produces a score between − 1 and 1.
580
R. Sharma and R. Pandey
Table 2 Discriminator network Type of layer Activation Flatten Merge Dropout Dense Dropout Dense
Input
Output
Details
1024 2048 0.3 Relu
2048
Tanh
512
512 0.3 1
To train this network, image feeds in the form of pairs. If both images belong to the same class, the ground truth will be 1. If both images belong to different classes, the ground truth will be − 1. The triplet loss function has been used to train the discriminator.
5 Result and Discussion This work utilized the features produced by the base model to find out the similarity or dissimilarity between the two images. The proposed network produces a score between − 1 and 1 (1 for the same image and − 1 for the different). This score has been used to determine the Rank-1 accuracy of the network. This work focuses on improving the network’s cross-accuracy (trained on the different datasets and tested on the different datasets). This network has been trained on the Market-1501 dataset and tested on different datasets to validate the proposed model. Table 3 shows the comparison result of the proposed Network with DG-NET. It demonstrates that the proposed network improved overall accuracy by 28.62% on Duke-MTMC and by 200% on MSMT17. The proposed network is also used to track the person using the person’s reidentification. The MOT17 [3] dataset was used to evaluate person tracking. The multi-object tracking accuracy (MOTA) of the proposed network was 80.7%. Table 4 shows the result of proposed method on the MOT17 dataset. For testing purposes, only 15% of the total testing dataset has been used. These outcomes indicate that the proposed model performs well with cross-data.
Table 3 Performance of DG-NET and proposed model on the cross dataset Method Dataset (%) Market-1501 Duke-MTMC MSMT17 DG-Net Ours
94.8 93.9
42.62 56.8
Trained on Market-1501 and tested on a different dataset
17.11 34.52
MOT17 – 80.7
Siamese-Discriminative Person Re-identification
581
Table 4 Result of the proposed method on MOT17 dataset No. of frames MOTA No. of matches 599 FP 13
80.7 FN 0
7820 IDSW 1858
No. of objects 9678 MT 63
6 Conclusion The proposed network’s outcome has been discussed in Sect. 5. The network was evaluated based on its Rank-1 accuracy. Due to the proposed network, the network’s Rank-1 accuracy has significantly improved. The network was trained using the Market-1501 dataset and evaluated using the Duke-MTMC, MSMT17, and MOT17 datasets. It achieves an accuracy of 80.7% on the MOT17 dataset despite not being trained on this dataset. The results indicate that the proposed model can perform well on cross-datasets (trained on different datasets and tested on different datasets).
References 1. Gong Y, Zeng Z (2021) An effective data augmentation for person re-identification. CoRR abs/2101.08533. https://arxiv.org/abs/2101.08533 2. Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. CoRR abs/1710.06555. http://arxiv.org/abs/1710. 06555 3. Milan A, Leal-Taixé L, Reid ID, Roth S, Schindler K (2016) MOT16: a benchmark for multiobject tracking. CoRR abs/1603.00831. http://arxiv.org/abs/1603.00831 4. Quan R, Dong X, Wu Y, Zhu L, Yang Y (2019) Auto-reid: searching for a part-aware convnet for person re-identification. CoRR abs/1903.09776. http://arxiv.org/abs/1903.09776 5. Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European conference on computer vision workshop on benchmarking multi-target tracking 6. Su C, Li J, Zhang S, Xing J, Gao W, Tian Q (2017) Pose-driven deep convolutional model for person re-identification. CoRR abs/1709.08325. http://arxiv.org/abs/1709.08325 7. Wang G, Lai J, Huang P, Xie X (2018) Spatial-temporal person re-identification. CoRR abs/1812.03282. http://arxiv.org/abs/1812.03282 8. Wang H, Fan Y, Wang Z, Jiao L, Schiele B (2018) Parameter-free spatial attention network for person re-identification. CoRR abs/1811.12150. http://arxiv.org/abs/1811.12150 9. Wang S, Lewandowski M, Annesley J, Orwell J (2011) Re-identification of pedestrians with variable occlusion and scale. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops). IEEE, pp 1876–1882 10. Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer GAN to bridge domain gap for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 79–88 11. Wieczorek M, Rychalska B, Dabrowski J (2021) On the unreasonable effectiveness of centroids in image retrieval. CoRR abs/2104.13643. https://arxiv.org/abs/2104.13643
582
R. Sharma and R. Pandey
12. Wu S, Chen Y, Li X, Wu A, You J, Zheng W (2016) An enhanced deep feature representation for person re-identification. CoRR abs/1604.07807. http://arxiv.org/abs/1604.07807 13. Zang X, Li G, Gao W, Shu X (2021) Learning to disentangle scenes for person re-identification. CoRR abs/2111.05476. https://arxiv.org/abs/2111.05476 14. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision, pp 1116–1124 15. Zheng Z, Yang X, Yu Z, Zheng L, Yang Y, Kautz J (2019) Joint discriminative and generative learning for person re-identification. CoRR abs/1904.07223. http://arxiv.org/abs/1904.07223
Transformers-Based Automated PHP Code Generator Yatin Tomer, Raghav Sharma, and Rohit Pandey
Abstract Over the past ten years, technological development has accelerated. It has attained human-like performance in various tasks involving natural language processing. Numerous studies are being undertaken in this area, and natural language programming tools have been developed (which take natural language description and generates source code). By utilizing natural language programming, communicating with machines can be possible without grasping the syntax of each programming language individually. Several tools have been developed with features such as code completion, the generation of brief code samples, and code suggestions. This paper presents a method capable of generating source code from a natural language description. In this work, transformer-based language model is employed and trained it with the PHP dataset collected from multiple platforms. Thus, the model can generate PHP code using natural language. PHP is a common server-side scripting language. PHP is used by 77.4% of all websites, according to a W3Tech research. Moreover, the model has been tested on various problems, and the results are rather encouraging. The model is able to achieve 85% accuracy, while tested on 40 sample problems. Keywords Auto code generator · PHP code generator · Transformer · Encoder–decoder-based architecture · Code generation
Y. Tomer (B) · R. Sharma · R. Pandey Hughes Systique Corporation, Gurugram, India e-mail: [email protected] R. Sharma e-mail: [email protected] R. Pandey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_44
583
584
Y. Tomer et al.
1 Introduction Automation is growing in popularity in today’s globe. It is a current hot topic with tonnes of studies being done. With the constant advancement of technology, it’s expected that software and machines will be employed more and more to automate routine chores that humans previously completed. In its simplest form, automation is the development and use of technology to create and deliver things and services with little to no human involvement [12]. Automation technologies, techniques, and procedures increase the efficiency, dependability, and speed of numerous jobs that humans previously carried out. The most sophisticated algorithm that powers selfdriving cars is one of many examples of automation encounters daily, along with automatic telephone switchboards and electronic navigation systems. Automating even a tiny portion of software development is a current study topic. Numerous software programmers can save time by automating even modest operations [7], which results in the conservation of resources across numerous sectors. Code creation will be more crucial as technology advances, and there is a shortage of skilled software developers. The most notable application of language modeling among the various fields is in programming languages. The machine learning community has been conducting research in this field to find [1] new methods to auto-complete, generate, correct, or analyze human-written code. Many researchers studied code generation from natural languages [2]. The system creates appropriate code snippets or completely executable source code from natural language descriptions. Automatic code generation can take many forms, such as creating short code samples, auto-filling lines of code, creating unit test cases from source code, and more. In this article, deep learning model is proposed to automate generating code, which can produce source code from the given natural language description. To execute sequence-to-sequence learning on the dataset in this project, transformerbased model is utilized that can translate natural language descriptions into PHP source code.
1.1 Problem Statement It requires concentrating on two key elements to creating source code from the natural language description. The first is the model, and the second is the data for training. When working with deep learning models, it is crucial to look for high-quality data to put into the model to receive high-quality results. The model also needs many data to provide good results. To be sure, searches done via several platforms and gathered reliable, high-quality data. A suitable model must be chosen for this activity, which becomes highly critical. Because of how similar this issue is to sequence-to-sequence creation, encoder–decoder architecture is employed, which is ideally equipped to handle issues of this type. There are numerous additional models like LSTM and CNN that are based on encoder–decoder architecture. Transformer-based paradigm
Transformers-Based Automated PHP Code Generator
585
has been utilized to address the positional and sequence issues. It is the model that works best for these kinds of issues since it eliminates the issue that other models have with long-term dependencies.
1.2 Motivation In recent years, automation has become a rapidly expanding field. With the advancement of technology over the past few years [8], there has been a growth in the production of goods that enhance daily life’s automation. It is being used in various areas and can typically observe automation in professions like analytics, facility management, customer service, etc. Recently, automation in household products has been popular. With the development of technology like smartphone-based fan control, TV, AC, and door locking, customers can now schedule and manage everyday activities. Automation extends beyond essential appliances to include advanced technologies like self-driving cars, social robots, and many other similar items. Additionally, numerous studies are being conducted in the field of code generation by machines. Applications that can create code are beneficial to programmers in a variety of ways. First, they can finish tasks more quickly, which boosts productivity. Second, it improves programmers’ programming skills. So, this paper presents an automatic code generator that can produce PHP source code from a description in natural language. In web development, PHP is a widely used server-side scripting language. PHP is becoming more and more popular, and the most well-known websites like WordPress and Drupal use it. It is a popular programming language that can be used to create GUI-based apps and server-side scripts. Web developers can benefit greatly from a tool that can generate PHP scripts in various situations, including creating static and dynamic web pages and webbased apps, creating desktop graphical user interfaces, managing website content, and many more. It will reduce the amount of time needed to complete tedious coding chores. The code for fundamental features that may be utilized in websites will be generated with the assistance of the developers. This kind of technology eliminates the need for people to carry out tedious, repetitive operations and can also be used to carry out sophisticated calculations and algorithms.
1.3 Paper Organization The problem statement of this work is described in Sect. 1.1. Section 1.2 describes the motivation behind this work. Related work in the field of code generation is discussed in Sect. 2. Approach used, model architecture which includes encoder, decoder, and self attention layer and loss function, is described in Sect. 3. Section 4 describes the dataset used, types of data used for training, and the data augmentation. Section 5 contains the results which demonstrated the testing results on different datasets. Last section of this paper, Sect. 6, describes the conclusion of this study.
586
Y. Tomer et al.
2 Literature Based on the prior inputs, auto-code generation is finishing the text sequence. The field of code creation has seen a lot of activity. In an IDE, neural synthesis technologies like GitHub Copilot [6], Kite, and TabNine [4] offer code snippets with the express goal of boosting a user’s productivity. TabNine and Kite are also available in the market for generating code. Both these tools use GPT-2 model, an open source model for generating the next sequence of text based on an input text. GPT-2 is a 1.5 bit transformer-based language model [3], trained in a database of 8 million web pages. This large transformer model was built for tasks like summarization, neural machine translation, etc. For preprocessing, the text is tokenized using a byte-level version of Byte Pair Encoding (BPE) and a vocabulary size of 50,257 [8]. It performs well when producing little code snippets and auto-completing lines of code, but it cannot handle sophisticated and vast language formations [11]. A system put forth by Chong and Pucella [10] produces action-based application interfaces with descriptions in plain language. Their primary goal is to convert ordinary English into higher-level logical statements using type-logical grammar. A programming language that can read the structured line-by-line natural language in English, German, and Arabic and generate the matching Java program was proposed by Knoll and Mezini. Pegasus’ core capabilities are reading natural language, producing source code, and expressing natural language. A compiler designed by Somasundaram and Swaminathan [10] parses a program’s natural language description to produce an intermediate representation that will be easily translated into the target language. Lin et al. proposed a system that converts a natural language description into the matching shell command. Recurrent neural networks (RNNs) and semantic parsing are used to translate the text. A voice programming interface that receives spelling word (speech), natural language, or paraphrase text and generates the associated Java program was proposed by Begel and Graham. The GPyT (GPT-based Python Code Model) model hosted by HuggingFace [9] is a GPT model trained from scratch on Python Code. The current GPyT model is only two epochs away from 80 GB of Python code. Additionally, the performance of this model is illustrated with the help of various examples. Alpha-code from DeepMind [5] is a competitive code generation system that tackles programming competition tasks. They employ a vast transformer language model to produce code, which has been pre-trained on Github code and refined on competitive programming tasks. In a programming competition with over 5000 participants on the Codeforces platform, it ranked in the top 54.3 percentile on average and produced positive outcomes.
Transformers-Based Automated PHP Code Generator
587
3 Approach An accurate, dependable training dataset and model are essential prerequisites for autonomous code creation. PHP code generator is intended to produce PHP source code from a sequence of natural language input. The primary process involves: • First and most important, the dataset is the factor on which the performance of the model depends. In this work, the main aim is to include all different categories of problems in the dataset. • Train the transformer-based language model on the PHP dataset. • Test the trained model on several problems to see the model’s performance.
3.1 Model Architecture Encoder–decoder architecture is typically the best option when working with a sequence-to-sequence-based language model. Encoder transfers an input series of symbol representations (a1 , a2 , a3 , . . . , an ) to a sequence of continuous representations, which decoder then uses to construct an output sequence of symbols (b1 , b2 , b3 , . . . , bn ) one at a time. The transformer proposes to encode each position and apply the attention mechanism [13] to relate to distant words of both inputs and outputs with respect to itself. It uses stacked attention self-attention and a positionwise, fully connected layer of both encoder and decoder. In this architecture, the input is passed to the encoder, which is a sequence of tokenized characters, and the output is a sequence of PHP Code. Python’s default tokenizer was used to tokenize the code, while the Spacy tokenizer was utilized to tokenize the natural language. Encoder The encoder is comprised of N = 6 identical levels, with each layer consisting of a multi-head attention sub-layer and a fully connected position-wise feedforward sub-layer (as shown in Fig. 1). The first sub-layer, the multi-head attention layer, linearly projects the query, keys, and values (Q, K , V ) n times with different learned projections and performs the attention function in parallel to provide output. The second layer, which is the feed-forward layer, is applied to each point individually and uniformly. It includes two linear transformations with ReLU activation in between (1); (1) FFN(x) = max(0, x W1 + b1 )W2 + b2 , where W1 and W2 are the weights different layers employ. Combining Input Embedding with Positional Encoding for each word in the input sequence constitutes the starting inputs. As in this model, Recurrent Networks are not employing; positional indices must be assigned to each token to maintain the sequential information. Initial encoder layer tensors are generated by combining source sequence embeddings with indices tensor embeddings. The encoder layers will then
588
Y. Tomer et al.
Fig. 1 Encoder architecture
process these tensors. Src tensor will then undergo a multi-head self-attention procedure, which assists the model in focusing on the essential src tensor characteristics. Attention The attention mechanism of the transformer can be regarded as a mapping query and set of key-value pairs to output, with a query, key, value, and outputs all being vectors. The transformer employs Scalar-Dot Product Attention (2), where outputs are weighted sums of values, and the weights applied to each value are the dot product of the query with each key. Attention(Q, K , V ) = softmax
QK T n 1 /2
V
(2)
where Q, K , V are query, key, values vectors. Scaler-Dot Product is an enhanced matrix multiplication code; it is significantly faster and more efficient in terms of storage space. The inputs are processed by the connected layers and activation function, which then provides the output of the encoder layer and repeats the process for each layer. Multi-attention consists of numerous attention layers layered in parallel, with different linear transformations of the same input, in which the query vector and key vector collaborate to determine the correct attention weights. It improves the model’s ability to learn and, thus, its performance. Multi-head attention (3) permits the model to perform the attention function concurrently on distinct representation sub-spaces of the input sequence. MutliHead(Q, K , V ) = [head1 , . . . , headh ]W0 ,
(3)
where headi is Attention(QWiQ , K WiK , V WiV ), W are all learnable parameter matrics. Decoder The decoder consists of N = 6 layers that are identical. Encoder architecture closely resembles decoder architecture. Decoder consists of a simple feedforward sub-layer, a multi-head attention sub-layer that is shared with the encoder, and an additional multi-head attention sub-layer that performs multi-head attention on the output of the encoder stack (Fig. 2). The input to the decoder is a combination of Output Embedding and Positional Encoding, where the input is masked and offset by one position to ensure that the prediction of the (i)th value depends only on the (i − 1)th position. The outputs of the attention layer are combined with the target input and normalized to form the decoder layer’s final outputs.
Transformers-Based Automated PHP Code Generator
589
Fig. 2 Decoder architecture
3.2 Loss Function The dataset has been augmented to mask variable literals. This means that the model can predict many accurate values for a given variable, so long as the predictions are consistent across the code. This would indicate that the training labels are not very certain; therefore, it would make more sense to consider them correct with a probability of 1 − smooth eps and erroneous otherwise. This is the function of label smoothing. By adding label smoothing to Cross-Entropy, it is ensured that the model does not become overconfident in its ability to predict variables that may be supplemented. The model is trained using the loss functions as seen in (4): −
C Wc log(exp(xn ,c ) yn ,c , C i=1 exp(x n ,i ) c=1
(4)
where x is the input, y is the target, W is the weight, C is the number of classes, and N is the mini-batch dimensions. The validation loss and training loss are utilized to assess when the model has been trained. The model with the lowest validation loss is the final trained model. It is vital to note that label smoothing results in significantly greater loss values than models that do not employ label smoothing. However, this is to be expected, given that it do not intend for the label predictions to be exact. As long as the predictions are consistent across the target code sequence, several suitable options exist for variables.
4 Dataset In this project, PHP dataset is utilized containing approximately 3000 problem statements and their corresponding PHP code. The problems for the dataset are collected from different platforms. Any dataset must meet the minimum requirements of completeness and precision. Without it, any final outcome is suspect of biases and wrong conclusions. No matter how sophisticated the algorithm, analysis is entirely depen-
590
Y. Tomer et al.
dent on the dataset’s quality. The quality of training data has immense implications for the model’s development. Before beginning the process of training the model using this dataset, great deal of time and energy have been spend confirming the accuracy and dependability of the data. The data source significantly impacts the accuracy of the model’s predictions. Most of the time, the data which is obtained is raw and unstructured. It is needed collect the data, organize and sequence it, and compile it in a format that can be used to train a model. Therefore, it is ensured that the data is collected from various sources enhances the performance of the model. The questions in the dataset are taken from platforms like GeeksforGeeks, Javatpoint, Tutorialspoint, W3Schools, etc., and some are self-written. The structure of the dataset includes problem statements following the PHP code (Fig. 3).
4.1 Types of Questions The initial step in training a model is the collection of training datasets. The performance of the model and the quality of its findings heavily depend on the training data. Method for adding data to the dataset is to collect data from as many categories as feasible. The collection comprises data from 60 different categories (approx). Integrating diverse data types into the dataset increases the model’s ability to learn. It ensures that the model can predict the outcomes for various data kinds. The collection contains numerous types of questions. It includes every conceivable question type in the dataset. As shown in Fig. 4, the dataset comprises topics such as number systems, String, Arrays, Functions, Operators, Classes, and others. It is also essential to ensure that the dataset does not experience an unbalanced state while being appended. To avoid this, it maintain the number of questions inside each category. Figure 3 | Format of the dataset used in the model. The first line of every problem starts with ’ # ’ following the problem statement. PHP code of that problem lies below the problem statement. After another ’ #, ’a new problem starts. This format is applied to the whole dataset. Fig. 3 Format of the dataset
Transformers-Based Automated PHP Code Generator
591
Fig. 4 Types of questions
4.2 Data Augmentation Quantity and diversity of data are crucial to the effectiveness of a deep learning model. Consequently, training neural network models requires a massive dataset. In machine learning models, collecting and categorizing data is a difficult task. Consequently, data augmentation can alter the dataset. It resolves the issue of limited data sets and data fluctuation. This enhances the model’s performance in a variety of scenarios. Data augmentation has been utilized to enhance the size of the dataset because the dataset lacks sufficient data points. It utilized Python’s built-in tokenizer to tokenize code. While tokenizing PHP code, it masks variable names with random alphabetic characters so that the trained model may understand that variable names can be anything and can be variable, as well as comprehend the fundamental logic underlying PHP syntax. The figure below depicts the enhancement algorithmic structure 1.
5 Results This section will analyze the performance of this model at each level based on the outcomes of test cases. The model’s performance has been examined on datasets with varying data points. The purpose of testing the model is to determine how the quantity of data impacts the model’s performance and how essential the dataset’s diversity is to modeling. In light of this, the model is trained on datasets containing varying data points and gathered the findings. To evaluate the model’s performance, loss approach is employed (i.e., train loss and validation loss which are used to assess
592
Y. Tomer et al.
Algorithm 1 Algorithm for Augmentation 1: defines a list “a,” which stores tokenized PHP str 2: for iteration = 1, 2, . . . len(a) do 3: if token type = 1 and value of token[i − 1] = “$” and token value != UPPERCASE then 4: if token.string value is in masked variable list then 5: Append final tokenized list 6: else masked variables randomly and append final tokenized list 7: end if 8: else if token type = string and not empty then 9: if “$” is present in first place then 10: Replace the remaining string with a random variable 11: else Append it in the final tokenized list 12: end if 13: elseAppend all remaining tokens in the final tokenized list 14: end if 15: end for
(a) Train Loss
(b) Validation Loss
Fig. 5 Affect on losses with different datasets
how the model fits on training data and validation data, respectively). Based on the training dataset, it may be determined how much the model is learning. Figure 5a and b illustrates the train loss and validation loss for various settings. There is a reduction in losses as the size of the dataset increases. For evaluating the model’s performance, 40 questions are selected from the beginning and throughout testing to determine how the dataset size increase affected the code generation for these specific problems. The testing problems come from various categories to ensure that the model is learning in a manner that is not limited to specific types of challenges. The model’s performance has enhanced as a result of its improved performance following training on many data sets. The findings are displayed in Fig. 6, which depicts the number of correct, erroneous, and partially correct solutions to the given problem set. As no study has been conducted in this field, no models can produce PHP code from natural language. However, some models can generate Python or C++ code. The model has been evaluated on the different datasets and can see the results in Fig. 5.
Transformers-Based Automated PHP Code Generator
593
Fig. 6 Performance of the model on given problem set
Trainings Training 1 Training 2 Training 3
Accuracy of the modal Correct (%) Partial correct (%) 40 37.50 67.5 15 85 7.5
Incorrect (%) 22.50 17.5 7.5
6 Conclusion In this work, it illustrates the performance of the model based on the number of data points it currently possess (approx. 3000 problems). It performed pretty well and achieved 85% accuracy on the test problems presented. The model can generate source code for small problems that do not rely on complex approaches. It has been observed; the model’s performance improves as the dataset grows. A deep learning model requires a vast amount of data to work optimally. Due to the limited quantity of the dataset, the model is currently only capable of producing accurate results for minor and straightforward situations. This implies that despite having fewer data points, the model has given pretty good results; therefore, if the dataset size will increase, the model will perform significantly better.
594
Y. Tomer et al.
References 1. Allamanis M, Barr ET, Devanbu P, Sutton C (2018) A survey of machine learning for big code and naturalness. ACM Comput Surv (CSUR) 51(4):1–37 2. Choudhury A (2019) 10 AI applications that can generate code themselves 3. Lee JS, Hsiang J (2020) Patent claim generation by fine-tuning OpenAI GPT-2. World Patent Information 62:101983 4. Li H (2011) Tabix: fast retrieval of sequence features from generic tab-delimited files. Bioinformatics 27(5):718–719 5. Li Y, Choi D, Chung J, Kushman N, Schrittwieser J, Leblond R, Eccles T, Keeling J, Gimeno F, Lago AD et al (2022) Competition-level code generation with alphacode. arXiv preprint arXiv:2203.07814 6. Nguyen N, Nadi S (2022) An empirical evaluation of GitHub Copilot’s code suggestions. In: 2022 IEEE/ACM 19th international conference on mining software repositories (MSR). IEEE, pp 1–5 7. Perez L, Ottens L, Viswanathan S (2021) Automatic code generation using pre-trained language models. arXiv preprint arXiv:2102.10535 8. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9 9. Sentdex: GPyT—generating python code with transformer models 10. Shin J, Nam J (2021) A survey of automatic code generation from natural language. J Inf Process Syst 17(3):537–555 11. Singh A (2021) Auto-code generation using GPT-2 12. Sinha A. Automation and its impact on the contemporary world 13. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Design and Analysis of Finite Impulse Response Filter Based on Particle Swarm Optimization and Grasshopper Optimization Algorithms Sandeep Kumar and Rajeshwar Singh
Abstract Denoising filters have played vital role in removing the unwanted signal from the real-time audio signal those are used in applications to curtail noise from audio signal. For transmitting and receiving of unaffected audio signal, the designing of denoising filters is mandatory; those are responsible for the elimination of distortion/noise from the desired audio signal in effective manner. Further, no noise/ distortion should be mixed with the audio signal themselves. Significant use of audio filter made the researchers as well as the scientists to devise such audio filters those are spontaneous, capable, and advanced. Digital filters are more advantageous over analog filters for processing and extract the desired signal. Finite impulse response filters have been favored over infinite impulse response filters due to phase linearity and frequency stability. Equiripple finite impulse response is devised for denoising of real-time audio signal and made analysis. A random audio signal by the addition of casual distortion is used for processing and analyzed by using MATLAB. Various windowing functions are existing, i.e., Hamming, Rectangular, Blackman, Hanning, Flat top, etc., but the Kaiser Window function is more advantageous due to the variable parameters to control width of main lobe and to attenuate side band ripples. Various optimization algorithms may be utilized, and the researchers analyze advantages and disadvantages. Particle swarm optimization algorithm and grasshopper optimization algorithms are become the known malleable method based on Swarm Intelligence, mainly particle’s population in search space in particle swarm optimization and determining the next position in grasshopper optimization algorithms. Here in particle swarm optimization, position of rest of particles don’t affect in the updation of position of a particular particle, whereas in grasshopper optimization algorithms, all other grasshoppers considered to find out next position of that particular grasshopper. Particle swarm optimization enhances the features of result by renovate the position and velocity of swarms. Optimized group of filter coefficients S. Kumar (B) I.K. Gujral Punjab Technical University, Jalandhar, Punjab, India e-mail: [email protected] R. Singh Doaba Khalsa Trust Group of Institutions, SBS Nagar, Rahon, Punjab, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_45
595
596
S. Kumar and R. Singh
have been generated using both algorithms. Digital finite impulse response filter has been fabricated in this paper by using Blackman, Flat top, and Kaiser Window function. Finite impulse response filter using Kaiser Window function provides the improved results. Both algorithms have been applied on designed filter to further optimize the design in MATLAB. Results obtained by using both algorithms show that devised filter are superior to earlier filters with reference to frequency spectrum. Keywords Finite impulse response (FIR) filter · Particle swarm optimization (PSO) · Grasshopper optimization algorithm (GOA) · Swarm Intelligence (SI) · Flat top · Blackman and Kaiser Window function
1 Introduction Digital signal processing (DSP) is most trending field, as the signal generation, its transmission, and recovering the desired signal/audio/image/video are of prime importance. A lot of noises, interferences are generally mixed with the desired/ transmitted signal. It is the utmost priority to recover desired signals/images/videos. The filter is normally utilized to eliminate undesired distortion arise during operation of video, image, and signal at any level, i.e., sources, transmission, and destination of communication system. Digital filter is beneficial and skilled for efficiently removing distortion from the distorted signal. FIR and IIR filters are the classification of digital filters [1–4]. Based upon dimension of information, classification of filter is one-dimensional (1D) for signal and two-dimensional (2D) filters for image processing. Various windowing functions exist but Kaiser Window function is more advantageous as contains variable parameters [5]. To solve these problems and to perform optimization process efficiently, signal processing requires a systematic and effective algorithm. Differential evolution and genetic algorithm are also popular multi-objective algorithms. Many optimization problems are solved by other algorithms like Artificial Bee Colony and ant colony optimization [6]. Swarm algorithms are used to interpret the optimization problems of real world. PSO is well known as well as among the most trusted swarm-based algorithms. The easy implementation, coding, limited controlling parameters, and flexible and easy to hybridize with other optimization algorithms are the main reasons behind the popularity of PSO. Different performance may occur due to a minor alteration in controlling parameters. Grasshopper optimization algorithm (GOA) is also a known malleable method based on Swarm Intelligence [7]. Remaining paper is organized as follows: Sect. 2 describes the literature review. Section 3 describes materials and methods. Section 4 presents discussion and results. Section 5 describes conclusions.
Design and Analysis of Finite Impulse Response Filter Based on Particle …
597
2 Literature Review New techniques like subsection sampling and proposed FIR filter for the realization of the TDA function are getting popular nowadays. This is to help in the extraction of required periodic components from a signal contain noise [8]. In this research paper, real-time audio signal analysis and reduction of noise present in the information signal are performed using digital filtering techniques. Digital finite impulse response filter, IIR filter, and wavelet transform are employed [9]. Various filtering techniques are utilized to minimize noise effects in ECG signal. FIR filter as LPF is presented, which helps to decrease attenuation in the ECG signal. FIR eliminates the high-frequency components and as LPF removes the noise at a low level from the ECG signal. Moving average filter (MAF) takes the average values of the signal and produces desired signal [10]. The uses of digital filters are elaborated for communication system. It is required to design an efficient digital finite impulse response (FIR) filter with proper window function. Window functions such as Blackman, Triangular, Rectangular, Kaiser, Hanning, and Hamming are used for the designing of digital FIR filters [11]. Reviews of research on particle swarm optimization and specific applications are being discussed [8, 12, 13]. PSO parameters have been customized. PSO is appeared as excellent tool to optimize. However, PSO undergo from premature convergence. Authors offered their work to use PSO for many technical and general outcomes [14]. Survey of the linear phase LPFIR filter has been described. PSO algorithm and customized inertia weight of PSO is used toward improvement of swarm’s searching ability. Here, it is asserted, and PSO simulation executes nice in comparison with remaining methods. Exact Coefficient and fast convergence can be achieved with PSO algorithm [15]. PSO has been utilized to devise low pass FIR filter. By using PSO, excellent filter’s coefficient has acquired. In this, writers claimed, “PSO is an easy swarm optimization method to optimize multidimensional difficult technical confront” [16]. Filter has been devised and made comparison between different parameters and coefficients. Grasshopper optimization algorithms (GOA) is taken to optimize the concert. LPFIR filter is devised by GOA. Design of LP, HP, BP, and BS filter has been described. Writers declared that ripples have been compacted the use of GOA [7]. PSO has been used for the designing of FIR filter. The performance of ARPSO and CRPSO has been compared using MATLAB to design FIR filter. Performance has been matched to PM. Acquired solutions show that PSO-based algorithm performs better while designing linear phase FIR filters regarding RMS error and frequency spectrum [17]. Broad analysis of many optimization techniques for devising FIR and IIR has been obtained. Nature inspired methods to optimize difficult engineering issues have
598
S. Kumar and R. Singh
been discussed. Authors explained various optimization algorithms like Swarm Intelligence (SI), genetic algorithms (GA), Artificial Bee Colony (ABC), particle swarm optimization (PSO), grasshopper optimization algorithm (GOA), Cuckoo Search (CS), ant colony optimization (ACO), biogeography-based optimization (BBO), bacterial foraging optimization (BFO), Bat Algorithms (BA), Krill Herd (KH), Harmony Search (HS), symbiotic organisms search (SOS), social spider optimization (SSO), gravitational search algorithm (GSA), Whale optimization algorithm (WOA), teaching-learning-based optimization (TLBO), Grey wolf algorithms (GWO). These may be used to design filter [18]. Henkel integration has been used for analysis of electro-magnetic information. Digital linear filter method has been used for signal frequency spectrum. Applications of PSO for optimization of shift and spacing have been described. Authors state that solution shows advancement over earlier methods [19]. The literature review has confirmed that the PSO techniques are among the excellent methods used to find optimal filter coefficients to design digital FIR filters.
3 Materials and Methods Design of Finite Impulse Response Filter: Filter whose impulse response is of finite time period is called finite impulse response (FIR) filter [20, 21]. Impulse response is finite as any feedback has not been provided to filter. Single dimensional digital FIR filter is illustrated by transfer function H(z) is H (Z ) =
N −1
h(n)z −n ,
(1)
n=0
H (z) = h(0) + h(1)z −1 + · · · + h(n)z −(N −1) ,
(2)
where N is filter length of the impulse response h(n). Absolute difference among ideal and desired magnitude response of filter is . D(w) = (|A id (w)| − |A (w)|)
(3)
∀w
D(w) shows error fitness function in above Eq. (3) and Aid is magnitude response of ideal F.I.R, A(w) indicates devised magnitude response. Blackman Window: Time domain sequence of the Blackman Window is W (n) = 0.42 − 0.5 cos
2π n M −1
+ 0.008 cos
4π n , 0 ≤ n ≤ M − 1, (4) M −1
where M is N/2 when N is even and (N + 1)/2 when N is odd.
Design and Analysis of Finite Impulse Response Filter Based on Particle …
599
Flat top Window: Flat top window is the summation of cosines. Coefficients of Flat top window are calculated by given equation: 2π n 4π n + a2 cos w(n) = a0 − a1 cos N −1 N −1 6π n 8π n + a4 cos , − a3 cos N −1 N −1
(5)
where 0 ≤ n ≤ N − 1. Kaiser Window: It approximates prolate spheroidal window, i.e., pointy instead of squashed. Basically, it is ratio of main lobe to side lobe energy. β manages relative side lobe attenuation. Parameter β, side lobe attenuation, is rigid with respect to the window length. Kaiser statement (n, β) calculates the length n and parameter β. Coefficients are calculated as N n− 2 .2 I0 β 1 − N 2 for 0 ≤ n ≤ N , (6) w(n) = I 0(β) where I0 is zero order updated Bessel functions. PSO Algorithm in FIR Filter Design: PSO was introduced and used as continuous real valued algorithm. Swarm of element revolves in D-dimensional search space to find an optimal solution. Particle(i) tends to store the current velocity vector. Vi = [vi1 , vi2 , . . . , vi D ]
(7)
Current position vector stored as X i = [xi1 , xi2 , . . . , xi D ].
(8)
PSO has started by taking arbitrarily X i and V i . After every iteration, particle attains excellent position, i.e., iPbesti = [Pbesti1 , Pbesti2 , . . . , Pbesti D ].
(9)
Finest position attained to swarms Gbest = [Gbest1 , Gbest2 , ..., Gbest D ].
(10)
Updated position and velocity of particle is given below: vid (t + 1) = [vid (t) + c1r1 (Pbestid (t) − xid (t)) + c2 r2 (Gbestd (t) − xid (t))], (11)
600
S. Kumar and R. Singh
xid (t + 1) = xid (t) + vid (t + 1).
(12)
Grasshopper Optimization Technique: Grasshoppers are insects which are seen individually in nature, but they join as one of the largest swarm. This swarming behavior can be seen in both nymph and adulthood. In the larval phase, they move slowly in small steps whereas in adulthood they move abruptly. Factors which affects the movement of grasshoppers while seeking food are social interaction between grasshoppers, gravity force on the grasshopper, and direction of the wind. Social interaction means the strength of social force indicating the intensity of attraction and repulsion between grasshoppers. When there is no attraction or repulsion between grasshoppers, they enter into a comfort zone where they are not able to move further. The direction of wind affects the nymph as they do not have wings and lay on the ground. It is assumed here that wind direction is always toward the target. Gravitational force is also not considered in determining the next position of grasshopper. The basic difference between grasshopper optimization and particle swarm optimization algorithm is that in GOA status of all the grasshoppers are considered in determining the next position of a particular grasshopper in search space whereas in PSO position of other particles do not affect while updating the position (Xt) of a particular search agent. Next position (Xt + 1) of a search agent depends on its previous position, target position, and position of all the remaining grasshoppers. It is given as ⎛
⎞ N
x j − xd ⎠ ubd − lbd j C s xd − x dj
+ Td . X id = C ⎝ 2 di j j=1
(13)
Here, ubd in Eq. 13 is the dth dimension upper bound and lbd is dth dimension lower bound. Both c variable in equation plays different role. First c behaves in similar manner as the inertia weight of PSO supports exploration, second c is a diminishing factor to shrink all the zones that is comfort, repulsion, and attraction in order to achieve a better exploitation. S represents social force on the grasshopper. T d shows the target position. c = Cmax − l
Cmax − Cmin L
(14)
c keeps on updating itself according to Eq. (14), depending on the value of iteration l that changes from one to maximum iteration L. Actually, c balances exploration and exploitation. The pseudocode for FIR filter design is mentioned here below.
Design and Analysis of Finite Impulse Response Filter Based on Particle …
601
4 Discussions and Results A FIR LPF is designed in MATLAB Simulink using DSP toolbox. Table 1 given illustrates the specified parameters utilized to devise digital FIR filter. Table includes structure type, response, designing method, filter order, and frequency specification for digital filters. The FIR filter was designed using these parameters for both Blackman and Flat top windows [22]. Magnitude response of the designed filter for both windows is revealed in Figs. 1 and 2. Table 2 given describes the specified parameters utilized to devise digital FIR filter using Kaiser Window. Table includes variable parameter beta, length of window function, max., and min. amplitude in time domain and main lobe width and side lobe attenuation in frequency domain for digital filters [23]. Magnitude response of devised filters for Kaiser Window is shown in Fig. 3. FIR filter utilizing Kaiser Window function is more efficient. Main lobe’s width and side lobe’s attenuation can be controlled by the beta parameters and the length Table 1 FIR filter parameter specifications for Kaiser Window Variable parameter beta
Window function’s length
Time domain
Frequency-domain
maximum amplitude
Minimum amplitude
Main lobe width
Relative side lobe attenuation (dB)
4
38
1.0
0.01
0.625000
−31.0
Fig. 1 Mag. response (dB), FIR, Blackman Window
602
S. Kumar and R. Singh
Fig. 2 Mag. response of FIR flat top window
Table 2 Performance analysis of LPF using PSO and GOA Algorithm
Max. ripple
Mean
Stop band attenuation
Pass band attenuation
Variance
Standard deviation
GOA
1.034
1.002
25.672
0.246
9.12 × 10−5
0.0095
0.293
10−4
PSO
1.023
1.093
23.068
1.19 ×
0.011
Fig. 3 Mag. response of low pass FIR filter
(n). But it has a drawback too, i.e., when value of beta is increased, width main lobe’s width increases, and side lobe’s attenuation decreases [24] (Fig. 4).
Design and Analysis of Finite Impulse Response Filter Based on Particle …
603
Fig. 4 K window of length 20 and diff. beta parameters
When the window size is increased, main lobe width and side lobe attenuation decreases as shown in Fig. 5. GOA and PSO algorithms are executed on devised FIR filters utilizing the K Window function of length n = 20 and β = 8 which is common in Fig. 5. Performance analysis of LPF using PSO and GOA is given in Table 2. Figure 6 shows the comparison of normalized magnitude of FIR LPF based on PSO and GOA graphically, and Fig. 7 illustrates the magnitude response (on dB scale) comparison of the same.
604
Fig. 5 K window of beta 8 and different lengths
Fig. 6 Magnitude response of FIR LPF
S. Kumar and R. Singh
Design and Analysis of Finite Impulse Response Filter Based on Particle …
605
Fig. 7 Magnitude response (db) of FIR LPF
5 Conclusion Digital FIR filter devised by the use of Blackman Window, Flat top window, and Kaiser Window function. Magnitude response of LP FIR filter is analyzed by given window functions. FIR filter’s magnitude response achieved nice by using Blackman Window. It is observed that Flat top window function performs better than Blackman Window function and performance is best using Kaiser Window function. Further, PSO and GOA method employed to devise the FIR LP filters by the use of Kaiser Window function. PSO has lower ripples where GOA has more attenuation. PSO modifies result due to an exclusive application offered to update position and velocity of swarm. A batch of upgraded coefficients has been acquired with GOA and PSO algorithms, which gives advanced solutions to LPF. FIR filters have been designed utilizing Kaiser Window function in MATLAB efficiently. This devised filter is further upgraded by GOA and PSO algorithms. Solution shows, devised FIR filter performs excellent in comparison to earlier devised FIR filters. Some more coefficients may be added to existing algorithms to give optimized result.
References 1. Oppenheim VA (1999) Discrete-time signal processing. Prentice-Hall, Englewood Cliffs 2. Antoniou A (2006) Digital signal processing: signals, systems, and filters. McGraw Hill
606
S. Kumar and R. Singh
3. Prokis DM (2007) Digital signal processing. Prentice-Hall International Edition 4. Khan SA (2011) Digital design of signal processing systems: a practical approach. John Wiley and Sons 5. Esakkirajan S, Jayaraman S, Veerakumar T (2015) Digital image processing. Tata McGraw-Hill Education Pvt. Ltd. 6. Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39 7. Yadav S, Yadav R, Kumar A, Kumar M (2020) A Noval approach for optimal design of digital FIR filter using grasshopper optimization algorithm. ISA 2020. https://doi.org/10.1016/j.isa tra.2020.08.032 8. Magsi H, Sodhro AH, Chachar FA, Abro SAK (2018) Analysis of signal noise reduction by using filters. In: 2018 international conference computer mathematics engineering technology invention innovation integration socioeconomic development. iCoMET 2018 proceeding, vol 2018-Janua, pp 1–6. https://doi.org/10.1109/ICOMET.2018.8346412 9. Feng W, Wang C, Chen X, Shi Y, Jiang M, Wang D (2021) Filter realization of the time-domain average denoising method for a mechanical signal. Shock Vib 2021. https://doi.org/10.1155/ 2021/6629349 10. Jayashree KC, Bharamappa KM, Kumar PM, Santeep GM, Saxena S (2020) Denoising realtime audio signals using matlab filter techniques denoising real-time audio signals using matlab filter techniques 29(10):4069–4078 11. Mbachu CB, Akaneme SA (2020) Noise reduction in speech signals using recursive least square adaptive algorithm. Int J Eng Adv Technol Stud 8(2):1–17. June 2020 Published by ECRTD-UK Print ISSN: 2053-5783(Print), Online ISSN: 2053-5791(online) 12. Hajihassani M, Armaghani DJ, Kalatehjari R (2018) Applications of particle swarm optimization in geotechnical engineering: a comprehensive review. Geotech Geol Eng 36(2):705–722 13. Elsheikh AH, Elaziz MA (2019) Review on applications of particle swarm optimization in solar energy systems. Int J Environ Sci Technol 16(2):1159–1170 14. Shami TM, El-saleh AA, Member S (2022) Particle swarm optimization : a comprehensive survey. pp 10031–10061 15. Kumari N, Jaglan P (2017) Design of FIR low pass filter using particle swarm optimization algorithm 1(3):31–35 16. Kadam SA, Chavan MS (2019) Design of digital filter by genetic algorithm. Int J Eng Adv Technol 8(6):397–400. https://doi.org/10.35940/ijeat.E7808.088619 17. Ali Z, Harijan BL, Memon TD, Nafi N, Memon U (2021) Digital FIR filter design by PSO and its variants attractive and repulsive PSO (ARPSO) and craziness based PSO (CRPSO) 3878(6):136–141. https://doi.org/10.35940/ijrte.F5515.039621 18. Kumar S, Singh R (2021) Review and analysis of optimization algorithms for digital filter design 12(7):1798–1806 19. Zeng L, Li J, Liu J, Guo R, Chen H, Liu R (2021) Efficient filter generation based on particle swarm optimization algorithm. IEEE Access 9:22816–22823. https://doi.org/10.1109/ACC ESS.2021.3056464 20. Dwivedi AK, Ghosh S, Londhe ND (2018) Review and analysis of evolutionary optimization based techniques for FIR Filter design. Springer Nature, Circ Syst Sig Proc 37:4409–4430 21. Ravi RV, Subramaniam K, Roshini TV, Muthusamy SPB, Prasanna Venkatesan GKD (2019) Optimization algorithms, an effective tool for the design of digital filters; a review. Springer Nature, J Ambient Intell Human Comput 22. Shehu NM, Gidado AS, Wudil YS, Gora UA (2016) Performance analysis of FIR low pass filter design using blackman and flat top window methods. ISSN 2321 3361 ©2016 IJESC. https://doi.org/10.4010/2016.848 23. Arya R, Jaiswal S (2015) Design of low pass FIR filters using kaiser window function with variable parameter Beta (β). Int J Multidiscip Curr Res 3. ISSN: 2321-3124 24. Kumar S, Singh R (2022) Particle swarm optimization algorithm based design and analysis of digital FIR filter using kaiser window function. Neuroquantology 20(11):2316–2326. https:// doi.org/10.14704/Nq.2022.20.11.Nq66229
HOG Feature-Based Offline Handwritten Malayalam Word Clustering with Lexicon Reduction A. T. Anju, Binu P. Chacko, and K. P. Mohamed Basheer
Abstract Handwriting recognition is an active research in the field of pattern recognition, image processing and computer vision. The handwriting recognition system can recognize handwritten words extracted from the scanned document image. For a better recognition model, the most inevitable thing is a good dataset for training purpose. Here, the clustering is performed on the segmented handwritten words to group the matching words for easing the process of labelled dataset creation. In this, after doing some pre-processing operations, as a feature descriptor, histogram of oriented gradient features is extracted holistically on the word image. For feature reduction, Principal Component Analysis is applied on the extracted feature set and finally K-means clustering used for grouping the matched handwritten words together from the unlabelled dataset. Here, training the recognition model for keyword searching, the clustering helps to create a lexicon reduced dataset and the non-repeated segmented words are neglected after clustering. This will make the most complicated and time-consuming manual task, grouping and labelling the handwritten words for dataset creation easier. Keywords Handwritten word recognition · Histogram of oriented gradient · Principal Component Analysis · K-means clustering
1 Introduction Automatic extraction of information from scanned handwritten documents is a challenging area of research under Document Analysis and Pattern Recognition. In some cases, the automatic detection and categorization of scanned documents is needed in accordance with some keywords present in that handwritten document image. A. T. Anju (B) · K. P. Mohamed Basheer Sullamussalam Science College, University of Calicut, Areekode, Kerala, India e-mail: [email protected] B. P. Chacko Prajyoti Niketan College, University of Calicut, Pudukad, Kerala, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_46
607
608
A. T. Anju et al.
The most difficult case is that the model needs to perform the conversion of handwriting present in the scanned noisy and degraded handwritten documents to digital form. A benchmark handwritten dataset is inevitable for a better word recognition system, and one of the most challenging and time-consuming tasks is labelled dataset creation for the training purpose of the recognition model. For word recognition, after performing word segmentation from the handwritten documents, we need to perform the grouping and labelling of the matching words present in the documents. So, we have equipped clustering to reduce the manual task to a great extent. By applying clustering on the unlabelled segmented handwritten words from the scanned documents, the repeated keywords were grouped together for labelling. Thus, we have performed automatic lexicon reduction as well, by neglecting the likely once-off words, from the segmented word dataset. As there is necessity of creating a word dataset for a better handwritten word recognition model, we have to perform the labelling process on the words segmented from the handwritten document image. The segmented words are handwritten word images of a large lexicon, in which some words are repeated and some are found once. By keeping keyword searching and categorization of handwritten documents in mind, here, the repeated words will be the keywords, and we have to create a labelled dataset containing such keywords. So, an unsupervised learning method, clustering can be used to group the repeated handwritten words together, and this automatic process will make the labelled dataset creation effortless. So, in the present work, input to our model was segmented word images, then, clustering was performed for grouping the matching handwritten word images together. Figure 1 shows the clustering process. The major contribution of this paper includes: • The proposed method reduces the manual task of labelling the dataset for supervised learning to a great extent. • It also reduces the lexicon size of the dataset. • This is a novel approach that helps for the creation of a benchmark dataset of word images. The remaining part of the paper is sorted out as follows: Sect. 2 gives the literature review. In Sect. 3, the proposed work is described. Section 4 depicts the experimental analysis and discussion. The paper is concluded in Sect. 5.
2 Literature Review The creation of a labelled dataset creation is a difficult, time–consuming and expensive task. For the training of supervised handwritten word recognition model, a good database is inevitable, and it greatly affects the recognition accuracy of the model. In this work, using the clustering technique we reduced the manual task by clustering the matching words segmented from the document. To our knowledge, the exact work is not found in the literature; however, there are some works which used HOG
HOG Feature-Based Offline Handwritten Malayalam Word Clustering …
609
Fig. 1 Clustering process of the word images
feature extractor for feature extraction. Hebbi et al. [1] extracted HOG features for Kannada handwritten word recognition. Hamida et al. [2] segmented the word image into different cells and created the HOG descriptor for each cell. Bouibed et al. [3], Mohammed et al. [4], Guha et al. [5] and Khaissidi et al. [6] were also employed HOG for feature extraction. Sharma et al. [7] created HOG feature descriptor for handwritten Odia numeral recognition. Hmood et al. proposed [8] an enhanced version of HOG feature descriptor for coin recognition. Jino et al. [9] used HOG descriptor for the recognition of handwritten characters. Because of the consistent accuracy rate found while using HOG descriptor, we employed gradient distribution-based HOG feature extraction for extracting features from handwritten scanned word images.
3 Proposed Work In the present work, we illustrate how a deterministic algorithm can help to create a handwritten word dataset. In this work, the input to the system is scanned noisy handwritten word images segmented from real time handwritten document of Police First Information Statement (FIS), and the output will be a number of possible clusters with matching words. Figure 2 shows the FIS and some handwritten words present in that document image. Consequently, a number of pre-processing steps need to be performed on the input images, which are Resizing, greyscale conversion, Gaussian blurring, canny
610
A. T. Anju et al.
Fig. 2 Segmented words from the document image
edge detection and Thresholding the image for better feature extraction. Then, the feature descriptor was created using Histogram of Oriented Gradient (HOG) Features applied on the handwritten word images. To improve the performance, Principal Component Analysis (PCA) was applied for feature selection from the extracted features. Here, we have used K-means clustering for grouping the matching words together for creating the lexicon reduced dataset for training the recognition model. Figure 3 shows the methodology of the proposed work.
3.1 Pre-processing In pre-processing, as the first step, all the input images were converted to greyscale images. This reduces the dimensionality and model complexity as well. To reduce the noise level of the image, we applied Gaussian blurring on the greyscale image. This low-pass filter removes the insignificant details from the image by convolving it with the kernel. Equation 1 represents the Gaussian kernel. G 2D x yσ =
1 − x 2 +y2 2 e 2σ 2π σ 2
(1)
where x and y are the local indices and σ is the standard deviation of the Gaussian distribution. Then, canny edge detection was performed on the input image before thresholding for getting an improved binarization result. In this multistage algorithm, after Gaussian blurring using 3*3 kernel, the intensity gradient of the image is calculated. Then, the unwanted pixels in the image are removed using non-maximum suppression, and hysteresis thresholding is applied on the image to retrieve the real edges of the image. The output of the canny edge detector is the strong edges of the word image, and it also removes some trivial noises. Finally, thresholding was performed using Otsu thresholding technique, and the image was resized for better feature extraction.
HOG Feature-Based Offline Handwritten Malayalam Word Clustering …
611
Fig. 3 Detailed methodology of the proposed work
3.2 Feature Extraction and Selection Feature extraction is an important stage because the clustering algorithm gets better results depending on the efficiency of the feature descriptor. Here, the HOG feature descriptor [10] is equipped to extract the holistic features of the handwritten Malayalam word. This structure oriented HOG descriptor extract features by considering the magnitude and angle of the gradient, and it calculates the gradient orientation in each portion of the image. Equations 2 and 3 represent the formula for calculating gradient.
612
A. T. Anju et al.
Gx(r, c) = I (r, c + 1)−−I (r, c − 1),
(2)
Gy(r, c) = I (r − 1, c)−−I (r + 1, c ),
(3)
where r and c are rows and columns, respectively. After calculating gradient, the magnitude and angle of each pixel in the image are calculated. Equations 4 and 5 depict the formula for calculating the magnitude and angle, respectively.
G 2x + G 2y
(4)
Angle(θ ) = | tan−1 (G y /G x )|
(5)
Magnitude(μ) =
For getting better functioning of the algorithm, the features extracted from the input image must contain the relevant features. However, PCA [11] is equipped for selecting the relevant features from the extracted features. In this, the principal components are constructed by considering the covariance matrix, Eigen vectors and Eigen values. The covariance matrix measures the association between each variable, and Eigen vectors depict the direction of our data. The Eigen value represents the magnitude or importance of each direction and eliminates the directions with lesser Eigen values for dimension reduction. Equation 6 represents final feature selection using PCA. Final data = FeatureVectorT × StandardizedOriginalDataT
(6)
where FeatureVector is the matrix that contains Eigen vectors of the highest Eigen values. Accordingly, the output of this stage is the relevant features selected from the extracted features.
3.3 Clustering Clustering is the final stage of this work to get the homogeneous words together for creating handwritten Malayalam word dataset and here we equipped K-means clustering. K-means is a simple clustering algorithm [12] by using squared error criterion, and its time complexity is O (n). It is convenient to use K-means for large dataset because it has linear time complexity. The algorithm starts with a random initial partition, and the patterns are assigned to different cluster centres by considering the similarity between them. This process is repeated until a convergence occurred. Equation 7 represents the main objective of k-means clustering, where (x1, x2, . . . , xn ) be a set of observations of d-dimensional real vector. K-means clustering is directed to perform partitioning of these n observations into k (≤ n) homogeneous subgroups by minimizing within-cluster variances.
HOG Feature-Based Offline Handwritten Malayalam Word Clustering …
args min
k
x − μi 2 = args min
i=1 x∈Si
k
|Si|Var Si
613
(7)
i=1
In this work, we have segmented handwritten words from a document image, and some are repeating words, which are the keywords present in the document. We clustered these homogeneous keywords together by applying k-means clustering. It makes the time-consuming and expensive manual task, labelled dataset creation, from the unlabelled segmented words easier.
4 Experiment Analysis and Discussion In the present work, for the experimentation, we applied unsupervised clustering on an unlabelled dataset containing 300 handwritten words segmented from 8 different FIS. Figure 4 shows some samples of handwritten word images. From Fig. 4a and b are matching words in the same way Fig. 4c–e and l are other group of matching words. The remaining are some other word samples taken from the dataset. As a first step, the image is resized to 150 × 50, and the input images were given for pre-processing to remove the noises from the image and to make the image suitable for efficient feature extraction. Figures 5 and 6 show the images after canny edge detection and thresholding. Then, the HOG features are extracted from the image, and Fig. 7 shows the HOG feature visualization of the above image. Here, 3061 features were extracted from each image of size 150 × 50, and PCA reduces the feature dimension. Figure 8 shows the HOG features in two dimensional space. To overcome the problems associated with the selection of initial centroids in Kmeans clustering, here, we applied K-means ++ optimization algorithm for improving the quality of clustering. In unsupervised learning, there is no ground truth to evaluate the performance of the model. In this, the value of k is intuited by experimenting
Fig. 4 Word samples from the dataset
614
A. T. Anju et al.
Fig. 5 Image after edge detection
Fig. 6 Image after thresholding
Fig. 7 HOG feature visualization
two methods, namely Elbow method and Silhoutte analysis [13]. Figure 9 shows the curve formed in the elbow method. In this curve, the x-axis denotes the number of clusters, and y-axis is the sum of squared distance (SSE). SSE is calculated in between the data points and the corresponding clusters’ centroids. We took k as a point where there is a sudden decrease in the value of SSE, and it forms an elbow-like structure. Here, we took k = 25 by considering the curve. In Silhoutte analysis, the degree of separation between the clusters is calculated. In this, the average distance is calculated between each data points in the same cluster and the closest cluster, and then a coefficient is computed. The coefficient value ranges from −1 to 1. Good clusters are formed when the value of coefficient is close to 1. From the experiment, the value of the coefficient comes
HOG Feature-Based Offline Handwritten Malayalam Word Clustering …
615
Fig. 8 HOG features
Fig. 9 Elbow method
close to 1 when the value of k is more than 30. Figure 10 shows some clusters of similar words formed after applying the proposed work. The experiments are performed in Google Colab Pro with GPU and a maximum of 32 GB RAM. All the image processing and machine learning experiments are implemented in Python 3.
616
A. T. Anju et al.
Fig. 10 Clusters of similar words
5 Conclusion In the present work, image processing and machine learning techniques were performed on an unlabelled dataset to make it a labelled one with reduced lexicon size. Till now, a benchmark offline Malayalam handwritten word database is not available, and this work made the dataset creation task easier with the help of automated subgrouping of homogenous words. However, this work finds its application in offline Malayalam handwritten recognition by providing its contribution for the making of a training dataset for the recognition model. In future, we will employ some more feature extraction techniques to enhance the efficiency of the model and apply deterministic algorithms for clustering as well.
HOG Feature-Based Offline Handwritten Malayalam Word Clustering …
617
References 1. Hebbi C, Sooraj JS, Mamatha HR (2022) Text to speech conversion of handwritten Kannada words using various machine learning models. In: Bhateja V, Tang J, Satapathy SC, Peer P, Das R (eds) Evolution in computational intelligence. Springer, Singapore, pp 21–33 2. Hamida S, El Gannour O, Cherradi B, Ouajji H, Raihani A (2022) Efficient feature descriptor selection for improved Arabic handwritten words recognition. Int J Electric Comput Eng 12(5):5304–5312 3. Bouibed ML, Nemmour H, Chibani Y (2022) SVM-based writer retrieval system in handwritten document images. Multimedia Tools Appl 81(16):22629–22651 4. Mohammed HH, Subramanian N, Al-Madeed S (2021) Learning-free handwritten word spotting method for historical handwritten documents. IET Image Proc 15(10):2332–2341 5. Guha R, Ghosh M, Singh PK, Sarkar R, Nasipuri M (2021) A Hybrid Swarm and Gravitationbased feature selection algorithm for handwritten Indic script classification problem. Complex Intell Syst 7(2):823–839 6. Khaissidi G, Elfakir Y, Mrabti M, Lakhliai Z, Chenouni D, El Yacoubi M (2016) Segmentationfree word spotting for handwritten Arabic documents. Int J Interact Multimedia Artific Intell 4(1):6–10 7. Sharma K, Sarangi PK, Rani L, Singh G, Sahoo AK, Rath BP (2022) Handwritten digit classification using HOG features and SVM classifier. In: Proceeding 2nd international conference on advance computing and innovative technologies in engineering (ICACITE), 2002, IEEE, pp 2071–2074 8. Hmood AK, Suen CY, Lam L (2018) An enhanced histogram of oriented gradient descriptor for numismatic applications. Pattern Recognit Image Anal 28(4):569–587 9. Jino PJ, Balakrishnan K (2017) Offline handwritten recognition of Malayalam district name-a holistic approach. Int J Eng Technol 9:987–994 10. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceeding 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, vol 1, pp 886–893 11. Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscipl Rev: Comput Stat 2(4):433–459 12. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323 13. Kodinariya TM, Makwana PR (2013) Review on determining number of clusters in K-means clustering. Int J Adv Res Comput Sci Manag Stud 1(6):90–95
MathKnowTopic: Creation of a Unified Knowledge Graph-Based Topic Modeling from Mathematical Text Books M. Srivani , Abirami Murugappan, and T. Mala
Abstract Most of the mathematical text documents are complex to understand and so it is very difficult to derive meaningful information from the text. In order to overcome this problem, MathKnowTopic: a unified knowledge graph-based topic modeling approach has been proposed to understand complex mathematical text documents and derive meaningful insights. A knowledge graph is a directed graph-based structure which depicts the semantic relations between the nodes. Topic modeling technique discovers the hidden semantic topics from the unstructured texts. The creation of a Unified Knowledge Graph Topic Modeling (UKGTM) consists of the following steps such as text preprocessing, entity and entity pairs extraction, relation or predicate extraction, construction of directed knowledge graph, topic model generation using Latent Dirichlet Allocation (LDA), Top2Vec, and BERTopic and optimum number of topics identification using evaluation. The dataset used in this paper is the term based on mathematical text books of class 1 to class 6. The proposed system is evaluated using performance metrics such as topic coherence score, and cognitive metrics such as knowledge capacity and incrementality. The results indicate that the KG+BERTopic (all-MiniLM-L6-v2) renders meaningful topics based on topic coherence score (0.6387). In terms of knowledge capacity cognitive score (0.7680), KG+BERTopic (all-MiniLM-L6-v2) performs well and Kg+BERTopic (Universal Sentence Encoder) performs well based on incrementality cognitive score (0.9925). Keywords Topic modeling · Knowledge graph · Entity extraction · Relation extraction · Semantic relations · Knowledge capacity · Cognitive metrics
1 Introduction NLP techniques play a major role in the extraction of meaningful and useful insights from the text data. These NLP techniques convert the unstructured text data into strucM. Srivani (B) · A. Murugappan · T. Mala College of Engineering Guindy, Anna University, Chennai, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_47
619
620
M. Srivani et al.
tured data by the application of many algorithms. For the construction of knowledge graph, computers must understand and analyze the natural language. Knowledge graphs are increasingly popular in cognition and human intelligence research [1]. A knowledge graph can be depicted as a semantic network with nodes and edges interlinked to each other. A knowledge graph stores information as a triplet with subject, predicate, and object. Probabilistic Topic modeling automatically discovers the semantic information from the unstructured text. Topics are the frequently occurring terms or words in the documents. The discovered topics by the different topic modeling approaches are often unclear and ambiguous, so in this paper knowledge graph is generated from the mathematical textual documents, and then, the topics are generated by topic modeling algorithms. Students must learn to interpret each symbol in the context of a variety of concrete situations, visuals, and languages. Mathematical symbols are not merely abbreviations. Mathematical symbols [2] enable us to find and articulate links between concepts. So, in this article the mathematical textual data is preprocessed, and UKGTM algorithm is used to generate knowledge graph and optimum mathematical topics. The significant contributions of this paper are • Extraction of intelligible insights from the unstructured mathematical text. • Creation of MathKnowTopic—a unified knowledge graph-based topic modeling system. • Extraction of optimum number of concepts from the mathematical books. • Evaluation of the proposed system using cognitive metrics such as knowledge capacity and incrementality. The rest of the paper is structured as follows: Sect. 2 explores the background and related work. Section 3 describes the system architecture, and Sect. 4 elucidates the methodology of the proposed system. Section 5 analyzes the discussion and results, and Sect. 6 concludes the work by presenting the relevant future works.
2 Background and Related Work Most prominent real-world applications of NLP are question answering systems, information extraction systems, machine translation, sentiment analysis, Named Entity Recognition (NER), and so on. Knowledge graphs in combination with topic modeling systems are discussed. A hybrid knowledge graph via topic modeling [3] is implemented to extract evidence from the knowledge base. LDA technique is used to design graph, and the graph is enriched using Google via OpenIE. The issue is that lack of argument mining with discrete knowledge using knowledge graphs. A SciKGraph has been developed [4] for performing semantic analysis of natural language documents to generate scientific knowledge graphs and extract most important concepts. For optimality, the thresholds and keyphrases can be fine-tuned, and interactive knowledge graph visualization must be investigated. A dynamic and interactive
MathKnowTopic: Creation of a Unified Knowledge …
621
knowledge graph [5] has been developed from the textual learning resources, and important topics are extracted by using the topic identification algorithm. This system lacks in handling huge amount of textual data. A Bayesian Embedded Spherical Topic Model (ESTM) has been proposed [6] which combines knowledge graph and word embeddings in hypersphere for high quality topics extraction and representations. ESTM makes use of low dimensional representation of words and entities. In future, probability distributions in ESTM and complete knowledge graphs must be explored. A Sentiment-Controllable topic-to-essay generator with a Topic Knowledge Graph (SCTKG) [7] has been implemented based on the Conditional Variational Autoencoder (CVAE) technique. To generate better knowledge representation, the Topic Graph Attention (TGA) algorithm is proposed. A Topic-Enriched Knowledge Graph Recommendation System (TEKGR) [8] has been implemented to recommend precise news. The system lacks in covering more conceptual topics. Some challenges overcome by the proposed system are entity disambiguation, effective handling of textual resources, managing dynamic input data, learning from multitude of sources (differentiates both normal and mathematical content), and interactive visualization of knowledge graphs.
3 System Architecture The proposed system MathKnowTopic deals with the creation of a unified knowledge graph-based topic modeling as shown in Fig. 1 from the mathematical text documents to derive semantic information depicting the concepts to be learned by the students. The proposed system consists of three phases such as generation of directed knowledge graph, construction of topic model, and model evaluation. The first phase consists of the following steps such as text preprocessing, entity and entity pairs extraction, relation or predicate extraction, and construction of directed knowledge graph. The second phase consists of creation of dictionary and corpus, extraction of term document frequency, topic model generation using LDA, Top2Vec, and BERTopic. The third phase is the evaluation of the proposed system using performance metrics such as topic coherence score, and cognitive metrics such as knowledge capacity and incrementality. The dataset is the mathematical text books of class 1 to class 6. Text preprocessing techniques analyzes the raw text data from the books and renders cleaned or filtered text. Entity extraction step extracts the entities from the filtered text by analyzing the text. Entity pairs extraction step extracts the subjects as source and objects as target. Relation or predicate extraction step extracts the semantic relations from the text by initializing matcher with vocabulary, defining patterns, adding pattern to matcher, and applying matcher to the documents. The construction of directed knowledge graph step starts with dataframe creation and extracts the subject, object pairs. The next step is to retrieve the nodes and edges from the knowledge graph and produce the dictionary and corpus by transforming the sequence of sentences and giving to corpora object for charting out words and integer ids. The dictionary is then used to design a bag of words model called corpus. Most frequent words are extracted to
622
M. Srivani et al.
Fig. 1 Unified knowledge graph-based topic modeling system
derive semantics from the text and topic modeling algorithms such as LDA, Top2Vec, and BERTopic are applied and analyzed to generate semantic topics or concepts. The proposed system is then evaluated using topic coherence score, knowledge capacity, and incrementality.
4 Methodology 4.1 Dataset Details and Text Data Preprocessing The dataset is the mathematical text books from class 1 to class 6 published by the Directorate of School Education, Tamil Nadu. 1 These books follow the Samacheer Kalvi syllabus framed for the academic year from 2022 to 2023. Textual preprocessing is an approach to prepare, clean, and analyze the text data by making it useful for prediction tasks. Sentence segmentation deals with parting the document into sentences. Case matching matches the words using various regular expressions. Spellchecking involves text splitting and spell check process. Tokenization slices and dices the raw text. Stopwords removal process removes the unnecessary words that do not add semantics to the sentence. Lemmatization task converts the tokenized words to dictionary form. N-grams such as bigrams and trigrams depict the sequence of two or three words, and the wordcloud depicts the frequency of important keywords.
1
https://www.tntextbooks.in/p/school-books.html.
MathKnowTopic: Creation of a Unified Knowledge …
623
4.2 Entity Pairs and Relations or Predicate Extraction Entities are the objects, places, concepts, events, situations, people, and things. Dependency parsing is performed, and parts of speech are tagged. The nouns and proper nouns are extracted as entities. Entity pairs are the subject–object pairs extracted from the text. The entity pairs extraction process consist of four steps namely dependency tags and previous tokens extraction, token looping, subject– object extraction and updation. Because the entity pairs are subject–object pairs, text preprocessing techniques are necessary to modify the text and remove stop words, pronouns, special characters, digits, and other obtrusive elements. Relations are the connections between the nodes. The main verb or the root word is the relation which links the subject and object. The relation extraction process consists of the following steps such as root words extraction, predicate extraction, dependency parsing, and relation count extraction.
4.3 Construction of Directed Knowledge Graph A knowledge graph is a semantic network which represents the real-world entities. This process consist of the following steps as shown in Fig. 2.
4.4 Process of Construction of Topic Models Topic modeling is an unsupervised learning approach which detects the semantic correlations between the group of words in the document. The process of topic model construction is depicted in Fig. 3.
Fig. 2 Process of generation of directed knowledge graph
Fig. 3 Process of generation of topic model
624
M. Srivani et al.
Creation of Dictionary and Corpus Creation of dictionary involves transforming the sentences in book into word list and feeding it into corpora object. The dictionary creates bag of word representations called the book corpus. Extraction of Term Document Frequency Term document frequency renders the document term matrix which can be used as input for topic modeling. TF estimates the word frequency and IDF enhances the weight of occasionally occuring words and reduces the weight of most reoccuring words. Topic Model Analysis Three different algorithms such as LDA, Top2Vec, and BERTopic are employed for performing topic modeling. LDA is used for generating structured topics from the large corpus of text. Top2Vec and BERTopic automatically generate the number of topics based on the content. Top2Vec performs unsupervised clustering and generates the embedded topics, documents, and word vectors. BERTopic modeling technique makes use of transformers, term frequencies, and inverse document frequencies to generate dense topics. Latent Dirichlet Allocation (LDA) LDA [9] is a probabilistic generative model, and the working process is as follows: For every document in corpus, (i) estimate the number of words, (ii) select the document’s topic mixture from a set of pre-determined topics, (iii) select the topic based on the multinomial distribution of the document, and (iv) choose a topical word based on the multinomial distribution of the topic. Some limitations of LDA include (i) no correlation between the topics. (ii) Number of topics to be generated must be fixed beforehand. (iii) Static in nature without progression of topics. (iv) Text preprocessing is mandatory to generate meaningful topics. These limitations can be addressed by using Top2Vec and BERTopic modeling techniques with the following advantages: (i) automatically reveals the number of topics to be generated, (ii) text preprocessing tasks are not required, (iii) contains many built-in functions for easy processing and can be applied on short texts, and (iv) renders more informative topics. Top2Vec Top2Vec [10], indicated as topic representations in distributed manner, is a recent topic modeling algorithm which incorporates the word embedding process, dimensionality reduction techniques such as Uniform Manifold Approximation and Projection (UMAP), and density cluster analysis. The steps of Top2Vec are given as follows: (i) design a document and word embedding model using sentence encoders, (ii) reduce the dimensionality using UMAP, (iii) generate dense group of words as clusters which represents the topics, (iv) estimate the centroid of each topic or cluster as the topic vector, and (v) discover the word vectors similar to topic vector. BERTopic BERTopic [11] is a dynamic topic modeling technique which is applied to a large corpus to extract semantically rich topics that evolve over time. The working process of BERTopic modeling is to (i) generate textual document embeddings using any one of the sentence transformer models, (ii) reduce the dimensionality of embeddings using UMAP and develop semantic clusters of documents using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), (iii) clip and minimize the topics using term and inverse document frequencies, and also enhance the term coherence employing Maximal Marginal Relevance.
MathKnowTopic: Creation of a Unified Knowledge …
625
Purpose of using BERTopic Dynamic Topic Model captures the development of topics in a sequentially ordered corpus of texts. Increase in the quality of topics, speed, and good in performance. Pretrained Language Model (PLM), so there is no need of preprocessing.
4.5 Unified Knowledge Graph Topic Modeling Approach (UKGTM) The UKGTM algorithm 1 takes as input the preprocessed and filtered unstructured text and generates the semantic network-based knowledge graph and optimum number of topics. Algorithm 1 Unified Knowledge Graph Topic Modeling (UKGTM) • Input : Preprocessed and Filtered unstructured text • Output: Knowledge Graph-based optimum number of topics 1. For each sentence (Si ) in document (Di ), where Si ∈ {1, 2, 3, ......., n} 2. Calculate T p = {SU Bi , O B Ji , Ri } where T p - Triplet, SU Bi - subjects, O B Ji - objects, and Ri - relations. Extraction of Relations 3. Configure Mc with V, where Mc - Matcher and V - Vocabulary 4. Define PAT and add to Mc , where PAT - Pattern and P AT = {R O O T, pr ep, agent, AD J } where prep - preposition, ADJ - adjective. 6. Add the PAT to the Mc , call Mc to the Di and pass Si to Mc to extract the matching positions. 8. Steps 3 to 8 is repeated to generate relations for all sentences. Extraction of Entity Pairs 9. Loop through the Tk in Si , where Tk ∈ {1, 2, 3, ......., n} indicates tokens. 10. If T1 == punctuation, then pass to next T2 . 11. If T2 == compound word, then store as prefix. 12. If T3 == SU Bi , then store as E 1 , where E 1 is first entity. 13. If T4 == O B Ji , then store as E 2 , where E 2 is second entity. Construction of Directed Knowledge Graph 14. Design a dataframe with E 1 , E 2 and Ri , and produce a network. 16. Extract all nodes, edges as Tk and give as input to Topic Modeling algorithms. Construction and Analysis of Topic Models 16. Design a Dict to map Tk with Tk id, where Dict refers to Dictionary. 17. Generate a corpus which is a collection of Di . 18. Develop a bow and pass Tk to doc2bow(), where bow indicates bag of words. 19. Build the LDA model and determine the optimal number of topics using coherence score. 20. Build the Top2Vec model using the distiluse-base-multilingual-cased embedding model. 21. Build the BERTopic model using all-MiniLM-L6-v2 embedding model. 22. Determine optimal number of mathematical concepts.
Repeat the same procedure for all documents and extract the optimum topics.
626
M. Srivani et al.
4.6 Model Evaluation Topic coherence is a significant factor in determining how similar high-scoring keywords in a specific topic are all in terms of semantics. Cv coherence measure uses cosine similarity and Normalized Pointwise Mutual Information (NPMI). UMass coherence score as shown in Eq. 1 estimates how often the two words appears with each other in the corpus. CU Mass (wi , w j ) = log
D(wi , w j ) + 1 D(wi )
(1)
UCI coherence measure in Eq. 2 determines the words that occur frequently using a sliding window. P(wi , w j ) + 1 CU CI (wi , w j ) = log . (2) P(wi ) · P(w j ) Certain cognitive metrics [12] such as knowledge capacity and incrementality are proposed to evaluate the system. Knowledge capacity shown in Eq. 3 deals with the correctness or completeness of the knowledge exhibited by the knowledge graph and topic models. (3) Know_ Capacity = Ne + Nr + N T where Ne indicates the total number of entities, Nr represents the total number of relations, and N T indicates the total number of topics generated by topic model algorithms. The knowledge capacity score is normalized to the range of 0–1. Incrementality is the amount of information that is actually employed in the performance of a set of tasks. It is a count of dominant topic or term score.
5 Discussion and Results In this paper, a UKGTM system is proposed to interpret the hidden semantic knowledge from the unstructured data. The dataset comprises three terms mathematical text books from class 1 to class 6. The implementation is carried out in Python programming language using Jupyter Notebook and Google Colab platform. The raw and initial data from the mathematical text books has a vocabulary size of 261 containing 563923 words. The text preprocessing tasks are performed on the raw data to generate processed data with a vocabulary size of 166 containing 532134 words. A sample knowledge graph for class 6 maths term text books has been generated as shown in Fig. 4. The knowledge relation graph has been generated with the source and target for the relation “find” as shown in Fig. 5. Some of the important concepts are radius, square, subtraction, shapes, perimeter, scalene triangle, diameter, rectangle, and ratio. The graph is created as a spring layout with node size 1500 and k = 0.5 (distance between nodes). The number of entities and relations for class 1 is 278,
MathKnowTopic: Creation of a Unified Knowledge …
627
Fig. 4 Knowledge graph for class 6 maths
157, class 2 is 565, 356, class 3 is 2437, 1676, class 4 is 172, 104, class 5 is 826, 534, class 6 is 440, 251. For the combined knowledge graph, the number of entities and relations are 4420, 3066. The extracted nodes and edges from the knowledge graph are given as input to the topic modeling algorithms. Gensim package is used for performing the LDA topic modeling algorithm. Table 1 depicts the top 5 topics with 10 topical words. Totally, 40 topics have been rendered from LDA topic modeling. The LDA model with the number of topics which has the highest coherence score serves as the optimal topic. Figure 6 is used to determine the optimal number of topics. The graph depicts that the optimal number of topics is 20 with a coherence score value of 0.6674 and depicts the number of topics starting from 2 to 38 as given in Table 2. Optimal number of topics that is 20 is shown in bold. Top2Vec generates a semantic space with meaningful number of topics. Top 5 topics generated by Top2Vec is given in Table 3. Table 4 indicates the top 5 topics generated by BERTopic. BERTopic generates more meaningful topics compared to other models. From Table 5, it is clear that the BERTopic (Universal Sentence Encoder) generates large number of semantic topics.
628
M. Srivani et al.
Fig. 5 Relation graph for relation “find” Table 1 Top 5 topics generated using LDA Topic 0
Topic 1
Topic 3
Topic 4
Step(0.157)
Multiplication(0.126) Table(0.153)
Topic 2
Shapes(0.102)
Times(0.089)
Place(0.143)
Centimeter(0.092)
Different(0.078)
Square(0.088)
Numbers(0.084)
Value(0.141)
Addition(0.055)
Minutes(0.054)
Ratio(0.079)
Pattern(0.066)
Thousand(0.045)
Division(0.050)
Number(0.032)
Lines(0.065)
Thousands(0.041)
Digit(0.029)
Amount(0.047)
Subtraction(0.029)
Proportion(0.058)
Predecessor(0.035)
Numbers(0.027)
Geometry(0.042)
Diagonals(0.025)
Parts(0.048)
Rectangle(0.033)
Ones(0.027)
Divisible(0.038)
Fractions(0.024)
Distance (0.034)
Product(0.030)
Hundred(0.026)
Quantity(0.030)
Properties(0.023)
Diameter(0.035)
Order(0.028)
Tens(0.018)
Base(0.022)
Processing(0.022)
Equal(0.028)
Even(0.022)
Units(0.017)
Portions(0.017)
Point(0.017)
Round(0.019)
Successor(0.021)
MathKnowTopic: Creation of a Unified Knowledge … Table 2 Topic coherence scores Number of topics using LDA
629
Coherence score
2 8 14 20 26 32 38
0.6649 0.6647 0.6637 0.6674 0.6627 0.6592 0.6634
Fig. 6 Topic coherence scores using LDA Table 3 Top 5 topics generated by Top2Vec Topic 0
Topic 1
Topic 2
Topic 3
Topic 4
Numbers(0.9067)
Step(0.6365)
cm(0.8692)
ml(0.4436)
Table(0.8269)
Multiplication(0.5907) Place(0.6362)
Square(0.4355)
Fill(0.3186)
Lines(0.7746)
Ratios(0.4624)
Unit(0.6286)
Equal(0.3784)
Thousand(0.3177)
Perimeter(0.3577)
Fractions(0.4591)
Area(0.6264)
Triangle(0.3419)
Sides(0.3041)
Order(0.2565)
Prime(0.4455)
Pattern(0.5872)
Digit(0.3016)
Symmetry(0.2749)
Units(0.2563)
Subtraction(0.3781)
Length(0.5659)
Term(0.2750)
Amount(0.2625)
Ratio(0.2551)
Addition(0.3673)
Point(0.5630)
Ones(0.1985)
Times(0.2408)
Form(0.1919)
Perimeter(0.3113)
Shapes(0.5588)
Parts(0.1975)
Difference(0.2318) Value(0.1883)
Rectangle(0.2999)
Lines(0.5505)
Tree(0.1763)
Ratios(0.2289)
Correct(0.1865)
630
M. Srivani et al.
Table 4 Top 5 topics generated by BERTopic Topic 0
Topic 1
Topic 2
Topic 3
Topic 4
Ratio(0.1440)
Perimeter(0.2565)
Symmetry(0.2543)
Triangle(0.2222)
Parallel(0.7358)
Proportion(0.0736)
Area(0.1679)
Rotational(0.1006)
Angled(0.1219)
Circles(0.4789)
Simplest(0.0579)
cm(0.0379)
Mirror(0.0904)
Scalene(0.0883)
Combinations(0.1257)
Equivalent(0.0153)
Shapes(0.0304)
Reflection(0.0876)
Sides(0.0622)
Segment(0.1000)
Simplify(0.0427)
Tangrams(0.0264)
Object(0.0725)
Equilateral(0.0602)
Division(0.0820)
Mixture(0.0331)
Square(0.0264)
Symmetry(0.0456)
Cubes(0.0539)
Multiples(0.0659)
Concrete(0.0331)
Fractions(0.0264)
Translation(0.0456)
Isosceles(0.0539)
Algebra(0.0590)
Unitary(0.0304)
Distributive(0.0240)
Axis(0.0446)
Inequality(0.0539)
Radius(0.0476)
Extremes(0.0291)
Collinear(0.0225)
Line(0.0410)
Symbols(0.0468)
Subtraction(0.0336)
Table 5 Number of topics generated by algorithms Topic modeling embedding models Number of topics generated LDA Top2Vec BERTopic BERTopic (roberta) BERTopic (universal Sentence Encoder) BERTopic (glove) BERTopic (all-MiniLM-L6-v2)
Table 6 Topic coherence scores Algorithms KG+LDA KG+Top2Vec KG+BERTopic KG+BERTopic (roberta) KG+BERTopic (Universal Sentence Encoder) KG+BERTopic (glove) KG+BERTopic (all-MiniLM-L6-v2)
119 163 116 194 145 170
Topic coherence metrics c_v 0.6877 0.6221 0.6228 0.6310 0.6336 0.6216 0.6387
u_mass −19.0102 −0.8529 −0.7421 −0.7126 −0.7680 −0.7799 −0.7253
c_uci −13.6899 −6.2405 −6.2439 −6.1482 −6.1870 −6.4705 −6.1613
Table 6 depicts the topic coherence scores of different topic modeling algorithms when combined with knowledge graphs. The results indicate that the algorithms KG+LDA and KG+BERTopic(all-MiniLM-L6-v2) render meaningful mathematical topics. The proposed system UKGTM as shown in Fig. 7 and Table 7 indicates that based on the cognitive metrics such as knowledge capacity and incrementality, BERTopic and BERTopic using Universal Sentence Encoder outperform the other algorithms.
MathKnowTopic: Creation of a Unified Knowledge … Table 7 Cognitive metrics scores for different algorithms Algorithms Knowledge capacity KG+LDA KG+Top2Vec KG+BERTopic KG+BERTopic (roberta) Kg+BERTopic (universal sentence encoder) KG+BERTopic (glove) KG+BERTopic (all-MiniLM-L6-v2)
Fig. 7 Coherence and cognitive metrics
0.7526 0.7605 0.7649 0.7602 0.7680 0.7631 0.7656
631
Incrementality 0.0402 0.7809 0.9925 0.8619 0.8808 0.8784 0.7631
632
M. Srivani et al.
6 Conclusion and Future Work MathKnowTopic is developed to indicate different mathematical concepts that has to be learned by school students. The proposed framework deals with the creation of UKGTM system for analyzing the mathematical text documents. The system deals with the extraction of semantics and topical structures. The framework explains how text data is analyzed conceptually to yield semantic knowledge. The mathematical text books data is effectively handled by generating meaningful knowledge graphs and conceptual topics. Cognitive metrics such as knowledge capacity and knowledge utilization are evaluated for the proposed system. Some challenges identified in this work include the generation of some special characters after performing text preprocessing. In the future, the UKGTM system can be further improved by performing cluster analysis semantically for retrieving similar concepts. Acknowledgements The authors gratefully acknowledge DST, New Delhi, for providing financial support to carry out this research work under DST-INSPIRE Fellowship scheme. One of the authors Ms. M. Srivani, is thankful for DST for the award of DST-INSPIRE fellowship.
References 1. Ji S, Pan S, Cambria E, Marttinen P, Philip SY (2021) A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans Neural Netw Learn Syst. https://doi.org/ 10.1109/TNNLS.2021.3070843 2. Wu M, Jin C, Hu W, Chen Y (2021) Enhanced language model with hybrid knowledge graph for mathematical topic prediction. https://doi.org/10.22541/au.163491250.03226531/v1 3. Li W, Abels P, Ahmadi Z, Burkhardt S, Schiller B, Gurevych I, Kramer S (2021) Topic-guided knowledge graph construction for argument mining. In: 2021 IEEE international conference on big knowledge (ICBK). IEEE, pp 315–322. https://doi.org/10.1109/ICKG52313.2021.00049 4. Tosi MDL, dos Reis JC (2021) SciKGraph: a knowledge graph approach to structure a scientific field. J Inf 15(1):101109. https://doi.org/10.1016/j.joi.2020.101109 5. Badawy A, Fisteus JA, Mahmoud TM, Abd El-Hafeez T (2021) Topic extraction and interactive knowledge graphs for learning resources. Sustainability 14(1):226. https://doi.org/10.3390/ su14010226 6. Ennajari H, Bouguila N, Bentahar J (2021) Combining knowledge graph and word embeddings for spherical topic modeling. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/ TNNLS.2021.3112045 7. Qiao L, Yan J, Meng F, Yang Z, Zhou J (2020) A sentiment-controllable topic-to-essay generator with topic knowledge graph. arXiv preprint arXiv:2010.05511. https://doi.org/10.48550/arXiv. 2010.05511 8. Lee D, Oh B, Seo S, Lee KH (2020) News recommendation with topic-enriched knowledge graphs. In: Proceedings of the 29th ACM international conference on information and knowledge management, pp 695–704.https://doi.org/10.1145/3340531.3411932 9. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993– 1022. https://doi.org/10.1162/jmlr.2003.3.4-5.993 10. Angelov D (2020) Top2vec: distributed representations of topics. arXiv preprint arXiv:2008.09470. https://doi.org/10.48550/arXiv.2008.09470
MathKnowTopic: Creation of a Unified Knowledge …
633
11. Grootendorst M (2022) BERTopic: neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794. https://doi.org/10.48550/arXiv.2203.05794 12. Wray RE, Lebiere C (2007) Metrics for cognitive architecture evaluation. In: Proceedings of the AAAI-07 workshop on evaluating architectures for intelligence, pp 60–66
Health Data, Social Media, and Privacy Awareness: A Pilot Study Prachotan Reddy Bathi and G. Deepa
Abstract Despite several disputes and the overall perceived lack of privacy on social media platforms among the general population, they are now widely used around the globe and have become commonplace. Vast volumes of data in various formats are being posted on these platforms. Over the past decade, the widespread adoption and proliferation of online medical forums and social platforms illustrates the variety of information being shared. Usually, when dealing with sensitive medical information, to ensure that all ethical and legal criteria are met, processing and maintaining calls for a high quality of security and privacy safeguards. We were curious if medical information on such socially available platforms goes through the same quality of privacy safeguards. Our literature study identified a significant lapse in the privacy attitude of users who post medical information online. In this paper, we report on the responses from our survey about how medical information and its privacy on social media is perceived. We look at the drivers and impediments to personal health information being shared online, focusing on privacy-related issues. We then use our findings to aid in developing a workflow for a tool that provides access to a general social media user with state-of-the-art anonymizing and privacy-protecting techniques. Keywords Social media · Health data · Privacy-preserving tools · Privacy awareness
P. R. Bathi (B) · G. Deepa Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576 104, India e-mail: [email protected]; [email protected] G. Deepa e-mail: [email protected] Present Address: P. R. Bathi Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_48
635
636
P. R. Bathi and G. Deepa
1 Introduction The value and importance of data privacy have grown immensely with an increase in Internet usage over the years. Mobile applications and social media sites, which have come closer to the end users, are acquiring and keeping increasingly more personal information about the users on the pretext of delivering services. Furthermore, certain sites and features may go above and beyond the users’ understanding of data gathering and utilization, exposing them to lower privacy than anticipated. Unfortunately, a few other platforms might not put proper controls safeguarding the data they’ve gathered, leading to a data leak that might jeopardize the users’ privacy. Various health and medical organizations have accumulated a considerable amount of medical data over the past few decades. The adoption of electronic health records (EHRs) has made patients’ health information readily available and accessible from nearly anywhere around the globe. It has also transformed the structure of health records and hence health care [1]. The introduction of the Internet, inexpensive, efficient, and small computers permitted convenient and rapid access to medical data and increased adoption significantly in the early 1990s, further establishing the groundwork for web-based EHRs [2]. Currently, social media is widespread in our digital society, fast developing, and significantly affecting the healthcare environment. It has drastically changed how information relating to health care is being delivered and consumed. With the advent of data-mining applications and patternrecognizing algorithms, this volume and variety of raw medical information generated on social platforms are being converted into valuable decision-making data. According to statistics from the UK, Facebook is the fourth most popular source of health information [3]. While several laws and ethical norms govern how data in electronic health records are handled, there is little protection for health data on social media. Although health forums and social media have considerably assisted patients and caregivers by providing critical medical assistance and emotional support, we contend that participation in these tools without requisite understanding may constitute a health data privacy vulnerability to the users. Given the significance of personal understanding and awareness of privacy issues and the criticality of health data, it is essential to evaluate the awareness of users handling health information online. Also, it is observed that social media is altering the interaction between message providers and message consumers [4]. This implies that there is a need to exert some supervision over online health interaction in order to maintain credibility and reliability. In this paper, we focus on assessing the privacy awareness of social media users interacting with health data and also propose a solution to privacy and reliability issues concerning health information on social media. We first introduce existing standards, protocols, and research about handling medical information online. Further, we establish our research questions and explain the methods of the study. We then extend our discussion to the results of the survey. Following this, a detail of the proposed planning project is presented from a bird’s eye view. Finally,
Health Data, Social Media, and Privacy Awareness: A Pilot Study
637
the paper discusses how each proposed project component aims to solve the concerns listed and observed in the study.
2 Background 2.1 Social Media and Health Data The Internet ecosystem, particularly the advent of social media, has brought about many valuable interventions. Social media platforms have made it easy for patients, their families, and friends to post personal information relating to their health and that of others. Also, healthcare providers may utilize a social media site to diagnose or treat patients; Facebook has an instance of aiding in the diagnosis of common skin disorders and uncommon congenital illnesses that need extensive care [5]. It is found that roughly 70% of US healthcare organizations utilize social media for community involvement activities such as fundraising, customer support, information dissemination, patient education, and marketing new services, with Facebook and YouTube being the most extensively used [6]. Patients have greatly benefitted from using social media for health-related functions. It is reported that 74% of all Internet users in the USA use social platforms, and browsing for health-related information has grown into a popular Internet activity. Social media provides a place for patient interaction that is not limited to the hospital or local clinic. When patients read about the experiences of other patients, they might feel empowered and uplifted, which is essential for patients suffering from chronic diseases. Sites such as Facebook are serving patients for a number of purposes, including education, information gathering, networking, research, support, goal setting, and personal progress monitoring [7]. Studies suggest that the usage of social media-based health care interventions has a positive impact; however, owing to the complexity of data shared on social media, which includes a user’s relationships and many other personal details, massive volumes of user-generated data risk compromising users’ privacy [8]. According to research on the sharing of sensitive health information on Facebook, individuals freely sought and shared behavioral, psychological, and genomic information. An intriguing conclusion drawn from this research was that while pursuing sensitive health data via online postings, the majority of Facebook users involved in the study openly identified themselves by giving their names, photos, and address [9].
2.2 Existing Research A plethora of research has been conducted on privacy-related views concerning organization-initiated healthcare information, especially sharing electronic health
638
P. R. Bathi and G. Deepa
records within data systems. Existing research on safeguarding user privacy in healthcare information systems may be divided into three groups [10]. These categories and sample situations they deal with are briefed in Table 1. Consumers’ support for organization-led initiatives is driven by two factors: (1) possible advantages and (2) security and privacy concerns [11]. According to recent research, individual traits such as demographic considerations and past experience using the Internet to manage one’s health care might influence people’s views of privacy and security [12]. Anonymizing tools and techniques such as k-anonymity [13] and l-diversity [14] has also attributed to extensive research. A culmination of the various anonymization methods (see Table 2) is essential to protect user privacy robustly. Personal Health Information Sharing on Social Media Less is known about the role of privacy in health information activities on social media. Users of PatientsLikeMe.com have expressed concern about the possible consequences of revealing their sensitive health data, such as discrimination by employers, insurance firms, or family or friends [15]. This concern was heightened in the case of stigmatized conditions such as HIV or mental problems. Participating in an informed consent procedure was associated with users’ likelihood of harboring pro-sharing sentiments (e.g., posting medical information can benefit me and also assist similar patients) [16]. The COVID-19 pandemic has further slimmed the gap between healthcare information and social media. User interactions with social media have significantly increased; users requested drugs, hospital availabilities, at-home remedies, and diagnostic suggestions on social media. The panic mode Table 1 Existing research categories and example situations addressed Category
Example/situation
Defending against internal abuse of electronic health data
Confidential information is disclosed for non-medical purposes by hospital workers with access to patients’ records
Defending against unauthorized access to electronic health data
Attackers breach hospital records or listen in on network communications
Defending against re-identification attacks against published electronic health records
Malicious actors with accessibility to de-identified medical data released for research reasons deduce the identities of data owners from a collection of unguarded quasi-identifiers
Table 2 Various anonymization methods Data anonymization
Reduces the link between the user and the user’s data
User anonymization
Allows the user’s identity to be kept private
Communication anonymization
Conceals the sender–receiver relationship
Unlinkability
Anonymizes the sender–data relationship
Health Data, Social Media, and Privacy Awareness: A Pilot Study
639
that the pandemic has brought with it diverted users’ attention from privacy. This calls for a new understanding of health data-sharing attitudes among social media users. Developing useable privacy-enhancing elements will be crucial in assisting patients in remaining safe while meeting their perceived benefits. Our research aims to present a more comprehensive picture of sharing personal health information (PHI) by studying sharing behavior post the pandemic. This study also demonstrates a novel tool that may be implemented on health data-sharing platforms to assist individuals in weighing the risks and advantages of exposing their personal health information online.
3 Methods We employed QuestionPro to develop an online survey with three question segments: (1) Background and Demographics, (2) Social Media Activity and Privacy Perception (3) Health Data Privacy Perception. All participants were allowed to respond to the first two sections, but only those who responded that they’ve interacted with personal health data on social media were allowed to complete the last portion. We shared the questionnaire through social media channels to ensure that the respondents actually used social media platforms. It has been reported that incentives do not improve response quality [17]. This motivated us not to promise any monetary benefits to the respondents. Further, each respondent could take part in the survey only once. The average response time was 13 min. The survey received responses from 79 participants in total. Regular reviews of the response patterns allowed for the removal of a case if any of the following warning signs appeared: (1) responders who responded quickly, finishing the survey within 4 min, (2) hasty answers in which the responder hurried through the survey by randomly choosing on similar patterns, and (3) answers in nonsensical or entering at random in the input field for open-ended questions. Following data cleaning, 31 responses were eliminated, leaving 52 responses for further statistical analysis. The eliminated responses also included ones that were not complete. The QuestionPro platform had a wide range of statistical analysis and visualization toolsets which were used alongside Python notebooks using packages such as the “pandas.” 40.38% of the filtered respondents to our study were women, 57.69% were men, and 1.92% identified as gender non-conforming. The majority of participants were young, with 71.15% falling between the ages of 18 and 27. Ages 38 to 50 made up 9.62% of the respondents. By examining the actual behavior of social media activity, our study aims to give a more comprehensive understanding of views about sharing personal health information. We specifically pose the research questions (RQs) listed below.
640
P. R. Bathi and G. Deepa
• RQ1: To what extent are users aware of privacy settings on social media? How easy is it for them to adopt new practices regarding privacy tools? • RQ2: Are users aware of the risks associated with privacy breaches due to sharing health information online? • RQ3: What perceived downsides do privacy-protecting techniques like anonymization come with?
4 Discussion on Survey Responses RQ1: To what extent are users aware of privacy settings on social media? How easy is it for them to adopt new practices regarding privacy tools? Regarding Internet privacy and security, we assessed peoples’ technological literacy. Five actions meant to safeguard digital privacy were outlined: “Picking secure passwords to safeguard your online profiles” and “Internet usage without having your online activity monitored.” On a scale from 1 (Never successful) to 5 (Always successful), respondents indicated how successful they thought they were at each task. The average score for each of the tasks was recorded at 2.4; this suggests that though the users tried techniques that helped protect their privacy, they were usually not very successful in their attempts. Participants were also asked to report on actions that they take in real-life versus actions that they take on social media to protect their identity in order to better identify if users’ perception of privacy or risk-to-benefit ratios change (see Fig. 1). RQ2: Are users aware of the risks associated with privacy breaches due to sharing health information online? Individuals’ thoughts, views, and experiences have become significant sources of health data due to the widespread usage of social media. As previously stated, these data are being utilized in various ways to improve our knowledge of health disorders and better help persons suffering from illness. Unfortunately, with this exponential growth in access to naturalistic data comes a number of crucial ethical problems. In today’s digitally connected world, the security, secrecy, and acceptable use of personal digital data is a key concern. This is reflected in recent legal reforms, such as the European Union’s adoption of the General Data Protection Regulation (GDPR). Irrespective of laws, regulations, and checks in place, the privacy attitudes of the individual play a crucial role. The same applies to sharing personal health information. When the respondents of the survey were asked which information pertaining to them, they considered the most important, a mean rank of 4.3 was given to medical information and health records (On a scale of 1 to 5, where 1 is “Most Important”). In contrast, ID information was given a higher mean rank of 2.25. This trend explains the respondents’ general lack of awareness regarding health data privacy. This might be resulting from a usually high benefit-to-risk ratio when it comes to sharing health information online.
Health Data, Social Media, and Privacy Awareness: A Pilot Study
641
Fig. 1 Privacy-protecting activities users took up in daily life and on social media
Respondents were then shown a series of social media posts from the second wave of the pandemic. We asked the participants which attributes from those posts they think are sensitive (see Fig. 2). Following up on this question, we then asked how concerned they would be if their personal health information shared on social media were compromised. We observed a mean level of concern of 37.38 (On a scale of 0 to 50, 0 being low and 50 being high). Once the respondents were made aware of the sensitive information such social media posts contain, it tipped the balance to the risk side of the attitude. This is an excellent indicator of how people’s perception of privacy changes when they are provided with proper awareness and usable tools to protect the same. RQ3: What perceived downsides do privacy-protecting techniques such as anonymization come with? While it is highly important to comply with ethical and privacy regulations, when it comes to health data, it is equally essential to ensure data quality, integrity, and authenticity. While mining social media for health data has proven to be extremely helpful, quality challenges are unavoidable on social media. Consider a social media user, Bob. Bob knows the risks that posting personal health information on social media causes. Bob still wishes to share his experiences anonymously from his recent Thrombolysis. He removes all his identifiers (such as name, contact details, and hospital details) before posting them on a health forum. We wanted to understand how other users on the platform perceive this information. Do they want to trust
642
P. R. Bathi and G. Deepa
Fig. 2 Perceived sensitive attributes from social media posts containing health information
this information and follow the approach that Bob posted? We asked our participants if posting medical information or medical requests using pseudonymous data or providing limited information reduces the genuineness of such posts/requests. Concurrent with our argument, a mean score of 3.57 was reported (On a scale of 0 to 5, 5 being “Strongly Agree”). We then asked the participants if employing some means to authenticate such medical information before posting would help keep away fake information/requests and help genuine posts. A third of the respondents indicated that it was highly likely for them to trust such posts (see Fig. 3).
Fig. 3 Likeliness to trust authenticated medical information posts
Health Data, Social Media, and Privacy Awareness: A Pilot Study
643
5 Proposed Tool Our preliminary study indicates a significant lack of awareness and tools to protect health data privacy while posting on social media. We found an expected high benefitto-risk ratio among the participants. Participants further emphasized how tools that authenticate medical information and protect their identity make the process much safer. Data sharing and accessibility have evolved into essential components of biological and clinical research. Nonetheless, private information regarding one’s medical history, diagnosis, and prescriptions is frequently included in patient data [18]. Several rules and laws have been enacted worldwide to preserve people’s privacy. The General Data Protection Regulation (GDPR) provides mechanisms for maintaining patient privacy when exchanging medical data and controls access to sensitive data. Data anonymization utilizing readily accessible techniques is a critical tool to secure data privacy. For third parties to use patient data without risking privacy, data must be anonymized. The data is anonymized to keep the data’s properties while preserving the person’s privacy and identity. Although anonymization is a crucial step in maintaining privacy, there aren’t many tools accessible to allow non-expert users to do it. As stated previously, most research around health data anonymization and privacy deals with organizationinitiated processes. There are no identified tools that individuals can use to share their health data securely. While users can try using pseudonymous data while sharing, it comes at the cost of authenticity. We thus propose a workflow for a tool that brings privacy-preserving techniques such as encryption and anonymization closer to the users at the same time while helping them meet their integrity and authenticity goals.
5.1 Workflow Sociomizer is designed for non-domain professionals who want to maintain privacy when posting online health-related information. We design this tool to easily integrate with existing social media platforms as a plugin or to be used as a standalone web application. The workflow can primarily be clubbed into three layers (see Fig. 4). In the first layer of the workflow, the users are authenticated, and a profile with their established details is created. Users can then update their profiles’ medical information and other related information. They can then choose to get their information verified by their medical practitioner, who is required to have a practitioner account on the tool with privileges to authorize patients and their data. Users can also consent to share their anonymized data for research purposes. The tool users can now update the information they’d like to share, and a unique identifier and URL are generated for easy and verified sharing. The second layer deals with back-end processes such as encryption and anonymization. The authenticated medical information is encrypted and stored in a
644
P. R. Bathi and G. Deepa
Fig. 4 Proposed workflow of the tool
database. When the user wishes to share certain content from their profile, the other attributes are obscured to people viewing it. Users can also connect to their social media groups and thus pick which group of trusted users can view the information. Data collected from users who have consented to share for research purposes will be processed using anonymization algorithms. This results in high-quality authenticated and anonymized health data at the third layer.
5.2 The Need for Ethically Collected Health Data from Social Media Social media users may interact, work together, and have discussions on any subject with relative ease due to the nature of these platforms. As a result, there is a sizable amount of user-generated material that, if properly mined and evaluated, might aid in the improvement of quality in both the public and private healthcare sectors. Despite technological advancements in scraping and analytical tools, there are still difficulties in gathering, processing, and interpreting health data in order to turn it into solutions for public health. Offline user behavior and privacy issues regarding the usage of collected data are some of these problems [19]. Even though the majority of their online data is accessible on a public website, users can be concerned about how their online and offline data will be utilized. By ensuring that patients comprehend the purpose of the study and giving them a choice to opt-out, researchers can ease their concerns. Researchers also need to take measures to protect user anonymity. When we asked our respondents if they’d be
Health Data, Social Media, and Privacy Awareness: A Pilot Study
645
willing to share anonymized health information if it would benefit and drive health research, 70% of the users agreed. A crucial and novel part of our proposed workflow is collecting this user-generated data with their consent, anonymizing it using industry-standard models, and then publishing it. Since user data is held behind the tool’s interface, methods to prevent web scraping, such as rate limiting and Captcha, could be employed to protect attributes that the users did not wish to disclose openly.
6 Conclusion This paper focused on understanding the privacy attitude toward sharing personal health information online to aid in designing our planning project. We defined and contextualized health data privacy on social media and supported it with our findings from the literature study. In addition, we looked at existing research and regulations concerning health data privacy. We established three primary research questions and attempted to answer them with observations from the survey. People are driven by individual benefit-to-risk ratios and turn to online social platforms to share health information. We understand that privacy attitudes change when moving from general social media activity to sharing personal health information. People usually fail to take control of their data due to a lack of privacy awareness and the general un-usability of privacy-preserving tools on social media. We then presented the proposed tool’s basic workflow and expected components in a three-layered manner. While we introduced a basic workflow of the tool, it is essential to conduct crucial studies regarding usability, we emphasize that the novelty arises from its usability as a simple plug-and-play tool. It is also important to strategically choose suitable models and algorithms for the various components. We plan to study these and report on our findings as part of our further research aspirations. We hope that our research encourages researchers and developers to study health data privacy concerns on social media and develop workarounds.
References 1. Scott ER (2016) electronic health records: then, now, and in the future. Yearbook Med Inf 25(01):S48-S61 2. Salenius SA, et al (1992) An electronic medical record system with direct data-entry and research capabilities. Int J Rad Oncol Biol Phys 24(2):369–376 3. Dawson J (2010) Doctors join patients in going online for health information. New Media Age 7:596–615 4. Mangold WG, Faulds DJ (2009) Social media: the new hybrid element of the promotion mix. Bus Horizons 52(4):357–365 5. Garcia-Romero MT et al (2011) Teledermatology via a social networking web site: a pilot study between a general hospital and a rural clinic. Telemed e-Health 17(8):652–655
646
P. R. Bathi and G. Deepa
6. Ukoha C (2020) How health care organizations approach social media measurement: qualitative study. JMIR Formative Res 4(8):e18518 7. Modahl M, Tompsett L, Moorhead T (2014) Doctors, patients and social media, vol 2011. QuantiaMD, Waltham, MA 8. Beigi G (2018) Social media and user privacy. arXiv preprint arXiv:1806.09786 9. Mowafa H (2012) Mobile social networking health (MSNet-health): beyond the mHealth frontier. In: Quality of life through quality of information. IOS Press, pp 808–812 10. Li F, et al (2010) New privacy threats in healthcare informatics: when medical records join the web. In: BIOKDD workshop, BIOKDD workshop, vol 2010 11. Patel VN, et al (2012) Consumer support for health information exchange and personal health records: a regional health information organization survey. J Med Syst 36(3):1043–1052 12. Esmaeilzadeh P, Sambasivan M (2017) Patients’ support for health information exchange: a literature review and classification of key factors. BMC Med Inform Decis Mak 17(1):1–21 13. Sweeney L (2002) k-anonymity: a model for protecting privacy. Internat J Uncertain Fuzziness Knowledge-Based Syst 10(05):557–570 14. Machanavajjhala A, et al (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discovery Data (TKDD) 1(1):3-es 15. Wicks P, et al (2010) Sharing health data for better outcomes on PatientsLikeMe. J Med Internet Res12(2):e1549 16. Liao Y (2019) Sharing personal health information on social media: balancing self-presentation and privacy. In: Proceedings of the 10th international conference on social media and society 17. Sánchez-Fernández J, Muñoz-Leiva F, Montoro-Ríos FJ (2012) Improving retention rate and response quality in web-based surveys. Comput Human Behav 28(2):507–514 18. Kondylakis H, et al (2015) Digital patient: personalized and translational data management through the MyHealthAvatar EU project. In: 2015 37th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE 19. Akay A, Dragomir A, Erlandsson B-E (2015) Mining social media big data for health. IEEE Pulse
SSIDs as a Source for Point of Interest Suggestion in Smart Cities Ajay Prasad and Arjun Singh
Abstract Service Set Identifiers (SSID) in WLANs have been invariably present since the adoption of IEEE 802.11 WLANs. Mostly, SSIDs meant to be just a name that end nodes connect to as Wi-Fi. However, if we bring in some smart mechanism into the SSIDs and their semantics, more useful adoptions can be made. If an urban zone is scanned for the available WLAN SSIDs using passive monitoring, we can certainly use Artificial Intelligence techniques like Named Entity Recognition to carve out semantics from these SSIDs. Predictions can be made from an SSID whether it relates to a particular entity in an urban zone. Entities can be certain specific commercial outlets or other very relevant establishments. Getting the SSIDs and their corresponding geolocations can help Point of Interest mapping in an urban zone. The modalities of carrying out this adoption were implemented, and results were studied. The intension was to test the viability and modalities of this proposed adoption. The results show lots of promise and suggests that this adoption would certainly add value to the smart city vision. Keywords Named Entity Recognition · Smart cities · SSID · Passive WLAN monitoring · Urban establishments · Points of interest · GPS
1 Introduction Wi-Fi is now everywhere. In a WLAN based on IEEE 802.11, the connection process involves basically three steps: (1) device discovery, (2) device authentication, and (3) establishment of connection (Association). The discovery step involves beacon and probe packets. Regular beacons are transmitted from an available Access Point (AP) to notify nearby devices (STA) of an available connection. Another property A. Prasad University of Petroleum and Energy Studies, Dehra Dun, India A. Singh (B) Manipal University Jaipur, Jaipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_49
647
648
A. Prasad and A. Singh
that an AP has is that it is mostly static. The STAs might roam from an available AP to another or in a Basic Service Set (BSS). Apart from many information, a beacon frame contains the Service Set Identifier (SSID) of the available AP. The SSID is usually a text that is readable and interpretable. While the Wi-Fi becoming omnipresent and popular, people are now using meaningful names to their SSIDs. In urban zones one can see the widespread presence of Wi-Fi and thus may interpret the type of place/people/establishment a particular SSID is emanating from. Passive monitoring of such locations or zones in an urban setup can collect lots of these SSIDs and can help maintain a database which can be used to train applications over it for various purposes. For example, SSIDs can be collected and semantics can be carved out for many forensic purposes as suggested in [1]. Another semantics that can be possibly derived is the information about the type of establishment a particular SSID is suggesting. The knowledge about the type of establishment can be whether it’s a grocery store, a pet shop, or a clinic can be very useful for a commuter if enabled. The collected database of SSIDs along with locations can equip maps to be more informative. The process of collecting the data, training the application, and embedding into maps can be automated, and the applications can be used by people in their mobile devices to get points of interest (PoI) suggestions while commuting in a city zone. Named Entity Recognition (NER) is one way that can be applied to train a model and thus use it to guess the semantics of an SSID. The paper works to test the viability of this and puts out the results for further probing. The next section will bring forward the literature survey that was made in the related areas that is, SSIDs, passive monitoring, Points of Interests, Named Entity Recognition, etc. Further in methodology section, we will bring forward the methods of combining the tools and technology to build the study base. The results and discussions will follow next.
2 Literature Review The major aspects of technologies that are involved in the proposed PoIs suggestion are (a) Passive Packet Capturing, (b) Named Entity Recognition (NER), and (c) Geolocations. Another aspect that also needed to be studied is the PoI systems and the work in these areas. Several work were explored in order to find the clue as to how to club the above three aspects and produce PoIs. Several works were found that explored the use of passive monitoring and monitor mode capturing of WLAN packets. For example, in [2] authors used probe request frames from mobile phone to detect a user and hence person, in [3] the authors used volunteered participants to detect packets. The performance aspects of passive monitoring were put forward in [4] which lead to several insights and the idea of capturing packet on-the-move. This was discussed extensively in [5]. SSIDs and their semantics relating to a business
SSIDs as a Source for Point of Interest Suggestion in Smart Cities
649
entity in an urban area were studied in [6]. It used the categorization methodology of cosine similarity. Authors in [1] extended the work in [6] by applying inspections on Preferred Network List data set. In certain work, the geolocations of SSIDs were derived using wiggle-net [7]. Named Entity Recognition (NER) [8] in natural language processing (NLP) [9] involves predicting the entity through the knowledge and context of a name. Machine learning methods are being devised to incorporate NER. Using either supervised learning, semi-supervised learning, and unsupervised learning NER models can be developed and tried. Certain supervised learning techniques include Hidden Markov Models (HMM) [10], Decision Trees [11], Maximum Entropy Models (ME) [12], Support Vector Machines (SVM) [13], and Conditional Random Fields (CRF) [14]. Applications like Garmin POI Loader [15], Tom–Tom Points of Interest [16], etc., establish and suggest Points of Interest in a city which have many benefits. An architecture was given in [17] that suggested PoIs in real time. In this the geolocation of a PoI appears as a recommender based on user check-ins. Using data bases collected from cellular-dumps or collection through social networking sites, etc., can be applied. NER was applied by [18] where the cell tower dumps were used to get the geolocations and inputs of users were also considered. Mainly, it was based on supervised learning-based NER. Another application was studied in [18] where social networking based data was considered to generate context-aware PoIs.
3 Methodology A fixed stretch of a city in India was scanned passively for beacons and probe frames. Passive capturing of packets can yield two important packets containing SSIDs, namely (a) probe requests and (b) beacons. The probe requests are the packets emanating from a device, whereas the beacons emanate in regular intervals from an Access Point. These beacons were captured using a simple setup of a monitor mode enabled WLAN adapter and a packet capturing software. The of-the-shelf WLAN adapter that can be set in monitor mode was used to capture packets through Wireshark [19] tool. The captured packets were stored for further analysis and synch with the GPS information. GPS locations were captured using mobile GPS sensor and supporting software. The collected data were synched and, a basic Spacy NER model [20] was trained over a part of SSID data. The tagging for train set was adopted using manual methods. That is the SSIDs and corresponding location were searched over Google or by visiting the place itself and the corresponding tags were given. Table 1 gives an example of the tagging. The tagging was made in two parts: (a) category and (b) sub-category. The basic categories were three, (i) OTHER, (ii) PERSON, and (iii) ESTABLISHMENT. The ESTABLISHMENTS were further classified into sub-categories like, book store, pet shop, clinic, etc. The Spacy model was trained and then the remaining part of the SSIDs were inputted to get the estimated tags. The results and comparison of various NER models are not discussed in this paper and
650
A. Prasad and A. Singh
Table 1 Example of SSIDs and Tags collection
SSID
Category
Sub-category
JioFiber-HiJ1m
OTHER
OTHER
D-Link
OTHER
OTHER
AndroidAPc0ed
PERSON
PERSON
Labu
ESTABLISHMENT
MEDICAL
Nishanil1
PERSON
PERSON
Rajeev
PERSON
PERSON
JioFiber-Aid7a
ESTABLISHMENT
OTHER
JioFiber-ae3Oh
OTHER
OTHER
Airtel 9760005915
OTHER
OTHER
Saloni Book Depot
OTHER
BOOK STORE
will be presented in another work. This work is just to test the viability, and hence, the scope doesn’t cover the NER results. The estimated tags and their corresponding geolocations were then fed to QGIS [21] for maps so as to show the PoIs. The sensed locations synched with that of captured SSIDs are merged into a Satellite-only Google Earth layer using QGIS opensource tool. Figure 1 gives a view of the whole activity. To expand the data collection, the figures show person (devices in probe requests) as one of the categories. However, in this work we are only confined to the SSIDs collected through beacon frames. The following procedures were carried out for collecting the data. (a) (b) (c) (d) (e)
UniqueDevices AddScanDataDevices GetFixedAndVariableDevices UniqueAPs AddScanDataAPs
The procedures were meant to capture new devices and SSIDs with every new scan of the same location. A scan means the passive capturing of WLAN data on-the-move. The arrays of devices and APs (SSIDs) were maintained as follows: 1: D: Device address 2: R: Randomized (T/F) 3: L: Location(GPS) 4: f {D/R/L}: for fixed Devices (devices that are static) 5: n {D/R/L}: for newly found unique Devices 6: v {D/R/L}: for variable Devices (devices that are moving) 7: nf {D/R/L}: new and fixed devices 8: nv {D/R/L}: new and variable devices (continued)
SSIDs as a Source for Point of Interest Suggestion in Smart Cities
651
(continued) 9: AP: Access Points (SSIDs) 10: APL: Access Points Locations 11: nAP,nAPL: newly found APs 12: nEAP,nEAPL: newly found Establishments APs 13: nPuAP,nPuAPL: newly found Public APs 14: nPeAP,nPeAPL: newly found APs that are peo ple 15: EAP,EAPL: Establishment Records 16: PuAP,PuAPL: Public APs Records 17: PeAP,PeAPL: Person APs Records
The procedures UniqueDevices (Algorithm 1) and UniqueAPs (Algorithm 4) were meant to get the packets’ transmitter address (in probe requests) and SSIDs (in beacons) and form the records with synched locations. The collected devices and APs were added to records by using the AddScanDataDevices (Algorithm 2) and AddScanDataAPs (Algorithm 5) procedures. The AddScanDataDevices procedure segregated the devices/people in two forms fixed and variable. The fixed were those records which were present earlier in the data at the relatively same location. The variables were found newly and were never found at the locations in the scanned area before. The scanned APs were further categorized into Person, Other, Public, or Establishment categories using AI methods like NER. Algorithm 1 Get Unique Devices 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
procedure UNIQUEDEVICES (PACKETS P, LOCATIONS L) Get Packets that are probe requests into P Sync time of P with the GPS reading Get Location of probe requests in nL Get Unique Transmitter Addresses from P into nD For i = 1 to Count(nD) If (Randomized(nD[i]) == TRUE) then nR[i] = 1 Else nR[i] = 0 End For Add nD, nR, nL into Unique Devices Arrays (D, R, L) respectively End UniqueDevices
Algorithm 2 Add newly scanned data in records 1: procedure ADDSCANDATADEVICES (PACKETS) 2: Call UniqueDevices: Get Unique Devices in nD, nR, nL 3: Call GetFixedandVariableDevices: Match nD, nR, nL with the stored Devices D, R and L in nfD, nfR, nfL and nvD, nvR, nvL 4: Call UpdateFixedDevices: Add fixed devices in fD, fR, fL 5: Call UpdateVariableDevices: Add variable devices in vD, vR, vL 6: End AddScanDataDevices
652
A. Prasad and A. Singh
Algorithm 3 Segregate fixed and variable devices 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17:
procedureGETFIXEDANDVARIABLEDEVICES (PACKETS, LOCATIONS) c1 = Count(D), c2 = Count(nD); K = 1; l = 1; For i = 1 to c1 Flag = FALSE; For j = 1 to c2 If ( D[i] = = nD[j]) Then nfD[k] = D[i], nfR[k] = R[i], nfL[k] = L[i]; k = k + 1; Flag = TRUE; End For If (Flag = = FALSE) nvD[l] = D[i], nvR[l] = R[i], nvL[l] = L[i]; l = l + 1; End If End For Return nvD, nvR, nvL, nfD, nfR, nfL;
Fig. 1 Process of segregating SSIDs in categories
SSIDs as a Source for Point of Interest Suggestion in Smart Cities
653
Algorithm 4 Get Unique SSIDs 1: procedure UNIQUEAPS (PACKETS, LOCATIONS) 2: Get Packets that are Beacons into P 3: Sync time of P with the GPS reading. 4: Get Location of Beacons in nL 5: Get Unique SSIDs from P into nAP 6: Add nAP, nAPL into Unique AP Arrays (AP, APL) respectively 7: End UniqueAPs
The GPS locations were captured using the smart phone GPS sensor. GPS reading while moving was recorded per second using ShareGPS [22] app. The recording was made using NMEA [23] GPS data format transfers on USB. We performed a simple experiment to understand and visualize the importance of channel hopping methodology for maximum beacon capturing. Two scans on-the-same route were made using channel hops of 0.1s hoping in first channels 1-11(H11) and second channels 1-13(H13). The results and analysis are discussed in the next section. The data was validated using heat maps and box plots for normality. The NER models were adopted using the Spacy models [20]. The NER models in sub-category tags and basic tags were formed by using the base Spacy training engines. Algorithm 5 Add the newly found APs (SSIDs) in records 1: 2: 3: 4: 5: 6: 7: 8: 9:
procedure ADDSCANDATAAPS(PACKETS) Call UniqueAPs: Get Unique APs in nAP, nAPL Call GetEstablishments: Get Named Establishment and Location in nEAP, nEAPL Call GetPublicWiFi: Get Identified Public WiFis and Location in nPuAP, nPuAPL Call GetPersonalWiFi: Get Identified Personal WiFi and Location in nPeAP, nPeAPL Call UpdateEstablishments: Add Named Establishment WiFis in EAP, EAPL Call UpdatePublicWiFis: Add Identified Public WiFis in PuAP, PuAPL Call UpdatePersonalWiFis: Add Identified Personal WiFis in PeAP, PeAPL End AddScanDataAPs
4 Results The captured packets in both channel strategies and scans were collected, and the procedures were followed to assimilate information, although we needed only probe requests and beacons. The general quantities of these packets and other packets were quite sufficient to build up database. Figure 2 shows few packets and reveals that the beacons were more prominently present in the air than any other packets (Fig. 3). The heat map visualization of the APs discovered during the scans in each channel hoping strategies (Fig. 4) revealed that the scans were properly collected, and hence, the passive capturing data is easily verified and validated. Both the scans show almost similar densities according to the road side population. The collected devices and APs at both channel strategies is shown in Fig. 5. The figure gives a useful insight
654
A. Prasad and A. Singh
that in every new scans new entities will show up and on-the-move one may miss a previously captured SSID or fixed devices. Thus, multiple scans of a same stretch of road are needed for refined collection. It is also observed that channel hoping has a role in all this. Figure 6 reveals that maximum number of packets can be collected if the channels are majorly switched into channels 1, 7, and 11. Figures 7 and 8 put forward the quantities of Establishments, Person, and Others via manual identification and tagging. These falls similar to the lines of the heat maps in Fig. 4. Hence, the validity holds. Another validation comes by observing the box plots (Fig. 9) of the basic categories where we get the fact that the Establishments
Fig. 2 Train and Dev Set (red) and test set (Blue) data collection locations of the urban stretch Fig. 3 Quantities of various packets
SSIDs as a Source for Point of Interest Suggestion in Smart Cities
Fig. 4 Heat maps of collected packets densities in both the scans
Fig. 5 Access points and devices in both channel strategies
655
656
A. Prasad and A. Singh
Fig. 6 Packets collected in each channel
and People are not normally distributed and hence the roadside densities have a play in this. The trained NER model showed 64 out 177 tags correct matches on the test set. Though this might look very insignificant, its actually promising as the model was built without proper pre-processing on the SSIDs. Even with such low accuracy we can be able to show Point of Interests to a good extent see Fig. 10.
Fig. 7 Manually categorized entities channel hoping 1–11
SSIDs as a Source for Point of Interest Suggestion in Smart Cities
Fig. 8 Manually categorized entities channel hoping 1–13
Fig. 9 Box plots of the basic categories
657
658
A. Prasad and A. Singh
Fig. 10 PoIs with the test data results
5 Conclusion The overall experiment showed very promising results for the SSID-based PoI suggestion mechanism. The beacons collected while scanning passively in an urban zone gave enough packets for processing the training of NER model. It is however going to be needing more efforts to refine the model so that the accuracy increased to good percentage. Multilevel pre-processing of SSIDs and building a huge corpus could help in refining the NER model further. City wise customizations will be required to build the models. Channel hoping strategies will help hugely in data collection. More accurate and robust instruments and sensors will lead to better results. Overall, the SSIDs can be a good source in smart cities for PoI suggestions and apps can be built for the same. However, we would need a better trained NER models to realize this practically on ground.
SSIDs as a Source for Point of Interest Suggestion in Smart Cities
659
References 1. Chernyshev M, Valli C, Hannay P (2016) On 802.11 access point locatability and named entity recognition in service set identifiers. IEEE Trans Inform Forens Sec 11(3):584–593 2. Oliveira L, Schneider D, De Souza J, Shen W (2019) Mobile device detection through wifi probe request analysis. IEEE Access 7:98579–98588. https://doi.org/10.1109/ACCESS.2019. 2925406 3. Chon Y, Kim S, Lee S, Kim D, Kim Y, Cha H (2014) Sensing wifi packets in the air: practicality and implications in urban mobility monitoring. In: Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing, ser. UbiComp’14. Association for Computing Machinery, New York, NY, pp 189–200. https://doi.org/10.1145/2632048.2636066 4. Li Y, Barthelemy J, Sun S, Perez P, Moran B (2020) A case study of wifi sniffing performance evaluation. IEEE Access 8:129224–129235. https://doi.org/10.1109/ACCESS.2020.3008533 5. Prasad A, Verma SS, Dahiya P, Kumar A (2021) A case study on the monitor mode passive capturing of wlan packets in an on-the-move setup. IEEE Access 9:152408–152420 6. Seneviratne S, Jiang F, Cunche M, Seneviratne A (2015) Ssids in the wild: extracting semantic information from wifi ssids. In: 2015 IEEE 40th conference on local computer networks (LCN), pp 494–497 7. Wigle. Wireless geographic logging engine: all the networks. Found by everyone. https://www. Wigle.net 8. Mohit B (2014) Named entity recognition. Springer, Berlin Heidelberg, pp 221–245. https:// doi.org/10.1007/978-3-642-45358-87 9. Nadkarni PM, Ohno-Machado L, Chapman WW (2011) Natural language processing: an introduction. J Am Med Inform Assoc 18(5):544–551. https://doi.org/10.1136/amiajnl-2011000464 10. Morwal S, Jahan N, Chopra D (2012) Named entity recognition using hidden Markov model (hmm). Int J Natural Lang Comput 1(4):15–23 11. Szarvas G, Farkas R, Kocsor A (2006) A multi-lingual named entity recognition system using boosting and c4.5 decision tree learning algorithms. In: Todorovski L, Lavrac N, Jantke KP (eds) Discovery science. Springer, Berlin Heidelberg, pp 267–278 12. Bender O, Och FJ, Ney H (2003) Maximum entropy models for named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003—vol 4, ser. CONLL ’03. Association for Computational Linguistics, USA, pp 148–151 13. Wu Y-C, Fan T-K, Lee Y-S, Yen S-J (2006) Extracting named entities using support vector machines. In: Bremer EG, Hakenberg J, Han E-HS, Berrar D, Dubitzky W (eds) Knowledge discovery in life science literature. Springer, Berlin Heidelberg, pp 91–103. https://doi.org/10. 4236/iim.2014.62004 14. Patil N, Patil A, Pawar B (2020) Named entity recognition using conditional random fields. Proc Comput Sci 167:1181–1188 (international conference on computational intelligence and data science). https://www.sciencedirect.com/science/article/ 15. GARMIN. Garmin support center. https://support.garmin.com/en-IN/ 16. Babre J How to use poi data to power location-based apps. https://www.tomtom.com/blog/map s/poi- data/ 17. Bao J, Zheng Y, Mokbel MF (2012) Location based and preference-aware recommendation using sparse geo-social networking data. In: Proceedings of the 20th international conference on advances in geographic information systems, ser. SIGSPATIAL’12. Association for Computing Machinery, New York, NY, pp 199–208. https://doi.org/10.1145/2424321.2424348 18. Wang R, Chow C, Lyu Y (2016) Exploring cell tower data dumps for supervised learning-based point-of-interest prediction (industrial paper). GeoInformatica 20:327–349 19. Zhang G, Qi L, Zhang X, Xu X, Dou W (2019) Context-aware point-of-interest recommendation algorithm with interpretability. In: Wang X, Gao H, Iqbal M, Min G (eds) Collaborative computing: networking, applications and worksharing. Springer International Publishing, Cham, pp 745–759
660
A. Prasad and A. Singh
20. Spacy. Trained models & pipelines. https://spacy.io/models 21. QGIS. Qgis a free and open source geographic information system. https://www.qgis.org/en/ site/ 22. S. App. Sharegpshelp. http://www.jillybunch.com/sharegps/mobile/index.html 23. NMEA. National marine electronics association. https://www.nmea.org/content/STANDA RDS/
The Psychological Influence of Online Stigmatization in Social Life due to Monkeypox Avik Chatterjee, Soumya Mukherjee, Anirbit Sengupta, and Abhijit Das
Abstract COVID-19 has taught which has several lessons and impacted us with Fear, Stress, Depression, and Trauma. The result of that was unbalanced equilibrium of life. We have found many scientific literatures supporting the above fact. In most of the literature, authors gone for finding the fact about the effect of COVID-19 by using Tweeter data and found that COVID-19 has profound effect on the mental well-being of every group of people. In these studies, the conspicuous approach that is taken by the researchers is Sentiment Analysis of various keywords by mining the tweets on the related topic. In our paper, we have followed the same approach to find out the effect of Monkeypox which is declared as Global Health Emergency by World Health Organization. The objective of this study is to find out that how this virus is affecting the equilibrium of life. We have gathered the data from Tweeter as we found that it is the most common choice of researchers probably because of its microblogging nature. We have explored the tweets by using Tweepy, NLTK and a deep learning-based technique is followed with an existing depression score method. We have also used RapidMiner Studio and Hydrator App to collect the tweeter data further for the purpose of revalidation. In this study, SPSS has been used for the statistical analysis of our data and focused specially on LGBTQ. We found that it has a profound effect on the equilibrium of life. Keywords Monkeypox · Twitter · Disequilibrium of life A. Chatterjee (B) Department of MCA, Techno College (Hooghly Campus), Chinsurah, West Bengal, India e-mail: [email protected] S. Mukherjee Department of Management Studies, Techno India (Hooghly Campus), Chinsurah, West Bengal, India A. Sengupta Department of ECE, Dr. Sudhir Chandra Sur Institute of Technology & Sports Complex, 540 Dum Dum Road, Calcutta, West Bengal, India A. Das Department of IT, RCCIIT, Calcutta, West Bengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_50
661
662
A. Chatterjee et al.
1 Introduction The name of the disease Monkeypox has come from the name of the virus behind this disease. This virus can spread from animals to humans and also from humans to humans. The nature of the disease is bringing back the Covid situation in people’s mind as people have suffered a lot in Covid which was also a virus attack and, in many places, people are still suffering from Corona Virus. From earlier studies, we found that COVID-19 has affected people physically, economically, and also mentally. Lots of people went to depression due to Corona Virus and many of them compelled suicide [1]. The addition of Monkeypox is making the situation more complex and making the life more difficult for people. In this study, we try to find out the effect of Monkeypox on the human mind. We have focused on teenagers and adolescents in our study as we have found that they have suffered from significant mental disorder in Covid pandemic also [2]. We have also focused on LGBTQ as we have found that they experience exclusion and discrimination in various ways. We have seen that there are several online initiatives against bullying also. According to World Health Organization (WHO) [3], 5% of adults are affected by depression worldwide. Major depressive disorder commonly known as depression is a one of the most ignored causes of decease and suicide. Depression is different from normal mood fluctuations and other temporary emotional responses to hassles of daily life. The end result of depression is suicide, and more than 700,000 people loss their life due to suicide every year. It is the fourth primary reason of death in 15–29-year-old people and second primary cause of death among teenagers as per statistics given by WHO. The severity increases more as in most of the cases people do not consider this disease seriously or do not want to take it seriously to avoid the social stigma which is accompanying with it. For our work, we have selected social media platform Twitter as it is vast source of human feelings and emotion [3]. In tweeter people can express their opinions, views, thoughts in a very few words as it is a microblogging site and there are lots of studies already have done by using this microblogging site using sentiment analysis tools to assess the depression intensity of human being for various reasons [4–8]. In our study, we have used Tweeter as our source of data to assess the impact of Monkeypox on human cognizance.
2 Literature Review Depression detection and intensity analysis study is very popular among researchers from last few years. The pattern of research has changed from traditional method of data collection to modern method where researchers are depending on social media as a primary source of data. Studies and researches related to depression issues started much earlier than the era of social media [9, 10]. But in past few years, the massive popularity of social media usage has make this data harvesting easier and more precise [11, 12]. In the paper by Beck et al. [10], we have found that 21 self-reporting
The Psychological Influence of Online Stigmatization in Social Life …
663
questions have been asked to assess the effect of depression in normal and psychiatric populations. This is known as Beck Depression Inventory (BDI). But self-reporting can be biased some times and that is why later on researchers moved to comparatively unbiased data source in the form of social media. We have found various researches using social media to judge the human psychology and feelings on several issues. In the Covid pandemic period, we found several research works on depression analysis through social media due to COVID-19. In [1] by Pirkis et al. have tried to find out the suicidal trends due to COVID-19. In [2] by Ghosh et al. addresses the earlystage depression detection and intensity analysis technique by exploring the social media with their deep learning method. In [13] by Chahal et al., we found that the authors have focused on the necessity of association among pediatricians, teachers, parents, sports coaches, and NGO workers through single online medium to discuss about the development of a child. According to WHO website [3], depression is a leading cause of suicide specially in women. Tweeter data analysis is a very popular tool for analysis of emotion and sentiment through which depression intensity is measured. There are several literatures where tweeter data are taken and analyzed to bring out the public opinions and emotions on various topics. Sailunaz et al. [4] have described how to collect and analyze sentiment and emotion stated by users from text in their twitter posts and use them for initiating recommendations. They considered tweets and replies on some set of areas or issues and created a dataset with various attributes like text, user, emotion, sentiment information, etc. Recently, various works have been done to scan the emotion and sentiment in search to rate the level and cause of depression in human being due to COVID-19 in many research papers. In [5], Jain et al. have tried to build a method that differentiates mental conditions. They have used machine learning and deep learning techniques for their work. The target is to develop a model which establishes a connection between users and counselors to find out the actual mental issues of a person and the solution of it. Patnaik et al. [6] have shown that age is a significant factor in case of depression as they found young people are much more affected by depression than the older people and among most of them are males. They have done a demographic study in this paper collected data by preparing questionnaires. Zhou et al. [7] in their paper shown a different method based on multimodal features using tweets to analyze the mental state of the people in Australia. Researchers have used term frequency-inverse document frequency (TF-IDF) algorithm to develop their depression classification model. This model is enabled to assess the polarity of depression that can be affected by COVID-19 and correlated events. They have found that COVID-19 and lockdown have a very substantial negative effect on people’s mind. Anishfathima et al. [8] have proposed a predictive model for measuring the level of stress in respect to COVID-19 by using several Machine Learning methods like K-Nearest Neighbor, Support Vector Machine, Naive Bayes, and Artificial Neural Networks. They have found Support Vector Machine as most optimum among all algorithms that they have considered in this paper. In the paper written by Chatterjee et al. [11], we have found that social, mental and economic instability can rise a high dissatisfaction and mental unwellness in the human mind. For this work, Delphi technique has been taken as the data are collected from the tweets and later that has been verified from
664
A. Chatterjee et al.
440 enthusiastic respondents through a set of questionnaires. The paper written by Zaldumbide et al. [9] has shown that social media like Tweeter has the potential to be a key to aware people against the spread of health diseases like pandemics. In this work, researchers have compared the tweeter data with other data sources like Google Trends and Australian Institute of Health and Welfare. They found a strong correlation among all sources of data. In other papers [10, 12, 14, 15] by different researchers also supports and indicates the above facts. So, from the previous research works on this subject, it is well established that Tweeter can be used as a very robust source of data when we measuring the sentiment of human being and the measurement can be done by using several machine learning methods in a very effective manner. In our paper, we have taken all these factors into consideration and we have tried to find out the impact and correlation among Online Stigmatization, Stress, and Disequilibrium of life due to Monkeypox by using Tweeter data.
3 Our Approach We have considered Online Stigmatization, Stress, and Disequilibrium of life as three factors in our work to find out the impacts of one another. Our main aim of this work is to find out the effect of Monkeypox on mental frame of mind by using data collected from Twitter. We have also considered the dataset given by Thakur [16] for our Twitter analysis which was done between the periods of 05/2022 and 07/2022. But to get more recent outcomes, we choose another period which is 08/2022. We have taken three hypotheses as H1, H2, and H3 which are described in Sect. 3.1.
3.1 Conceptual Model Online stigmatization has a definite influence in our life since we are more or less engrossed in the virtual world. The dominance of social media cannot be ignored at any point of time. As a consequence, when an epidemic like Monkeypox breaks out, it creates hype in the social media. The exaggeration creates panic, anxiety, and definite depression in our life. This outcome has destroyed the balance of our life. The disequilibrium that evokes in our life needs to be reinstated and sorted out to resolve the harmony in our life. Considering it as reality, following premises have been adopted for our research studies (Fig. 1). H1 Online Stigmatization helps to induce Disequilibrium in Life. H2 Online Stigmatization helps to develop Stress among the individuals. H3 Stress evokes Disequilibrium in Life of the individual.
The Psychological Influence of Online Stigmatization in Social Life …
665
Fig. 1 Proposed conceptual model
3.2 Research Methodology The present study was empirical and exploratory in nature. The study was conducted to identify how far Online Stigmatization plays a definite part to evoke Stress and Disequilibrium in Life in this situation of Monkeypox. In this hostile and complicated atmosphere, it is desirable to maintain equilibrium in life. We are entangled by the influences of social media in our life. Despite being a boon, it often creates unwanted hazards in our life. It is enough to create a mess in our life. Considering the evil consequences, we need to measure the impact of Online Stigmatization in our life and how it creates pandemonium to the equilibrium to our life. This study is significant and relevant in this current perspective as our life starts revolving around social media. Our every decision has dominated by the input of social media. It has even got the due momentum from the days of COVID-19. To shed light in the research, an online survey was conducted by employing non-probability purposive sampling method. A structured questionnaire was framed aptly considering the objectives of the study to collect data. While designing the questionnaire, a seven-point Likert scale was employed specifically. We made a humble effort to reach 312 respondents who were willing enough to share their opinion and views on this subject. In Fig. 4a–d, we have shown few samples’ output of sentiment analysis executed on the data collected from Twitter. We have collected twitter data in respect to this Monkeypox situation by using Textblob, Teepy, etc., and the collected data are represented by using Matplotlib library. We further explored the twitter data by using RapidMiner studio 9.10.011 and used hydrator which actually takes the tweet IDs and returns the data from Twitter as JSON against the id. To analyze the data, we depended completely on the Smart PLS Software 3.3.2 version for structural equation modeling. It brings better dynamism and flexibility in the study since it has the privilege of the multivariate analytical
666
A. Chatterjee et al.
technique [17–24]. For developing a prerequisite idea regarding the minimum sample size, we administered G* Power at 5% level of significance. It categorically points out that 132 samples were enough to do justice to the analysis.
3.3 Measurement Scale To extract the desirable information of the latent variables for our study, a structured questionnaire was developed systematically, emphasizing two categories of questions—general and specific. The demographic profile highlights the gender, age, family income, and occupation of the respondents. Eighteen specific questions were framed to collect the opinions and views of the respondents in respect to our latent variables for our study: (1) Discrimination, (2) Fear, (3) Anxiety, (4) Online Stigmatization, (5) Stress, and (6) Disequilibrium of life. To be more concise, specific questions were designed employing the measurement scales that had been duly suggested by the eminent researchers. Those scales had been categorically employed to develop the constructs of the study. To make the research appear more relevant, precise, and scientific, minor amendments were made to fit in the research content perfectly and certain indicators were transformed accordingly to make it more acceptable to the readers. Higher stress was measured by developing the scale tailored from Lait and Wallace [25]. Finally, the Disequilibrium of Life was computed by taking assistance from a scale of Fisher et al. [26]. These established scales were employed accordingly to make the present study evolved as more realistic and relevant one.
4 Results 4.1 Descriptive Analysis See Table 1. Table 1 Sample demographics (N = 312) Age
Sex
Profession
Sexual orientation
Location
32
Male
Service
Heterosexual
India
43
Female
Homemaker
Homosexual
USA
19
Female
Student
Heterosexual
UK
54
Male
Business
Pansexual
Australia
61
Female
Homemaker
Asexual
India
58
Female
Service
Heterosexual
India
21
Male
Student
Heterosexual
India
The Psychological Influence of Online Stigmatization in Social Life …
667
4.2 Measurement Model Assessment The study primarily focuses on outer model to establish internal reliability and convergent validity to carry on with the research work. The confirmatory study has administered in partial least square structure equation modeling [27]. In reference to Yildirim and Correia [28], Composite Analysis (CCA) was performed to measure Online Stigmatization by using reflective formative measurement. The score of latent variables of all the respective three dimensions of Online Stigmatization was measured initially in two-stage reflective formative assessments. In the later part, the second-order construct of the formative model was considered based on the outputs of the first-order construct. The second-order composite thus gives the due momentum to recognize the impact of Online Stigmatization on Stress and Disequilibrium of life. Initially, the internal reliability was measured emphasizing on Cronbach’s Alpha, Dijkstra and Henseler’s rho and Composite Reliability. The values of Cronbach’s Alpha were well above the threshold value of 0.70 which according to social science is quite convincing [19]. Again, the values of rho also reflected positive outcome since it is well above the threshold value of 0.70 [18]. The score of Average Variance explained must be more than 0.50 [29]. Every construct has established its own reliability since all are well above 0.50. Again, the Composite Reliability is acknowledged as good to satisfactory if the value ranges between 0.7 and 0.9 [30]. Here, in our study, the Composite Reliability is admirable since it remained 0.8–0.9 in every instant. So, we may rightly say that the investigations of internal reliability and convergent validity are established and duly reflected in Table 2. Table 3 on the other hand has figured out the uniqueness of each construct which is good enough to progress with the study. Besides employing traditional method of investigating discriminant validity, an innovative criterion of Heterotrait–Monotrait ratio of correlations (HTMT) has been incorporated for this study. As per HTMT inference method, all HTMT values should be less than 1. Pointed out that the permissible value is 0.85, whereas Gold et al. [31] specifically inclined to any value within 0.9. In this study, the scores of all Table 2 Quality criterion for reflective model assessments and composite model Construct
Cronbach’s alpha
rho_A
Composite reliability
Average variance extracted (AVE)
Discrimination
0.776
0.844
0.864
0.679
Disequilibrium of life
0.803
0.849
0.862
0.56
Fear
0.831
0.841
0.898
0.747
Online stigmatization
1
1
1
1
Prejudices
0.831
0.842
0.899
0.748
Stress
0.771
0.783
0.852
0.591
668
A. Chatterjee et al.
Table 3 Discriminant validity assessments Construct
Discrimination Disequilibrium Fear of life
Discrimination
0.824
Online Prejudices Stress stigmatization
Disequilibrium 0.703 of life
0.769
Fear
0.647
0.648
0.864
Online stigmatization
0.437
0.46
0.529 1
Prejudices
0.449
0.467
0.613 0.69
0.865
Stress
0.536
0.548
0.594 0.376
0.352
0.769
Table 4 HTMT ratio of correlations for discriminant validity assessments Construct
Discrimination
Disequilibrium of life
Fear
Online stigmatization
Prejudices
Discrimination Disequilibrium of life
0.895
Fear
0.78
0.793
Online stigmatization
0.465
0.511
0.577
Prejudices
0.532
0.583
0.737
0.755
Stress
0.649
0.429
0.733
0.786
0.817
the constructs are within the permissible limit. It is thus enough to be sure of the uniqueness of each construct and it is evidently established by Table 4.
4.3 Structural Model Assessment In the Structural Model Assessment, it is of utmost important to compute the relationship between the constructs and predictive relevance [19]. Bootstrapping process was concisely employed with recommended 5000 bootstraps to get the score of p values which gives the base to assess the hypothesis assumed for the study [18]. Initially, each set of the predictor constructs of the structural inner model is of dire need to be computed as formative measurement models [32]. The collinearity issues are to be checked primarily, and for this purpose, it is wise to consider the value of tolerance and inflation factor (VIF). Diamantopoulos et al. [30] suggested that the VIF value should remain below 3.33, and here in this study, the value of the constructs like Stress and Disequilibrium of Life was well below the threshold value. Their respective values were 1.568. So we may rightly say that no collinearity issue is involved in
The Psychological Influence of Online Stigmatization in Social Life …
669
it. After that it is a prerequisite demand to understand the importance and significance of Path Coefficients. Ideally, the coefficients should lie between − 1 and + 1 and it is only counted when the bootstrapping process is employed with 5000 subsamples in PLS algorithm. In our study, Online Stigmatization was considered as second-order composite and all the three reflective constructs whose latent variable scores were taken were formally identified as formative assessment. The outer weights of all constructs were significant at 1 percent level and most importantly different from zero. The structural model assessment is readily scripted in Fig. 2. The coefficient of determination (R2 ) for endogenous construct was also thoroughly computed for our study. The variance in each of the endogenous construct is basically assumed by R2 and the threshold value of R2 is usually determined on the basis of the context. Even the low value of R2 is worthy in PSL-SEM analysis [33]. Social Science even predicted the value 0.20 as high [34]. In our study, the value of R2 for Stress is 0.362 and that of Disequilibrium of Life is 0.699. From the perspective of Social Science, both the values of the endogenous constructs are acceptable [19]. From our study, we can concisely report that Online Stigmatization has a definite degree of impact over Stress and Disequilibrium of Life. The Goodness of Fit was computed by the standardized root mean square residual. SRMR plays a convincing role to evaluate the goodness of fit model [18]. The threshold value of SRMR is 0.08 [35, 36]. In our study, the SRMR value is 0.072 and it thus points out that structural model and hypothesis testing can well be considered. Table 5 firmly states that Online Stigmatization plays a significant role in creating Stress and evoking Disequilibrium of Life. F 2 and Q2 were also given due importance to understand the predictive importance and relevance. Cohen [37] pointed out that the proposed limit to understand the degree of impact of exogenous construct on endogenous constructs was 0.02 (small effect), 0.15 (moderate effect), and 0.35
Fig. 2 Structural model assessments with control variables
670
A. Chatterjee et al.
Table 5 Structural model assessment Hypothesis
Original sample (O)
Sample mean (M)
T statistics (|O/ STDEV|)
2.50%
97.50%
Supported/ not supported
Online stigmatization → equilibrium of life
0.442
0.442
10.201
0.356
0.526
Supported
Online stigmatization → stress
0.602
0.604
17.753
0.535
0.667
Supported
Stress → equilibrium of life
0.492
0.492
12.083
0.411
0.571
Supported
(large effect). The f 2 value for Online Stigmatization for Disequilibrium of Life is 0.413 and 0.512 of Stress on Disequilibrium of Life which categorically points that Online Stigmatization has strong effect on Equilibrium of Life; likewise, Stress too has a strong effect on Equilibrium of Life. Richter [20] explained that any Q2 value above 0.02 has definite predictive power. The Stone–Geisser’s Q2 value for Stress is 0.356 and that of Disequilibrium of Life is 0.539 which is enough to justify that impact of the independent constructs in the conceptual model of the study.
4.4 Importance Performance Map Analysis To make the study appear more convincing and acceptable, we have judiciously employed priority map analysis which is better to be say as Impact Performance Map or Importance Performance Matrix [23]. The main reason behind the application of IMPA is to figure out how much both the exogenous constructs—Online Stigmatization or Stress—have an impact over Disequilibrium of Life [38]. The computation of IMPA thus makes the study not only more rational and realistic but also acceptable (Fig. 3 and Table 6).
4.5 Results of Sentiment Analysis The Online Stigmatization has a definite role to play in our life. From the Sentimental Analysis, we can deduce the true image of the society by evoking the degree of impact of Online Stigmatization on the equilibrium of our life. The key words depict how it has a definite influence in our life. In Fig. 4a–d, we have shown few outputs of search keywords from Twitter. We have used Textblob, Tweepy to search the keywords from Twitter, and Matplotlib library has been used to plot the result of our output that we have got from several Tweets that we gone through in the specified time period.
The Psychological Influence of Online Stigmatization in Social Life …
671
Fig. 3 Adjusted importance performance matrix for disequilibrium of life
4.6 The Results of IMPA The performance of Disequilibrium of Life as computed is 62.151. In Fig. 3, if we increase one unit of Online Stigmatization from 63.546 to 64.46, the Disequilibrium of Life would increase to 62.151 with a total effect of 0.738. Similarly, the rise of one unit of Stress from 44.896 to 45.896, the disequilibrium would increase to 62.643 with a total effect of 0.492. Thus, Disequilibrium of Life depends both on Online Stigmatization and Stress which needs to be taken care of judiciously, tactfully, and aptly to lead a happy and peaceful life.
5 Conclusion and Limitation As stated already, from our study, we have found that Online Stigmatization and Stress both have substantial effect on Disequilibrium of life specially in this environment of Monkeypox. The result seems to be expected, but the impact was needed to be found so that psychological practitioners can help the people in need to avoid the adverse effect of stress and stigmatization that people face online. In this paper, we have considered on Stress and Online Stigmatization to measure the equilibrium of life, but later we can also consider the anxiety as another factor during this measurement.
672
A. Chatterjee et al.
Fig. 4 Output of few search keywords from Twitter
Table 6 Importance—performance map (construct-wise unstandardized effects)
Constructs
Importance
Performances
Online stigmatization
0.738
63.546
Stress
0.492
44.896
Mean value
0.615
54.221
We can also think about other factors as well in our future to judge the factors of equilibrium of life due to this health hazards that is with COVID-19 and currently Monkeypox.
References 1. Pirkis J et al (2022) Suicide trends in the early months of the COVID-19 pandemic: an interrupted time-series analysis of preliminary data from 21 countries. Lancet Psychiat 8(7):579–88.S 2. Ghosh and Anwar T (2021) Depression intensity estimation via social media: a deep learning approach. In: IEEE transactions on computational social systems, vol 8, no 6, pp 1465–1474. https://doi.org/10.1109/TCSS.2021.3084154
The Psychological Influence of Online Stigmatization in Social Life …
673
3. https://www.who.int/news-room/fact-sheets/detail/depression 4. Sailunaz K, Alhajj R (2019) Emotion and sentiment analysis from Twitter text. J Comput Sci 36:101003 (ISSN 1877-7503). https://doi.org/10.1016/j.jocs.2019.05.009 (https://www.scienc edirect.com/science/article/pii/S1877750318311037) 5. Jain S, Jadhav M, Kolambe N, Bhavathankar P (2022) Detection of mental disorder on social platform. In: 7th international conference on computing in engineering & technology (ICCET 2022), pp 171–175. https://doi.org/10.1049/icp.2022.0612 6. Patnaik D, Gupta A, Henriques ML (2022) An empirical analysis of anxiety and depression during COVID-19. Interdisc Res Technol Manage 2022:1–4. https://doi.org/10.1109/IRTM54 583.2022.9791544 7. Zhou J, Zogan H, Yang S, Jameel S, Xu G, Chen F (2021) Detecting community depression dynamics due to COVID-19 pandemic in Australia. IEEE Trans Comput Social Syst 8(4):982– 991. https://doi.org/10.1109/TCSS.2020.3047604 8. Anishfathima B, Sreenithi B, Trisha S, Swathi J, Sindhu Priya M (2022) The impact of mental health due to COVID-19—a mental health detector using machine learning. In: 2022 second international conference on artificial intelligence and smart energy (ICAIS), pp 248–252. https://doi.org/10.1109/ICAIS53314.2022.9743009 9. Zaldumbide J, Sinnott RO (2015) Identification and validation of real- time health events through social media. IEEE Int Conf Data Sci Data Intensive Syst 2015:9–16. https://doi.org/ 10.1109/DSDIS.2015.27 10. Beck A, Ward C, Mendelson M, Mock J, Erbaugh J (1961) An inventory for measuring depression. Arch Gen Psychiatry 4(6):561–571. https://doi.org/10.1001/archpsyc.1961.017101200 31004 11. Chatterjee A, Sengupta A, Das A (2022) Covid-combat: a sentiment analysis based solution framework to derive the social, mental and financial impact on individuals in postpandemic situation. NeuroQuantology 20(6):4610–4621. https://doi.org/10.14704/nq.2022.20. 6.NQ22033 12. Jackson-Koku G (2016) Beck depression inventory. Occup Med 66(2):174–175. https://doi. org/10.1093/occmed/kqv087 13. Chahal N (2021) Psychological effects of pandemic COVID-19 on children and adolescents worldwide—a case study. In: 2021 12th international conference on computing communication and networking technologies (ICCCNT), pp 1–6. https://doi.org/10.1109/ICCCNT51525.2021. 9579832 14. Shen G et al (2017) Depression detection via harvesting social media: a multimodal dictionary learning solution August 2017. In: Conference: twenty-sixth international joint conference on artificial intelligence. https://doi.org/10.24963/ijcai.2017/536 15. Rankin S, Weber G, Blumenfeld W, Frazer S (2010) 2010 State of higher education for lesbian, gay, bisexual and transgender people. Campus Pride 16. Thakur N (2022) Monkeypox2022Tweets: the first public Twitter dataset on the 2022 Monkeypox outbreak. 2022060172 (preprints 2022). https://doi.org/10.20944/preprints202 206.0172.v3 17. Hair JF, Sarstedt M, Ringle CM (2019) Rethinking some of the rethinking of partial least squares. Eur J Mark Forthc 53(4):566–584 18. Hair JF, Howard M, Christian N (2020) Assessing measurement model quality in PLS-SEM using confirmatory composite analysis. J Bus Res 109:101–110 19. Hair JF, Hult GTM, Ringle C, Sarstedt M (2017) A primer on partial least squares structural equation modeling (PLS-SEM), 2nd edn. SAGE Publications, Thousand Oaks, CA 20. Richter NF, CepedaCarrión G, Roldán JL (2016) European management research using partial least squares structural equation modeling (PLS-SEM): editorial. Eur Manag J 34(6):589–597 21. Nitzl C, Roldan JL, Cepeda G (2016) Mediation analysis in partial least squares path modeling: helping researchers discuss more sophisticated models. Ind Manag Data Syst 116(9):1849– 1864 22. Rigdon EE (2016) Choosing PLS path modeling as analytical method in European management research: a realist perspective. Eur Manag J 34:598–605
674
A. Chatterjee et al.
23. Ringle CM, Sarstedt M (2016) Gain more insight from your PLS-SEM results: the importanceperformance map analysis. Ind Manag Data Syst 116(9):1865–1886 24. Ringle CM, Sarstedt M, Schlittgen R (2014) Genetic algorithm segmentation in partial least squares structural equation modeling. OR Spectrum 36:251–327 25. Lait J, Jean EW (2002) Stress at work: a study of organizational-professional conflict and unmet expectations. Relations Industrielles/Indus Relat 57:463–490 26. Fisher GG, Bulger CA, Smith CS (2009) Beyond work and family: a measure of work/nonwork interference and enhancement. J Occup Health Psychol 14:441–456 27. Schuberth F, Henseler J, Dijkstra TK (2018) Confirmatory composite analysis. Front Psychol 9:1–14 28. Yildirim C, Correia AP (2015) Exploring the dimensions of nomophobia: development and validation of a self-reported questionnaire. Comput Hum Behav 49:130–137 29. Fornell CG, Larcker DF (1981) Evaluating structural equation models with unobservable variables and measurement error. J Mark Res 18(1):39–50 30. Diamantopoulos A (2008) Formative indicators: introduction to the special issue. J Bus Res 61(12):1201–1202 31. Gold AH, Malhotra A, Segars AH (2001) Knowledge management: an organizational capabilities perspective. J Manag Inf Syst 18(1):185–214 32. Cassel C, Hackl P, Westlund AH (1999) Robustness of partial least-squares method for estimating latent variable quality structures. J Appl Stat 26:435–446 33. Raithel S, Sarstedt M, Scharf S, Schwaiger M (2012) On the value relevance of customer satisfaction. Multiple drivers and multiple markets. J Acad Mark Sci 40(4):1–17 34. Rasoolimanesh SM, Jaafar M, Kock N, Ahmad AG (2017) The effects of community factors on residents’ perceptions toward world heritage site inscription and sustainable tourism development. J Sustain Tour 25(2):198–216. https://doi.org/10.1080/09669582.2016 35. Henseler J, Hubona GS, Ray PA (2016) Using PLS path modeling in new technology research: updated guidelines. Ind Manag Data Syst 116:1–19 36. Hu LT, Bentler PM (1999) Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Modeling 6(1):1–55 37. Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, Hillsdale, NJ 38. Fornell CG, Johnson MD, Anderson EW, Cha J, Bryant BE (1996) The American customer satisfaction index: nature, purpose, and findings. J Mark 60(4):7–18
Reliability Characteristics of Two Units’ Parallel System with Refreshment Facility Subject to Environment Conditions M. S. Barak, Ajay Kumar, Reena Garg, Ashish Kumar, and Monika Sani
Abstract Reliability and maintainability are the essential parameters of items and products that satisfy the customer’s requirements. This paper highlighted the cold standby system performance under different environmental conditions. It is observed that when the operative unit fails, the redundant unit starts working. To repair the failed unit, an expert server is always available. Sometimes the server is tired due to continuously working or any other reason, then providing refreshment generally enhances his efficiency. It is assumed that the repair and refreshment actions follow the exponentially negative distribution. In contrast, the failure rate of the unit and refreshment request rate, weather change rate from average to abnormal, or abnormal to normal have a general distribution. Tables and graphs are used to explore the nature of some of the required reliability characteristics, like: MTSF, business of the server, availability, refreshment, and profit of the defined model. Keywords Cold standby · Environment conditions · Refreshment · Repairable system · Reliability measures
1 Introduction A cold standby redundant system works at full capacity under certain conditions, where redundancy provides the extra unit that works parallel to the operative unit. But sometimes, the system is not working well due to abnormal weather. Many kinds A. Kumar · R. Garg Department of Mathematics, J.C. Bose University of Science and Technology, YMCA, Faridabad, India M. S. Barak Department of Mathematics, Indira Gandhi University, Meerpur, Rewari, India e-mail: [email protected] A. Kumar (B) · M. Sani Department of Mathematics and Statistics, Manipal University Jaipur, Jaipur 303007, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_51
675
676
M. S. Barak et al.
of research have been done with various repair strategies using cold standby systems of identical units and stochastic modeling, regardless of server fatigue or his feelings affecting the performance. In [1], Osaki and Asakura analyzed the repairable system of two units with the provision of preventive maintenance. In [2] Birolini threw light on the reliability theory and its applications using the regenerative point and stochastic process. Subramanian [3] examined the preventive maintenance and the repair policy of two-unit parallel systems. In [4] Naidu and Gopalan evaluated the economic values of two unit systems under inspection and repair services. Dhillon [5] evaluated the reliability characteristics of the system using warm standby redundancy under unit failure. Chander [6] suggested the reliability models under server arrival facilities, prioritizing operation and repair facilities. Wang and Zhang [7] discussed the stochastic cold standby model with maximum replacement facility and priority in use. Sachin and Anand [8] evaluated some reliability measures under the different climates of a three-state repairable system. Malik and Pawar [9] evaluated the economical aspect of a model under online repair only in normal environment. Pawar et al. [10] described the assessment of the damaged unit with repair subjected to environmental conditions. Deswal and Malik [11] examined the distinct unit system reliability parameters under the priority of operative unit and subjected to weather conditions. Joshi and Sharma [12] discussed the nature of two same units of warm redundant systems in varying production demands. Kumar et al. [13] analyzed the stochastic system by having an expert repairman subjected to maximum repair time. The profit and availability of a cold standby stochastic system using general distribution have been discussed by Kumar and Goel [14]. Malik and Rathee [15] threw light on the reliability of the stochastic redundant systems working under repair times and maximum operation time. Neeraj and Barak [15] explored a system’s economic and reliability parameters, prioritizing preventive maintenance over repair under different environmental conditions. A cold standby system that works in different environments subjected to inspection before the repair has been evaluated by Barak and Kumari [16]. Barak et al. [17] examined a cold standby system under server failure and subjected to inspection before repair. The profit of a two-warm unit redundant system has a server performing in distinct climate conditions under a first come, first serve repair policy which has been described by Kumar et al. [18, 19]. Uniyal et al. [20] evaluated some nature-inspired optimization techniques to analyze growth in almost every field of mathematics and reliability. Khanduja and Panchal [21] discussed the system’s reliability, availability, and maintainability to improve productivity in the industries. Kumar et al. [22] analyzed the product’s reliability using Weibull distribution in the industries with critical inspection to improve the system availability. After observing the above study in mind, in this manuscript, it is observed that when the operative unit fails, the redundant unit starts working. To repair the failed unit, an expert server is always available. Sometimes the server is tired due to continuously working or any other reason, then providing refreshment generally enhances his efficiency. Tables and graphs are used to explore the nature of some of the required reliability characteristics, like: MTSF, business of the server, number of visits made by the server, availability, and profit of the defined system.
Reliability Characteristics of Two Units’ Parallel System …
677
2 System Assumptions The following assumptions has been made in model development:
• It is assumed that the system has two units in good condition—one operative and another cold standby. • When the operative unit fails, then the cold standby unit starts working. • An expert repairman is always available to the system. • Refreshment is provided to the repairman to override the tiredness.
3 Notations In the model development, various notations have been utilized suggested by Kumar et al. [23].
4 Transition Probabilities For the nonzero elements, there are smooth probability considerations under the expressions. ∞ pi, j =
qi, j (t)dt.
(1)
0
In particular, cases, using “ f 1 (t) = θ e−θt and g1 (t) = ϕe−ϕt ” calculating the transition probabilities of the system. The transition probability at state S2 is shown below: λϕ , (λ + θ )(ϕ + μ) β . = (ϕ + μ + β)
p2,1;(7,6) = p3,1;(9,10)
p2,1;7(6, 5)n =
λμ , (λ + θ )(ϕ + μ) (2)
It has been conclusively established that sum of transition probabilities at each state is one (Fig. 1).
678
M. S. Barak et al.
Fig. 1 State transition diagram
5 Mean Sojourn Time In the cold standby redundant system, “T ” is the time during which the system fails bythe system during processing at the at the state Si and μi is the time consumed ∞ m = ith state and formulated as μi = i, j j 0 P(T > t)dt. The system has an unconditional mean time to go for anyone Si ∈ R, while it is calculated from the point of entry into the state S j s. Then, μ0 = m 0,1 + m 0,3 =
1 . λ
(3)
6 Mean Time to System Failure “The cumulative distribution function of the initial time is φi (t) at state Si ∈ R to the failed state. Treating the failed states as an absorbing state, then the repetitive interface for φi (t) is” φ0 (t) =Q 0,1 (t) ⊗ φ1 (t) + Q 0,3 (t) φ1 (t) =Q 1,8 (t) ⊗ φ8 (t) + Q 1,2 (t) ⊗ φ2 (t) + Q 1,0 (t) ⊗ φ0 (t) + Q 1,4 (t) φ2 (t) =Q 2,7 (t) + Q 2,1 (t) ⊗ φ1 (t); φ8 (t) = Q 8,9 (t) + Q 8,1 (t) ⊗ φ1 (t).
(4)
Reliability Characteristics of Two Units’ Parallel System …
679
Taking LST of the relation (4), and solving for ϕ0∗∗ (s), then MTSF of the system can be accessed by using Laplace inverse transformation of equations which are obtained as MTSF =
[1 − p1,2 p2,1 − p1,8 p8,1 ]μ0 + p0,1 μ1 + p0,1 ( p1,2 μ2 + p1,8 μ8 ) . 1 − p0,1 p1,0 − p1,2 p2,1 − p1,8 p8,1
(5)
7 System Availability Availability implies “the probability that the system is available for use at a specified time. “Suppose Ai (t) indicated the possibility that the system is available for use at a time (t) and turned Si ∈ R at a time t = 0. The mathematical expression Ai (t) for the system model is as follows: A0 (t) =q0, 3 (t) ⊕ A3 (t) + q0, 1 (t) ⊕ A1 (t) + M0 (t) A1 (t) =q1, 2 (t) ⊕ A2 (t) + q1, 0 (t) ⊕ A0 (t) + q1, 8 (t) ⊕ A8 (t) + [q1, 1; 4(5, 6)n (t) + q1, 1; 4(9, 10) (t) + q1, 1; 4 (t)] ⊕ A1 (t) + M1 (t) A2 (t) =[q2,1 (t) + q2,1;(7,6) (t) + q2,1;7(6, 5)n (t)] ⊕ A1 (t) + M2 (t) A3 (t) =[q3,1 (t) + q3,1;(5,6)n (t) + q3,1;(9,10) (t)] ⊕ A1 (t) A8 (t) =[q8,1 (t) + q8,1;(9,10) (t)] ⊕ A1 (t) + M8∗ (s).
(6)
Using LT of relation (6), and calculating A∗0 (s), then A0 (∞) = lim s A∗0 (s) = s s→0
NA NA , = lim s→0 D1 D1
(7)
where N A = [μ0 p1,0 + μ1 + μ2 p1,2 + μ8 p1,8 ] and D1 = [μ0 p1, 0 + μ1 + p1, 2 μ2 + p1, 0 μ3 p0, 3 + p1, 8 μ8 ].
8 Busy Period of the Server Due to Repair Suppose Bi (t) represent the server business probability due to repair of the particular unit that is not working, while the system unit Si ∈ R at a time t = 0. Then, Bi (t) is B0 (t) =q0, 1 (t) ⊕ B1 (t) + q0, 3 (t) ⊕ B3 (t) B1 (t) =q1, 0 (t) ⊕ B0 (t) + q1,2 (t) ⊕ B2 (t) + q1,8 (t) ⊕ B8 (t) + q1,1;4 (t) + q1,1;4(5,6)n (t)q1,1;4(9,10) (t) ⊕ B1 (t) + W1 (t) B2 (t) = q2, 1 (t) + q2, 1; (7, 6) (t) + q2, 1; 7(6, 5)n (t) ⊕ B1 (t) + W2 (t)
680
M. S. Barak et al.
B3 (t) = q3, 1 (t) + q3, 1; (5, 6)n (t) + q3, 1; (9,10) (t) ⊕ B1 (t) + W3 (t) B8 (t) = q8, 1 (t) + q8, 1; (9,10)n (t) ⊕ B1 (t).
(8)
Using LT of relations (8), then we get ∴ lim N B = W1∗ (0) + W2∗ (0) p1, 2 + W3∗ (0) p0, 3 p1, 0 s→0
⎤⎤ θ ϕβ1 (ϕ + λ + μ + β) ⎥⎥ ⎢ (λ + θ )(β1 + λ)⎢ ⎦⎥ ⎣ +λ{β1 (ϕ + μ + β)(θ + μ) + θ ϕβ1 } ⎢ λ⎢ ⎥ +bϕ[θ ϕ(β + β1 ) + β1 {θβ + μ(ϕ + μ + θ )}] ⎦ ⎣ ⎡
(B0 ) =
⎡
(ϕ + μ + β)(β1 + λ){θβ + λ(ϕ + μ + θ )μβ1 } ⎧ ⎫⎤ . θ ϕβ1 (ϕ + λ + μ + β){ϕ(ϕ + μ + β) + λ} ⎪ ⎪ ⎨ ⎬⎥ ⎢ ⎢ (λ + θ )(β1 + λ) bλ[θ ϕ(β + β1 ) + β1 {θβ + μ(ϕ + μ + θ )}] ⎥ ⎢ ⎪ ⎪ ⎩ 2 ⎭⎥ ⎢ ⎥ +λ [β1 (ϕ + μ + θ )(θ + μ) + θ ϕβ1 ] ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ θ ϕ + λ(ϕ + μ + β)}μβ 1 ⎣ +(ϕ + μ + β)(β + λ) ⎦ 1 +[β1 ϕ + λ(ϕ + β1 )βθ ⎡
9 Estimated Number of Visits Made by the Server Assume that Vi (t) is the estimated visits to repair the failed unit by the server at (0, t], while the system unit deals with the state Si ∈ R at a time t=0. Then, Vi (t) is as follows: V0 (t) =Q 0, 1 (t) ⊗ [1 + V1 (t)] + Q 0, 3 (t) ⊗ [1 + V3 (t)] V1 (t) =Q 1, 0 (t) ⊗ V0 (t) + Q 1, 2 (t) ⊗ V2 (t) + Q 1, 8 (t) ⊗ V8 (t) + Q 1, 1; 4 (t) + Q 1, 1; 4(5, 6)n (t) + Q 1, 1; 4(9,10) (t) ⊗ V1 (t) V2 (t) = Q 2, 1 (t) + Q 2, 1; (7, 6) (t) + Q 2, 1; 7(6, 5)n (t) ⊗ V1 (t) V3 (t) = Q 3, 1 (t) + Q 3, 1; (5, 6)n (t) + Q 3, 1; (9,10) (t) ⊗ V1 (t) V8 (t) = Q 8, 1 (t) + Q 81.(9,10) (t) ⊗ V1 (t).
(9)
Using LST of the above relations (9), the expected number of visits by the server V0∗∗ (s) is given by
Reliability Characteristics of Two Units’ Parallel System …
681
λθ ϕ 2 (ϕ + μ + β)(λ + θ )(β1 + λ) ⎧ ⎫⎤ . (ϕ + λ + μ + β){ϕ(ϕ + μ + β) + λ} θ ϕβ ⎪ ⎪ 1 ⎨ ⎬ ⎢ ⎥ ⎢ (λ + θ )(β1 + λ) bλ[θ ϕ(β + β1 ) + β1 {θβ + μ(ϕ + μ + θ )}] ⎥ ⎢ ⎪ ⎪ ⎩ 2 ⎭⎥ ⎢ ⎥ +λ [β1 (ϕ + μ + θ )(θ + μ) + θ ϕβ1 ] ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ +(ϕ + μ + β)(β + λ) θ ϕ + λ(ϕ + μ + β)}μβ1 ⎦ 1 +[β1 ϕ + λ(ϕ + β1 )βθ
V0 = ⎡
10 Profit Analysis It is an essential characteristic of reliability that is profit function as defined P = T0 A0 − T1 B0R − T2 V0 ,
(10)
where T0 = 15,000 (cost per unit up-time). T1 = 1500 (charge per unit made by the server for repair). T2 = 500 (charge per visit for repairing).
11 Discussion To elaborate the nature of the system’s reliability corresponding to refreshment rate under different weather conditions depicted by graphs and tables. Table 1 represents the tendency of mean time to a system failure that increases smoothly corresponding to refreshment rate (θ) in the range [0.55, 1]; however, other parameters such as failure rate of the unit (λ = 0.5), server refreshment request rate (μ = 0.4), unit repair rate (φ = 0.4), change of weather from normal to abnormal rate (β = 0.4), strange to the normal rate (β1 = 0.6), and cold stand by the unit which is working with probability (a = 0.8) and is not working with probability (b = 0.2) are fixed to analysis the behavior of MTSF. The availability of the redundant system is affected by refreshment rate (θ), repair rate (φ), unit failure rate (λ), server refreshment request rate (μ), normal weatherto-abnormal weather rate (β), and abnormal-to-the normal rate (β1 ). Sometimes, it is observed that the system operative unit cannot perform satisfactorily, and then a server repairs the failed unit. The server improves the failed unit rapidly when refreshment is provided. The availability is explored numerically in Table 2 for the system and its value increases corresponding to refreshment rate (θ) and other parameters λ = 0.5, μ = 0.4, φ = 0.4, β = 0.4, β1 = 0.6, a = 0.8, b = 0.2 treated as constant. A redundant system is used to enhance the reliability and
682
M. S. Barak et al.
Table 1 MTSF versus refreshment rate (θ) θ ↓
λ = 0.5, μ = 0.4 φ = 0.4, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.6, μ = 0.4 φ = 0.4, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.5 φ = 0.4, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.4 φ = 0.5, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.4 φ = 0.4, β = 0.5 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.4 φ = 0.4, β = 0.4 β1 = 0.7 a = 0.8, b = 0.2
0.55 4.137649
3.391906
4.112044
4.258237
4.113155
4.146341
0.6
3.395062
4.117647
4.264151
4.117647
4.151436
0.65 4.147166
3.39801
4.122871
4.269644
4.121818
4.156171
0.7
4.151436
3.400771
4.127753
4.27476
4.1257
4.160584
0.75 4.155424
3.403361
4.132325
4.279537
4.129323
4.164706
0.8
4.159157
3.405797
4.136617
4.284006
4.132712
4.168565
0.85 4.162658
3.408091
4.140653
4.288197
4.135889
4.172185
0.9
4.142582
4.165949
3.410256
4.144455
4.292135
4.138874
4.175589
0.95 4.169047
3.412303
4.148044
4.295842
4.141682
4.178794
1
3.414239
4.151436
4.299338
4.14433
4.181818
4.17197
Table 2 Availability versus refreshment rate (θ) θ ↓
λ = 0.5, μ = 0.4 φ = 0.4, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.6, μ = 0.4 φ = 0.4, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.5 φ = 0.4, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.4 φ = 0.5, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.4 φ = 0.4, β = 0.5 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.4 φ = 0.4, β = 0.4 β1 = 0.7 a = 0.8, b = 0.2
0.55 0.229992
0.20454
0.223597
0.263645
0.225156
0.231716
0.6
0.21273
0.233279
0.274562
0.234501
0.241508
0.65 0.248313
0.220151
0.242081
0.284429
0.242979
0.250385
0.7
0.239604 0.256233
0.226903
0.25011
0.293379
0.250698
0.258461
0.75 0.263459
0.233068
0.257457
0.301524
0.257751
0.265832
0.8
0.270073
0.238717
0.264201
0.308962
0.264215
0.272581
0.85 0.276145
0.243909
0.270408
0.315773
0.27016
0.278778
0.9
0.281736
0.248696
0.276136
0.32203
0.275641
0.284486
0.95 0.286898
0.253122
0.281437
0.327793
0.280709
0.289755
1
0.257225
0.286353
0.333116
0.285407
0.294633
0.291675
profit of the system. When the operative unit fails, then the cold standby unit starts working, and the server is required to repair the failed unit. Sometimes, the server needs refreshments to increase his efficiency in handling the unit. It is evident from Table 3 that by using the constant parameters such that λ = 0.5, μ = 0.4, φ = 0.5, β = 0.4, β1 = 0.6, a = 0.8, b = 0.2, the trend of profit function enhanced with respect to refreshment rate (θ).
Reliability Characteristics of Two Units’ Parallel System …
683
Table 3 Profit versus refreshment rate (θ) θ ↓
λ = 0.5, μ = 0.4 φ = 0.4, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.6, μ = 0.4 φ = 0.4, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.5 φ = 0.4, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.4 φ = 0.5, β = 0.4 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.4 φ = 0.4, β = 0.5 β1 = 0.6 a = 0.8, b = 0.2
λ = 0.5, μ = 0.4 φ = 0.4, β = 0.4 β1 = 0.7 a = 0.8, b = 0.2
0.55 240.5826
162.1836
220.0317
337.5906
237.4784
243.3071
0.6
261.5652
179.7569
242.0214
361.9994
257.6871
264.5508
0.65 280.8163
195.8645
262.2311
384.3237
276.2337
284.0204
0.7
210.677
280.8545
404.7996
293.3051
301.914
0.75 314.8664
224.3406
298.0598
423.6318
309.0628
318.4036
0.8
329.9769
236.9804
313.9942
440.9981
323.6465
333.6387
0.85 343.9857
248.7047
328.786
457.0531
337.1777
347.7493
0.9
298.5279
259.6072
342.548
471.9319
349.7622
360.8494
0.95 369.127
357.0034
269.7696
355.3791
485.7524
361.4929
373.0386
1
279.2633
367.3668
498.6184
372.4509
384.4046
380.4416
12 Conclusion The above study shows that the mean time to system failure of the system model, availability of the system for use, and the profit function has an increasing trend with respect to refreshment rate. It is noticed that if refreshment is provided to the server before going to abnormal weather, it enhances his efficiency. The concept of refreshment under weather conditions is used in industries, cybercafés, universities, etc. Hence, the main idea finding in this study is that offering refreshments to the serviceman proved a tremendous change in his efficiency to preserve the quality and the profit of the system on the higher side.
References 1. Osaki S, Asakura T (1970) A two-unit standby redundant system with repair and preventive maintenance. J Appl Probab 7(3):641–648 2. Birolini A (1974) Some applications of regenerative stochastic processes to reliability theorypart one: tutorial Introduction. IEEE Trans Reliab 23(3):186–194 3. Subramanian R (1978) Availability of 2-unit system with preventive maintenance and one repair facility. IEEE Trans Reliab 27(2):171–172 4. Naidu RS, Gopalan MN (1984) Cost-benefit analysis of a one-server two-unit system subject to arbitrary failure, inspection and repair. Reliab Eng 8(1):11–22 5. Dhillon BS (1993) Reliability and availability analysis of a system with warm standby and common cause failures. Microelectron Reliab 33(9):1343–1349 6. Chander S, Bansal RK (2005) Profit analysis of single-unit reliability models with repair at different failure modes. Proceedings of INCRESE IIT Kharagpur, India, pp 577–587
684
M. S. Barak et al.
7. Wang GJ, Zhang YL (2007) An optimal replacement policy for repairable cold standby systems with priority in use. Int J Syst Sci 38(12):1021–1027 8. Sachin K, Anand T (2009) Evaluation of some reliability parameters of a three-state repairable system with environmental failure. Int J Res Rev Appl Sci 2(1):96–103 9. Malik SC, Pawar D (2010) Reliability and economic measures of a system with inspection for on-line repair and no repair activity in abnormal weather. Bull Pure Appl Sci 29(2):355–368 10. Pawar D, Malik SC, Bahl S (2010) Steady state analysis of an operating system with repair at different levels of damages subject to inspection and weather conditions. Int J Agric Stat Sci 6(1):225–234 11. Deswal S, Malik SC (2015) Reliability measures of a system of two non identical units with priority subject to weather conditions. J Reliab Stat Stud 8(1):181–190 12. Joshi PK, Sharma C (2015) Stochastic analysis of two identical unit warm standby system with varying demand of production. J Adv Math Comput Sci 1–16 13. Kumar A, Baweja S, Barak M (2015) Stochastic behavior of a cold standby system with maximum repair time. Decision Sci Lett 4(4):569–578 14. Kumar J, Goel M (2016) Availability and profit analysis of a two-unit cold standby system for general distribution. Cogent Math 3(1):1262937 15. Malik SC, Rathee R (2016) Reliability modelling of a parallel system with maximum operation and repair times. Int J Oper Res 25(1):131–142 16. Barak MS, Kumari S (2018) Profit analysis of a two-unit cold standby system operating under different weather conditions subject to inspection. Appl Appl Math Int J 13(1):5 17. Barak MS, Yadav D, Kumari S (2018) Stochastic analysis of a two-unit system with standby and server failure subject to inspection. Life Cycle Reliab Safety Eng 7(1):23–32 18. Kumar A, Pawar D, Malik SC (2019) Profit analysis of a warm standby non-identical unit system with single server performing in normal/abnormal environment. Life Cycle Reliab Safety Eng 8(3):219–226 19. Kumar A, Pawar D, Malik SC (2020) Reliability analysis of a redundant system with ‘FCFS’ repair policy subject to weather conditions. Int J Adv Sci Technol 29(3):7568–7578 20. Uniyal N, Pant S, Kumar A (2020) An overview of few nature-inspired optimization techniques and its reliability applications. Int J Math Eng Manage Sci 5(4):732 21. Khanduja D, Panchal D (2021) RAM analysis for improving productivity in process industries: a review. Reliab Risk Model Eng Syst 151–160 22. Kumar J, Bansal SA, Mehta M (2022) Reliability analysis—a critical review. In: Recent trends in industrial and production engineering, pp 205–217 23. Kumar A, Garg R, Barak MS (2021) Reliability measures of a cold standby system subject to refreshment. Int J Syst Assur Eng Manage 14(1):1–9
A Comparison of Security Analysis Methods for Smart Contracts Built on the Ethereum Blockchain Satpal Singh Kushwaha and Sandeep Joshi
Abstract Blockchain technology is one of the superior technologies of recent times. It is a decentralized distributed ledger to record transactions in an immutable way. Smart contracts are digital contracts, i.e., an auto executable small piece of code, written to be deployed on the blockchain. Ethereum is a mostly used blockchain platform for writing and deploying the smart contract. The immutability feature of blockchain poses threats to smart contracts because if a smart contract is uploaded on the blockchain, it becomes immutable. The EVM design and solidity programming language pose some security issues, which should be analyzed properly. So smart contracts must be analyzed for vulnerabilities before deployment. Several smart contracts security analysis approaches are proposed in the literature based on symbolic execution, taint analysis, formal verification, rule-based pattern detection, etc. A comparative study of the above techniques is performed in this paper, which will help the researchers select an efficient method to analyze the Ethereum smart contracts and pave the way for future research directions. Keywords Blockchain · Ethereum · Smart contract · Vulnerability · Security · Analysis · Attacks · EVM
1 Introduction Blockchain-based smart contracts provide a way of trustless communication between two parties without a trusted third party. It is possible due to the features of blockchain technology [1, 2]. Smart contracts are auto-triggered small pieces of code containing all the agreement terms agreed by two parties. Using smart contracts, the payments are made in cryptocurrencies; that’s why they are on target of attackers. Smart contracts were attacked several times in history, and the attackers successfully stole ether (cryptocurrency of Ethereum blockchain). The most famous attack was the DAO [3] S. S. Kushwaha · S. Joshi (B) Manipal University Jaipur, Jaipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_52
685
686
S. S. Kushwaha and S. Joshi
Table 1 Famous attacks on smart contracts Real world attack Cause Vulnerability DAO [3]
External Dependance Malleable randomness Unauthorized access control
Smart billions lottery [4] Parity multi-signature wallet attack [5] Proof of weak hand Improper validation coin (PoWHC) [6] BadgerDAO [7] Unauthorized access control
Loss in Ether/US$
Year
Re-entrancy
60 Million US$
2016
Bad randomness
400 Ether
2017
Function’s default visibility
5 million Ether
2017
Integer overflow
866 Ether
2018
Unprotected Ether withdrawal
120 million US$
2021
attack that happened in 2016, where the attackers could steal 60 Million US$s. Smart contracts are agreements between two parties coded in a programming language, and codes can have bugs. As the Ethereum blockchain records the transactions in an immutable way means, once registered or deployed over the blockchain, they become immutable. So smart contracts should be thoroughly analyzed to check vulnerabilities before being recorded as a transaction on the blockchain. Table 1 shows some of the famous attacks on the Ethereum blockchain-based smart contract.
1.1 Ethereum Blockchain Technology Ethereum blockchain is one of the famous and primarily used to develop smart contracts. Ethereum blockchain is a public decentralized distributive ledger. Anyone can deploy permanent and immutable decentralized applications onto Ethereum, and users can interact with these applications. Ethereum blockchain is now rapidly being adopted in several fields such as decentralized financial applications, supply chain management, storing bulk of data generated from IoT devices, government operations, health industry, and education. Some financial institutions will transfer their databases and processes on the blockchain. In the same way, Dubai plans to deploy all government operations on blockchain to implement transparency in their government operations. Block chain’s popularity is due to its features like immutability and decentralization, which makes it more secure.
1.2 Ethereum Smart Contract The smart contract is a digital agreement [8] among two unknown entities without any mediating person in a trustless manner. The trust in such agreements comes from the features of blockchain, like immutability. These smart contracts [9] are coded in solidity-like programming languages and uploaded on the blockchain through a
A Comparison of Security Analysis Methods for Smart Contracts Built …
687
transaction. Smart contracts have their storage and balances. Ethereum smart contracts get executed when a predefined or agreed condition is met or triggered, then decided terms in the agreement are automatically fulfilled, and the smart contract state is updated. Smart contracts become insecure if uploaded on the chain with performing analysis for vulnerabilities [10].
1.3 Motivation The literature did not cover the performance assessment related to Ethereum smart contracts analysis methods. Few articles are present that only categorize the analysis methods but do not provide a performance analysis to asses the analysis methods, which lacks the researchers to select the appropriate method for further research.
1.4 Contribution This paper presents a comparative analysis of Ethereum blockchain smart contracts. Our results show each method’s efficiency in terms of vulnerability detection and adoption of method in security analysis tools developments. This comparative analysis will help the research community to asses suitable methods for smart contracts analysis.
1.5 Paper Outline The organization of rest of the paper is as follows: Sect. 3 presents Ethereum smart contracts key security issues, Sect. 4 describes Ethereum smart contracts security analysis methods, Sect. 5 describes comparison results of analysis methods, and Sect. 6 concludes the paper.
2 Related Work Several research articles have been published on security analysis methods and tools for Ethereum smart contracts. Praitheeshan et al. [11] present a survey on analysis security analysis methods and provide a detailed overview of security vulnerabilities, Bartoletti et al. [4] performed an empirical analysis on smart contracts security, Rouhani et al. [12] did a systematic survey on security and performance of smart contracts, Daojing He et al. [13] performed vulnerability analysis and summarized security analysis methods of smart contracts Almakhour et al. [14] surveyed about
688
S. S. Kushwaha and S. Joshi
verification of Ethereum smart contracts, Pinna et al. [15] performed an empirical study on Ethereum smart contracts analysis, Atzei et al. [5] attacks on Ethereum smart contract and analysis of security vulnerabilities, and Liu et al. [16] surveyed the verification of smart contracts. The research work done on smart contract analysis is either incomplete or not sufficient to assess the performance of analysis methods.
3 Ethereum Smart Contract Key Vulnerabilities Several research articles are present in the literature for intelligent contract vulnerabilities. These vulnerabilities are categorized into three categories: due to Ethereum blockchain design, solidity programming language, or Ethereum virtual machine (EVM). These are the leading root cause that can be further divided into sub-cause categories like immutability, uncontrolled resource consumption, malleable entropy sources, malleable miner, malleable user, fallback function invocation, insufficient authorization, etc. (Fig. 1). Following is a brief description of some smart contract vulnerabilities.
3.1 Re-entrancy Vulnerability This vulnerability occurs due to the re-entrant function. This vulnerability was the leading cause of the famous “The DAO” attack. Due to the re-entrant function, the same code can be repeatedly called, which can be maliciously used by an attacker.
3.2 Integer Overflow and Underflow This vulnerability occurs during arithmetic calculations. Each data type in solidity has a specified range. Suppose the calculated value stored by a specific kind of variable crosses the field on the upper side during analysis. In that case, it becomes overflow; if it crosses the lower side, it becomes underflow. Integer overflow and underflow issues lead to wrong calculations and can be the reason for severe problems.
3.3 Timestamp Dependency This vulnerability occurs when a block timestamp is used as a source of randomness. The miners mine the blockchain blocks, and a malleable miner can manipulate the block timestamp as per his need, creating security threats to the smart contract.
A Comparison of Security Analysis Methods for Smart Contracts Built …
689
Fig. 1 Ethereum blockchain-based smart contract vulnerabilities
3.4 Transaction Ordering Dependency This vulnerability occurs when two dependent transactions are executed in the wrong order. If transaction “A” is dependent on transaction “B,” but if transaction “A” is executed by the miner before transaction “B,” then the results of the transactions will be undesired. The malleable miner can manipulate the transaction order and lead to disastrous results.
3.5 Gas-Related Vulnerabilities The gas is nothing but the transnational cost of each instruction in the Ethereum blockchain. The addition/subtraction operation’s cost of execution differs from the multiplication/division operation cost of execution. But when some operations send
690
S. S. Kushwaha and S. Joshi
less than the required gas, it fails. So for managing gas-related issues, sufficient gas should be supplied to specific operations, and gas costly loops should be avoided.
4 Ethereum Smart Contract Analysis Methods The Ethereum smart contract analysis techniques are mainly divided into three types: dynamic analysis, static analysis and formal verification, further divided into subcategories.
4.1 Static Analysis The smart contract is analyzed in static analysis without executing the code or any run-time environment. Static analysis generally examines the code directly by recognizing vulnerable patterns through pattern detection, by analyzing the control flow or data flow of the smart contract, by rule-based logic analysis, by taint tracking means tracking the changes in the critical variable values, analyzing the abstract syntax tree for vulnerable patterns. The static analysis covers approximately all the execution paths compared to the dynamic analysis.
4.2 Dynamic Analysis In dynamic analysis, the smart contract is executed in a run-time environment. The vulnerable patterns are identified by simulating the real-time attacks with the help of malicious code feeding or wrong feeding input. In static analysis, we cannot predict the user’s behavior or the intention for some particular input. That’s why static analysis may produce false positives compared to dynamic analysis.
4.3 Formal Analysis In formal analysis methods, specific properties like correctness, reliability, etc., are tried to be proved with the help of theorem provers or mathematics based on more formal methods. As formal verification techniques only verify the code’s correctness or validity, formal verification methods are used only in the limited number of tools (Fig. 2). Following Fig. 3 depicts the three main analysis methods with their respective subcategories.
A Comparison of Security Analysis Methods for Smart Contracts Built …
691
Fig. 2 Categorization of Ethereum smart contract analysis methods
Fig. 3 Share of analysis methods in smart contract analysis tools
5 Comparison of Main Analysis Methods The static and dynamic analysis methods have been used to develop several tools. Still, the formal verification techniques are used only in a few tool, so for the performance of comparative analysis, we selected only static and dynamic analysis. We have surveyed 86 security analysis tools in [17]. Based on the survey, 77% of tools are based on static analysis, 22 % of tools are based on dynamic analysis, and only 1 % of tools are based on formal analysis methods. Figure 4 depicts the share of three analysis methods in analysis tool development.
692
S. S. Kushwaha and S. Joshi
Fig. 4 Performance comparison of tools based on static analysis and dynamic analysis methods
We created a data set of 57 Ethereum smart contracts downloaded from Etherscan [18] tagged with 89 vulnerabilities from reentrancy, gas-related, integer overflowunderflow, and timestamp dependency. For performance comparison, we selected some famous Ethereum smart contract analysis tools based on static and dynamic analysis methods. Analysis tools selected based on static analysis method are Slither [19], Oyente [20], Securify [21], Mythril [22]. Analysis tools selected based on dynamic analysis method are MAIAN [23], Manticore [24], sFuzz [25]. In this performance comparison, we found that static analysis tools cover more types of vulnerabilities than dynamic analysis tools. It is also found that the vulnerability detection capabilities of statics analysis tools are better compared to dynamic analysis tools.
6 Conclusion We discovered the flaws already present in Ethereum’s smart contracts and arranged the various security analysis techniques into one of three categories: static, dynamic, and formal verification. Then, we further divided them into subcategories. After that, we evaluate each of the three approaches based on how well they perform, how thoroughly they search for vulnerabilities, and how accurately they do so. Both static analysis and dynamic analysis methods implemented automation tools, which are very convenient to use and make it easier to analyze vulnerable contracts. However, they can only detect their own uniquely defined vulnerable patterns. Static analysis methods cover more vulnerabilities compared to dynamic analysis. The detection efficiency of static analysis methods is good compared to dynamic analysis methods. Theorem provers are used in formal verification methods to validate the correctness properties of smart contracts by using the interpreted versions of their respective proofs.
A Comparison of Security Analysis Methods for Smart Contracts Built …
693
References 1. Averin A, Averina O (2019) Review of blockchain technology vulnerabilities and blockchainsystem attacks. https://doi.org/10.1109/FarEastCon.2019.8934243 2. Wright CS (2008) Bitcoin: a peer-to-peer electronic cash system. https://doi.org/10.2139/ssrn. 3440802 3. Ghaleb B, Al-Dubai A, Ekonomou E, Qasem M, Romdhani I, Mackenzie L (2019) Addressing the DAO insider attack in RPL’s Internet of Things. Networks. https://doi.org/10.1109/ LCOMM.2018.2878151 4. Bartoletti M, Pompianu L (2017) An empirical analysis of smart contracts: platforms, applications, and design patterns. https://doi.org/10.1007/978-3-319-70278-0_31 5. Atzei N, Bartoletti M, Cimoli T (2017) A survey of attacks on Ethereum smart contracts (SoK). https://doi.org/10.1007/978-3-662-54455-6_8 6. Min T, Wang H, Guo Y, Cai W (2019) Blockchain games: a survey. https://doi.org/10.1109/ CIG.2019.8848111 7. Lawler R (2021) Someone stole $ 120 million in crypto by hacking a DeFi Website. Accessed 11 Dec 2021. [Online]. Available: https://www.theverge.com 8. Szabo N (1997) Formalizing and securing relationships on public networks. https://doi.org/10. 5210/fm.v2i9.548 9. Kushwaha SS, Joshi S (2021) An overview of blockchain-based smart contract. https://doi.org/ 10.1007/978-981-15-9647-6_70 10. Kushwaha SS, Joshi S, Singh D, Kaur M, Lee H-N (2022) Systematic review of security vulnerabilities in Ethereum blockchain smart. Contract. https://doi.org/10.1109/ACCESS.2021. 3140091 11. Praitheeshan P, Pan L, Yu J, Liu J, Doss R (2019) Security analysis methods on Ethereum smart contract vulnerabilities: a survey. https://arxiv.org/abs/1908.08605 12. Rouhani, S., Deters, R (2019) Security, performance, and applications of smart contracts: a systematic survey https://doi.org/10.1109/ACCESS.2019.2911031 13. He D, Deng Z, Zhang Y, Chan S, Cheng Y, Guizani N (2020) Smart contract vulnerability analysis and security audit. https://doi.org/10.1109/MNET.001.1900656 14. Almakhour M, Sliman L, Samhat AE, Mellouk A (2020) Verification of smart contracts: a survey. https://doi.org/10.1016/j.pmcj.2020.101227 15. Pinna A, Ibba S, Baralla G, Tonelli R, Marchesi M (2019) A massive analysis of Ethereum smart contracts empirical study and code metrics. https://doi.org/10.1109/ACCESS.2019.2921936 16. Liu J, Liu Z (2019) A survey on security verification of blockchain smart contracts. In: IEEE access, vol 7. Institute of Electrical and Electronics Engineers (IEEE), pp 77894–77904. https:// doi.org/10.1109/access.2019.2921624 17. Kushwaha SS, Joshi S, Singh D, Kaur M, Lee H-N (2022) Ethereum smart contract analysis tools: a systematic review. https://doi.org/10.1109/ACCESS.2022.3169902 18. Etherscan: the Ethereum blockchain explorer n.d. https://etherscan.io 19. Feist J, Grieco G, Groce A (2019) Slither: a static analysis framework for smart contracts. https://doi.org/10.1109/WETSEB.2019.00008 20. Luu L, Chu D-H, Olickel H, Saxena P, Hobor A (2016) Making smart contracts smarter. https:// doi.org/10.1145/2976749.2978309 21. Tsankov P, Dan A, Drachsler-Cohen D, Gervais A, Bünzli F, Vechev M (2018) Securifyy: practical security analysis of smart contracts. In: Proceeding ACM conference computer communication security. https://doi.org/10.1145/3243734.3243780 22. (Aug. 2022). Mythril. [Online]. Available: https://github.com/ConsenSys/mythril 23. Nikoli´c I, Kolluri A, Sergey I, Saxena P, Hobor A (2018) Finding the greedy, prodigal, and suicidal contracts at scale. https://doi.org/10.1145/3274694.3274743 24. Mossberg M, Manzano F, Hennenfent E, Groce A, Grieco G, Feist J, Brunson T, Dinaburg A (2019) Manticore: a user-friendly symbolic execution framework for binaries and smart contracts https://doi.org/10.1109/ASE.2019.00133 25. Nguyen TD, Pham LH, Sun J, Lin Y, Minh QT (2020) sFuzz https://doi.org/10.1145/3377811. 3380334
IoT-Based Real-Time Crop and Fertilizer Prediction for Precision Farming 4.0 Shailendra Tiwari, Ravit Garg, N. Kushal, and Manju Khurana
Abstract The Internet of things (IoT), in the present era is influencing everybody’s life and fate through smart applications. It is a collection of various gadgets which make a self-decision-taking system. The new advancements in Precision Farming with the utilization of the Internet of things improving the substance of traditional farming strategies by making it efficient, cost-effective for farmers, increasing crop yield and prevents the decrease in soil nutrient content. The paper presents the prototype for the development of a system that can record and analyse soil features such as moisture, pH, temperature, humidity, nitrogen, phosphorous, and potassium levels by acquiring real-time agriculture information and be able to predict the suitability of the soil for crop vis-a-vis the fertilizer application required for the said crop. The system developed in this article uses Arduino Uno, temperature-humidity sensor, breadboard, ESP32s Node MCU for Wi-Fi connectivity, pressure sensor, soil moisture sensor, soil NPK sensor, and jumper wires. This approach is based on a simple architecture that collects and analyses the data, in real-time, from the agricultural field and transmits the same through Wi-Fi to the cloud server. Based on the computation of the input parameters, the system would also suggest the most suitable crop and the fertilizer for that predicted crop. Keywords Crop selection · Machine learning · Arduino · IoT device · Precision farming
1 Introduction The IoT is associated with our daily routine more productively in the current world. Agriculture is the better wellspring of giving food varieties, cereals, and other crude substances to human existence. Till now the farmers are following conventional methods taught by their ancestors. But the methods and information gathered from S. Tiwari (B) · R. Garg · N. Kushal · M. Khurana CSED, Thapar Institute of Engineering and Technology, Patiala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_53
695
696
S. Tiwari et al.
the ancestors are not going to fulfil our needs in the future. Also, modern people are searching for innovative procedures and smart gadgets for their everyday exercises. Innovation has an impressive ability to increment efficiency and diminish the additional human assets from agriculture. Internet of things has introduced the new ways for precision farming [1]. While discussing the brilliant cultivating idea with the IoT, gathering data from the sensors in the agriculture domain is the principal challenge. Moreover, precise examination of agriculture-meteorological and environmental conditions can increase crop yield [2]. It will also give information of the best fit crop and suitable fertilizer for that crop. IoT in agriculture is presently the most researched area as this sector is the crucial sector to ensure food security for the ever-growing global human population [3]. However, previously developed systems have low accuracy and are costly. In addition to this, they don’t provide a user-friendly interface. In addition, those systems developed in the past have not been of portable nature. Therefore, in the present study, a consolidated and committed system has been proposed for the collection of soil parameters using the IoT kit, android application, and cloud server. The created framework gathers and analyses the soil parameters at an agricultural field level in real-time. Additionally, we have fostered our own Arduino-based IoT portable kit for the collection of soil data. So, the total incorporated and committed system has been planned and developed for soil data collection using the IoT kit, android application, and cloud-based server. The created framework gathers and analyses the data received from the agriculture field with the help of sensors. Goals of this paper are (1) to plan and develop a affordable, and IoT portable kit for the collection of soil data. (2) Based on data collected predict the suitable crop and fertilizer for that crop. (3) To develop a easy to use mobile application for displaying soil properties and predicting the best suitable crop and fertilizer. (4) To develop a cloud server to collect and analyse the data collected. The details of this article are presented in six divisions. Section 1 contains a brief introduction of proposed system. Section 2 contains related work done in this field. Section 3 contains the architecture, components used and flowchart of the system. Section 4 contains a detailed description of the mathematical models used in the development of the system. Section 5 contains results and discussion. The last section contains the conclusion and future aspects of the study.
2 Related Work Shafi et al. [4] proposed a coordinated methodology for checking crop well-being by utilizing AI, IoT, and drone innovation. Remote sensors gives ongoing data of surroundings influencing the yield, and the drone-based platform gives the multispectral information used to create Vegetation Indices (VI’s) for example, Normalized Difference Vegetation Index (NDVI) for examining crop well-being. Deep learning calculations were applied to the gathered information wherein a deep neural network
IoT-Based Real-Time Crop and Fertilizer Prediction for Precision …
697
with two secret layers was viewed as the most ideal model among every one of the chosen models [5]. The paper by Gaikwad et al. [6], depicts the plan and improvement of a technique that keeps track of the soil parameters such as temperature, humidity, air temperature, air moistness at the agriculture field progressively. Their framework has been created utilizing open-source software. The IoT-based gadget estimated the parameters and promptly moved them to a cloud server to compute the data. This minimal expense system can helpfully and proficiently be utilized in the agribusiness area for farming management and production. The paper by Rodríguez et al. [7] executes a three-layered design system based on smart farming. They assessed Omicron, Libelium, and Intel advancements under rules like the value, the number of contributions for sensor association, correspondence conventions, transportability, battery duration, and collecting energy framework photovoltaic board. They assessed edge-based administration components in the Edge Layer to give information dependability, zeroing in on exception identification and treatment utilizing machine learning and interpolation calculations. In the layer of data analytics, they assessed different AI calculations to gauge espresso creation. The paper by Jaybhaye et al. [8] portrays the application ‘Farming Guru’ that helps farmers in successful cultivating, by making them more intelligent. Various seasons, markets, and environments impact the creation of yields, yet changes in these variables represent a significant misfortune to a farmer. These elements can be decreased by utilizing this application which includes five significant areas to be specific, crop investigation, exact weather conditions figure, machine learning-based approach connected with the information on moistness, zones of development, air conditions alongside pressure, and reasonable market for crops in advance. Knowing climate before development can safeguard farms as well as farmers from enormous misfortunes. The ongoing paper expects a model and framework plan to break down harvests and weather conditions as well as help the farmers in overseeing funds by giving them a chance to exchange their devices for additional pay as well as manage their pay by giving a financial plan. The paper by Bandara and Mudiyanselage [1] proposed a framework that utilized IoT sensors to gather soil data. The system proposed in this paper does the real-time monitoring of the agriculture field with IoT remote sensors. It gives a more solid and adaptable idea for the farmers, and a straightforward design containing the sensors that gather the information from the agriculture field and pass on the information to a central database and as per the received data, the server allocated the work to the specific devices.
698
S. Tiwari et al.
3 The Proposed System This section of the paper provides the technical details of the system.
3.1 The Architecture of the System In this article, an IoT-based system is developed that capture the soil data. System architecture consists of the following significant components: Arduino kit, IoT sensors, mobile-based application, and cloud server for computation of the data. IoT kit is portable and powered by solar. With the help of a Wi-Fi microchip, an Internet connection is provided to the system via the hotspot of mobile. Due to the portability of the system feasible approach is using Wi-Fi. We can also use the GSM module for Internet connectivity, but the GSM module is expensive as its usage charges are high. So proposed IoT kit can capture soil moisture, temperature, humidity, nitrogen, phosphorous, and potassium. Further cloud server performs computation based on sensor values and provides a suitable crop for that field and also suggests NPK ratio for the fertilizer. IoT kit and sensors: System design includes an Arduino kit and probes/sensors (soil moisture sensor with resistance probe, temperature sensor, NPK sensor, air temperature sensor, moistness, and pH sensor). In addition to this a solar panel for the power source, and a Wi-Fi module for the Internet connectivity. NodeMCU, based on ESP8266 is an open-source platform that can establish communication between objects and transfer data through the Wi-Fi protocol, which is used in the proposed system. The entire circuit set up along with its components has been placed in the box. With the help of a hotspot, of a mobile Internet connection was provided to the system. The details of the connection have been included in the Arduino code. The air humidity (range 20–90% with 5% accuracy) and temperature have been acquired through DHT11. ESP32 is used with other systems to provide Wi-Fi and on the backside of Arduino kit thermistor and IC chip is attached. Two points in a circuit are connected by jumper wires. The Arduino is used for the programming of software and hardware and it is open-sourced. In this article, an IoT kit with a portable design is proposed. For initial stages of circuit design a breadboard is used for building and testing. Mobile-based application: The mobile application was developed to consolidate user data and data from the IoT kit. Android Studio version 3.2 was used as a platform using the Flutter package, which uses Dart programming language, for programming the mobile application. It is used to display soil nutrients such as nitrogen, phosphorus, and potassium along with soil moisture, temperature, and pH with the IoT kit. The inbuilt smartphone feature GPS has been used for reducing the cost of the application. The user’s location coordinates have been fetched using the Geolocator package [9]. Efforts have been made to securely login the user using
IoT-Based Real-Time Crop and Fertilizer Prediction for Precision …
699
Fig. 1 Flowchart of the system
email verification and Google’s ReCAPTCHA. The IoT kit sends data to the cloud server which is fetched by the application for further processing. Application and Cloud server: In the end, the data from the smartphone and IoT kit were sent to the cloud server for computation. The backend for the application was designed using the Django REST framework. Furthermore, the cloud server was used to store soil data collected with the help of the IoT kit. API for serial communication between IoT kit and mobile application has been included in the mobile application with a smartphone application. Figure 1 describes the flowchart of the system.
4 Mathematical Models This section describes the various mathematical models used in this paper. After testing the system on various models KNN, Logistic regression, and Random Forest were selected for multi-classification:
4.1 K-Nearest Neighbour (KNN) KNN, a machine learning algorithm, is used for classification [10]. This is based on a supervised learning algorithm, in which we do classification based on neighbours. It keeps the track of old results and performs classification for new ones on the basis of similarity with stored old results. Here K represents the number of neighbours used for comparison of distance. Based on Eq. 1, we find the distance between old cases and new cases.
700
S. Tiwari et al.
D(X 1 , X 2 ) = [
K
p
|X 1i − X 2i |
1/ p
]
(1)
i=1
Here p = 2 (for Euclidean distance) X 1i represents training data X 2i represents test data. We get a result based on those coordinates after the calculation of the distances of all the K-nearest points.
4.2 Random Forest Classifier Random Forest [11] is a popular ML algorithm that involves a technique based on supervised learning. It has been adopted for both regression and classification problems in machine learning. It uses the idea of ensemble learning. Ensemble learning concept includes improving the accuracy of the model by solving a complex problem by combining multiple classifiers. Random Forest involves decision trees on various segments of the given dataset and picks the mean for the accurate prediction of the datasets. The final output of the random forest is based on the majority votes of predictions. Greater the number of trees greater the accuracy which also removes the issue of overfitting. The Random Forest algorithm works by creating numbers of decision trees for training and gives the results prediction of classification (regression) for each created tree. Random Forest algorithm contains properties of random regression of trees. In Random Forest, random records are chosen for computation from the dataset. After that for each sample individual decision trees are made. An output is generated for each decision tree. Majority Voting for classification or Averaging for regression determines the final output.
4.3 Logistic Regression Logistic Regression [12], a machine learning algorithm, is a statistical approach implemented for the classification of problems on the basis of probability concepts. When the target value is categorical this algorithm is used. Generally, when the type of classification problem is binary this algorithm is used. This approach can be used, for example, in marking email spam (1) or not (0). The sigmoid function is used by this algorithm, to return the probability of a label. There are 3 types of logistic regression ordinal, multinomial, and binomial. We are using multinomial logistic
IoT-Based Real-Time Crop and Fertilizer Prediction for Precision …
701
regression as we want a suggestion of crop and fertilizer as an output. The Logistic Regression formula is represented in Eq. 2.
p j (x) log p J (x)
= β0 j + β1 j X 1 + · · · + βi j X i
(2)
Here j = 1, …, J − 1. βi j is the regression coefficient for the above values of j. X i is independent variable for i = 1, …, p
5 Results and Discussion Different soil properties were measured using the IoT sensors stated above. The cloud server does the computation using the information provided by the sensors and the results, which include suitable crop and NPK ratio for suitable fertilizer, can be conveyed on a mobile-based application. The information is collected simultaneously from the field by inserting an IoT kit into the soil. From there the information is passed on to the cloud server to break down the required elements for the dirt. The nutrient content is estimated regularly to examine the enhancement in the soil and for the prediction of NPK ratio of fertilizers.
5.1 pH Sensor Observations The acidity/alkalinity of the soil can be evaluated by the pH sensor. Various yields need unique pH conditions, so based on pH value best decision can be taken. A pH with a value of 7 is neutral, acidic when the value is lesser than 7, and alkaline when the value is more than 7. The demands of farmers can be fulfilled by blending a variety of fertilizers to amend the pH value according to the necessity. This pH reach can insure the high solubility of most add-ons fundamental for crop development and improvement [13]. pH value of samples collected from college campus and nearby fields lies between 6–6.5 and 7 and the review shows a large portion of the crop yields need a range of pH value 5.5–7.0 for typical germination and improvement. Appropriate pH value for each crop can be seen in Fig. 2.
702
S. Tiwari et al.
Fig. 2 pH versus crop bar graph
5.2 Moisture Content Water present in the soil have critical part in determining the dirt characteristics like supplement content, electric resistance, and innumerable other sub-characteristics. Soil moisture sensor is used to measure the moisture of the soil.
5.3 Temperature and Humidity Sensor Basic temperature and humidity sensors are placed in the dirt, and they communicate a constant flow of information to the cloud. Figure 3 shows the suitable humidity value for crops. The below-ground activities such as the growth of roots, respiration, decomposition, and nitrogen mineralization are significantly influenced by soil temperature. Figure 4 shows the suitable temperature value for each crop. That harvest yield was not ideal because of unsuitable daylight and low temperature according to the perception.
IoT-Based Real-Time Crop and Fertilizer Prediction for Precision …
Fig. 3 Humidity versus crop bar graph
Fig. 4 Temperature versus crop bar graph
703
704
S. Tiwari et al.
Fig. 5 Nitrogen versus crop bar graph
5.4 Nutrient Content The content of the supplements, nitrogen, phosphorus, and potassium (NPK), directly impact the yield cycles that follow. These readings are particularly helpful towards the beginning of the cycle when the decision regarding the choice of the crop is made and the farmer will be ready to use a different type of fertilizers according to the essential requirements of the soil. Figures 5, 6 and 7 shows required nitrogen, phosphorous, and potassium contents for better yield of particular crop. The NPK contents present in the soil are determined with the help of a NPK sensor. Soil contents have been observed with slight differences in NPK estimates and upon utilization of regular composts like vermicompost and pachakavya and reevaluation of the contents have shown an expanded supplement level in the dirt. The tests have been conducted on the college campus and nearby fields which have been divided into distinct regions. The samples have been collected from these different regions which have different soil parameters. The model that executes on a cloud server receives the humidity, temperature, pH, moisture, and NPK values through field sensors. In this paper, we focused on predicting the most suitable crop to be cultivated in the fields and NPK ratio for fertilizer, by a farmer, to achieve the best possible yield. The suggestion for the crop is forecasted by the utilization of different Artificial Intelligence Algorithms like K-Nearest Neighbour, Random Forest, and Logistic Regression. After that, for the precision of the predictions, a numerical assessment has to be done. To determine the suitable crop, subsequent parameters, NPK values, temperature, moisture, moistness, pH, and rainfall are taken into consideration. The
IoT-Based Real-Time Crop and Fertilizer Prediction for Precision …
Fig. 6 Phosphorous versus crop bar graph
Fig. 7 Potassium versus crop bar graph
705
706
S. Tiwari et al.
parameters are expected to suggest a suitable crop. The mobile application shown in Fig. 8 is developed to display the results. Subsequently, the chosen parameters are called the input feature and the suggested feature is called the Y-yield feature which consists of the suggested crop and fertilizer.
Fig. 8 Mobile-based application to collect the soil data and predict crop and NPK ratio for fertilizer
IoT-Based Real-Time Crop and Fertilizer Prediction for Precision … Table 1 Accuracy analysis of models
Model
Accuracy in %
Random forest
99.348
KNN
98.478
Logistic regression
98.043
707
The dataset used to develop the system is a complete dataset that holds almost zero empty fields. Therefore, there isn’t any need for one-hot pre-processing or encoding to fill the empty fields. In the above-mentioned dataset, 23 different yield types are considered. The dataset consists of 23 unique varieties of crops. The dataset has been manually isolated into 80% training and 20% testing segments. The dataset consists of both mathematical and downright qualities. Standard scalar pre-processing procedures have been used on this dataset for mathematical properties for normalizing values also, keep up with correspondence. Label Encoder is utilized for absolute properties to switch marks into a numeric structure to change over them into the machinecomprehensible structure. After testing various ML Algorithms, Random Forest, KNN, and Logistic Regression have been selected to give the maximum accuracy. Table 1 shows the accuracy of Random Forest, KNN, and Logistic Regression. The dataset containing the soil parameters mentioned earlier mostly follows the trends and changes in the same way. So, Random Forest, Logistic Regression, and KNN are better for calculations of the dataset used in this article. KNN works on a few occasions if any new occurrence, coordinates with these examples then it is a neighbour as per KNN. Figure 9 shows the confusion matrix for KNN. Random Forest is used for numerous occurrence of average and majority voting give the decision of output. Figure 10 shows the confusion matrix for Random Forest. For categorical dependent variables having a provided set of independent variables, Logistic Regression is used. Figure 11 shows the confusion matrix for Logistic Regression. Here, we have analysed the dataset using Random Forest Classifier, Logistic Regression, and K-Nearest Neighbour. It can be noticed that the Random Forest Classifier gives finer precision compared to the remaining two models (Logistic Regression and KNN). KNN couldn’t fill in as much as Random Forest was able to, because both rely upon the greater part vote strategy. Be that as it may, KNN majorly relies upon the distance between previously already stored information and the new information provided to it. Furthermore, it majorly relies upon the neighbours. That means with change in K, results also change. Subsequently, exactness is unsteady. KNN is better when a large amount of information had to be followed. Initially, the features are fed to the logistic regression classifier as input, and processing is done on the features after scaling the features based on the output of the system model, we receive the result in the form of suitable crop and NPK ratio for fertilizer. The primary benefit of Random Forest is testing procedures. It tracks down every one of the relationships, on the basis of the sample dataset, between the taken dataset and models multiple decision trees. Examining contains both columns as well as
708
S. Tiwari et al.
Fig. 9 Confusion matrix for KNN
element testing. That’s why there is a high probability of various acquaintances in the dataset. Thus alternate results for various choice trees in the light of the corelations of distinct trees are obtained. The cross-approval method is used for tested models. This technique is used to evaluate AI models and checking the model performance for a self-directed testing dataset and applied the KNN classifier, i.e. N-neighbours as 2 and applied classifier to each model at long last. In the light of those outcomes by applying a larger part vote approach we obtain a maximal precise connected result. As multiple decision trees are stimulated, high precision is present and thusly obtaining a major acknowledgement from that. That’s why Random Forest is much ideal however it is exceptionally calculation contrasted with KNN. The Random Forest and Logistic Regression have a low bias and low variance.
6 Conclusion and Future Scope The agriculture is one of the sectors that significantly contribute to the economic growth of any nation, including India, but is lacking in using new technologies of machine learning. Hence our farmers should be able to be aware and use newer technologies to identify suitable crops and fertilizer inputs, through the use of machine
IoT-Based Real-Time Crop and Fertilizer Prediction for Precision …
709
Fig. 10 Confusion matrix for random forest
learning and other modern approaches. The IoT-enabled agricultural system helped us in achieving modern scientific solutions. The proposed system uses parameters like soil moistness, humidity, temperature, NPK, pH, and rainfall values. This study, thus, focused on the selection of crops and fertilizers based on soil and air parameters using ML techniques. The Random Forest algorithm, implemented as a regressor and classifier, gave better results than KNN and Logistic Regression. In the future, the precision of the system can be increased by using more parameters. In addition, crop health can be monitored at regular intervals while making predictions in the future for more accuracy.
710
S. Tiwari et al.
Fig. 11 Confusion matrix for logistic regression
References 1. Bandara TM, Mudiyanselage W, Raza M (2020) Smart farm and monitoring system for measuring the environmental condition using wireless sensor network—IOT technology in farming. In: 2020 5th international conference on innovative technologies in intelligent systems and industrial applications (CITISIA), pp 1–7. https://doi.org/10.1109/CITISIA50690.2020. 9371830 2. Jangam AR, Kale KV, Gaikwad S, Vibhute AD (2018) Design and development of IoT based System for retrieval of agrometeorological parameters. In: 2018 international conference on recent innovations in electrical, electronics & communication engineering (ICRIEECE), pp 804–809. https://doi.org/10.1109/ICRIEECE44171.2018.9008636 3. Doshi J, Patel T, Bharti SK (2019) Smart Farming using IoT, a solution for optimally monitoring farming conditions. Proc Comput Sci 160:746–751 4. Shafi U et al (2020) A multi-modal approach for crop health mapping using low altitude remote sensing, internet of things (IoT) and machine learning. IEEE Access 8:112708–112724. https:/ /doi.org/10.1109/ACCESS.2020.3002948 5. Rudd JD, Roberson GT, Classen JJ (2017) Application of satellite, unmanned aircraft system, and ground-based sensor data for precision agriculture: a review. In: 2017 Spokane, Washington July 16–July 19, 2017, p 1 6. Gaikwad SV, Vibhute AD, Kale KV, Mehrotra SC (2021) An innovative IoT based system for precision farming. Comput Electron Agric 187:106291 7. Rodríguez JP, Montoya-Munoz AI, Rodriguez-Pabon C, Hoyos J, Corrales JC (2021) IoT-Agro: a smart farming system to Colombian coffee farms. Comput Electron Agric 190:106442 8. Jaybhaye N, Tatiya P, Joshi A, Kothari S, Tapkir J (2022) Farming Guru:—machine learning based innovation for smart farming. In: 2022 4th international conference on smart systems
IoT-Based Real-Time Crop and Fertilizer Prediction for Precision …
9. 10. 11. 12.
13.
711
and inventive technology (ICSSIT), pp 848–851. https://doi.org/10.1109/ICSSIT53264.2022. 9716287 “Geolocator,” Dart packages. https://pub.dev/packages/geolocator. Accessed 11 Jul 2022 “What is the k-nearest neighbors algorithm?” Ibm.com. https://www.ibm.com/in-en/topics/ knn. Accessed 11 Jul 2022 IBM Cloud Education (2020) What is random forest? Ibm.com. https://www.ibm.com/cloud/ learn/random-forest. Accessed 11 Jul 2022 Rajeshwari T, Harsha Vardhini PA, Manoj Kumar Reddy K, Priya KK, Sreeja K (2021) Smart agriculture implementation using IoT and leaf disease detection using logistic regression. In: 2021 4th international conference on recent developments in control, automation & power engineering (RDCAPE), pp 619–623. https://doi.org/10.1109/RDCAPE52977.2021.9633608 Reshma R, Sathiyavathi V, Sindhu T, Selvakumar K, SaiRamesh L (2020) IoT based classification techniques for soil content analysis and crop yield prediction. In: 2020 fourth international conference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC), pp 156–160. https://doi.org/10.1109/I-SMAC49090.2020.9243600
Pied Piper: Meta Search for Music Pulak Malhotra
and Ashwin Rao
Abstract Internet search engines have become an integral part of life, but for pop music, people still rely on textual search engines like Google. We propose Pied Piper, a meta search engine for music. It can search for music lyrics, song metadata and song audio or a combination of any of these as the input query and efficiently return the relevant results. Keywords Metadata search · Audio search · Audio fingerprint · Information retrieval · Software architecture
1 Introduction We propose Pied Piper, a new search engine solely for pop music. Pied Piper works on 3 axes: lyrics, metadata, and audio. Traditional text-based search engines can only identify songs from the title or from lyrics. This is not ideal. For example, it could be hard to specify if we want a given word to be in the title, genre, artist name, or lyrics. Another case could be that if we have an audio snippet of an unknown song which we know was released before 2000, we cannot use both the audio snippet and the fact that it was released before 2000 in the same search. Pied Piper solves this by using the concept of meta search along the 3 aforementioned axes to deliver the best search results in an efficient manner. We detail Pied Piper’s methodology to search on lyrics, audio and metadata, and how to combine rankings on each of these axes to obtain a final ranking of songs based on the user’s input query. We also address error handling in audio matching and explain Pied Piper’s software architecture and efficient parallel pipeline for music search. Finally, we suggest ways to evaluate performance and propose additional features that can be integrated into Pied Piper’s existing design. P. Malhotra (B) · A. Rao International Institute of Information Technology, Hyderabad, India e-mail: [email protected] A. Rao e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_54
713
714
P. Malhotra and A. Rao
2 Related Work A survey on audio fingerprinting [3] characterized all techniques of audio fingerprinting using a general framework. Improvements to resilience made systems such as Shazam1 [9], Philips [6], and Microsoft [2] popular, especially for the music information retrieval task. Google has released popular audio recognition features in recent years. Now Playing [5] is a Shazam-like functionality on Pixel phones that recognizes songs from audio using an on-phone deep neural network. Sound Search [5] expanded this functionality to support over 100 million songs. The new hum-tosearch feature on Google Assistant allows a user to hum the tune of a song and matches this against Google’s database of songs using audio fingerprints. Báez-Suárez [1] proposed SAMAF, which uses a sequence-to-sequence autoencoder model consisting of long-short term memory (LSTM) [7] layers to generate audio fingerprints. SAMAF was shown to perform better than other audio fingerprinting techniques in the same paper.
3 Problem Formulation 3.1 Lyrics and Metadata Search Lyrics search refers to the retrieval of all songs which contain a given string in their lyrics, ranked by relevance. Similarly, we define metadata search as the retrieval of songs which have specific fields in their metadata, matching with fields in the given query. Lyrics and metadata search about music can be tackled as a text-based domain-specific search problem. Metadata parameters can include information like music name, album name, artist name, and release year.
3.2 Audio Search Audio search refers to the retrieval of all songs which contain the exact or perceptually similar audio to the given audio query, ranked by relevance. Audio search is an entirely different problem compared to text search due to audio being high dimensional. Perceptually similar audio samples could have substantial variance in their data streams. As a result, the comparison of one audio sample against all other audio samples is computationally very expensive and neither efficient nor effective. An audio fingerprint is a content-based compact signature that summarizes essential information about the audio sample. Instead of using the entire audio file for comparison, we use the audio fingerprint instead. 1
https://www.shazam.com/.
Pied Piper: Meta Search for Music
715
Traditionally, a meta search engine is defined as a search engine which collects results from other search engines (such as Google and Bing) and then makes a new ranking. We use this term in a slightly different way, as the independent search engines are internal parts of Pied Piper (we will discuss this in detail in Sect. 4). We run all searches for different fields in the input independently. This allows us to run them concurrently and deliver results as quickly as possible. An alternative we considered was using a transformer [8] and embedding a multi-dimensional query (for example, a query with song snippet, song name, and artist name) in a contextualized feature space, but this would turn out to be too slow as audio information is itself is high dimensional and difficult to encode.
4 System Architecture Pied piper has a frontend which accepts user query. The query can be divided along the 3 axes as mentioned earlier. Now, each sub-query is parsed and processed to generate a ranking and score for the document. All this information is passed into a merging step to generate the combined ranking, which, after filtering, gives us the final ranking. The system architecture is shown in Fig. 1.
4.1 Frontend The frontend of Pied Paper has an easy-to-use user interface. There is one input bar for lyrics, one button to record audio or upload an audio file, and a few more inputs for different kinds of metadata that the user knows. It is not compulsory for the user to fill up all the fields. The more fields the user fills, the more information is available to Pied Piper, leading to better results (Fig. 2).
Fig. 1 System architecture
716
P. Malhotra and A. Rao
Fig. 2 A sample UI for the frontend of Pied Piper. User can enter lyrics, record audio, search by artist name, and can get songs released before or after a given date
4.2 Backend 4.2.1
Database
The backend of the system is assumed to have a database of all songs along with all metadata of every song. This information can be easily scraped from a website like Youtube2 or Soundcloud.3 MusicBrainz4 is an open source music encyclopedia which can be used to collect metadata about songs.
4.2.2
Lyrics Search
For lyrics search, we use a text search engine using an inverted index (posting list) to allow for fast searches. The index contains the tokens and the N gram of lyrics as keys. Stop-words in lyrics are removed since they play no role in song identification. 2
https://youtube.com. https://soundcloud.com/. 4 https://musicbrainz.org/. 3
Pied Piper: Meta Search for Music
717
Fig. 3 Sample inverted index for lyrics
Each posting list will contain the song ID of the songs that contain those tokens. The value of n can be tuned according to the space requirements. If space is a constraint, the value of n can be reduced, at the expense of a slight decrease in accuracy. This is a trade-off between memory and processing (Fig. 3). A term frequency inverse document frequency (tf-idf) ranking system would work well in the case of lyrics. As the catchy parts of the songs are often repeated many times in a given song (like the chorus), the term frequency would be higher and would boost the ranking of relevant songs when they are searched for using these repeated portions of lyrics.
4.2.3
Metadata Search
Metadata search for textual features like track title is implemented in a similar fashion to lyrics search, but with minor modifications. For example, in case of title search, stop words are not removed since titles are usually short and stop words could have important meaning. For features like the release date, metadata search can act as a filter on the final results. If the input for date is before 2008, all the songs released after 2008 in the search results can simply be removed. For each type of metadata, a different ranking and document score is returned. Hence, search on each metadata type is treated as an independent process. The rankings for each metadata type will be merged in the merging step.
718
P. Malhotra and A. Rao
Fig. 4 Sample inverted index for audio fingerprints
4.2.4
Audio Search
We propose the use of SAMAF [1] to generate a set of 32 bit sub-fingerprints for a song (after breaking the song into many small-sized intervals). We store all these sub-fingerprints in the database and need an effective way to index them. Cha [4] proposed an effective way to index audio fingerprints. We adopt a similar search methodology and use an inverted index. Each sub-fingerprint is mapped to a list of songs that contain the given sub-fingerprint. Asampleinverted list is shown in Fig. 4. For each sub-fingerprint, we also generate i=1...n 32i sub-fingerprints to account for up to n toggled bits. Generating these fingerprints with toggled bits avoids situations when matching fails due to erroneous bits in the query fingerprint. Figure 5 shows how the inverted index looks after adding the generated sub-fingerprints. Adding the toggled bit increases the index size; however, we can achieve O(1) lookup time with B+Trees. The search with this index is done in a two-fold manner. First, there is a coarse search in which the all songs associated with the sub-fingerprints of the query are retrieved. Then, for the songs which have large number of sub-fingerprints matching, a fine search is performed. In fine search, the full fingerprint of the song is retrieved (this is a heavy operation) and the similarity between the retrieved and query fingerprint is calculated using Hamming distance between them. If the similarity is above a particular threshold, the song is reported as a result. The algorithm to search for audio fingerprints is shown in Fig. 6.
Pied Piper: Meta Search for Music
719
Fig. 5 The fingerprints generated with up to n toggled bits point to the same posting list as that for fingerprint A
Fig. 6 Search algorithm for audio fingerprints [4]
4.3 Merging and Creating the Final Ranking Once all 3 types of searches are complete, each of them returns a set of results along with a normalized score for each song in the result. We can run the aforementioned searches in parallel as they are independent of each other. This will boost the speed of the search engine significantly. If the user only enters one kind of input of query, then the result of that particular search can be shown as the final result. If the user has specified more than one type of field (such as both lyrics and audio) then we need to merge the results. We propose the use of weighted sum for this purpose. Suppose the score for a song r in the ith field is given by ri then,
720
P. Malhotra and A. Rao
FinalScore(r ) =
n
ci ∗ ri
(1)
i=1
where
n i
ci = 1.
The value of n depends on the number of features the user’s input has. For example, if the user gives an audio sample, artist name, track name and lyrics, then the value of n will be 4. The values of ci can act as hyperparameters which we can tune. In general, a feature like track name should have a higher weight than lyrics since we are less likely to find false positive matches in the song’s title as compared to its lyrics. The last step of creating the final ranking is to remove those results which do not fit a metadata filter (as discussed in Sect. 4.2.3). After filtering, we sort the results by their final scores in descending order and return the results to the user on the frontend.
5 Evaluation Each of the field searches can be individually evaluated by providing inputs only for that field. This can be used to tune the value of N in the N -gram index that we make for textual searches. For audio search, the thresholds in both fine and coarse search can be treated as hyperparameters and tuned to get better results. We can add white noise in various amounts to song recordings to test what level of noise starts creating erroneous results. Further, since the SAMAF framework uses deep learning, we could try to generate adversarial examples using the weights of the model.
6 Conclusion and Future Work In this paper, we propose a meta search engine for music which can effectively handle multi-dimensional input, be it lyrics, track metadata or track audio. There is always scope for improvement. The robustness of SAMAF to certain audio distortions such as pitch shifting and speed change is still weak, and other audio fingerprinting techniques which are more robust to these factors can be utilized. Oftentimes, people remember sentiments of the song. For example, whether the song was energetic or sad. It could be interesting to classify songs into different sentiments and themes using machine learning techniques and provide an option for such sentiment matching in the search engine as well. Another aspect to work on could be the personalization of the search engine based on users’ song listening history.
Pied Piper: Meta Search for Music
721
References 1. Báez-Suárez A, Shah N, Nolazco-Flores JA, Huang SHS, Gnawali O, Shi W (2020) Samaf: sequence-to-sequence autoencoder model for audio fingerprinting. ACM Trans Multimedia Comput Commun Appl 16(2). https://doi.org/10.1145/3380828 2. Burges C, Platt J, Jana S (2003) Distortion discriminant analysis for audio fingerprinting. IEEE Trans Speech Audio Process 11:165–174. https://doi.org/10.1109/TSA.2003.811538 3. Cano P, Batlle E (2005) A review of audio fingerprinting. J VLSI Sig Process 41:271–284. https://doi.org/10.1007/s11265-005-4151-3 4. Cha GH (2012) An effective and efficient indexing scheme for audio fingerprinting. Int J Inform Technol Commun Convergence 2(3):268–280. https://doi.org/10.1504/IJITCC.2012.050414, https://www.inderscienceonline.com/doi/abs/10.1504/IJITCC.2012.050414, pMID: 50414 5. Frank C (2020) The machine learning behind hum to search. https://ai.googleblog.com/2020/ 11/the-machine-learning-behind-hum-to.html 6. Haitsma J, Kalker T (2002) A highly robust audio fingerprinting system. In: ISMIR 7. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 8. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. NIPS’17, Curran Associates Inc., Red Hook, NY, USA, pp 6000–6010 9. Wang A (2003) An industrial strength audio search algorithm
Securing a SaaS Application on AWS Cloud Siddhartha Sourav Panda, Nishanth Kumar Pathi, and Shinu Abhi
Abstract In the last few years, we have seen tremendous growth in cloud adoption and migration, especially the new emerging companies and startups in multiple domains who are embracing cloud technologies to avoid the on-premises costs of maintaining the servers. As organizations grow and invest in digital transformation every year, the cloud is becoming an ever more crucial part of the organization and getting integrated with multiple core services. It is highly recommended that they look at their cloud security components and make them secure and robust to avoid a cyber-attack. Year after year IT world has been witnessing a series of news headlines and data leaks that occurred because of cloud architecture misconfigurations. In this article, the authors demonstrated a few secured architectures on Amazon Web Services (AWS), which is one of the top cloud service providers in the world. This paper’s target is to educate and set up a guideline for a secured architecture baseline on AWS cloud adoption for new or existing customers to review their architecture and encourage them to deploy the security components on AWS. This paper provides a brief overview of the various architectures proposed and implemented that can act as a solution for handling the various issues related to cloud computing, especially cloud security. Keywords Cloud computing · Cloud security · Misconfiguration · Amazon web services (AWS) · Secured architecture
S. S. Panda (B) · N. K. Pathi · S. Abhi RACE, REVA University, Bangalore, India e-mail: [email protected] N. K. Pathi e-mail: [email protected] S. Abhi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_55
723
724
S. S. Panda et al.
1 Introduction 1.1 Cloud Computing Cloud computing is an on-demand delivery of computing power, database, storage, applications, and other IT resources via the Internet with pay-as-you-go subscription pricing. These resources run on server computers in large data centers owned by cloud service providers in different locations around the world, known as regions. In case of AWS, these regions contain multiple availability zones connected through AWS’s high-speed intranet for higher reliability and availability. When a user uses a cloud service provider service like AWS, that service provider owns the computers and servers that users are using. These resources can be used to build solutions that help meet business goals and satisfy technology requirements. There are 3 major types of cloud computing as a service known as Infrastructure as a service (IaaS), Platform as a service (PaaS), and Software as a service (SaaS). IaaS is a type that offers essential storage, network, and compute resources on demand on a pay-as-you-go basis. PaaS is a complete development and deployment environment in the cloud with resources that enable you to deliver everything. Services in the Software as a service (SaaS) category provide a user with a completed product or application for direct use that the service provider runs and manages. In most cases, software as a service refers to end-user applications. With a SaaS offering, users do not have to consider how the service and servers are maintained in data centers or how the underlying infrastructure is managed by cloud service providers [1]. A cloud-based application is fully deployed in the cloud, and all parts of the application run in the cloud. However, multiple problems persist while using cloud computing services during migration. Many organizations are not sure about CSP’s trust ability since all the data of companies remain online on the cloud, and anyone from anywhere can easily access that data, even leading to sensitive and consumer data loss [1]. The cloud is integrated with multiple settings, policies, assets, and interconnected services and resources, making it a highly sophisticated environment and complicated to fully understand and properly set up. This is especially true for organizations that have been pushed to migrate quickly to the cloud to be in the market competition, and due to the current pandemic situation with most staff working remotely. With multiple advanced technologies being introduced in the cloud, the architectures are getting more complex which affects multiple security aspects of the architectures. Unfortunately, when organizations start using any new technology too quickly without fully understanding its features, security misconfigurations could occur, thus opening vulnerable backdoors for attackers. Research from Vectra analysis indicated that every company they surveyed recently had at least one cloud security issue in its cloud environment. Client misconfiguration on cloud services is the fundamental cause of over 99% of cloud breaches reported recently [2]. About 90% of Amazon S3 buckets are vulnerable to ransomware attacks, an analysis survey by Ermetic has revealed [3]. Similar are
Securing a SaaS Application on AWS Cloud
725
the cases in Elastic Compute Cloud, which are exposed to the Internet on admin ports and data stored on the storage as non-encrypted [4]. The databases deployed on the cloud-offered relational database services are most of the time deployed on public subnets and exposed to the Internet for ease of admin operations with no data encryption on them [5]. Cloud computing also has other issues like privacy concerns, compliance, security concerns, sustainability, higher costs, and lack of reliability in providing services. Data at Rest and Data in Transit are two types of data that can expose a security risk in the cloud. The term “data at rest” refers to data that is stored in the cloud or that may be accessed via the Internet. This includes both backup and production data. The data stored in AWS S3, EBS, and EFS are considered data at rest. The term “data in transit” refers to data that is being moved in and out of the cloud. This data can be in the form of a cloud-based file or database that can be requested for use at a different location. When data is uploaded to the cloud, it is referred to as data in transit. Data in transit is considered riskier than data at rest since it moves across the wide Internet and around enterprise networks. It’s also more likely to be exposed to third parties and vendors if not handled carefully [6]. This paper deployed a couple common web application architectures on AWS and integrated multiple advanced AWS security services to enhance their security posture. The implementations are tested with common cyber-attack mechanisms under lab conditions and the AWS services behavioral action on them are recorded. The services were able to either detect or prevent the attacks hence providing an optimized secured solution on AWS to leverage.
2 AWS Cloud Architecture 2.1 Working on AWS Being a major cloud service provider, AWS provides a highly secured infrastructure and offers a variety of security services. These AWS-offered services are updated to the current security trends and multiple user-recommended functionalities are introduced in subsequent versions. Moreover, a user doesn’t need to consider maintaining or managing the data centers as AWS manages that accordingly in multiple regions and availability zones. It also provides a bunch of different services like computing, storage, networking, database, data processing, analytics services, etc. to host an endto-end application based on user requirements. Users can easily trade capital expenses for variable expenses. The services can be subscription-based or pay-as-you-go ondemand pricing. Due to these benefits, AWS is trusted by startups, entrepreneurs, and small and medium companies for their architecture and application hosting. Hosting their infrastructure on the cloud would save them huge monetary expenses because with cloud computing they will have the choice to only rent the necessary computing power, storage space, and communication capacity from a large cloud
726
S. S. Panda et al.
computing provider that has all these assets connected to the Internet. The AWS security services can be integrated with their architecture and can be pay-per-use or reserved cost. Auto-scaling also equips with guessing the capacity of data when the load on servers increases in short periods. Additionally, organizations can easily go global in minutes without spending money or running and maintaining data centers. AWS Cloud Adoption Framework (CAF) provides guidance and best practices to help organizations identify gaps and processes in their current architecture. It also helps organizations build a comprehensive and secure approach to cloud computing both across the organization and throughout the IT lifecycle to accelerate successful cloud adoption [1].
2.2 Security on AWS AWS is a secure cloud platform that offers a broad set of security-oriented products. These services are AWS-managed which means AWS takes responsibility for updating them and patching them to current standards. AWS integrates security in compute, storage, network, database, and other IT resources and recommends exploring these options to improve the security posture of the architectures. One main concern of using the cloud is data privacy and security, especially for users with sensitive financial or customer data that would be devastating and destructive to the client if it were compromised. Cloud computing also attracts the attention of attackers and raises many security concerns as complex architectures get deployed on them which can open potential backdoors and vulnerabilities. AWS cloud security is an emerging sub-domain of network security, computer security, and more vastly information security. It points to a broad set of technologies, policies, and controls deployed to protect data, applications, and the associated infrastructure of cloud computing. Confidentiality, integrity, and availability are three key concepts in information security. They are also very important in AWS because applications are deployed in a shared network environment. Confidentiality assures customers that the information being stored offsite on AWS storage (EBS, EFS, S3, RDS) can only be accessed by authorized persons. The integrity of the data that is transferred to the AWS cloud environment is to ensure that the data has not been corrupted or tampered with during transit. Finally, availability is the data and services that are available whenever it is needed. Security and compliance are a shared responsibility between AWS and the customer. This shared model can help ease the customer’s operational burden as AWS operates, manages, and controls the components from the host operating system and virtualization layer down to the physical security of the facilities in which the services and servers operate. The customer assumes responsibility and management of the guest operating system (including updates and security patches), other associated application software as well as the configuration of the AWS-provided security group firewall [7].
Securing a SaaS Application on AWS Cloud
727
In this paper, the authors will discuss various security controls which are deployed on a couple of architectures when migrated or created on an AWS cloud environment. Various AWS-offered services are integrated to mitigate security concerns on the cloud. Security measures and mechanisms could be developed and customized according to the client’s organizational needs and must be upgraded frequently to prevent the occurrence of any cyber-attacks. The architectures proposed here have been integrated with AWS-offered security services which improve the security posture of the SaaS applications and protect them from various cyber-attacks.
2.3 Literature Survey on AWS Architectures Configuration Issues and Development The literature review describe how an unsecured Cloud configuration can lead to cyber-attacks hence causing damage to confidential data leaks and reputation damage. A few authors encourage securing each AWS component and hence the complete architecture remains secured with an overall secured posture. • AWS architecture is designed to be safe and secure, but it is up to the users to secure their respective cloud environments and applications. Top data leaks in 2021 provide insights into how simple AWS S3 misconfiguration can lead to sensitive and critical data leaks causing huge data loss and reputation damage. Fixing misconfigurations takes time considering the complexity of the architecture but it should be of the highest priority and organizations must act fast [2]. Multiple AWS services are listed on top AWS misconfiguration lists provided by Trend Micro 2021 report. For example, EBS data storage when non-encrypted, and S3 bucket storage when publicly exposed, made into the top 3 misconfigured rules [8]. • Data theft is not the only potential threat to S3 buckets. Attackers can also lead to a Ransomware attack encrypting valuable files and demanding compensation illegally. Most S3 buckets are publicly exposed and there is no object scanning or antivirus scan mechanisms on the bucket or sensitive data scan mechanisms deployed when new files are uploaded which might be malicious [9]. • Databases deployed on AWS RDS can also be vulnerable to attacks when publicly exposed. DB admins open port 3306 open on RDS DB endpoints to connect to the internet and do admin tasks but that opens the attack surface for attackers to take advantage of as well. The DB port should never be exposed on the internet and all data on DB should be stored encrypted with keys [5]. • AWS recommends configuring proper IAM policies, bucket policies, MFA, delete protection, bucket versioning, data encryption, logging and monitoring, and regular backups for sensitive and critical data. So it is highly recommended to audit and monitor the bucket configuration frequently to make sure they have secured configuration [10].
728
S. S. Panda et al.
• EC2 instances that are only privately available should not be exposed to the public Internet especially the webserver, software deployment instances, databases installed, etc. The admin maintenance ports should not be exposed to the Internet. Security groups and access control lists should be thoroughly reviewed frequently. The data storage disk should remain encrypted with keys and applications to be scanned for any potential patches [4]. • AWS provides the best AWS security features and guidelines on how to utilize them. It provides best practices for configuration on the AWS cloud. AWS security is a very important pillar in AWS Well-Architected Framework and defines clearly the security foundations and design principles while adopting the cloud [11]. • According to cloud security alliance, the top 12 cloud security threats have been captured. They are data leakage, hacking interfaces, and APIs, permanent data loss, lack of awareness, abuse of cloud services, the vulnerability of the systems used, account theft, insiders, targeted cyber-attacks, compromised accounts and bypassing authentication, DDoS attacks, joint technologies, common risks [12]. • Misconfigured Amazon S3 buckets can lead to sensitive data leaks and ransomware attacks. When S3 buckets are configured as accessible from the public Internet and the objects are accessible then the attacker can get control of the bucket and upload malicious files. DDoS attacks can also be performed on the bucket endpoints exposed to the Internet [10]. • The cloud is comprised of multiple complicated settings, policies, assets, and interconnected services and resources, making it a sophisticated environment to fully understand and properly set up. Organizations are migrating to the cloud without fully understanding each component being integrated and this might cause misconfiguration in a few components [8]. • This research paper explains how big data processing is done using advanced AWS services and how the data can be kept secure on those services. AWS services can secure huge data encrypted and process them efficiently [13]. • Cloud service providers spend a lot of effort and cost to enhance the security of the data flowing through them. This paper analyses the storage encryption and data integrity concepts along with the performance of cloud services [14]. • Cloud provides multiple options of either key generating, storing, and encrypting with their native solutions or users can encrypt them with any advanced algorithm before sending them to the cloud. This paper explains how the data can be encrypted at the user’s end before storing it in the cloud [15]. • Along with secured architecture and data security, it is required to have secure authentication to access cloud resources. In the architectures, the authors have proposed the usage of an Identity Provider service like Cognito which handles users’ registration and authentication. This paper briefly explains the advanced Kerberos authentication with LDAP role-based access control which can be integrated with cloud computing applications [16]. • This review paper describes various benefits of migrating to the cloud and a few security issues with the cloud as well. As it depends on the organization’s requirements if the advantages of the cloud benefit them, they can plan on migrating to the cloud [17].
Securing a SaaS Application on AWS Cloud
729
• In the proposed architectures the authors have integrated AWS Guard Duty threat intelligence device for alerting and capturing malicious attempts. It is also desirable if an organization would like to integrate an additional IDS/IPS with the secured architectures [18]. • This paper describes a file transfer application and architecture which can be deployed on the cloud. During disaster recovery or on-premises device damage or loss, it is a great recommendation to keep a data backup on the cloud as cloud storage services are reliable, secured, and encrypted [19].
3 Architectures Overview 3.1 Architectures Deployment Methods Proposals This paper proposes two different architectures for hosting a web application or a web server on the AWS cloud. The architectures explain methods of how an organization can migrate their web application to the cloud and how to integrate the services offered by AWS to secure them. In this paper, the authors will be explaining the architecture diagrams and the components involved. Organizations can refer to the same for their application hosting on AWS. The first architecture in Fig. 1 describes a web application deployed on an EC2 instance that is publicly accessible through Application Load Balancer and CloudFront CDN. The instances are spread across availability zones and use auto-scaling. The EC2 webserver interacts with the RDS database on private connection for web application users’ data and server configuration stored in a database. Users will be connecting to CloudFront URL on TLS connection with DNS hosted by Route 53 and certificate approved in ACM. The web CloudFront is protected by WAF for any web-targeted attacks. Other AWS security components have been integrated with the instance for added security features like Guard Duty, Inspector, etc. All data storage on EBS, EFS, S3, and RDS are encrypted with keys stored in KMS. Admins can take control of EC2 instances connecting through Bastion Hosts. Based on organization requirements, site-to-site VPN with an IPsec tunnel can also be configured for VPN connection from the on-premises. The following AWS services are integrated into the web application: • • • •
EC2 instances: The server is deployed on them. RDS: The database service to store applications and users’ data. S3: Bucket service to store files, objects, etc. Auto-scaling: The auto-scaling feature is enabled on the instances to make sure the application is not slowing down due to overload. • ALB: Application Load Balancer is a fully managed layer 7 load balancing service that load balances incoming traffic across multiple targets defined. • CloudFront: It is a CDN service that speeds up the distribution of web content across multiple regions.
730
S. S. Panda et al.
Fig. 1 Web application architecture on EC2 web server
• CloudWatch: It is a monitoring and management service that provides data and logs insights for AWS. • CloudTrail: It enables auditing, security monitoring, and operational troubleshooting by tracking user activity and API usage. • Guard Duty: It is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect AWS accounts. • Inspector: It is an automated security assessment service that helps identify the vulnerabilities and common misconfigurations in instances. • Security Hub: It is a centralized security management service that performs security best practice checks and aggregates logs from multiple services. • Detective: It automatically collects log data from AWS resources and uses machine learning and statistical analysis to build a linked set of data. • WAF: It is a web application firewall that helps protect web applications against common web exploits and thus enhances security. • Shield: It is a DDoS protection service that safeguards applications running on AWS. Shield Advanced has advanced features to control DDoS. • Route 53: It is a highly available and scalable cloud domain name system service in AWS. • ACM: It handles the process of creating, storing, and renewing public and private SSL/TLS certificates and keys that protect AWS resources. • KMS: It is a secure service that uses hardware security modules to protect keys. It can be used to encrypt and decrypt data. • Cognito: It helps in easily adding user sign-up and authentication to web applications.
Securing a SaaS Application on AWS Cloud
731
Fig. 2 Web application deployed on S3 bucket
• VPC Endpoint: It enables private connection from VPC to eligible AWS services without routing on the Internet. • Bastion: It is a server that helps admins to provide access to a private network from an external network. • Backup: It is a fully managed backup service that helps to centralize and automate the backup of data across AWS services. The second architecture as in Fig. 2 describes a web application hosted on an S3 bucket and exposed to the public Internet with help of CloudFront. The connection from users to CloudFront DNS is secured with TLS, and the domain is hosted on AWS Route 53, which takes the certificate signed by Amazon CA on ACM. The data on the S3 bucket is stored encrypted with keys stored in KMS. The S3 bucket is integrated with an antivirus solution using Lambda and API Gateway that scans for every object uploaded and uses SNS to send notifications to admins if an infected file is found. Other AWS security components have been closely integrated with the architecture for added security posture. The following AWS services are integrated into the web application: • S3: Bucket service, which stores the web application code objects. • CloudFront: It is a CDN service that speeds up the distribution of web content. • CloudTrail: It enables auditing, security monitoring, and operational troubleshooting by tracking user activity and API usage. • Macie: It is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect sensitive data. • API Gateway: It is an AWS service for creating, publishing, maintaining, monitoring, and securing REST, HTTP, and WebSocket APIs. • Lambda: It is a serverless compute service that runs code in response to events and automatically manages the underlying compute resources. • Bucket Antivirus: It is a service integrated with S3 buckets and scans any object as soon as uploaded for malware when events are generated. • SNS: SNS service is enabled to send notifications via email as soon as an infected file is found on the S3 bucket. • Route 53: It is a highly available and scalable cloud domain name system service in AWS.
732
S. S. Panda et al.
• ACM: It handles the process of creating, storing, and renewing public and private SSL/TLS certificates and keys that protect AWS resources. • WAF: It is a web application firewall that helps protect web applications against common web exploits and thus enhances security. • Cognito: It helps in easily adding user sign-up and authentication to mobile or web applications. • Backup: It is a fully managed backup service that helps to centralize and automate the backup of data across AWS services. • Glacier: It is a low-cost cloud storage service to provide storage for data archiving and backup of cloud data.
4 Results and Conclusion Multiple architectures on AWS have several risks like data leakage, components misconfiguration, account hijacking, insecure interfaces, APIs, and many more, but by using AWS effectively and deploying secured configuration, users can effortlessly mitigate a lot of these problems. Secure integration of AWS services can help in securing infrastructure, logging, auditing, services monitoring, etc. Although there is not a single solution for all issues but from a user perspective, one needs to know the commonly occurring misconfigurations to mitigate them before malicious attackers get to exploit them and cause more significant harm. A lack of awareness of cloud technology and cloud-provided services can cause these security misconfigurations. With multiple services being developed and released by the cloud frequently, it is recommended to understand them. In addition to that, AWS offers a variety of other services, which indicates that users do not need to think about processing, data storage, security, and integrity. AWS strongly recommends utilizing them to safeguard resources from ever-improving cyber-attacks. Cloud security should be the highest priority for organizations and cloud service providers. It includes the ability to protect data, systems, and assets by improving user security. Cloud service providers build security into the core of their cloud infrastructure and offer multiple advanced services to help organizations meet their unique security requirements in the cloud. The architectures demonstrated in the paper have been tested under lab conditions and simulated scenarios to mitigate multiple cloud attack threats and protect the architecture from various cyber-attacks. All AWS-provided security components have been leveraged to have a great security posture which can be referred to while protecting new or existing infrastructures. The logging and monitoring mechanisms proposed can capture any abnormal behavior, which helps in detecting incidents and attacks before extended damage. The architectures can be referred to for multiple applications to leverage these secured AWS services for secure integration. Details provided in Table 1. describe various actions simulations performed under Lab premises on the secured architectures and the AWS services mentioned were able to detect/prevent them.
Securing a SaaS Application on AWS Cloud Table 1 Lab simulation on AWS-secured architectures
Simulation scenarios
733
AWS security service detected/prevented
Port scan
Detective, guard duty
Root privilege escalation
Detective, guard duty
Malware file upload
Bucket AV
Instance vulnerabilities
Inspector
Brute force attack
Detective, guard duty
SQL, XSS, and web app attacks WAF Sensitive data upload
Macie
DDoS
Shield
Malicious IP access
WAF
Unauthorized config change
AWS config
Monitoring, user actions logging
CloudWatch, Cloudtrail
In Cybersecurity, the main weakness is the human itself. Since cloud technologies are advanced and, new products and security features are being published in a short period. It is of utmost priority that the user understands the solutions and services before deploying them to production. After all, customer and financial data are very sensitive and critical, and everyone would like to keep their private data safe and secure. A critical data loss can lead to business slowdown and reputation damage, so it should be the organization’s utmost priority to safeguard them.
References 1. Singh T (2021) The effect of Amazon web services (AWS) on cloud-computing. Int J Eng Res Technol 10(11):480–482. https://www.ijert.org/research/the-effect-of-amazon-web-servicesaws-on-cloud-computing-IJERTV10IS110188.pdf 2. Nath O (2021) Top 5 AWS misconfigurations that led to data leaks in 2021. https://www.spi ceworks.com/it-security/cyber-risk-management/articles/aws-misconfigurations-2021/ 3. Nath O (2021) What makes AWS buckets vulnerable to ransomware and how to mitigate the threat. https://www.spiceworks.com/it-security/cyber-risk-management/news/aws-vulner able-to-ransomware-attacks/ 4. Mahajan A (2021) 4 most common misconfigurations in AWS EC2 instances. https://kloudle. com/blog/4-most-common-misconfigurations-in-aws-ec2-instances 5. Cloudanix (2021) 15 Top Aws Rds misconfigurations to avoid in 2022. https://blog.cloudanix. com/top-15-aws-rds-misconfigurations-2022/ 6. Lord N (2019) Data protection: data in transit vs. data at rest. https://digitalguardian.com/blog/ data-protection-data-in-transit-vs-data-at-rest 7. AWS (2021) Shared responsibility model. https://aws.amazon.com/compliance/shared-respon sibility-model/ 8. Trend (2021) Top 10 AWS security misconfiguration. https://www.trendmicro.com/en_us/dev ops/21/k/top-10-aws-security-misconfigurations.html
734
S. S. Panda et al.
9. Votiro (2021) How misconfigured Amazon S3 buckets can lead to a ransomware attack. https:/ /securityboulevard.com/2021/04/how-misconfigured-amazon-s3-buckets-can-lead-to-a-ran somware-attack/ 10. Gietzen S (2021) S3 Ransomware part 2: attack vector. https://rhinosecuritylabs.com/aws/s3ransomware-part-2-prevention-and-defense/ 11. AWS (2020) Security pillar AWS well-architected framework 12. Malik S (2021) Top 12 cloud security threats according to Cloud Security Alliance. https://bit bytes.io/cloud-security-threats/ 13. Anand Mishra GK (2021) Big data analytics options on AWS. Int J Eng Res Technol 10(April):29 14. Patil Nikhil N, Mapari Rahul B (2014) A comprehensive survey on data integrity proving schemes in cloud storage. Ijarcce 8163–8166. https://doi.org/10.17148/ijarcce.2014.31019 15. Phapale A (2016) A novel approach for securing cloud data using cryptographic approach, pp 296–299 16. Eltayb NI, Rayis OA (2018) Cloud computing security framework privacy security. Recent Innov Trends Comput. http://www.academia.edu/download/56698026/1519625123_26-022018.pdf 17. Ujloomwale MN, Badre MR (2014) Data storage security in cloud. IOSR J Comput Eng 16(6):50–56. https://doi.org/10.9790/0661-16635056 18. Thomas G, Janardhanan P (2012) Intrusion tolerance: enhancement of safety in cloud computing. Ijarcce Com 1(4):238–242. http://ijarcce.com/upload/june/8-IntrusionTolerance Enhancement.pdf 19. Akash BV, Murugan R (2022) Authenticated transfer of files with storage and backup within a cloud environment. Int J Eng Res Technol 11(02):259–260
Optimized Energy Efficient Task Scheduling in Fog Computing Shilpa Dinesh Vispute and Priyanka Vashisht
Abstract Recently, continuous growth in use of Internet of Things (IoT) produce huge amount of data during processing, which increases load on Cloud Computing network. Cloud Computing does not support low latency for real-time applications. To overcome such drawbacks like low latency, lower response time, IoT-based application moves towards Fog Computing. The ideal solution for these IoT applications is provided by Fog and Cloud Computing. It enables user to process data near to end user. Fog environment also supports low latency and low response time than Cloud environment. Fog Computing allows to schedule application tasks on various Fog node which are available in the network. Fog nodes have limited resources than Cloud Computing hence scheduling tasks among Fog nodes becomes essential. In this paper, a scheduling algorithm, namely, Leveraging Energy for Task Scheduling in Fog Computing using optimization (LETSO) has been proposed. To check the efficiency of proposed algorithm two vital parameters, i.e. makespan and energy is considered. The simulated result of LETSO shows better result by minimizing the makespan time and reducing the energy consumption as compared to Bee Life Algorithm (BLA) and Modified Particle Swarm Optimization (MPSO). Keywords Task scheduling · Fog computing · Cloud computing · Particle swarm optimization · Internet of Things (IoT)
S. D. Vispute Department of Computer Science and Engineering, The NorthCap University, Gurugram, Haryana, India e-mail: [email protected] P. Vashisht (B) Department of Computer Science and Engineering, Amity University, Gurugram, Haryana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Lecture Notes in Networks and Systems 680, https://doi.org/10.1007/978-981-99-2602-2_56
735
736
S. D. Vispute and P. Vashisht
1 Introduction IoT development has increased data processing, which poses a challenge to Cloud Computing. Even though Cloud Computing has a lot of storage and processing power, it doesn’t enable Internet of Things (IoT) applications since IoT need fast response time and low latency [1]. Utilizing both Cloud Computing and Fog Computing helps to achieve IoT application requirements. Fog Computing is distributed network of Fog nodes which has limited resources. Effective use of resources can be achieved by proper task scheduling. IoT applications can be divided into tasks. Tasks are schedule on suitable Fog node [2]. While scheduling, consideration of some parameters are necessary like makespan, energy consumption. Therefore, it is crucial to concentrate on task scheduling algorithms that assess energy usage and makespan [3]. This study shows evaluation and comparison of three task scheduling algorithms, i.e. LETSO, Bee Life Algorithm (BLA), Modified Particle Swarm Optimization (MPSO). This research shows evaluation study of task scheduling algorithms to optimize makespan and energy. The remainder of the paper is organized as follows: An overview of literature review is given in Sect. 2. In Sect. 3, the proposed Leveraging Energy for Task Scheduling in Fog Computing using optimization (LETSO) has been elaborated. In Sect. 4, results are discussed to check the effectiveness of the proposed algorithm. The last section contains a conclusion as well as some future directions.
2 Literature Review Fog Computing combines with Cloud Computing to offers computing, storage and networking capabilities at network edge. Fog Computing is three-tier architecture as depicted in Fig. 1. Lower layer consists of various IoT devices, which produced huge volume of data. This data needs to be processed on distributed network. The middle layer also called as Fog network is formed by Fog nodes having resource ability [4]. Fog network is a distributed network of Fog nodes that functions much like a Cloud server but with less storage, compute, and networking power. The important role of Fog node is to provide services at edge of network. Though Fog nodes have limited ability for storing and processing but for real-time application Fog nodes processed request with low latency. Long term data as well high computational ability of Cloud network is being leveraged at the topmost layer [5]. The top layer is the cloud network with huge computational capacity and storage. Recently many researchers proposed novel techniques for task scheduling in Fog Computing. Literature review table of studied research papers is as shown in Table 1. Basette et al. [6] introduced two phase improved elitism genetic algorithm (IEGA) for Fog Computing. First phase is useful to find out suitable resources. Second phase describe certain probabilities to search solution. IEGS algorithm when evaluated with
Optimized Energy Efficient Task Scheduling in Fog Computing
737
Cloud Layer
Fog Layer
IoT Layer
Fig. 1 Fog computing architecture
similar existing algorithms shows better result. Evaluation parameters makespan, energy, CO2 emission are used. Kalantary et al. [7] introduced hidden Markov Chain Learning method. Markov Chain Learning method predicts future need of resources. Sensors and status matrix useful for knowledge of workload and resource. Resource allocation based on Markov table. Resource efficiency, data locality, task completion, energy consumption of physical machine is considered for evaluation. Experiment is carried out in CloudSim Simulator. Gu et al. [8] proposed resource allocation technique which based on reputation mechanism. The Fog server determines reputation and find out suitable resources for fog layered community. Fog node reputation is calculated by historical behaviour of node. Fog server records the internal reputation and indirect reputation of Fog nodes. The internal reputation of Fog nodes is depend on characteristics like as Fog node processing power, memory capacity, bandwidth capacity etc. This reputation is useful for task allocation. Fadahunsi [9] introduced virtual Fog resolver for Fog Computing resource management. The authors proposed probing-based request distribution algorithm based on responses for requests. If home Fog is overloaded or fail, it determines if backup Fog is preferable. If a Fog network fails due to congestion, the Cloud is used to serve application requests. To solve the allocation optimization model, the authors use the IBM CPLEX optimization solver. The proposed VFR performs better than other various distribution techniques. Bhatia et al. [10] presented QCI neural network model to choose a certain node with the capacity to carry out heterogeneous tasks. The processing power of the Fog node explains the Fog node’s ability. In a Fog environment, the quantomized
738
S. D. Vispute and P. Vashisht
Table 1 Summary of various task scheduling algorithms Research work
Research idea
Research findings
[6]
An improved Consider near local elitism genetic minima to find out algorithm (IEGA) solution
[7]
Resource discovery using Markov learning model
Probability of needed Resource efficiency, resource in future is task completion, calculated energy consumption of physical machine
[8]
Reputation-based resource allocation
Reputation value of Success rate, service User reputation Fog node used which delay, reputation and energy not depend on historical value considered behaviour
[9]
Heuristic-based virtual fog resolver
Probing-based request distribution used
Resource utilization, Home Fog response time change based on their location not considered
[10]
QCI neural network model
Quantumized strategy schedules diverse tasks
Temporal delay, Energy doesn’t prediction efficiency considered
[11]
Probed-based algorithm
Depends on ad hoc neighbour status
Dropping Probability, control overhead
More number of evaluation parameters can give more reliable solution
[12]
Heuristic-based load balancing scheduling algorithm
Data centres are chosen based on the smallest load and the closest location
Response time, network usage, task load, delay factor
Energy consumption for Fog node not considered
[13]
A novel algorithm Optimum scheduling based on the Tabu solution is provided search using Tabu search
Execution time, latency, allocated memory, and cost
Energy doesn’t considered
[14]
Interval division Genetic scheduling algorithm with penalty factor
Cost, delay, balance Doesn’t consider factor for Fog cluster energy consumption for Fog node while task scheduling
Penalty factor included for Cloud-Fog scenario
Evaluation parameters used
Research gaps
Makespan, energy, carbon dioxide emission rate, flow time
Consider only Fog node energy consumption Can improve performance if makespan was considered
strategy schedules diverse tasks. Proposed algorithm simulated in iFogSim simulator evaluating delay parameter. Beraldi et al. [11] suggested a load balancing technique for Fog Computing under loosely correlated state information. The approach is based on ad hoc neighbour status probing. This paper focuses on the effects of delays during the probing phase. Three algorithms a probe-based algorithm, a local processing method, and a sequential processing algorithm were constructed by the authors. Authors also
Optimized Energy Efficient Task Scheduling in Fog Computing
739
suggested a model to support stale information which is used in probe-based algorithm. Experimental performance evaluated using a mathematical model and the Omnet++ simulator. Singh and Auluck [12] suggested Heuristic scheduling algorithms namely minimum distance, minimum load, and minimum hop distance. Job placement done on data centre based on its priority. Data centres are chosen based on the smallest load and the closest location. When simulating the suggested algorithm in iFogSim, evaluation criteria such response time, network use, task load, and delay factor are used. The authors explain the suggested algorithms using a scenario of a smart society. Memari et al. [13] suggested a meta-heuristic algorithm that considers latency and cost for Fog Cloud architecture. Proposed algorithm used the Tabu search engine. The optimum scheduling solution is provided using Tabu search in conjunction with the Appropriate Nearest Neighbour (ANN) and Fruit Fly Optimization (FOA). Execution time, latency, memory, and cost are taken into account as evaluation factors. Real-time experiments are conducted to test the proposed algorithm. Zhou et al. [14] suggested Fog Computing-based framework for smart factory. Two scheduling techniques were suggested by the authors for scenarios involving only Fog and scenarios including both Fog and Cloud. Interval division genetic scheduling algorithm (IDSGA) is only utilized in scenarios with Fog resources to optimize tasks. The authors suggested the interval division genetic scheduling algorithm with penalty factor (IDSGA-P) algorithm, which took into account the penalty factor to improve resource allocation, for a hybrid Cloud and Fog scenario. Most of the literature proposed task scheduling methods for Cloud-Fog platforms without considering significant differences between Cloud servers and Fog nodes. In terms of characteristics like computational power, energy consumption, etc. Cloud and Fog servers are distinct from one another. Task scheduling algorithms which are used in Cloud Computing may not be applicable for Fog environment to achieve minimum makespan and energy. When scheduling tasks on a Fog network, most task scheduling algorithms do not take energy into account as an evaluation factor. Proposed algorithm used makespan and energy consumption to schedule task efficiently.
3 Proposed System Model In Fog Computing, computation is done on Fog nodes to fulfil user requests. Let’s assume that the proposed system has n Fog nodes. These Fog nodes are located close to the IoT layer. Consequently, Fog nodes are directly connected to IoT devices. All queries from IoT devices are handled by the Fog broker in Fog Computing. Requests are evaluated by the Fog broker before being sent to the relevant Fog node for processing. A novel Leveraging Energy for Task Scheduling in Fog Computing using optimization (LETSO) is put forth in this study. Fog broker includes LETSO, which
740
S. D. Vispute and P. Vashisht
Fig. 2 Working of proposed algorithm for Fog computing
offers a superior work scheduling solution based on timeliness and energy usage. Working of proposed LETSO is illustrated in Fig. 2. The request is made by IoT devices or end user devices to get it processed. These requests are forwarded to nearest Fog nodes, which further send request to Fog broker as shown in step 2. Fog broker is responsible for converting the jobs into task by splitting big jobs into more manageable task as illustrated in step 3. The smaller tasks are being analysed by the broker for estimation of computation requirements by the task. Tasks are then put in poll of tasks. Based on computational requirement tasks are being scheduled to various cloud nodes. As shown in step 6 of Fig. 2, the Fog broker assigns tasks to available cloud nodes. Execution of the smaller tasks are done on distributed cloud environment. The result of executed tasks are send back to Fog broker. The result of the all the compiled tasks is aggregated by the Fog broker, once all tasks have been completed. The results are then delivered to the user through the connected Fog node as shown in step 10. Since the amount of time spent in this communication between the Fog broker and nodes is so minute, it is almost negligible.
Optimized Energy Efficient Task Scheduling in Fog Computing
741
3.1 Leveraging Energy for Task Scheduling in Fog Computing Using Optimization (LETSO) The proposed Leveraging Energy for Task Scheduling in Fog Computing using optimization algorithm based on Particle Swarm Optimization (PSO). PSO is a popular meta-heuristic algorithm. Particle swarm optimization assigns a position, velocity, and fitness value to each particle. Following are specifics of PSO algorithm which is used in this research work [15]. Requests sent by IoT devices to the Fog layer are broken down into tasks and handled on any available Fog nodes. Each task is identifiable by its characteristics, such as the computational power, memory usage, size of I/O. These tasks can be represented as shown in Eq. (1): T = {t1 , t2 , t3 , t4 . . . tk }
(1)
where tk represents the kth task at a given instance of time. The infrastructure consists of Fog and Cloud nodes having attributes such as processor, memory usage, bandwidth usage. These nodes are the processing unit of all the tasks and can be represented as: N = {n 1 , n 2 , n 3 . . . n m }
(2)
One or more tasks can be executed on one processor as shown in Eq. (3). Ni T = t1i , t2i , . . . , tni
(3)
So t1 , t2 , . . . , tn tasks can be executed on node n i depends on length of tasks and processing rate of node. The representation stage which seeks to establish a suitable mapping between problem solution and PSO particles is crucial in effective design of PSO algorithm. This work represents particle in terms of matrix of Fog nodes and tasks. If task is executed on node then element is represented by 1 else it is represented by 0 as shown in Table 2. Suppose 5 tasks are executed on Fog environment, solution for the same can be represented as shown in Eq. (4). TasksOnNodes = t13 , t21 , t32 , t41 , t52
(4)
The velocity of the particle is dependent on best position of particle locally as well as globally. Personal Best (pBest) indicates best position achieved by particle at local level. Global Best (gBest) is position when find out by a particle from starting of algorithm. So, pBest and gBest are useful to find out particles own position and global position. At each step these both positions get updated. Depending on pBest and gBest velocities of particles are updated using Eqs. (5), (6) and (7) [15].
742
S. D. Vispute and P. Vashisht
Table 2 Solution for position matrix for above example t1
t2
t3
t4
t5
n1
0
1
0
1
0
n2
0
0
1
0
1
n3
1
0
0
0
0
Vki+1 = Vki + c1 ∗ r 1 pBestik −X ki + c2 ∗ r 2 gBestik − X ki
(5)
Vk(t+1) =Vki (i, j) + c1r 1 pBesttk (i, j)−X kt (i, j) + c2r 2 gBesttk (i, j) − X kt (i, j)
(6)
where c1 and c2 are constant numbers and r1, r2 have range between [0,1]. Based on particle’s pBest and gBest, the particle velocities are updated on a regular basis. Depending on velocity, position of particle updated in position matrix. i i = X ki + Vk+1 X k+1
(7)
Fitness function value shows quality of particles. In this research fitness function value is measured with help of utility function as shown in Eq. (10). When fitness value is more it gives best solution. Algorithm Leveraging Energy for Task Scheduling in Fog Computing using optimization is shown in Algorithm 1. Algorithm 1 Leveraging Energy for Task Scheduling in Fog Computing using optimization (LETSO) Input: Set of Fog NodesN = {N1 , N2 , .. . , Ni }. Set of Tasks T = T1 , T2 , . . . , T j . Output: T ask allocated to Fog N odes = N1i , N2i , . . . , N ij Start: For each Task Ti T . Calculate Minimum Makespan using Eq. (9) Calculate Minimum Energy using Eq. (12) Calculate FitVal using Eq. (10) For end For Each Particle P N , T Do For Each iteration I Do Calculate Personal Besto f Particle using Eq. (7) If (Fit V al > p Best) then p Best = Fit V al
Optimized Energy Efficient Task Scheduling in Fog Computing
743
end if Calculate Global Best value of particle using Eq. (5) If (Fit V al > g Best) then g Best = Fit V al end if Update Par ticle velocit y using Eq. (6) Update Par ticle position using Eq. (7) End For particle End For I . Store Optimal solution End
4 Simulation Environment and Performance Evaluation The experimental setup was done using MATLAB simulator. For the current scenario, the total number of nodes are 15. The Fog layer has 10 devices with limited processing capacity. Gateways, routers, or personal computers have limited capacity but are closer to end user. The cloud layer consists of high capacity datacentres or virtual machines. These devices are five in number. Fog node have its own CPU rate, memory, bandwidth attributes. Every task also has attributes like memory, input size, output size, number of instructions. The simulation scenario is as shown in Table 3 The evaluation parameters for the LETSO algorithm are: Makespan Energy Consumption. Makespan is the total time required to complete a request. So, makespan depends on start time and completion time of task. Mspan = max Exe(n i ) 1≤i≤m
MinMspan =
TaskLength(tx )/
1≤x≤n
Table 3 Simulation scenario
Processing rate(n i )
1≤i≤n
System
Intel(R) Core™ i3-100 5G1 CPU @1.20 GHz 1.19 GHz
Memory
8 GB
Simulator
MATLAB
Operating system
Windows 11
(8) (9)
744
S. D. Vispute and P. Vashisht
Energy is consumed by Fog node when Fog node is in static state or dynamic state. Energy consumption during static state is mainly energy used in ideal mode when Fog node has no work load. Here in this research work, energy in ideal mode considered negligible. Energy consumed in dynamic mode required to execute a task. U = Mcoe ∗ (Min Mspan/Mspan) + Ecoe ∗ (Min Energy/Energy) Energy = Pow(Ni )*(Exe(Tix )∗Ts ∗Tr )
(10) (11)
where Pow(Ni ) is power needed to Fog node to execute task. Ts is time needed to send the task and Tr is time needed to receive the data. Min Energy txi (12) MinEnergy = 1≤i