356 75 17MB
English Pages 558 [559] Year 2023
Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar
Praveen Kumar Shukla Krishna Pratap Singh Ashish Kumar Tripathi Andries Engelbrecht Editors
Computer Vision and Robotics Proceedings of CVR 2022
Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.
Praveen Kumar Shukla · Krishna Pratap Singh · Ashish Kumar Tripathi · Andries Engelbrecht Editors
Computer Vision and Robotics Proceedings of CVR 2022
Editors Praveen Kumar Shukla Department of Computer Science and Engineering Babu Banarasi Das University Lucknow, Uttar Pradesh, India Ashish Kumar Tripathi Department of Computer Science and Engineering Malviya National Institute of Technology (MNIT) Jaipur, India
Krishna Pratap Singh Department of Information Technology Indian Institute of Information Technology, Allahabad Prayagraj, Uttar Pradesh, India Andries Engelbrecht Department of Industrial Engineering and Computer Science Division University of Stellenbosch Matieland, South Africa
ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-19-7891-3 ISBN 978-981-19-7892-0 (eBook) https://doi.org/10.1007/978-981-19-7892-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This book contains outstanding research papers as the proceedings of the International conference on computer vision and robotics (CVR 2022). CVR 2022 has been organized by Babu Banarasi Das University, Lucknow, India, and technically sponsored by Soft Computing Research Society, India. The conference was conceived as a platform for disseminating and exchanging ideas, concepts, and results of the researchers from academia and industry to develop a comprehensive understanding of the challenges in the area of computer vision and robotics. This book will help in strengthening amiable networking between academia and industry. The conference focused on the computer vision, robotics, pattern recognition, and real-time systems. We have tried our best to enrich the quality of the CVR 2022 through a stringent and careful peer-review process. CVR 2022 received many technical contributed articles from distinguished participants from home and abroad. CVR 2022 received 210 research submissions from 13 different countries, viz., Bangladesh, Bulgaria, Greece, India, Kenya, Malaysia, Morocco, Nigeria, Poland, Turkey, United Kingdom, United States, and Vietnam. After a very stringent peer-reviewing process, only 44 high-quality papers were finally accepted for presentation and the final proceedings. This book presents novel contributions to communication and computational technologies and serves as reference material for advanced research. Lucknow, India Prayagraj, India Jaipur, India Pretoria, South Africa
Praveen Kumar Shukla Krishna Pratap Singh Ashish Kumar Tripathi Andries Engelbrecht
v
About This Book
This book gathers outstanding research papers presented at the international conference on computer vision and robotics (CVR 2022), held on May 21–22, 2022, at Babu Banarasi Das University, Lucknow, India, under the technical sponsorship of the Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges of intelligence advancements in computational viewpoints. This book will help strengthen congenial networking between academia and industry. We have tried our best to enrich the quality of the CVR 2022 through the stringent and careful peer-review process. This book presents novel contributions to computer vision and robotics for advanced research. The topics covered: Artificial Intelligence for Computer Vision, Imaging Sensors Technology, Deep Neural Network, Biometrics Recognition, Biomedical Imaging, Image/Video Classification, Soft Computing for Computer Vision, Robotic devices and systems, autonomous vehicles, intelligent control systems, cooperating robots for manufacturing and assembly, intelligent transportation systems, human-machine interaction, human motor control, game playing, internet of things, intelligent decision-making systems, intelligent information processing. Lucknow, India Prayagraj, India Jaipur, India Pretoria, South Africa
Praveen Kumar Shukla Krishna Pratap Singh Ashish Kumar Tripathi Andries Engelbrecht
vii
Contents
1
2
3
4
Story Telling: Learning to Visualize Sentences Through Generated Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Yashaswini and S. S. Shylaja A Novel Processing of Scalable Web Log Data Using Map Reduce Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yeturu Jahnavi, Y. Pavan Kumar Reddy, V. S. K. Sindhura, Vidisha Tiwari, and Shaswat Srivastava A Review of Disease Detection Emerging Technologies of Pre and Post harvest Plant Diseases: Recent Developments and Future Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sakshi Pandey, Kuldeep Kumar Yogi, and Ayush Ranjan A Comparative Study Based on Lung Cancer with Deep Learning and Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . Yalamkur Nuzhat Afreen and P. V. Bhaskar Reddy
5
Low-Cost Data Acquisition System for Electric Vehicles . . . . . . . . . . Vinay Gupta, Toushit Lohani, and Karan Singh Shekhawat
6
Machine Learning Based Robotic-Assisted Upper Limb Rehabilitation Therapies: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shymala Gowri Selvaganapathy, N. Hema Priya, P. D. Rathika, and M. Mohana Lakshmi
1
15
27
41 51
59
7
Performance Analysis of Classic LEACH Versus CC-LEACH . . . . . Lakshmi Bhaskar and C. R. Yamuna Devi
75
8
Anomaly Detection in the Course Evaluation Process . . . . . . . . . . . . . Vanishree Pabalkar, Ruby Chanda, and Anagha Vaidya
85
9
Self-attention-Based Efficient U-Net for Crack Segmentation . . . . . . 103 Shreyansh Gupta, Shivam Shrivastwa, Sunny Kumar, and Ashutosh Trivedi ix
x
Contents
10 Lung Carcinoma Detection from CT Images Using Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 C. Karthika Pragadeeswari, R. Durga, G. Dhevanandhini, and P. Vimala 11 A Deep Learning Based Human Activity Recognition System for Monitoring the Elderly People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 V. Gokula Krishnan, A. Kishore Kumar, G. Bhagya Sri, T. A. Mohana Prakash, P. A. Abdul Saleem, and V. Divya 12 Tweet Classification on the Base of Sentiments Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Firas Fadhil Shihab and Dursun Ekmekci 13 A Comparison of Top-Rated Open-Source CMS—Joomla, Drupal, and WordPress for E-Commerce Website . . . . . . . . . . . . . . . . 157 Savan K. Patel, Falguni Suthar, Swati Patel, and Jigna Prajapati 14 Exploiting Video Classification Using Deep Learning Models for Human Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Upasna Singh and Nihit Singhal 15 6 Degree of Freedom Based Arm Robotic System with 3D Printing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Shrutika Arun Langde, Neetu Gyanchandani, and Shyam Bawankar 16 Earlier Heart Disease Prediction System Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Tiwari Kritika, Sajid Ansari, and Govind Kushwaha 17 Heart Disease Prediction Using Machine Learning and Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Vijay Mane, Yash Tobre, Swapnil Bonde, Arya Patil, and Parth Sakhare 18 Mathematical Approaches in the Study of Diabetes Mellitus . . . . . . . 229 S. V. K. R. Rajeswari and P. Vijayakumar 19 Path Planning of Autonomous Underwater Vehicle Under Malicious Node Effect in Underwater Acoustic Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Arnav Hari, Prateek, and Rajeev Arya 20 A Study on Attribute Selection Methodologies in Microarray Data to Classify the Cancer Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 S. Logeswari, D. Santhakumar, and D. Lakshmi
Contents
xi
21 Fine Tuning the Pre-trained Convolutional Neural Network Models for Hyperspectral Image Classification Using Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Manoj Kumar Singh and Brajesh Kumar 22 Computational Prediction of Plastic Degrading Microbes Using Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 N. Hemalatha, W. Akhil, Raji Vinod, and T. Akhil 23 Computational Yield Prediction of Rice Using KNN Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 N. Hemalatha, W. Akhil, and Raji Vinod 24 Exploiting Deep Learning for Overlapping Chromosome Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Alexander Nikolaou and George A. Papakostas 25 Deep Transfer Modeling for Classification and Identification of Tomato Plant Leaf Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Rajeev Kumar Singh, Akhilesh Tiwari, and Rajendra Kumar Gupta 26 Designing Real-Time Frame Modification Functions Using Hand Gesture Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Shivalika Goyal, Himani, and Amit Laddi 27 Benchmarking of Novel Convolutional Neural Network Models for Automatic Butterfly Identification . . . . . . . . . . . . . . . . . . . . 351 Manjunath Chikkamath, DwijendraNath Dwivedi, R. B. Hirekurubar, and Raj Thimmappa 28 Machine Learning Techniques in Intrusion Detection System: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Roshni Khandait, Uday Chourasia, and Priyanka Dixit 29 A Machine Learning Approach for Honey Adulteration Detection Using Mineral Element Profiles . . . . . . . . . . . . . . . . . . . . . . . 379 Mokhtar A. Al-Awadhi and Ratnadeep R. Deshmukh 30 Change Detection on Earth’s Surface Using Machine Learning: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Pathan Misbah, Jhummarwala Abdul, and Dave Dhruv 31 Social Distancing Detector Framework Using Deep Learning and Computer Vision Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 R. Vishnu Vasan and Muthuswamy Vijayalakshmi 32 Hands-Free Eye Gesture Authentication Using Deep Learning and Computer Vision Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 S. Rohith and Muthuswamy Vijayalakshmi
xii
Contents
33 An Overview of Ad Hoc Networks Routing Protocols and Its Design Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 Sarbjit Kaur and Ramesh Kait 34 Real-Time Vehicle Detection and Tracking System Using Cascade Classifier and Background Subtractor . . . . . . . . . . . . . . . . . . 431 Priyanshi Verma, Kajal Verma, Aprajita Singh, Arya Kumar Sundaram, and Vijendra Singh Bramhe 35 Multi-input MLP and LSTM-Based Neural Network Model for SQL Injection Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Vishal Sharma and Sachin Kumar 36 Comparative Study of Machine Learning Algorithms for Prediction of SQL Injections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Vishal Sharma and Sachin Kumar 37 Detecting Fake Reviews Using Multiple Machine Learning Models: A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Akhandpratap Manoj Singh and Sachin Kumar 38 Using GitHub and Grafana Tools: Data Visualization (DATA VIZ) in Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 E Geetha Rani and D T Chetana 39 Group Fairness in Outlier Detection Ensembles . . . . . . . . . . . . . . . . . . 493 Gargi Mishra and Rajeev Kumar 40 Extended Chebyshev Chaotic Map Based Message Verification Protocol for Wireless Surveillance Systems . . . . . . . . . . . . . . . . . . . . . . 503 Vincent Omollo Nyangaresi 41 Single Multiplicative Neuron Model in Predicting Crude Oil Prices and Analyzing Lag Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Shobhit Nigam and Vidhi Bhatt 42 A Hybridized Teaching–Learning-Based Optimization Algorithm to Solve Capacitated Vehicle Routing Problem . . . . . . . . . 527 Sakshi Bhatia, Nirmala Sharma, and Harish Sharma 43 Data Mining: An Incipient Approach to World Security . . . . . . . . . . 541 Syed Anas Ansar, Swati Arya, Sujit Kumar Dwivedi, Nupur Soni, Amitabha Yadav, and Prabhash Chandra Pathak 44 Security in IoT Layers: Emerging Challenges with Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 Syed Anas Ansar, Swati Arya, Shruti Aggrawal, Surabhi Saxena, Arun Kushwaha, and Prabhash Chandra Pathak Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
About the Editors
Prof. Praveen Kumar Shukla is presently working as Professor and Head in the Department of Computer Science and Engineering, Babu Banarasi Das University, Lucknow. He did Ph.D. in Computer Science and Engineering from Dr. A. P. J. Abdul Kalam Technical University, Lucknow. He did B.Tech. in Information Technology and M.Tech. in Computer Science and Engineering. His research area includes fuzzy systems (interval type-2 fuzzy systems and type-2 fuzzy systems), evolutionary algorithms (genetic algorithms), genetic fuzzy systems, multi-objective optimization using evolutionary algorithms, big data analytics, and Internet of things. He has published many papers in national conferences, international conferences, and international journals. He has also published a book “Introduction to Information Security and Cyber Laws” in 2014 with Dreamtech Publishers and published a patent on “Social Media based Framework for Community Moblisation during and post Pandemic” in 2021. He completed one research project on “Particle Swarm Optimization Based Electric Load Forecasting” sponsored by Dr. A. P. J. Abdul Kalam Technical University under TEQIP-III scheme. He is Member of IEEE (Computational Intelligence Society), USA, International Association of Computer Science and Information Technology (IACSIT), Singapore, International Association of Engineers (IAENG), Hong Kong, Society of Digital Information and Wireless Communications (SDIWC), Institutions of Engineers, India, and Soft Computing Research Society (SCRS), India. Dr. Krishna Pratap Singh is Associate Professor at Department of Information Technology, Indian Institute of Information Technology Allahabad. Prior to this, he worked as Assistant Professor (2013–2018) and Lecture (2009–2012) at IIIT Allahabad. He did Ph.D. and Master from IIT Roorkee in 2009 and 2004, respectively. He has published more than 60 research articles in various International Journals and conferences. He is heading Machine Leaning and Optimization Lab at IIITA. His teaching and research interests are machine learning, representation learning, transfer learning, and optimization.
xiii
xiv
About the Editors
Dr. Ashish Kumar Tripathi (Member, IEEE) received his M.Tech. and Ph.D. degrees in Computer Science and Engineering from Delhi Technological University, Delhi, India, in 2013 and 2019, respectively. He is currently working as Assistant Professor with the Department of Computer Science and Engineering, Malviya National Institute of Technology (MNIT), Jaipur, India. His research interests include big data analytics, social media analytics, soft computing, image analysis, and natural language processing. Dr. Tripathi has published several papers in international journals and conferences including IEEE transactions. He is active Reviewer for several journals of repute. Prof. Andries Engelbrecht received the Master’s and Ph.D. degrees in Computer Science from the University of Stellenbosch, South Africa, in 1994 and 1999, respectively. He is currently appointed as Voigt Chair in Data Science in the Department of Industrial Engineering, with a joint appointment as Professor in the Computer Science Division, Stellenbosch University. Prior to his appointment at Stellenbosch University, he has been at the University of Pretoria, Department of Computer Science (1998–2018), where he was appointed as South Africa Research Chair in Artificial Intelligence (2007–2018), Head of the Department of Computer Science (2008– 2017), and Director of the Institute for Big Data and Data Science (2017–2018).
Chapter 1
Story Telling: Learning to Visualize Sentences Through Generated Scenes S. Yashaswini and S. S. Shylaja
1 Introduction Visual learning is the best way to develop visual thinking, where the learner understands and stores the information in a better way by combining ideas, words, and concepts with images. Designing a model with such insight is nevertheless an easy task. Over the years, CNN has become a significant component of many computer vision applications. They are broadly used to analyze visual imagery. CNN takes an image as an input, assigns weights and biases to various objects in the image, and gives the ability to differentiate one from another, whose workings will be discussed in the algorithm section. Natural Language Processing (NLP) is a process that makes computers understand the human language. Behind the scenes, NLP basically analyzes the grammatical structure of the given sentence and the individual meaning of the given words, then uses the NLTK library to extract the meaning and deliver the output. NLP is used for scene generation which includes the parsing process where the given natural text of a sentence undergoes POS tagging for finding out the noun in the sentence. These nouns are referred to as objects when displayed as a scene. Blender has been one of the popular open-source software among animators and visual effects artists which helps in the rendering of images. Blender is used for creating beautiful images, VFX, and more. From the survey, it is envisaged that the process in the final stage displays the output as a scene using the blender framework as it is easier to visualize and further modify the scene accordingly.
S. Yashaswini (B) Department of CSE, Cambridge Institute of Technology, Bengaluru, India e-mail: [email protected] S. S. Shylaja Department of CSE, PES University, BSK-III Stage, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_1
1
2
S. Yashaswini and S. S. Shylaja
2 Literature Survey Words Eye is the software system that converts the input text into a 3D scene without the requirement of technical knowledge from the user. It is a web-based application used to generate a scene by giving the input to the browser. Input sentences are parsed into a phrase structure where it enforces subject-verb agreement and other syntactic and semantic constraints [1]. Parse tree is then converted to a dependency structure wherein dependency individual words are represented by nodes and arcs. Syntactic dependencies are converted to semantic relations using frame. Semantic roles in turn are converted to the final set of graphical objects. Here the final output scene is rendered in OpenGL and displayed in the user’s browser as a jpeg image. Here we built a face library using the FaceGen3D graphics package which it provides attributes such as Age and Gender as well as highly specific shape, color, and asymmetry controls to output scene [2]. First and foremost, the sentences have to be treated with natural language processing; there are techniques such as tokenization, lemmatization, Part of Speech (POS) tagging, etc. The most important part of natural language processing is to extract the meaningful elements from the input and second is scene generation, in scene generation semantic contents are mapped with the database objects and scenes corresponding to the sentence is generated. In the text-to-scene conversion system, linguistic analysis uses stanford core NLP library to perform NLP tasks. Semantic analysis to extract meaningful elements from the input sentence by converting text into a dependency structure representation, and final step is scene generation, which converts semantic elements into corresponding visual representation [3]. The inference drawn after the survey is that a text to 3D system is proposed that takes natural language text as input to construct 3D scenes as output based on spatial relationships. By studying spatial knowledge, the scene is generated. Here a user provides a natural language text as an input to the system and then the system identifies explicit constraints on the objects that should appear in the scene on what objects should be presented and how they should be arranged. Here we use five steps to obtain the output. Initially template parsing is performed. It parses the textual description input of a scene into a set of constraints on which the objects present and spatial relations between them [4]. It uses the extraction algorithm like Rapid Automatic Keywords Extraction (RAKE) algorithm and tools like Named Entity Recognition (NER) or Parts-Of-Speech (POS). After which Inference is followed by Template Parsing. It expands the constraints by accounting for implicit constraints not specified in the text. After this Grounding is performed, it provides the constraints and priors on the spatial relations of objects. Next step is scene layout. It arranges the objects and optimizes their placement based on spatial knowledge. Last step is Interaction and Learning. It provides an option for a user to adjust the scene through direct manipulation and commands [5].
1 Story Telling: Learning to Visualize Sentences Through Generated Scenes
3
The semantic information of the abstract images is created from collections of clip art. Abstract images infer high-level semantic information, since they remove the reliance on a noisy low-level object, attribute, and relation detectors, or the tedious hand-labeling of images. Importantly, abstract images also allow the ability to generate sets of semantically similar scenes [6]. Generating (retrieve) scenes that convey the desired semantic meaning, even when scenes (queries) are described by multiple sentences have significant improvements over baseline approaches, yet are still at inception [7]. The semantic information retrieval from the scene dataset provides advantages over real images. Since noise removal is easier at low-level object, attribute, and relation detectors, hand-labeling or manual annotation is easier compared to real images [8]. Finding analogous sets of real images that are semantically similar would be nearly impossible. The relation between the saliency and memorability of objects and their semantic importance enables them to generate quality scenes [9]. Generating novel scenes depicting the sentences involves linguistic and semantic analysis of visual meaning by sampling from the CRF. The CRF is also used to score a set of scenes for a text-based image retrieval task.
2.1 Dataset The new dataset storydb is developed, by web scraping the images considering the subclasses bird, boy, flower, girl, and tree. Each category described has 30–50 images. The images are unsupervised, without labels having different shape, size, color, and texture as shown in Fig. 1. The images are trained using CNN, the accuracy of 70% is obtained after training for 10 epochs, now the dataset is labeled. The training set consists of images with subclasses. The random images of the existing subclasses are web scrapped and used as a test dataset. The images for the test set are analyzed and compared with the training dataset and the labels are predicted. Now these labels are POS tagged and the dependency graph is passed for further processing as shown in Fig. 2. The test dataset consists of some random pictures prediction on test dataset after some epochs the test image is checked with all the image with training, depending on the images the label is predicted as shown in Fig. 3 The Convolution Neural Network is used for image labeling. The architecture modification employed is as follows, the random image is given as input to the CNN model. The image size is preprocessed to 148 × 148, 32 filters with dimensions 3 × 3 are used for feature extraction. The batch normalization is done with Epsilon value 0.00001 with a relu activation layer. The output is max pooled to get prominent pixels, dropout eliminates the insignificant pixels. The image size is further reduced and the process continues for four convolution layers. Finally, dense layers are used with Softmax activation function with a learning rate of 0.08. The output layer consists of five nominal values each belonging to categories like a bird, boy, flower, girl, and
4
Fig. 1 Storydb with subclasses
Fig. 2 Accuracy of CNN with 10 subclasses
S. Yashaswini and S. S. Shylaja
1 Story Telling: Learning to Visualize Sentences Through Generated Scenes
5
Fig. 3 Label prediction after CNN
tree. The training set consists of 30–50 images in all five categories. The new image which is not part of the training set is used for the test set to predict the label.
3 Implementation CNN is an interesting method in machine learning for image preprocessing, it works by extracting features from the image to absorb required patterns in the dataset, therefore, it helps in the prediction of the image. Figure 4 shows the detailed process of CNN method. The input images from the dataset are divided into training and testing. The images are trained using CNN with 10 epochs each with a learning rate of 0.9 and crossvalidation loss is minimized to a value nearing to 1. The labels are assigned to train set with an accuracy of around 70%. A separate CSV file has training. CSV consisting of image label accuracy and path. The input text is tokenized and POS Tagged, the noun tags are extracted and compared with labels of training.csv, the image path having the highest accuracy is retrieved and it is printed in blender background and stored as output.
6
S. Yashaswini and S. S. Shylaja
Fig. 4 Image labeling and scene generation using blender
1: Importing the required packages like Keras, CV2, OS, NumPy 2: Setting up of default values for width and height of the images 3: Defining the path for training and testing of images 4: Defining the classes for the training dataset. 5: Image preprocessing 5.1: rescaling, rotating, zooming and flipping of images 6: CNN model creation 6.1: adding the convolutional layer with activation='relu'. 6.2: batch Normalization process is undergone. 6.3: maxpooling of data 6.4: adding a dropout layer. 6.5: flattening of 2D image into 1D image 6.6: adding the dense layer with activation= 'softmax'. 7: Compiling the model. 8: Fitting the model 8.1: input Training dataset 8.2: set the epochs=10; 8.3: input Validation dataset. 9: Output 9.1: read the images in directory 9.2: setup default width and height 9.3: prediction of image. CNN algorithm for Image Prediction
In the ML model, it is important to identify attributes, as attributes directly reflect on the results. Hence it is important to choose appropriate attributes for training. Here, we emphasized label, path, and score as a path for training which is as stated in the CNN algorithm.
1 Story Telling: Learning to Visualize Sentences Through Generated Scenes
7
1: Create a CSV file 2: Assign field names (label, path, score) 3: Storing data under respective fields 3.1: the objects predicted are stored in the label field. 3.2: the path of the predicted image is stored in the path field. 3.3: probability of the image is stored in the score field. 4: Repeat step 3 until all images in the test directory are read. 5: Output CSV is generated. Algorithm to identify attributes after training
In natural language processing, POS tagging is one of the important method used for text processing where it takes a sentence and convert them into forms such as words, tuples, etc. as explained in the algorithm for attribute identification, it explains the following steps used for text preprocessing using POS tagging and tokenization. After text processing, the nouns extracted will be stored in a csv file under their respective fields (label, object, score). Further Blender programs will use this csv file for rendering. Algorithm to retrieve appropriate images from textual inference is learned for generating attributes as explained below. 1: Extract nouns from tuple 2: For each noun in the tuple, 2.1: if noun is present in training.csv Then 2.1.1: store the object in label field 2.1.2: store the path of the object in path field 2.1.3: store the score under the score field 2.2: else Re-run the CNN model. 3: Repeat step 2.1 Until all the objects in tuple are traversed. 4: Display the csv file with respective fields with Data (label, path, score) 5: Using subprocess call blender program. Algorithm to retrieve appropriate image from textual inference learned
In Blender, Rendering is a process where it generates a two-dimensional image from a model by means of application programs. Once the blender program reads the csv file it retrieves the images present in that csv file and further steps are stated in algorithm to create a user interface to incorporate resizing, relocation, and removal of the image from the scene. Here, the user will be given certain options like resizing, relocation, and removal of objects in the form of a dialog box, where the user can do the following changes to the generated scene. The bound checking ensures whether all the objects are within the scene using the algorithm mentioned for bound checking and the scene generated after bound checking is as shown in Fig. 5.
8
S. Yashaswini and S. S. Shylaja
Fig. 5 Ensuring bound checking of the objects in the scene
1: Creating of dialogue box 2: Dropdown list consisting of names of images inserted 3: Resizing and relocating options are provided in the dialogue box. 4: Choice is given to user for removing image 4.1: if user selects option='yes' then image is removed else image is not removed Algorithm to create user interface to incorporate resizing, relocation and removal of image from the scene.
1: Set a default scaling value. 2: Drop down list of small, medium and large is given 3: If default scaling='yes' Then Select any one option in drop down list. 4: Selection of objects by user 5: Selected objects are declared as 'active objects'. 6: Applying the option selected by user to active object. Algorithm for bound checking
1 Story Telling: Learning to Visualize Sentences Through Generated Scenes
9
Fig. 6 Epochs of CNN model
4 Result The result of the experimentation carried out by analyzing the unsupervised images, which are labeled using the CNN model, the test dataset is compared with the train set and the path is copied. The input test entered is tokenized, POS Tagged and the image is retrieved and placed in a blender with park background. The deep learning model like CNN has been the baseline approach for image labeling for the unsupervised dataset. The image features like shape color and texture are learned for identification on prior work using the abstract scene and manual annotation for image identification.
4.1 CNN Model Result Under the CNN model the dataset is split into two parts: training set and the validation set. When the epochs are run we get there the accuracy and loss for both training set and validation set, respectively. The current model runs 10 epochs with an average accuracy of 70% for the training set and 60% for validation set. After the execution of epochs the model predicts the testing images and stores the result in a csv file as shown in Fig. 6.
4.2 NLP Model Result The user gives the input sentence to the NLP module and the given sentence is tokenized to words and the words are given their respective POS tags which are stored in csv file for further use. The tokenizing and POS tagging help to identify the nouns which help with basic object retrieval. The scene generated using the blender framework is as shown in Fig. 7.
10
S. Yashaswini and S. S. Shylaja
Fig. 7 System generated blender output
Fig. 8 User interface for dynamic modification of objects
4.3 Blender Generated Scene The model takes input from the previously generated csv file and displays the objects in blender application with default scaling and location. The default scaling value is (4, 4, 1) and the default location is (0, 0, 0) where the X-axis value increments by 5 units for every image insertion. The image is further rendered manually which is explained in Fig. 8.
4.4 Scaling, Relocation, and Removal The user can select any image from the drop down list and modify its scaling and location values until the user is satisfied with the result obtained. The user can also perform bound checking by selecting the default size from the drop down list and by typing ‘yes’ in the ‘Use default’.
1 Story Telling: Learning to Visualize Sentences Through Generated Scenes
11
4.5 Final Result After the scaling and relocation as needed by the user the image is rendered and saved. Thus the final result looks as displayed in Fig. 9. The scene generated had both bird and tree of same size hence dynamic modification is done to the scene to match the human intuition by rescaling and changing the position of the objects in the scene.
5 Empirical Evaluation Comparing the manually generated image and blender generated image using Euclidean distance algorithm we will be able to accurately compare the similarities between the two scenes as shown in Fig. 10. More the Euclidean score is, the more is the difference between the images. In the above figure it is observed that the first scene is more accurate compared to the second as the score of the second scene is more compared to the first according to the Euclidean score. But according to human observation, the second scene seems to be more accurate than the first scene. This is because the Euclidean distance algorithm considers features like color, shape, size, etc. of the image. Figure 11 illustrates the comparison of test images with trained images using the Euclidean score.
Fig. 9 Rendered image after modifications
12
S. Yashaswini and S. S. Shylaja
Fig. 10 Manually generated image versus blender generated image
Fig. 11 Test image comparison train using Euclidean distance
5.1 CIDEr Metrics Automatically describing an image with a sentence is a long-standing challenge of computer vision and NLP Recently, Convolutional Neural Networks and Recurrent Neural Networks are used for image captioning. Previous work focuses on grammaticality, saliency, truthfulness, etc. Hence there is a need for an evaluation protocol for image captioning based on “human-likeness”. Consensus-based protocol measures the similarity of a candidate or test sentence to the majority, or consensus of how people describe the image. CIDEr (Consensusbased Image Description Evaluation) measures consensus. CIDEr metric matches
1 Story Telling: Learning to Visualize Sentences Through Generated Scenes
13
Fig. 12 CIDEr score for “The bird is sitting on the tree”
human consensus better than existing metrics used for evaluating image captions such as BLEU [3], ROUGEL [4], and METEOR [1] as shown in Fig. 12.
5.2 Inception Score The inception score is calculated by first using a pre-trained Inception v3 model to predict the class probabilities for each generated image. These are conditional probabilities, e.g., class labels conditional on the generated image. Images that are classified strongly as one class over all other classes indicate a high quality. As such, the conditional probability of all generated images in the collection should have low entropy as explained in Table 1. The inception score can vary from 0 to 1, if the image is closer to the description then the value will be nearer to 1. The inception score 0 implies the image has divergence from the description. For example, the testing is done for the cat image and predicted label is cat score will be near to 1, if label predicted is dog then the score will be near to 0. The X-axis indicates the class label and Y-axis indicates the inception score as shown in Fig. 13. Table 1 Inception score for the determined label
Fig. 13 The inception score for the dataset
Inception Score
Label
0.47775
Boy
0.27727
Girl
0.38048
Cake
0.26304
Tree
0.95102
Bird
0.62618
Flower
1 0.8 0.6 0.4 0.2 0
1
2
3
4
5
6
7
8
9
10 11 12
14
S. Yashaswini and S. S. Shylaja
6 Conclusion When we ran the experimental setup several numbers of times, the output kept on changing. Manual generated and blender generated scene scores were compared, in which we found that they were almost similar. But when it came to Euclidean scores, the machine failed to generate the expected output due to features like color, shape, and position of the image. In future, better algorithms must be given to do feature comparison and the number of datasets must be increased. We emphasized for a limited number of classes for the images which should be further increased where each class must contain a greater number of images. Our model failed to identify semantics that must be taken into consideration in the future.
7 Future Enhancement The Better algorithm involving the deep learning concepts can be used for feature comparison. The dataset is limited for the park scene, but an increase in the number of classes in the dataset can increase the usability to learn and visualize different object classes. Identification of semantic information using the shape, color, and texture can improve and significantly increase the evaluation results.
References 1. Chang A, Monroe W, Savva M, Potts C, Manning CD (2015) Text to 3D scene generation with rich lexical grounding. arXiv 2. Coyne B, Schudel C, Bitz M, Hirschberg J (2011) Evaluating a text-to-scene generation system as an aid literacy. SLaTE 2011 3. Rugma R, Sreeram S (2016) Text-to-scene conversion system for assisting the education of children with intellectual challenges. IJRSET 5(8) 4. Dessai S, Dhanaraj R (2016) Text to 3d scene generation. IJLTET 6(3):255–258 5. Lawrence Zincky C (2013) Bringing semantics into focus using visual abstraction. In: IEEE conference on computer vision and pattern recognition (CVPR), 2013 (Oral) 6. Parikh D, Vanderwende L (2013) Learning the visual interpretation of sentences. In: IEEE international conference on computer vision (ICCV) 7. Vedantam R, Parikh D, Lawrence Zincky C (2015) Adopting abstract images for semantic scene understanding. IEEE Trans Pattern Anal Mach Intell (PAMI) 8. Lawrence Zincky C, Parikh D (2013) Bringing semantics into focus using visual abstraction. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3009–3016 9. Lawrence Zincky C, Parikh D, Vanderwende L (2013) Learning the visual interpretation of sentences. In: IEEE international conference on computer vision (ICCV), pp 1681–1688
Chapter 2
A Novel Processing of Scalable Web Log Data Using Map Reduce Framework Yeturu Jahnavi, Y. Pavan Kumar Reddy, V. S. K. Sindhura, Vidisha Tiwari, and Shaswat Srivastava
1 Introduction and Preliminaries Big Data Framework-based algorithms are implemented to handle enormous datasets. Due to implicit properties of applications, data at times are previously distributed over some systems. Therefore, researchers are engrossed in developing disseminated data processing and computing-based algorithms [1–4]. In order to diminish the complexity and hide the lower-level information of parallelization, an abstract programming paradigm, known as Map Reduce has been investigated. Map Reduce is a programming paradigm for handling Big Data Applications. Recently, many Map Reduce algorithms have been developed to analyze and process big data [5–8]. The principal emphasis of this model is to implement a parallel algorithm using the Map Reduce framework to tackle the problem of term weighting on huge datasets. Processing Data with Hadoop, Map Reduce Framework, Map Reduce Daemons are represented in the following.
Y. Jahnavi (B) Department of Computer Science, Dr. V. S. Krishna Govt Degree and PG College (Autonomous), Visakhapatnam, Andhra Pradesh, India e-mail: [email protected] Y. Pavan Kumar Reddy Department of Computer Science and Engineering, Narayana Engineering College, Nellore, Andhra Pradesh, India V. S. K. Sindhura Oracle Health Insurance (OHI), Hyderabad, Telangana, India V. Tiwari · S. Srivastava Department of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamilnadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_2
15
16
Y. Jahnavi et al.
1.1 Processing Data with Hadoop Hadoop, one of the utmost prominent Map Reduce implementations, is running on clusters where Hadoop distributed file system (HDFS) warehouses data to render high aggregate I/O bandwidth. At the core of HDFS is a single Name Node—a master server that governs the file system namespace and controls approaches to files. Hadoop is a framework that uses the programming language Java, with Google File System and Google’s Map Reduce technologies. Hadoop has two main parts: Data Storage framework (HDFS): HDFS is a schema less general-purpose file system called Hadoop Distributed File System. This storage system distributes data across several nodes. Data Processing Framework: Map Reduce is a functional programming model started by Google. To process data, this model uses two functions: the MAP and the REDUCE. It is a computational framework that splits a task across multiple nodes and processes data in parallel. There are various Hadoop Ecosystem projects such as HIVE, PIG, SQOOP, HBASE, FLUME, OOZIE, MAHOUT, etc., to enhance the functionality of Hadoop core components.
1.2 Map Reduce Framework Map Reduce is a software framework that helps to process scalable and parallel programming models for scientific analysis and data-intensive applications. Google’s Map Reduce and Google File System (GFS) gave inspiration to the Development of Hadoop. Hadoop is an Apache Project. Hadoop also provides a distributed file system (HDFS) that warehouses data on the nodes furnishing extremely elevated aggregate bandwidth across the cluster. Together Map Reduce and the distributed file system are considered so that node failures can be detected and responded automatically by the framework.
1.3 Pig Scripting Pig is a scripting language officially called PIGLATIN. It is a platform for analyzing large data sets. Pig Latin queries are converted into Map Reduce jobs. It was developed at Yahoo! It avoids the problem of writing more numbers of lines to code in java to generate a simple application; rather it completes the code within a few lines.
2 A Novel Processing of Scalable Web Log Data Using Map Reduce …
17
2 Literature Survey The role of Big Data remains crucial in several industries today such as IT sector, engineering, automobile, finance, banking, purchasing, and online healthcare and many more, and is constantly growing. Some of the greatest challenges in the current context are the optimal and efficient utilization of time, energy, and the available resources. The existing applications are inadequate for negotiating the broad range of very complex and intricate datasets present in Big Data, which record an exponential growth every single day. Several online transactions, social media and the like, generate huge quantities of raw data regularly. Data processing in Big Data becomes very challenging inducing many problems in the context of a steady and enormous increase in the variety, volume, velocity and even complexity of the data. Hence, Big Data has turned out to be a complex process regarding the correctness, changes, and match relations. c, such as Big Data analysis, handling of huge volumes of data, privacy and security of data, adequate measures of storage, visualization, task scheduling, optimization of energy and fault tolerance. Abrahams et al., 2000, and Lopes et al., 2016, have observed that the generated data is largely heterogeneous and incomplete in nature making the Big Data analysis complex and severe [9]. The gathered data is available in various formats and structures. Scheduling the jobs dynamically employing the method of distributed computing necessitates even resource scheduling among diverse geographical locations [10]. Jalalian et al., 2018, have noted that the major concerns in the architecture, based on grid, are generation of workflow, scheduling, and resource management. There is absolutely no need for any human intervention for the execution of the tasks. It furnishes flexibility in execution, while reducing complexity in the generation of workflow, mainly saving cost and time [11]. Wang et al., 2019, have put forward the idea that grid scheduling turns out to be an indispensable component for gathering the power from the distributed resources and solving the user-related issues. Two principal factors of scheduling of resources and applications play a prominent role in programming in grid computing and evolution, making use of real-time ambience and simulation approaches [12]. Veronica et al., 2017, have considered several real-time scheduling algorithms, using multiprocessor approach with various methods and performance metrics of scheduling, such as global and hybrid algorithms, for comparative analysis and for distribution and usage [13]. Ramesh and Katheria, 2019, are of the opinion that Big Data needs to transform the data into a suitable format for interaction for arriving at better decisions. The same data is represented in diverse forms in data visualization, to resolve the perceptionbased and screen-based issues [14]. Chen and Jiang, 2019, found that the selection of apt tools for Big Data on the basis of requirement and data analytics becomes a complex process. Choice of tools and result analysis are some very vital aspects of the Big Data framework. The right tools
18
Y. Jahnavi et al.
assist the users in performing various tasks effectively and as per the requirement [15]. Mana and Cherukullapurath, 2018, cited the example of Facebook, which employs the tool with a job tracker. Facebook invested huge amounts in Hadoop and restructured the Hadoop distribution mechanism to make it more user-friendly and provide customizable services. It has a significant role to play in optimization of time and other applications for Facebook. They advocated the means of how Map Reduce predicts the possible time for completion of the map tasks depending on the requirement from the CPU and the disk, by computing the Mean Value Analysis (MVA). MVA puts forth an analytical model capable of multiple types of jobs and compares the performance of a single node and multiple node environments in Hadoop [16]. Bello-Orgaz et al., 2016, made an interesting survey on the way Hadoop++ enhances the general performance of the framework of Hadoop without altering the basic environment. Hadoop introduces the technology as per the user-defined functions and in the exact locations. Hadoop++ proves quite helpful in jobs such as indexing and subsequent join processing [17]. Hashem et al., 2020, proposed iMapReduce, yet another user-friendly ambience, allowing the user to define the iterative computation with the function of Map Reduce. Map Reduce reduces the overheads, removes the shuffled static data, and permits asynchronous tasks, as a result of which performance improves in iterative implementation. The performance of iMapReduce is at least five times faster than what can be achieved by Hadoop [18]. Seera et al., 2018, proposed a hybrid system of database Llama, where correlation groups are defined in columns providing support for data manipulation in vertically divided tables. Map Reduce, supported by Llama and with a query engine, furnishes a novel join algorithm for speedy process of joining. The results of the experiments, utilizing EC2, demonstrated an accomplishment of enhanced load and query performance. The row-wise method in storage is followed in Map Reduce [19]. Beame et al., 2017, projected Skew Tune method to be adopted for all the userspecified programs, which has the major advantage of extending the framework of Hadoop and consequently reducing the run time of the task. Application of Skew Tune enables several applications to become very efficient [20]. Teng and Ching, 2017, proposed the Starfish approach, which makes use of an auto-tuning system. Optimization of cost-based programs in Map Reduce is quite complex in Starfish. The user needs to identify from the Starfish, the behavior of the programs during the process of task execution, and the way the programs behave corresponding to the varying parameters, such as input data and resources, efficiently optimizing the program. It has been observed that the workflow in Map Reduce requires broad space for planning to generate the workflows and proposed Stubby, an optimizer, based on cost. Stubby is introduced exclusively for this purpose, which has the capability to search a specific subspace through the existing full policy space. Stubby formulates intense computation of the involved procedures to transfer the plan to an efficient and identified search algorithm [21].
2 A Novel Processing of Scalable Web Log Data Using Map Reduce …
19
Arora, et al., 2020, have found another procedure, Twister, as being compatible with the iterative operations of Map Reduce and for efficiently executing the relevant computations [22]. AntoPraveena and Dr. B. Bharathi et al., 2017, described that there are difficulties, one of which is that the current stockpiling framework couldn’t support such gigantic measures of information. Hence, a rule which makes the existence cycle of the board framework viable is needed. Data life cycle of the executive’s interaction chooses which information will be put away and which information will be disposed of during the logical cycle [23]. Dr. Urmila R. Pol, 2016, proposed that Pig and Hive give a more elevated level of reflection while Hadoop Map Reduce gives a low degree of abstraction. The disadvantage to utilizing Hive is that Hadoop engineers need to think twice about streamlining the questions as it relies upon the Hive enhancer and Hadoop designers need to prepare the Hive analyser on effective improvement of inquiries. Apache Pig offers more advancement and control on the information stream than Hive [24]. Pappas, I. O., Mikalef, P., Giannakos, M. N. et al., 2018, propose that Data-driven sustainable development, Information driven practical turn of events is a current practice and procedure to require and expand upon information-driven strategies, accordingly we need a more profound comprehension of how they can coincide and co-develop in the advanced society. Different difficulties exist before such a change can be accomplished, and in this manner, we need to change the current cycle of how we plan data innovation and advanced practices in our examination [25]. Hariri et al., 2019, state that Different types of vulnerability exist in huge information and enormous information examination that may contrarily affect the adequacy and exactness of the results. Uncertainty is a circumstance which includes obscure or flawy information. The number of missing connections between information focused on interpersonal organizations is roughly 80% to 90% and the quantity of missing trait esteems inside persistent reports interpreted from specialist analyzes are over 90% [26]. Al-Zobbi et al., 2017, state that Data analytics is prone to privacy violations and data disclosures, which can be part of the way credited to the multi-client attributes of enormous information environments. Data anonymization can address a portion of these worries by giving instruments to cover and can assist with hiding the weak information. An epic system that executes SQL-like Hadoop environments joining Pig Latin with the extra parting of information is suitable to give a fine-grained veiling and covering, which depends on access level advantages of the client [27]. Jahnavi, Y, states that there exist various term weighting algorithms for extracting the seminal features from various documents [28–35].
20
Y. Jahnavi et al.
3 Methodology Weblog data has the nature of dynamic and large volume. The traditional RDBMS is not sufficient to manage highly scalable web log data. Hadoop framework can overcome the problems raised in traditional systems. Hadoop is a sophisticated framework to handle huge amounts of scalable data. It contains a Map Reduce framework which helps in writing applications to process huge amounts of data in parallel, on large clusters of vendible hardware in a reliable manner. Analysis of Weblog data needs to be processed in a distributed environment due to its nature of huge volume and also the generation of online streaming data. Hadoop framework with pig scripting is used for extracting the dynamic patterns from weblog data in a distributed environment. Various situations in the data set are represented by using status codes. Frequency of status codes is also analyzed. Managing a huge volume of data using distributed processing greatly reduces the execution time and also the generation of online streaming data. Hadoop framework with pig scripting is used for extracting the dynamic patterns from weblog data in a distributed environment. Various situations in the data set are represented by using status codes. Frequency of status codes is also analyzed. Managing a huge volume of data using distributed processing greatly reduces the execution time. Analysis of Weblog data using Hadoop can be performed with various tools and approaches such as Hadoop tools and Hadoop Distributed File System and Pig. These are used in storing and processing large amounts of datasets in parallel manner. Before applying the proposed algorithm, the data needs to be preprocessed. Weblog data analysis Weblog data analysis performs pre-processing, uploading of data, Hadoop processing, and Analysis. Data pre-processing: Pre-process the data sets using various Pre-processing techniques like Data Cleaning, Data Reduction, Data Integration, and Data Transformation. Data Uploading: In this phase, we need to upload the data sets to the Hadoop Distributed File System to Process the data sets. Hadoop Processing: After uploading the data sets to the HDFS, then we need to process those data sets based on the type of analysis. Analysis: In the analysis phase, we can create various dimensions analysis reports based on the results.
2 A Novel Processing of Scalable Web Log Data Using Map Reduce …
21
ALGORITHM FOR ANALYSIS OF STATUS CODES Algorithm: Pig Servlet (Data, STATUS_CODE) Input: NASA’S WEB LOG DATA, STATUS_CODE Where weblog data contains log files which may contain information like username, IP address, Time stamp, URL, file location, type of HTTP request, status code, number of bytes downloaded etc. STATUS_CODE is HTTP Server response code. Output: Analysis Report on different status codes in the web log files. Procedure: for each record in the data Identify the status codes in each record for each value in object if value is equal to “STATUS _CODE” then Filter out the particular record end if end for; for each filtered record Group the records based on either Host name (or) IP_ Address end for; end for; for each grouped record Count the no of entries in the Group Filter those records by using a threshold value end for; Return the final filtered documents.
The architecture of the weblog data analysis system has been represented in Fig. 1. A user collects the weblog data from different web servers and then sends the collected data to the Hadoop distributed file system using Pig Scripting language. Analysis is performed based on the user’s type of analysis and the analysis reports which were generated were sent to a large-scale data processing system.
22
Y. Jahnavi et al.
HDFSHDFS
A large-scale data processing system
Pig Server Map Reduce
User
Analysis report of web Log File
Fig. 1 Architecture of the weblog data analysis system
4 Results In this paper, the analysis reports of three web log data sets which were analyzed based on their status codes are shown. NASA-HTTP, CLARKNET-HTTP, SASKATCHEWAN-HTTP are considered in our application. Here are the bar chart and pie chart representing the number of hits of a particular status code in those data sets. The number of records in each status code such as Nasa, Clarknet, and saskat are represented in Fig. 2. Figure 3 represents the number of records in each status code in CLARKNET-HTTP Data set. The number of records in each status code in SASKATCHEWAN-HTTP Data set is represented in Fig. 4. Figure 5 represents the number of records in each status code of NASA-HTTP Data set. The status code 401 represents Unauthorized access, 404 represents Not Found, 203 represents Non-Authoritative Information, 204 represents No content, 400 represents Bad Request, 406 represents Not Acceptable, 409 represents Conflict, 500 represents Internal Server Error, 502 represents Bad Gateway, and 503 represents Service Unavailable.
No of records
80000 60000 40000 20000 0
203
204 Nasa
400
404
406
Clarknet Status codes
Fig. 2 Number of records in each status code
409
500 saskat
502
503
2 A Novel Processing of Scalable Web Log Data Using Map Reduce … Fig. 3 Number of records in each status code in CLARKNET-HTTP Data set
23
clarknet 203 204 400 404
Fig. 4 Number of records in each status code in SASKATCHEWAN-HTTP Data set
saskat 203 204 400 401
Fig. 5 Number of records in each status code of NASA-HTTP Data set
Nasa
203 204 400 404 406 409 500 502 503
The analysis has been performed using PIG scripting language. These are useful in real-time applications such as Track Advertising, Monitor Search Engine Traffic, Cracks and Hacks, Trends, Identifying Gold Referrers, Know Thy Browsers.
5 Conclusion Enormous amount of data is being generated daily from various sources. A functional approach is a prerequisite for examining, analyzing, and reporting the originated data. Distributed processing framework is required for handling large datasets and also to decrease the execution time. Dynamic patterns of web log files have to be processed, analyzed, and managed. Status Codes in the web log file data have been analyzed and the relevant data have been extracted from the experimentation.
24
Y. Jahnavi et al.
References 1. Janev V, Puji´c D, Jeli´c M, Vidal ME (2020) Chapter 9 Survey on big data applications. In: Janev V, Graux D, Jabeen H, Sallinger E (eds) Knowledge graphs and big data processing. Lecture notes in computer science, vol 12072. Springer, Cham. https://doi.org/10.1007/978-3030-53199-7_9 2. Durand T, Hattingh M (2020) Data mining and artificial intelligence techniques used to extract big data patterns. In: 2020 2nd international multidisciplinary information technology and engineering conference (IMITEC), pp 1–8. https://doi.org/10.1109/IMITEC50163.2020.933 4069 3. Hassan AO, Hasan AA (2021) Simplified data processing for large cluster: a Map Reduce and Hadoop based study. Adv Appl Sci 6(3):43–48. https://doi.org/10.11648/j.aas.20210603.11 4. Li L (2021) Efficient distributed database clustering algorithm for big data processing. In: 2021 6th international conference on smart grid and electrical automation (ICSGEA), pp 495–498. https://doi.org/10.1109/ICSGEA53208.2021.00118 5. Fernandez-Basso C, Dolores Ruiz M, Martin-Bautista MJ (2021) Spark solutions for discovering fuzzy association rules in big data. Int J Approx Reason 137:94–112. ISSN 0888-613X. https://doi.org/10.1016/j.ijar.2021.07.004 6. Gao W, Wu J (2022) Multi-relational graph convolution network for service recommendation in mashup development. Appl Sci 12:924. https://doi.org/10.3390/app12020924 7. Arulmozhi P, Murugappan A (2021) DSHPoolF: deep supervised hashing based on selective pool feature map for image retrieval. Vis Comput 37:1–15. https://doi.org/10.1007/s00371020-01993-4 8. Sunitha T, Sivarani TS (2021) An efficient content-based satellite image retrieval system for big data utilizing threshold-based checking method. Earth Sci Inform 14. https://doi.org/10. 1007/s12145-021-00629-y 9. Abrahams et al (2000) Explorations in Hubble space: a quantitative tuning fork. Astron J 2835–2842 10. Lopes R, Menascé D (2016) A taxonomy of job scheduling on distributed computing systems. IEEE Trans Parallel Distrib Syst 27:1 11. Jalalian Z, Sharifi M (2018) Autonomous task scheduling for fast big data processing. Big data and HPC: ecosystem and convergence. IOS Press, pp 137–154 12. Wang Z et al (2019) Evaluation of methane production and energy conversion from corn stalk using furfural wastewater pre-treatment for whole slurry anaerobic co-digestion. Bioresour Technol 293 13. Scuotto V et al (2017) The performance implications of leveraging internal innovation through social media networks: an empirical verification of the smart fashion industry. Technol Forecast Soc Chang 120:184–194 14. Ramesh D, Katheria YS (2019) Ensemble method based predictive model for analysing disease datasets: a predictive analysis approach. Heal Technol 9(4):533–545 15. Chen H, Jiang B (2019) A review of fault detection and diagnosis for the traction system in high-speed trains. IEEE Trans Intell Transp Syst 21(2):450–465 16. Mana SC (2018) A feature-based comparison study of big data scheduling algorithms. In: 2018 international conference on computer, communication, and signal processing (ICCCSP). IEEE, pp 1–3 17. Bello-Orgaz G, Jung JJ, Camacho D (2016) Social big data: recent achievements and new challenges. Inf Fusion 28:45–59 18. Hashem IAT, Anuar NB, Marjani M, Ahmed E, Chiroma H, Firdaus A, Gani A (2020) Map Reduce scheduling algorithms: a review. J Supercomput 76(7):4915–4945 19. Seera NK, Taruna S (2018) Leveraging map reduce with column-oriented stores: study of solutions and benefits. Big data analytics. Springer, Singapore, pp 39–46 (2018) 20. Beame P, Koutris P, Suciu D (2017) Communication steps for parallel query processing. J ACM (JACM) 64(6):1–58
2 A Novel Processing of Scalable Web Log Data Using Map Reduce …
25
21. Teng C-I (2017) Strengthening loyalty of online gamers: goal gradient perspective. Int J Electron Commer 21(1):128–147 22. Arora A, Rakhyani S (2020) Investigating the impact of exchange rate volatility, inflation and economic output on international trade of India. Indian Econ J 23. Anto Praveen MD, Bharath B (2017) A survey paper on big data analytics. In: IEEE international conference on information, communication & embedded systems (ICICCES) 24. Pol UR (2016) Big data analysis: comparison of Hadoop Map Reduce, Pig and Hive. Int J Innov Res Sci Eng Technol 5(6) 25. Pappas IO, Mikalef P, Giannakos MN et al (2018) Big data and business analytics ecosystems: paving the way towards digital transformation and sustainable societies. Inf Syst E-Bus Manage 16:479–491 26. Hariri RH, Fredericks EM, Bowers KM (2019) Uncertainty in big data analytics: survey, opportunities, and challenges. J Big Data 6:44 27. Al-Zobbi M, Shahrestani S, Ruan C (2017) Improving Map Reduce privacy by implementing multi-dimensional sensitivity-based anonymization. J Big Data 4:45 28. Jahnavi Y (2015) FPST: a new term weighting algorithm for long running and short-lived events. Int J Data Anal Tech Strat (Inderscience Publishers) 7(4) 29. Jahnavi Y (2012) A cogitate study on text mining. Int J Eng Adv Technol 1(6):189–196 30. Jahnavi Y (2019) Analysis of weather data using various regression algorithms. Int J Data Sci (Inderscience Publishers) 4(2) 31. Jahnavi Y, Elango P, Raja SP et al (2022) A new algorithm for time series prediction using machine learning models. Evol Intel. https://doi.org/10.1007/s12065-022-00710-5) 32. Jahnavi Y (2019) Statistical data mining technique for salient feature extraction. Int J Intell Syst Technol Appl (Inderscience Publishers) 18(4) 33. Jahnavi Y, Radhika Y (2013) Hot topic extraction based on frequency, position, scattering and topical weight for time sliced news documents. In: 15th international conference on advanced computing technologies, ICACT 2013 34. Yeturu J et al (2021) A novel ensemble stacking classification of genetic variations using machine learning algorithms. Int J Image Graph 2350015 35. Bhargav K, Asiff SK, Jahnavi Y (2019) An extensive study for the development of web pages. Indian J Public Health Res Dev 10(5)
Chapter 3
A Review of Disease Detection Emerging Technologies of Pre and Post harvest Plant Diseases: Recent Developments and Future Prospects Sakshi Pandey, Kuldeep Kumar Yogi, and Ayush Ranjan
1 Introduction Agriculture has a prime role in each individual’s life directly or indirectly. It is the methodology of producing crops that offers food, which is the building block of a human [1]. In the agricultural sector, PD is a major cause that influences economic losses. The quality of crops being produced is directly influenced by these PDs. The PDD and PDC are the major tasks to advance the quality of crop manufacture intended for the development of the economy [2]. Once in India, around 2 million of the population died as a consequence of the higher dependence of the populace on a specific crop like rice that was affected by the fungus Cochliobolus miyabeanus. Similarly in the USA, the maize harvest was entirely damaged by a fungus of the same species, Cochliobolus heterostrophus rigorously influencing the economy [3]. Pathogenic micro-organisms appear everywhere in nature. On account of the vulnerability of the crop against influences of the pathogens, the pathogens feature the indications in the PDs [4]. Several detection methodologies are established to spot PDs. Pre-harvest along with post-harvest PDD is extremely important. Since imaging methodologies are low-cost and harmless, these methods are generally utilized in DD devices to determine diseases. However, in all computational methods, the existence of the DD procedure is almost identical [5]. The Deep Learning (DL) methodologies introduced are becoming popular in recent days. DL is an enhanced version of Machine Learning (ML) methodology that utilizes Artificial Neural Networks (ANN) that functions like a human brain [6]. If an Artificial Intelligent (AI) device can identify disease from a picture of an infected leaf, it will be S. Pandey (B) · K. K. Yogi Department of Computer Science, Banasthali Vidyapith, Tonk, Rajasthan 304022, India e-mail: [email protected] A. Ranjan Department of Engineering and Technology, Rajasthan University, Jaipur 302004, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_3
27
28
S. Pandey et al.
Fig. 1 Basic architecture of plant disease detection
enormously supportive of the agriculture industry [7]. Taking into account the drawbacks of the prevailing surveys, a detailed nomenclature of citrus diseases containing segmentation, FE, preprocessing, classification, and feature selection, together with their challenges, pros, and cons is represented in this research. The PDD of preharvest along with post-harvest crops is reviewed in this paper. It also correlates the pros in addition to the cons of these effective methodologies. The PDD’s basic structure is displayed in Fig. 1.
2 Materials and Methods PD is an issue that appears as a consequence of an anomaly in the plant’s physiology, form, or behavior. The image analysis methodologies are discussed in Sect. 2.1. The segmentation methodologies utilized in PDD are explicated in Sect. 2.2. The FE approaches meant for PDD are discussed in Sect. 2.3. In Sect. 2.4, the ML-based PDC is explicated. The DL techniques intended for PDC are illustrated in Sect. 2.5.
2.1 Image Acquisition This is the initial step for a computer vision approach. It involves the extrication of images as of specific datasets in conjunction with the capture of images straight from the field. The DD device’s accuracy is centered on the integrity of images extricated for the training purpose. Various types of images employed in PDD are explicated in this session. Barbedo et al. [8] examined the usage of Near-Infrared (NIR) HS imaging (HSI) for finding sprout destruction in wheat kernels (WK). Experimentations were
3 A Review of Disease Detection Emerging Technologies …
29
conducted for determining the spectral bands that had a better potency for differentiating among sound together with sprouted kernels. Two wavelengths were chosen along with amalgamated into an index that was utilized to spot the existence or the non-existence of sprouting. Experimentations comprising 3 WKs exhibited that the methodology was proficient in detecting kernels for which the sprouting procedure had initiated, attaining 100 percent accuracy for the models utilized in this study. However, the efficient estimation of the seriousness of the sprouting was impossible. Wu et al. [9] recommended a Sentinel-2 Multispectral Imagery for proficiently finding wheat Fusarium Head Blight (FHB). To identify FHB at a provincial scale, the Red-Edge Head Blight Index (REHBI) was produced. HS data at the canopy scale was incorporated to reproduce Sentinel-2 multispectral reflection utilizing the sensor’s Relative Spectral Response (RSR) function. After that, several differentials together with a ratio amalgamation of Sentinel-2 bands that were sensitive to FHB strength were chosen. In general, REHBI executed well in DD than Optimized Soil Adjusted Vegetation Index (OSAVI) in conjunction with Remote Digital Video Inspection (RDVI). The kappa coefficient was 0.51, accompanied by an accuracy of 78.6%. The experiential outcomes exhibited that REHBI would be deployed to observe FHB. However, the HS wheat kernel data, cultivation procedures, and the crop cultivars, in conjunction with the management practice of the wheat field utilized in the research were consistent. Hamuda et al. [10] developed a Hue Saturation Value (HSV) color space accompanied by morphological procedures intended for detecting crops automatically underneath field conditions. The technique centered on color features together with morphological corrosion and dilation was illustrated. The crop and soil, in addition to weeds, were differentiated by the algorithm by utilizing the HSV color space. The ROI was proffered by straining every single HSV channel between specific values. The method’s efficiency was appraised by correlating the outcomes attained with those of manual annotation. It obtained sensitivity and precision of 98.91 and 99.04%, respectively. However, sometimes the misclassification rate was augmented by the method.
2.2 Image Segmentation Segmentation is utilized for sectioning the picture to detect ROI. It intends to partition the region possessing anomalies. The segmentation methodologies meant for PDD together with limitations are discussed in the section. Hayat et al. [11] produced unsupervised Bayesian learning (BL) for rice panicle segmentation employing unmanned aerial vehicle (UAV) images. In the BL, the dispensations of pixel intensities are implicated to execute a multivariate Gaussian mixture model (GMM), with varied elements in the GMM respective to several groups, namely leaves, panicles, or background. The occurrence of every single group was described by the weights related to every single element in the GMM. The factors were repeatedly reviewed by utilizing the Markov chain Monte Carlo (MCMC)
30
S. Pandey et al.
methodology with Gibbs sampling, without the desire for labeled training data. Implementing the unsupervised BL methodology on varied UAV images obtained mean precision, recall, and F1-score of 72.31, 96.49, and 82.10%, correspondingly. But a great number of background pixels encircling the panicles were miscategorized as panicles by the Panicle-SEG methodology. Xiao et al. [12] produced a principal component analysis (PCA) in addition to a backpropagation neural network (PCA-BP) for rice blast identification. Initially, the harvested lesion’s image was produced with 10 morphological features, 6 color features, and 5 texture features of every single lesion that was extricated. Next, to appraise the association between the parameters, the stepwise regression evaluation was utilized. The outcomes displayed a linear correlation. After that, the PCA methodology was utilized to minimize the dimension linearly, in addition to plotting 21 factors into 6 extensive features as input factors. The experiential outcomes displayed that the Average Recognition Rate (ARR) of rice blast centered on PCA along with the BP neural network was 95.83%, which was 7.5% more than the ARR utilizing BP neural network. However, the methodology had higher data loss. Xiong [13] introduced an Automatic Image Segmentation Algorithm (AISA) centered on the grab-cut approach for the detection of cash crop diseases. The MobileNet CNN technique was chosen as the DL method and abundant crop images of the Internet along with practical planting bases are included to extend the public database Plant Village for the reason of enhancing the capability of MobileNet. However, the disease’s various enhanced stages weren’t detected by the methodology.
2.3 Feature Extraction Features represent relevant data related to objects, differentiating one object from other objects. Features are useful to detect objects and, in addition, to allocate the class label to an object. These features being extricated are supplied to classification. The FE methodologies together with limitations are briefly explicated in this section. Guo et al. [14] introduced a Gaussian model for the features extricated as in the infected WKs images. A signal processing methodology centered on the Gaussian modeling along with an enhanced Extreme Learning Machine (ELM) methodology was employed for detecting the insect as well as sprout-affected WKs. Discriminate features extricated as Gaussian-model-determined features were supplied to an ELM centered on a C-matrix fixed optimization approximation solution. By utilizing this methodology, better outcomes of 96.0% of insect-damaged, 92.0% of undamaged, together with 95.0% of sprout-damaged WKs were categorized perfectly. However, this methodology consumed more time. Huang et al. [15] produced Optimized Spectral Indices (OSI) for the detection in conjunction with the observation of winter wheat diseases. The novel OSI was derived as a weighted amalgamation of a single band together with a standardized wavelength variation of 2 bands. The largest together with the smallest related wavelengths for several diseases was initially extricated as leaf spectral information utilizing the
3 A Review of Disease Detection Emerging Technologies …
31
RELIEF-F methodology. The classification accuracies of these novel indices for the normal along with the affected leaves with powdery mildew, aphids, and yellow rust were 86.5, 85.2, 93.5, and 91.6%, correspondingly. However, for HS data, the method possessed lower sensitivity. Luo et al. [16] introduced HS measurements for detecting the density of aphids in winter wheat leaves. Four methodologies were utilized to investigate the Spectral Features (SF) to recognize the aphid density of wheat leaf; along with that, a methodology was recognized to establish the aphid density utilizing Partial Least Square Regression (PLSR). A total of 48 SFs was attained via independent t-tests through spectral derivative methodology, correlation analysis, Continuous Wavelet Analysis (CWA), continuous removal method, and commonly utilized vegetation indices meant for detecting stress. The outcome displayed that the methodology had a better potency in identifying aphid density with a Root Mean Square Error (RMSE) of 15 together with a coefficient of estimation of 0.77. But there was no efficient examination of spectral reactions influenced by aphid infestation. Zhang et al. [17] introduce a higher-resolution color FE along with multispectral imaging meant for the detection of rice sheath blight (ShB). It was established that the transformation in color could identify the affected regions of ShB in the field designs via color FE together with the space transformation of images. These outcomes demonstrated that a customer-grade UAV incorporated with digital along with multispectral cameras could be a valuable instrument to identify the ShB disease at a field scale. However, the methodology was not appropriate for fungicide utilization. Bakhshipour and Jafari [18] developed ANN for the finding of weeds utilizing shape factors. A Support Vector Machine (SVM) along with ANN was deployed to assist the vision methodology in weed detection regarding their model. In this, 4 species of usual weeds in sugar beet fields were reviewed. Fourier descriptors along with moment invariant aspects were contained in the shape feature sets. The outcomes displayed that the ANN’s classification accuracy was 92.92%, whereas 92.50% of weeds were categorized perfectly. While utilizing SVM as the classifier, higher accuracy of 95% was attained where 93.33% of weeds were classified perfectly. However, the methodology had lesser convergence. Anjnaa et al. [19] developed a Gray Level Co-occurrence Matrix (GLCM) meant for PDD together with PDC. This methodology detected the capsicum diseases automatically and categorizes if the capsicum or its leaf was infected or normal, the capsicum’s affected region extricated by k-means clustering methodology following SVM was utilized for training along with classification purposes; among these classifiers, KNN along with SVM provided enhanced outcomes for the applications. This procedure was executed on 62 pictures of normal or infected capsicum together with its leaves. Using SVM, these images were categorized into healthy along with infected ones with 100% accuracy. However, the methodology couldn’t detect crop diseases precisely in the earlier stages. Chowdhury et al. [20] recommended a CNN intended for the recognition together with recognition of rice diseases. The novel models, namely VGG16 along with InceptionV3, had been implemented for the detection in conjunction with the identification of rice diseases in conjunction with pests. The experiential outcomes displayed
32
S. Pandey et al.
the efficacy of these architectures with actual databases. Hence, a 2-stage small CNN approach had been proffered as well as contrasted with the novel memoryefficient CNN models like NASNet Mobile, MobileNet, and SqueezeNet. The experiential outcomes displayed that the model could attain the needed accuracy of 93.3% possessing a minimized model size. However, the methodology had lower precision.
2.4 Machine Learning Approaches for Plant Disease Detection The ML methodologies for DD of pre-harvest in addition to post-harvest wheat along with rice plants are illustrated in this section. Numerous ML methodologies have been deployed for the PDD together with the PDC. The pre-harvest PDD utilizing ML methodologies is discussed in Table 1.
2.5 Deep Learning Approaches DL is an AI function that replicates the functioning of the human brain in processing data together with producing patterns intended for making decisions. DL has a key role in PDD at the earlier stage. The DD of pre-harvest in addition to post-harvest wheat along with rice plants utilizing DL methodologies is discussed in this section. Pre-harvest plant disease detection. Indirect methodologies can be employed to discover diseases in the pre-harvest together with post-harvest plants. The pre-harvest PDD utilizing DL methodologies is discussed in this section. The pre-harvest PDD is explained in Table 2. Post-harvest plant disease detection. Post-harvest diseases are those that occur following the harvest. The post-harvest PDD utilizing DL methodologies is explicated in Table 3. Various ML together with DL classifiers are utilized to recognize the diseases in wheat along with rice plants. The methodologies are appraised regarding certain performance metrics like sensitivity, accuracy, and specificity. The performance of DL along with ML classifiers is illustrated in the figure given below. The ML methodologies’ PDC accuracy is displayed in Fig. 2. Spectral vegetation indices-based kernel discriminant methodology (SVIKDA) [39] obtains 88.2% of accuracy. Similarly, Fuzzy Relevance Vector Machine (FRVM) [40], SVM [41], Spectral Angle Mapper (SAM), SVM [21], along with SLIC-SVM [23], and SVM [24] together with KNN obtain an accuracy of 99.87, 93, 87, 80, and 88.15%, respectively. The accuracy of DL methodologies for PDD is illustrated in Fig. 3. DCNN [42, 43], Transfer learning [6], DCNN-LSTM [44], and FCM-KM [14] attain an accuracy
3 A Review of Disease Detection Emerging Technologies …
33
Table 1 Disease detection of pre-harvest plants using ML techniques Author
Classifiers
Dataset
Results
Drawbacks
Mewes et al. [21]
SVM
HyMap
Accuracy—95%
Computationally intensive and thus time-consuming
Jiang et al. [22]
SVM
Rice leaf dataset
Accuracy—96.8%
Relied on larger scale databases
Mia et al. [23]
SVM
Benchmark datasets Accuracy—88% Precision—54% Recall—99% F1_measure—70%
Texture-based features were not utilized
Sun et al. [24]
SLIC with SVM
Publicly available datasets
Accuracy—80%
This methodology offered certain unrelated features
Phadikar et al. [25]
Bayes’ classifier and SVM
Self-captured from agriculture fields utilizing Nikon COOLPIX P4
Accuracy—79.5%, 68.1%
Similar kind of features was regarded
Kusumo et al. [26]
SVM, DT, RF, Naïve Bayes
Gray leaf spot, Common rust, Northern blight, Healthy
Accuracy—87%
However, the device utilized lower level
Shah et al. [27]
NN with various distance metrics, SVM with several kernel functions
Self-captured, K-NN: 85, SVM: 88 utilizing Samsung digital camera PL200, from Agriculture university Dharwad
Focused on particular diseases
Chung et al. [28]
SVM
PlantVillage
Lower reorganization rate
Accuracy—87.9%
Table 2 Disease detection of pre-harvest plants Researchers
Classifiers
Limitation
Bai et al. [29] Rice spike
Diseases
CNN
Higher false positive rate
Lu et al. [30]
Wheat disease
Fully convolutional network To structure the handcrafted feature extractors, expert knowledge was required
Wu et al. [9]
Wheat grain disease Faster R-CNN
The execution time might be increased in this structure
34
S. Pandey et al.
Table 3 Post-harvest plant disease detection Researchers
Diseases
Classifiers
Limitation
Atila et al. [31]
Plant leaf disease
EfficientNet
The methodology would possess lower accuracy in complicated regions
Lee et al. [32]
Multi-crop diseases
CNN
Not appropriate for pictures having low contrast
Nagasubramanian et al. [33]
Charcoal rot
3D DCNN
This technique utilized a smaller database
Asfarian et al. [34]
Leaf blast, Leaf blight, Brown spot, Dan tungro
Probabilistic neural network (PNN)
Possessed identical features
Kai et al. [35]
Gray Speck
EBPN
A lesser number of pictures was employed for features
Ramesh and Vydeki [36]
Paddy leaf diseases
Deep neural network with Jaya algorithm
Higher misclassification rate
Sethy et al. [37]
Bacterial blight
CNN
It needed extremely higher training time
Geetharamani and Arun Pandian [38]
Yellow leaf curl disease
DCNN
Lower convergence
Fig. 2 Plant disease classification accuracy of ML techniques
of 91.83, 75, 97.5, and 97.2%, respectively. Similarly, ANN [36], Generative Adversarial Network (GAN) [45], CNN [2, 46], FCNN [30], R-CNN [9], and BPNN [47] obtain an accuracy of 90, 98.7, 84.54, 93.27, 91, and 88.9%, respectively. Finally, PNN [48] and DCGAN [49] have 98.5% and 96.3% of accuracy, respectively.
3 A Review of Disease Detection Emerging Technologies …
35
Fig. 3 Plant disease classification accuracy of DL techniques
The sensitivity of DL methodologies is given in Fig. 4. DCNN [43, 50], DCNNLSTM [44], ANN [36], PNN [1], DNN [38], DCNN [51, 52], F-CNN [27, 53], 3DCNN [33], DCNN [43], and EBPN [33] achieve a sensitivity of 85.71, 71.6, 90, 87.7, 86.2, 86.2, 79, 90.4, 97.61, and 70% respectively. DL classifiers’ specificity is depicted in Fig. 5. DCNN (Jinxiu Chena [49] 2020) has 98.11%. DCNN-LSTM [44] obtains 90.5% of specificity. ANN (S. Ramesh and D. Vydeki [36]) and PNN [52] have 89 and 89.6% of specificity. DNN [42] and DCNN [51] attain 87.7 and 88%. 3DCNN [33] has 87.3% of specificity. Lastly, DCNN [43] and CNN [54] have 96.87 and 92% of specificity.
Fig. 4 Plant disease classification sensitivity of DL techniques
36
S. Pandey et al.
Fig. 5 Plant disease classification specificity of DL techniques
3 Results and Discussion A sequence of classification methods employed for the task of PDD in addition to PDC has been examined. Frequently, SVM together with ANN has been employed for this purpose. Mostly employed classifiers are feature-based in which the execution is superior to other methods like Naïve Bayes, BP neural net, KNN, decision tree, and probable neural net classifiers. It is noticed that numerous researchers attempted to automate PDD methodology. In certain surveys, the efficiency is improved with highly satisfactory outcomes along with small databases for several cultures. Additionally, CNN has been examined in numerous cultures to make PDD along with PDC highly accurate. CNN’s capacity could be employed for systems structured for single cultures. However, the over-fitting of CNN is observed as a foremost problem in methods related to PDD utilizing a deep neural net. Various methodologies have been mentioned, which could enhance the CNN-based classification system’s proficiency for single or numerous cultures.
4 Conclusion In this document, numerous related works to computerize the PDC, as well as the PDD method employing ML along with DL methodologies, were summarized. In India, to conquer the difficulty of deprivation in agriculture an effective automatic device for PDD is extremely needed. The review illustrates various proficient methodologies utilized for preprocessing modules, schemes for the segmentation of diseases, FE, preprocessing module, and classifiers. It also summarized several issues in the FE part. Additionally, the drawbacks in the prevailing methodologies have also conversed with the intention to enhance proficiency without undermining the recent approaches.
3 A Review of Disease Detection Emerging Technologies …
37
Numerous computer vision approaches approved by various researchers have also been surveyed together with an exhibition of research in recent times. In the literature, several researchers intended to make DD completely automated. However, certain outcomes should be prepared by an expert to authorize consistency. Accordingly, some computerization with expert involvement is desired these days rather than completely automatic devices. For the upcoming researchers, this may appear as another concern. Additionally, the emergence of real-time appliances might be a better research problem in this sector.
References 1. Singh V, Sharma N, Singh S (2020) A review of imaging techniques for plant disease detection. Artif Intell Agric 4:229–242 2. Iqbal Z, Khan MA, Sharif M, Shah JH, HabiburRehman M, Javed K (2018) An automated detection and classification of citrus plant diseases using image processing techniques a review. Comput Electron Agric 153:12–32 3. Ray M, Ray A, Dash S, Mishra A, Gopinath Achary K, Nayak S, Singh S (2017) Fungal disease detection in plants traditional assays, novel diagnostic techniques and biosensors. Biosens Bioelectron 87:708–723 4. Golhani K, Balasundram SK, Vadamalai G, Pradhan B (2018) A review of neural networks in plant disease detection using hyperspectral data. Inf Process Agric 5:354–371 5. Radhakrishanan M (2020) Automatic identification of diseases in grains crops through computational approaches a review. Comput Electron Agric 178:1–24 6. Garcia J, Barbedo A (2019) Plant disease identification from individual lesions and spots using deep learning. Biosys Eng 180:96–107 7. Rahman MdA, Islam MdM, Mahdee GM, S., WasiUlKabirMd: Improved segmentation approach for plant disease detection. In: 1st international conference on advances in science, engineering and robotics technology, 3–5 May, Dhaka, Bangladesh 8. Barbedo JGA, Guarienti EM, Tibola CS (2018) Detection of sprout damage in wheat kernels using NIR hyperspectral imaging. Biosyst Eng 175:124–132 9. Wu W, Yang T, Li R, Chen C, Liu T, Zhou K, Sun C, Li C, Zhu X, Guo W (2020) Detection and enumeration of wheat grains based on a deep learning method under various scenarios and scales. J Integr Agric 19:1998–2008 10. Hamuda E, McGinley B, Glavin M, Jones E (2017) Automatic crop detection under field conditions using the HSV colour space and morphological operations. Comput Electron Agric 133:97–107 11. Hayat MdA, Wu J, Cao Y (2020) Unsupervised Bayesian learning for rice panicle segmentation with UAV images. Plant Methods 16:1–13 12. Xiao M, Ma Y, Feng Z, Deng Z, Hou S, Shu L, Lu Z (2018) Rice blast recognition based on principal component analysis and neural network. Comput Electron Agric 154:482–490 13. Xiong Y, Liang L, Wang L, She J, Wu M (2020) Identification of cash crop diseases using automatic image segmentation algorithm and deep learning with expanded dataset. Comput Electron Agric 177:105712 14. Guo M, Ma Y, Yang X, Mankin RW (2019) Detection of damaged wheat kernels using an impact acoustic signal processing technique based on Gaussian modelling and an improved extreme learning machine algorithm. Biosyst Eng 184:37–44 15. Huang W, Guan Q, Luo J, Zhang J, Zhao J, Liang D, Huang L, Zhang D (2014) New optimized spectral indices for identifying and monitoring winter wheat diseases. IEEE J Sel Topics Appl Earth Observ Remote Sens 7:2516–2524
38
S. Pandey et al.
16. Luo J, Huang W, Zhao J, Zhang J, Zhao C, Ma R (2013) Detecting aphid density of winter wheat leaf using hyperspectral measurements. IEEE J Sel Topics Appl Earth Observ Remote Sens 6:690–698 17. Zhang D, Zhou X, Zhang J, Lan Y, Xu C, Liang D (2008) Detection of rice sheath blight using an unmanned aerial system with high-resolution color and multispectral imaging, PLoS ONE 23:1–14 18. Bakhshipour A, Jafari A (2018) Evaluation of support vector machine and artificial neural networks in weed detection using shape features. Comput Electron Agric 145:153–160 19. Anjnaa, Sood M, Singh PK (2020) Hybrid system for detection and classification of plant disease using qualitative texture features analysis. Procedia Comput Sci 167:1056–1065 20. Chowdhury RR, Arko PS, Ali ME, Khan MAI, Apon SH, Nowrin F, Wasif A (2020) Identification and recognition of rice diseases and pests using convolutional neural networks. Biosyst Engineering, 194; 112–120, (2020). 21. Mewes T, Franke J, Menz G (2011) Spectral requirements on airborne hyperspectral remote sensing data for wheat disease detection. Precis Agric 12:795–812 22. Jiang F, Lu Y, Chen Y, Cai D, Li G (2020) Image recognition of four rice leaf diseases based on deep learning and support vector machine. Comput Electron Agric 179:105824 23. Mia MdR, Roy S, Das SK, Atikur Rahman Md (2020) Mango leaf disease recognition using neural network and support vector machine. Iran J Comput Sci 3:1–9 24. Sun Y, Jiang Z, Zhang L, Dong W, Rao Y (2019) SLIC_SVM based leaf diseases saliency map extraction of tea plant. Comput Electron Agric 157:102–109 25. Phadikar S, Sil J, Das AK (2012) Classification of rice leaf diseases based on morphological changes. Int J Inf Electron Eng 2:460–463 26. Kusumo BS, Heryana A, Mahendra O, Pardede HF (2018) Machine learning-based for automatic detection of corn-plant diseases using image processing. In: 2018 international conference on computer, control, informatics and its applications (IC3INA), pp 93–97 27. Shah NB, Thakkar TC, Raval SM, Trivedi H (2019) Adaptive live task migration in cloud environment for significant disaster prevention and cost reduction. Inf Commun Technol Intell Syst Smart Innov Syst Technol 106:639–654 28. Chung C-L, Huang KJ, Chen SY, Lai MH, Chen YC, Kuo YF (2016) Detecting bakanae disease in rice seedlings by machine vision. Comput Electron Agric 121:404–411 29. Bai X, Cao Z, Zhao L, Zhang J, Lv C, Li C, Xie J (2018) Rice heading stage automatic observation by multi-classifier cascade based rice spike detection method. Agric For Meteorol 259:260–270 30. Lu J, Hu J, Zhao G, Mei F, Zhang C (2017) An in-field automatic wheat disease diagnosis system. Comput Electron Agric 142:369–379 31. Atila Ü, Uçar M, Akyol K, Uçar E (2020) Plant leaf disease classification using efficient net deep learning model. Ecol Inform. https://doi.org/10.1016/j.ecoinf.2020.101182 32. Lee SH, Goëau H, Bonnet P, Joly A (2020) New perspectives on plant disease characterization based on deep learning. Comput Electron Agric 170:1–12 33. Nagasubramanian K, Jones S, Singh AK, Sarkar S, Singh A, Ganapathysubramanian B (2019) Plant disease identification using explainable 3D deep learning on hyperspectral images. Plant Methods 15:1–10 34. Asfarian A, Herdiyeni Y, Rauf A, Mutaqin KH (2013) Paddy diseases identification with texture analysis using fractal descriptors based on Fourier spectrum. In: International conference on computer, control, informatics and its applications, 19–21 Nov, Jakarta, Indonesia 35. Kai S, Zhikun L, Hang S, Chunhong G (2011) A research of maize disease image recognition of corn based on BP networks. In: Third international conference on measuring technology and mechatronics automation, pp 6–7 36. Ramesh S, Vydeki D (2020) Recognition and classification of paddy leaf diseases using optimized deep neural network with Jaya algorithm. Inf Process Agric 7(2):249–260 37. Sethy PK, Barpanda NK, Rath AK, Behera SK (2020) Deep feature based rice leaf disease identification using support vector machine. Comput Electron Agric 175:1–9
3 A Review of Disease Detection Emerging Technologies …
39
38. Geetharamani G, Arun Pandian J (2019) Identification of plant leaf diseases using a nine-layer deep convolutional neural network. Comput Electr Eng 76:323–338 39. Shi Y, Huang W, Luo J, Huang L, Zhou X (2017) Detection and discrimination of pests and diseases in winter wheat based on spectral indices and kernel discriminant analysis. Comput Electron Agric 141:171–180 40. Vijaya Lakshmi B, Mohan V (2016) Kernel-based PSO and FRVM an automatic plant leaf type detection using texture shape and color features. Comput Electron Agric 125:99–112 41. Römer C, Bürling K, Hunsche M, Rumpf T, Noga Georg J, Plümer L (2011) Robust fitting of fluorescence spectra for pre-symptomatic wheat leaf rust detection with support vector machines. Comput Electron Agric 79:180–188 42. Cristin R, Kumar BS, Priya C, Karthick K (2020) Deep neural network based rider-cuckoo search algorithm for plant disease detection. Artif Intell Rev 53:1–26 43. Chena J, Chena J, Zhanga D, Sun Y, Nanehkaran YA (2020) Using deep transfer learning plant disease identification for image-based. Comput Electron Agric 173:1–11 44. Raja Reddy Goluguri NV, Suganya Devi K, Srinivasan P (2021) Rice-net an efficient artificial fish swarm optimization applied deep convolutional neural network model for identifying the Oryza sativa diseases. Neural Comput Appl 33:1–16 45. Kathiresan GS, Anirudh M, Nagharjun M, Karthik RJ (2021) Disease detection in rice leaves using transfer learning techniques. J Phys Conf Ser 1911 46. Hussain A, Ahmad M, Mughal IA, Ali H (2018) Automatic disease detection in crop using wheat convolution neural network. In: The 4th international conference on next generation computing. https://doi.org/10.13140/RG.2.2.14191.46244 47. Wang X, Zhang X, Zhou G (2016) Automatic detection of rice disease using near infrared spectra technologies. J Indian Soc Remote Sens. https://doi.org/10.1007/s12524-016-0638-6 48. Picon A, Seitz M, Alvarez-Gila A, Mohnke P, Ortiz-Barredo A, Echazarra J (2019) Crop conditional convolutional neural networks for massive multi-crop plant disease classification over cell phone acquired images taken on real field conditions. Comput Electron Agric. https:// doi.org/10.1016/j.compag.2019.105093 49. Mahmoud MAB, Guo P, Wang K (2020) Pseudoinverse learning autoencoder with DCGAN for plant diseases classification. Multimed Tools Appl 79:26245–26263 50. Kai S, Zhikun L, Hang S, Chunhong G (2011) A research of maize disease image recognition of corn based on BP networks. In: 2011 third international conference on measuring technology and mechatronics automation, vol 1, no 246–249 51. Orillo JW, Cruz JD, Agapito L, Satimbre PJ, Valenzuela I (2014) Identification of diseases in rice plant (oryza sativa) using back propagation artificial neural network. In: International conference on humanoid, nanotechnology, information technology, communication and control, environment and management (HNICEM), 12–16 Nov, Palawan, Philippines 52. Mustafa MS, Husin Z, Tan WK, Mavi MF, Farook RSM (2019) Development of automated hybrid intelligent system for herbs plant classification and early herbs plant disease detection. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04634-7 53. Sharma P, Singh YPB, Ghai W (2019) Performance analysis of deep learning CNN models for disease detection in plants using image segmentation. Inf Process Agric. https://doi.org/10. 1016/j.inpa.2019.11.001 54. Sambasivam G, Opiyo GD (2021) A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks. Egypt Inform J 22(1):27–34 55. Ramesh S, Vydeki D (2019) Application of machine learning in detection of blast disease in South Indian rice crops. J Phytol 11:31–37
Chapter 4
A Comparative Study Based on Lung Cancer with Deep Learning and Machine Learning Models Yalamkur Nuzhat Afreen
and P. V. Bhaskar Reddy
1 Introduction Lung disorders, often known as respiratory diseases, are illnesses that affect the respiratory tract and other lung components [1]. Pneumonia, TB, and coronavirus infection are all lung illnesses (COVID-19). According to the International Breathing Society’s Forum [2], 334 million people have asthma, 1.4 million people die each year from tuberculosis, 1.6 million people die from lung cancer, and 1.6 million people die from pneumonia. Hundreds of thousands of people have died as a result of the war [2]. The COVID-19 epidemic has affected the entire world [3, 4], poisoning millions of individuals and putting a strain on healthcare systems. Lung illness is one of the world’s greatest common effects of humanity and disability. Early identification is critical for enhancing long-term survival and boosting the chances of recovery [5, 6]. Skin tests, blood tests, sputum samples [7], chest radiography, and computed tomography (CT) [8] have all been used in the past to diagnose lung disease. In-depth training has recently shown considerable promise when used in medical imaging to diagnose disorders, especially lung diseases. Deep learning is a type of machine learning that focuses on processes that are inspired by the brain’s function and structure. Improvements in machine learning, particularly deep learning, have made identifying, measuring, and classifying medical imaging models easier [9]. This progress has been made possible by the ability of in-depth training to learn functions entirely from data rather than manually generated functions based on field-specific expertise. Deep training is gradually gaining traction as the gold standard, resulting in improved implementation in Y. N. Afreen (B) · P. V. Bhaskar Reddy School of Computer Science and Engineering, REVA University, Bengaluru, India e-mail: [email protected] P. V. Bhaskar Reddy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_4
41
42
Y. N. Afreen and P. V. Bhaskar Reddy
a variety of medical settings. As a result, these advancements make it easier for clinicians to diagnose and categorize certain illness conditions [10, 11]. The goals of this paper are to: (1) construct a catalog of advanced deep learningfounded lung disease detection systems; (2) depict patterns in contemporary work in the field; and (3) recognize remaining challenges and probable future approaches in this area. The following is a breakdown of how this article is structured. The approach for conducting this investigation is presented in Sect. 2. The general procedures for employing in-depth training to detect lung disease in medical imaging are depicted in Sect. 3. Section 4 lays out the taxonomy and provides thorough explanations for each subtopic.
2 Background Knowledge The (SARS-CoV-2) severe-acute-respiratory-syndrome-corona-virus-2 initiates Coronavirus Disease 2019 (COVID-19), a novel and highly contagious respiratory disease. A global pandemic has developed from the present COVID-19 epidemic. As of June 7, 2020, readily available are 6,663,304 substantiated incidents and 392,802 corroborated deaths worldwide. It can spread quickly, and a variety of individuals have respiratory failure very before time on. COVID-19’s clinical symptoms, etiology, and even treatment are all based on study and examination for the duration of the critical illness period. In China, most patients have been successfully discharged. To date, no studies have given an early prognosis intended for the extent of COVID-19 patients’ lung damage and rehabilitation. When patients were discharged, many had imaging abnormalities, and some even developed lung fibrosis, according to a retrospective study. The decrease of lung function in COVID-19 patients during the early stages of recovery demands special attention. We performed retroactive research of 57 cleared but restored COVID-19 patients to have a better grasp of the likely clinical outcomes. Thirty days after discharge, sequential lung purpose, lung imaging, and cardiovascular physical activity capacity have been assessed. Furthermore, based on outcome indicators, we compared critically ill and non-critically ill individuals.
2.1 Methodology This section explains the approach used to conduct a recent study that employed in-depth training to diagnose lung illness. The methodology’s block diagram. For the first time, an appropriate database is cited as the primary source of article links. Due to the disease’s recent emergence, several pre-printed papers on COVID-19 have also been provided. Just recently published publications (2016–2020) are included in this study to guarantee that it covers only the most recent works. However, there are a few older yet important pieces included as well. The relevant search phrases
4 A Comparative Study Based on Lung Cancer with Deep Learning …
43
were utilized to find all possible papers for the detection of deep lung illnesses. “Indepth training,” “detection,” “classification,” “CNN,” “lung illness,” “tuberculosis,” “pneumonia,” “lung cancer,” “COVID-19,” and “coronavirus” were among the terms utilized. Only publications written in English were included in the research. We found 366 items at the end of this phase. Second, a screening was carried out to choose only the relevant works. Simply the right and abstract were considered during the screening. The primary assortment criteria for this study are works in which in-depth training methods were used to detect the relevant disorders. Inappropriate members were kicked out. Only 98 products were included in the shortlist, according to the inspection.
2.1.1
Lung Disease Detection with Domains
The purpose of lung disease diagnosis is to categorize a picture into two groups: healthy lungs and diseased lungs. Training is required to obtain the lung disease classifier, also known as a model. Training is the process of a neural network learning to recognize a set of images. Using deep learning, it is possible to create a model that can identify photographs based on their appropriate class labels. As a result, the first step in applying deep learning to identify lung diseases is to collect photos of lungs with the condition to be recognized. In the second step, the neural network is trained until it can distinguish diseases. The final step is to sort new photos into categories. The first phase involves acquiring photos. The computer must learn by doing in order to develop a classification pattern. To recognize an objective, the computer needs to examine a large number of photos. Deep learning models can be trained with other forms of data, such as time series and audio data. In the context of the task measured in this research, photographs will be the necessary data needed to detect lung disease. Chest X-rays, CT scans, sputum smear microscopy, and histopathology images are all examples of images that could be employed. This stage generates photographs that will subsequently be used to train the model. The next stage is pre-processing. The image could be updated or edited here to improve image quality. Disparity Limited Adaptive Histogram Equalization (CLAHE) could be utilized to improve the dissimilarity of the photographs [12]. Image control techniques like lung segmentation [13] and bone elimination [14] could be used to rummage for the region of interest (ROI), which could then be used to detect lung pathology. Edge detection [15] could be used to create a new data format. To expand the quantity of data accessible, data augmentation could be performed on the photographs. Feature extraction might also be employed to assist the deep learning model in uncovering relevant features that would aid in the identification of a specific object or class. This procedure results in a series of images with increased image quality and the removal of unwanted items. Images that have been upgraded or edited will be used in training because of this stage.
44
Y. N. Afreen and P. V. Bhaskar Reddy
In the third step, training, three things could be addressed. The use of transfer learning, the employment of an ensemble, and the selection of a deep learning algorithm are all important considerations. Deep learning approaches include the deep belief network (DBN), multilayer perceptron neural network (MPNN), recurrent neural network (RNN), and CNN. Algorithms can learn in a number of different ways. Certain algorithms are better suited to specific data formats. CNN is a visual storytelling master. Transfer learning and ensemble learning are two ways for minimizing training time, improving classification accuracy, and reducing overfitting [16, 17]. The competent prototype will power predict which class an image will be in the right place in the fourth and final stage, categorization. For example, if a model has been taught to distinguish between healthy lungs combined with tuberculosisinfected lungs in X-ray images, it accurately categorizes different images (images that the model has never seen before) as healthy lungs or tuberculosis-infected lungs. The image will be assigned a likelihood score by the model. The probability score indicates how prospective an image is to belong to a specific category [18, 19]. The image will be categorized at the end of this phase based on the likelihood score assigned by the model. This unit presents a catalog of current exertion on lung disease detection making use of deep learning, which happens in the paper’s initial involvement. The taxonomic system is designed to summarize and clarify the important themes and areas of concentration in the previous work. A total of seven attributes were chosen to be included in the grouping. These characteristics were preferred because they existed in common and could be found in all the articles studied [20]. Ensemble classification is the process of combining many classifiers to create a prediction. Ensemble reduces prediction variance, resulting in forecasts that are more accurate than any one model. The ensemble approaches utilized, according to the literature, are popular polling, probability score averaging, and stacking. In this section, COVID-19 has thirteen datasets that are publicly available. As a result of the COVID-19 outbreak, several datasets have been made public. Several of these collections are still increasing in terms of the number of photos they contain. As a result, the number of photos in the datasets may differ from the number mentioned in this paper. It’s worth noticing that some of the images could be found in many databases. As a result, researchers should check for duplicate photos in the future. In Fig. 1, An X-ray is a diagnostic and therapy tool that helps professionals diagnose and treat medical problems. A chest X-ray produces images of the blood vessels, lungs, airways, heart, spine, and chest bones, and is the most common medical X-ray technique. Previously, medical X-ray images were exposed to photographic films, which had to be processed before being examined. This difficulty is solved with digital X-rays. Figure 1 shows a collection of chest X-rays from several datasets with various lung illnesses [21]. A CT scan is a type of radiography that uses computer processing to create sectional images at various depth planes from images taken from different angles around the patient’s body in Fig. 2. The image slices can be stacked or presented separately to reveal the tissues, organs, bones, and any anomalies in a three-dimensional
4 A Comparative Study Based on Lung Cancer with Deep Learning …
45
Fig. 1 Chest X-rays datasets-case-I [21]
perspective of the patient. CT scan images show more detail than X-rays [22]. Figure 2 depicts a collection of CT scan pictures taken from several datasets. In various investigations, CT scans were utilized to diagnose lung disease, including tuberculosis diagnosis, lung cancer detection, and COVID-19 detection. It summarizes the works surveyed using the taxonomy. Users can quickly find items that are relevant to their interests because of this. The allocation of works established on the taxonomy’s recognized attributes is examined in the next section.
3 Issues and Future Direction This subsection, which contains the paper’s last contributions, discusses the ongoing issues and upcoming instructions for lung disease diagnosis using deep learning. The research evaluated shows that the field of state-of-the-art lung disease detection is plagued with issues. Some of the next projects are aimed at resolving the issues that have been uncovered [23]. This section discusses the issues of lung disease diagnosis
46
Y. N. Afreen and P. V. Bhaskar Reddy
Fig. 2 Chest X-rays datasets-case-I [22]
using deep learning that has been discovered in the literature. There were four major issues that were identified: When using ensemble approaches, there are several issues to consider: I data imbalance; (ii) handling of high image sizes; (iii) limited available datasets; and (iv) considerable error correlation. This segment discusses future research that could be done to develop the execution of deep learning-based lung disease detection. Datasets available-Some studies used data from private hospitals. To gather larger datasets, activities such as deidentification of personal patient information might be undertaken to make the data available to the public. Usage of cloud computing-The use of cloud computing for training may be able to solve the difficulty of dealing with large image sizes. Many other aspects, such as quadtree and picture histogram, have yet to be investigated. Feature engineering allows more information to be extracted from existing data. In terms of new features, fresh data is extracted [24]. Limitation of the Survey: The survey has a limitation because the majority of the work reviewed came from people who were indexed in the Scopus database, as explained in Sect. 2. COVID19-related publications were given an exception because the majority of the articles were still in the preprint stage when this survey was done. In terms of publication years, the most recent studies analyzed were those published before October 2020. As a result, the results presented in this survey report do not include contributions from works that are not Scopus indexed but will be published in October 2020 or later [25].
4 A Comparative Study Based on Lung Cancer with Deep Learning …
47
4 Conclusions In this section, A more current lung disease detection categorization was established with the help of training depth based on research on the works evaluated to encapsulate and ensure the organization of the essential models and motivation of active work in lung disease detection. Recent research on this issue also includes trend analyses based on the taxonomy’s defined properties. The employment of CNN and transfer training is substantial, according to a review of the distribution of works. Except for the joint characteristic, all the characteristics found in the taxonomic system have had an average linear rise across the years. Finally, study into how in-depth training has been utilized to detect lung disease is necessary to guarantee that future research is directed in the proper direction, resulting in improved disease detection system performance. Other scholars might use the taxonomy to design their own inquiries and activities. The proposed future path could increase the number of applications for in-depth lung disease detection while also improving efficiency. Existing studies to detect COVID-19 affect lung cancer using machine learning and deep learning algorithms with 2D and 3D X-ray images of the lungs, using the necessary algorithms addressed with less likely results. This is a challenging situation for researchers, scientists, and healthcare professionals. Researchers are constantly working to find possible solutions to deal with this pandemic in their respective fields. In our system, we offer distributed Hadoop algorithms for in-depth training. In this system, we need to train each data set individually to make the prediction and test the effectiveness of the solution. We must use CT-based images-IMAGES Based on the CT image of each person to achieve maximum precision. Other concerns, as well as the future direction of lung disease screening with in-depth training, were identified and explained subsequently. In-depth training revealed four issues with lung illness detection: data imbalance, extensive image processing, restricted data sets, and significant error correlation when employing ensemble techniques. Provide data sets to the community, utilize cloud computing, use additional features, and use the set are four prospective lung disease screening tasks that use in-depth training to address detected concerns. The proposed methods will significantly impact the country as a warning to society. Therefore, the existence of such a hypothesis for innovative research will significantly reduce the incidence of lung cancer deaths affected by COVID-19 and will be considered for prior public advice.
References 1. Rahaman MM, Li C, Yao Y, Kulwa F, Rahman MA, Wang Q, Qi S, Kong F, Zhu X, Zhao X (2020) Identification of COVID-19 samples from chest X-ray images using deep learning: a comparison of transfer learning approaches. J X-Ray Sci Technol 28:821–839 2. Ma J, Song Y, Tian X, Hua Y, Zhang R, Wu J (2019) Survey on deep learning for pulmonary medical imaging. Front Med 14:450–469
48
Y. N. Afreen and P. V. Bhaskar Reddy
3. Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L, Tse D, Etemadi M, Ye W, Corrado G et al (2019) End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med 25:954–961 4. Gordienko Y, Gang P, Hui J, Zeng W, Kochura Y, Alienin O, Rokovyi O, Stirenko S (2019) Deep learning with lung segmentation and bone shadow exclusion techniques for chest X-ray analysis of lung cancer. Adv Intell Syst Comput 638–647 5. Kieu STH, Hijazi MHA, Bade A, Yaakob R, Jeffree S (2019) Ensemble deep learning for tuberculosis detection using chest X-ray and canny edge detected images. IAES Int J Artif Intell 8:429–435 6. Ayan E, Ünver HM (2019) Diagnosis of pneumonia from chest X-ray images using deep learning. Sci Meet Electr Electron Biomed Eng Comput Sci 1–5 7. Salman FM, Abu-naser SS, Alajrami E, Abu-nasser BS, Ashqar BAM (2020) COVID-19 detection using artificial intelligence. Int J Acad Eng Res 4:18–25 8. Gao XW, James-Reynolds C, Currie E (2019) Analysis of tuberculosis severity levels from CT pulmonary images based on enhanced residual deep learning architecture. Neurocomputing 392:233–244 9. Gozes O, Frid M, Greenspan H, Patrick D (2020) Rapid AI development cycle for the coronavirus (COVID-19) pandemic: initial results for automated detection & patient monitoring using deep learning CT image analysis article. arXiv. arXiv:2003.05037 10. Mithra KS, Emmanuel WRS (2019) Automated identification of mycobacterium bacillus from sputum images for tuberculosis diagnosis. Signal Image Video Process 11. Samuel RDJ, Kanna BR (2019) Tuberculosis (TB) detection system using deep neural networks. Neural Comput Appl 31:1533–1545 12. O’Mahony N, Campbell S, Carvalho A, Harapanahalli S, Hernandez GV, Krpalkova L, Riordan D, Walsh J (2020) Deep learning vs. traditional computer vision. Adv Intell Syst Comput 128–144 13. Mikołajczyk A, Grochowski M (2018) Data augmentation for improving deep learning in image classification problem. In: Proceedings of the 2018 international interdisciplinary PhD workshop, Swinoujscie, Poland, 9–12 May 2018, pp 117–122 14. Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data 6 15. Ker J, Wang L (2018) Deep learning applications in medical image analysis. IEEE Access 6:9375–9389 16. Wang C, Chen D, Hao L, Liu X, Zeng Y, Chen J, Zhang G (2019) Pulmonary image classification based on inception-v3 transfer learning model. IEEE Access 7:146533–146541 17. Kabari LG, Onwuka U (2019) Comparison of bagging and voting ensemble machine learning algorithm as a classifier. Int J Adv Res Comput Sci Softw Eng 9:1–6 18. Chouhan V, Singh SK, Khamparia A, Gupta D, Albuquerque VHCD (2020) A novel transfer learning based approach for pneumonia detection in chest X-ray images. Appl Sci 10:559 19. Heo SJ, Kim Y, Yun S, Lim SS, Kim J, Nam CM, Park EC, Jung I, Yoon JH (2019) Deep learning algorithms with demographic information help to detect tuberculosis in chest radiographs in annual workers’ health examination data. Int J Environ Res Public Health 16:250 20. Pasa F, Golkov V, Pfeiffer F, Cremers D, Pfeiffer D (2019) Efficient deep network architectures for fast chest X-ray tuberculosis screening and visualization. Sci Rep 9:2–10 21. Liu J, Liu Y, Wang C, Li A, Meng B (2018) An original neural network for pulmonary tuberculosis diagnosis in radiographs. In: Lecture notes in computer science. Proceedings of the international conference on artificial neural networks, Rhodes, Greece, 4–7 Oct 2018. Springer, Berlin/Heidelberg, Germany, pp 158–166 22. Stirenko S, Kochura Y, Alienin O (2018) Chest X-ray analysis of tuberculosis by deep learning with segmentation and augmentation. In: Proceedings of the 2018 IEEE 38th international conference on electronics and nanotechnology (ELNANO), Kiev, Ukraine, 24–26 Apr 2018, pp 422–428 23. Andika LA, Pratiwi H, Sulistijowati Handajani S (2020) Convolutional neural network modeling for classification of pulmonary tuberculosis disease. J Phys Conf Ser 1490
4 A Comparative Study Based on Lung Cancer with Deep Learning …
49
24. Ul Abideen Z, Ghafoor M, Munir K, Saqib M, Ullah A, Zia T, Tariq SA, Ahmed G, Zahra A (2020) Uncertainty assisted robust tuberculosis identification with bayesian convolutional neural networks. IEEE Access 8:22812–22825 [PubMed] 25. Hwang EJ, Park S, Jin KN, Kim JI, Choi SY, Lee JH, Goo JM, Aum J, Yim JJ, Park CM (2019) Development and validation of a deep learning—based automatic detection algorithm for active pulmonary tuberculosis on chest radiographs. Clin Infect Dis 69:739–747
Chapter 5
Low-Cost Data Acquisition System for Electric Vehicles Vinay Gupta, Toushit Lohani, and Karan Singh Shekhawat
1 Introduction The discharge of greenhouse gasses such as carbon dioxide and chlorofluorocarbons are undesirable results typically connected with the burning of petroleum products for energy requirements. The severity of ecological change as a result of greenhouse gas emissions has reached critical levels, as evidenced by present global warming and the disappearance of big icebergs. The deteriorating environmental change implications are expected to prompt immediate defensive efforts and climate strategies. The International Energy Agency (IEA) has described the situation for the future energy system as requiring a two-degree Celsius reduction by 2050 [1]. If no steps are done to address the current situation, greenhouse gas production will almost certainly double by 2050 [2]. The transportation industry accounts for about 28% of overall CO2 emissions, with road transport accounting for more than 70% of total emissions [3]. Various initiatives are being accepted to reduce emissions from the transportation sector. The main aim is to produce new fuels and introduce clean revolution features for electric vehicles, with the goal of reducing greenhouse gas emissions while also improving vehicle performance. Electrification of transportation is an excellent practice with numerous advantages. Electric vehicles (EVs) have the potential to improve energy security by diversifying fuel sources, encourage financial development by launching new sophisticated ventures, and, most importantly, protect the environment by reducing greenhouse gas emissions [4]. EVs show a preferred exhibition over internal combustion vehicles (ICE) because of the utilization of more effective power trains and electric motors [5]. The development of electric vehicles is in its very initial stage in India. That’s why we need the parameters of motor and battery to improve the performance of electric vehicles. Consumers in a competitive market are looking for EVs with greater and longer ranges, often around 400 km on a single V. Gupta (B) · T. Lohani · K. S. Shekhawat Department of Electrical Engineering, Manipal University Jaipur, Jaipur 303007, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_5
51
52
V. Gupta et al.
charge [6]. This technique helps to increase the range of electric vehicles while also maintaining the safety of crucial components like batteries and electric motors. The battery pack is the most expensive component in any electric vehicle. Battery management, battery health, and battery lifespan forecasts are all critical aspects of extending battery life. Recent breakthroughs in Big Data analytics and how they may be utilized to assess battery health utilizing data are discussed by Li et al. [7]. They classify them according to their viability and cost-effectiveness, as well as their advantages and disadvantages. Liu et al. [8] go even further, presenting a machine learning-enabled technique for predicting lithium-ion battery aging based on Gaussian process regression (GPR). Finally, because battery flaws may impair performance, additional approaches investigate improved fault identification techniques [9]. Data collection throughout the runtime of electric vehicles is not possible due to the high current ratings of the BLDC motors driven by rechargeable batteries. Measurement and instrumentation for such high current rating motors are prohibited to prevent becoming invasive in the wake of automation and to ensure human safety [10]. The availability of data will be crucial for supervisory control, traffic navigation, and vehicle wellness monitoring. Data gathering is central to the development and examination of electric vehicles. The ability to manage, improve, and supervise the position, steering, and health of vehicles are crucial when looking to limit long-standing practical expenses and increase productivity in the transportation industry [11]. Since every electric vehicle contains a few battery packs, each has separate estimations of cell voltages and temperatures. Acquiring such estimations physically would require intrusive, huge, and complicated circuits. The availability of data acquisition for high current is lower [12, 13]. This paper represents the design and performance analysis of a data acquisition system for monitoring the performance of electric vehicles.
2 Block Diagram of Electric Vehicle The three main systems of electric vehicles are electric propulsion system, energy storage system, and auxiliary system. Figure 1 shows the block diagram of an electric vehicle. The electric propulsion system involves an electric motor, a power electronic converter, a vehicle controller, a mechanical transmission, and driving wheels. BLDC, induction motor, direct current motor, synchronous motor, and switched reluctance motor can be used. Generally, brushless DC motors are used in electric vehicles. The power converters used in electric vehicles are DC/ AC converters, DC/DC converters, and AC/DC converters. The converters should match the required voltage, current rating, and switching frequency. The energy source system comprises the energy storage, the energy management system, and the battery charging unit. Electric vehicles need on-board energy storing mechanisms that store energy in a manner that is easily transformed into electrical energy in an economical and effective way. Batteries are currently the most approved energy storage devices. Batteries
5 Low-Cost Data Acquisition System for Electric Vehicles
Vehicle Controller
Energy Management Unit
Electronic Power Converter
Energy Source
Energy Refuelling Unit
Electric Motor
Auxiliary Power Supply
53
Mechanical Transmission
Power Steering Unit
Inner Climate Control Unit
Fig. 1 Block diagram of electric vehicle
may be lead acid type or lithium-ion type. Lithium-ion batteries are the more suitable choice for electric vehicles. The energy management system also delivers signals to activate relays for matching cells and offers cut-off security in case of severe failure. In lithium-ion batteries, thermal observing is crucial because of the restricted operating thermal range. The battery pack is vulnerable to high temperatures and must be kept at temperatures between 0 and 45 °C, depending on the configuration of the battery [14]. Temperatures of nearly 25 °C are perfect. Efficient cooling and heating techniques are required. The auxiliary system incorporates the hotel’s air control unit, the power control unit, and the associated supply unit. In view of the control contributions from the accelerator and brake pedals, the vehicle regulator gives suitable control signals to the power converter, which has the capacity to facilitate the power stream between the electric motor and fuel source. The retrogressive force stream is a direct outcome of the electric vehicle’s regenerative braking, and the energy retrieved can be returned to the fuel source if the fuel source is responsive. The energy management unit coordinates with the vehicle regulator to control the regenerative braking system and its energy recovery. It is like working with the energy refueling unit to control the refueling unit and observe the ease of use of the fuel source. The auxiliary power supply provides the necessary power at various voltage levels for all EV auxiliaries, particularly the housing climate control and force controlling units [15].
54
V. Gupta et al.
3 Design and Authentication of Data Acquisition System This DAQ is the ideal approach to continuously analyzing the electric vehicle with less exertion and labor, bringing about a productive result. Figure 2 shows the connection diagram of the developed data acquisition system. The proposed system comprises a current sensor, a temperature sensor, a voltage divider, and an Arduino UNO to obtain information on the current of the BLDC motor, battery voltage, and battery temperature at various motor speeds. The suggested data acquisition system depends on the Arduino IDE utilizing the Arduino UNO card, which will store the data in an excel sheet, assisting with making the quality diagrams effectively and precisely with programmed logging and postponing at an appropriate time interval. Specifications of all required components are mentioned below with their rating: i.
Current Sensor • Make—LANTIAN RC • Current rating—100 A • Pin power supply—5 V
ii. Temperature and Humidity Sensor DHT11 • • • •
Pin supply Voltage: 5 V Supply Current: 2.5 mA max Operating range for humidity: 20–80% with 5% accuracy Operating range for temperature: 0–50 °C with ±2 °C accuracy
iii. Arduino UNO • • • •
Microcontroller: ATmega328. Supply Voltage: 5 V. DC Current: 40 mA. Digital I/O Pins: 14
Micro SD Card
Arduino UNO
Current Sensor Battery Pack
Voltage Divider Brushless DC Motor
Fig. 2 Block diagram of proposed DAQ system
5 Low-Cost Data Acquisition System for Electric Vehicles
55
Table 1 Description of main components of electric vehicle Description/Components
Type
Power
Voltage (V)
Current (A)
Speed
No
DC motor
Brushless
1000 W
48
20
3000 RPM
1
Battery
Lead acid
–
12
80
–
4
Controller
–
–
48
40
–
1
• Analog Pins: 6 We connected the developed data acquisition system to a brushless DC motor so that various parameters with different speeds were observed and stored in an excel sheet [16]. The developed data logger can measure and record the DC voltage and current up to 55 V and 100 A, respectively. To verify the precision of the suggested data acquisition system, we performed a small test on a 1 KW, 48 V brushless DC motor. The specifications of the brushless DC motor and associated components are given in Table 1. Figure 3 shows the experimental setup of the suggested data acquisition system. This setup consists of one BLDC motor and four lead acid batteries, which are connected in series to fulfill the required 48 V.
Fig. 3 Experimental setup of proposed system
56
V. Gupta et al.
Table 2 Examination between developed data logger and standard multimeter Speed (RPM)
Measured by developed system
Measured by standard multimeter
Error (%)
Motor current (A)
Motor current (A)
Battery voltage (V)
Battery temp. (°C)
Battery voltage (V)
Battery temp. (°C)
In current
In voltage
192
2.21
48.13
24
2.26
48.1
24
2.2
0.06
298
2.71
48.11
24
2. 82
47.9
24
3.9
0.04
376
3.62
48.05
24
3.74
47.8
24
3.2
0.5
475
4.41
47.96
24
4.51
47.6
24
2.2
0.75
505
4.72
47.65
24
4.8
47.5
24
1.6
0.31
612
5.08
47.51
24
5.11
47.3
24
0.5
0.44
612
5.38
47.38
24
5.55
47.2
24
3
0.38
805
6.12
47.27
24
6.40
47.1
24
4.3
0.36
933
6.85
47.15
25
7.15
47.0
25
4.1
0.31
1120
7.34
47.05
25
7.58
46.9
25
3.1
0.31
1475
9.04
46.78
25
9.32
46.7
25
3
0.17
1680
10.75
46.62
25
11.02
46.5
25
2.4
0.25
1865
11.14
46.53
25
11.36
46.4
25
1.9
0.28
1940
12.05
46.41
25
12.15
46.3
25
0.9
0.23
2085
13.45
46.28
25
13.58
46.1
25
0.9
0.39
2198
14.51
46.11
25
14.69
45.9
25
1.2
0.45
2315
16.12
45.92
25
16.28
45.7
25
0.9
0.48
2480
17.24
45.73
25
17.41
45.5
25
0.9
0.50
The suggested data acquisition system offers constant information recording of current of the motor, battery’s voltage, and battery’s temperature. We utilized a standard measurement device to record the various peruses of current of the motor, battery’s voltage, and battery’s temperature at various speeds. This procedure assisted us in determining the accuracy of the suggested data logger. Readings from the suggested data logger and the standard estimating instrument are almost identical. The information provided can be saved in an excel sheet. Test results show that the suggested data logger is reliable and accurate. Table 2 shows the examination between the proposed data obtaining system and the standard estimating instrument. The comparison of the suggested data logger with recent study is shown in Table 3.
4 Conclusion Climate security and energy emergencies have encouraged the development of electric vehicles. Information accumulation is a focal point in quite a bit of the present
5 Low-Cost Data Acquisition System for Electric Vehicles Table 3 Comparison of the proposed DAQ system with recent study
57
Author
Year
DC voltage range (V)
DC current range (A)
Pachauri et al. [16]
2021
0–120
0–20
Proposed system
2022
0–150
0–100
electric vehicular research. A cost-effective, precise, and affordable data collection system for electric vehicle monitoring is required for researchers and academia. Existing data acquisition systems are costly and only suitable for low current applications. Generally, a high current rated brushless DC motor is used in electric vehicles. A reliable high current data acquisition system is required for the uninterrupted development of the electric vehicle powertrain, charging methods, and battery. The proposed data acquisition continuously recodes the required information for academic and industrial applications, e.g., research in electric vehicle driving patterns, current and voltage of electric vehicles, and battery charging and discharging patterns. The result shows that the developed data acquisition system is accurate and affordable.
References 1. International Energy Agency [Internet] Energy technology perspectives 2012: the wider benefits of the 2 1C scenario 2. Chan M. CBT online [Internet]. E V charging station launched at Bangsar Shopping Centre 3. European Commission (2011) Transport in figures’—statistical pocketbook. https://ec.europa. eu/transport/facts-fundings/statistics/pocketbook-2011_en/. Accessed 21 Feb 2021 4. Darabi Z, Ferdowsi M (2011) Aggregated impact of plug-in hybrid electric vehicles on electricity demand profile. IEEE Trans Sustain Energy 2(4):501–508 5. Green Car Reports. Lithium-ion battery packs now 209 per kwh, will fall to 100 by 2025: Bloomberg analysis. https://www.greencarreports.com/news/1114245_lithium-ion-bat tery-packs-now-209-per-kwh-will-fall-to-100-by-2025-bloomberg-analysis. Accessed 18 Feb 2021 6. Yong JY, Ramachandara Murthy VK, Tan KM, Mithulananthan N (2015) A review on the stateof-the-art technologies of electric vehicle, its impacts and prospects. Renew Sustain Energy Rev 49:365–385 7. Li Y, Liu K, Foley AM, Zülke A, Berecibar M, Nanini-Maury E, Van Mierlo J, Hoster HE (2019) Data-driven health estimation and lifetime prediction of lithium-ion batteries: a review. Renew Sustain Energy Rev 113:109254 8. Liu K, Li Y, Hu X, Lucu M, Widanage WD (2020) Gaussian process regression with automatic relevance determination Kernel for calendar aging prediction of lithium-ion batteries. IEEE Trans Ind Inform 2020(16):3767–3777 9. Hu X, Zhang K, Liu K, Lin X, Dey S, Onori S (2020) Advanced fault diagnosis for lithium-ion battery systems: a review of fault mechanisms, fault features, and diagnosis procedures. IEEE Ind Electron Mag 2020(14):65–91 10. Kothandabhany SKM (2011) Electric vehicle roadmap for Malaysia: proceedings of the sustainable mobility: 1st Malaysian-German sustainable automotive mobility conference, Oct 18
58
V. Gupta et al.
11. Chan M (2013) CBT online [Internet]. E V charging station launched at Bangsar Shopping Centre 12. Wong D (2014) CBT online [Internet]. Electric cars and buses available to the public next year 13. Benghanem M (2009) Measurement of meteorological data based on wireless data acquisition system monitoring. Appl Energy 86:2651–2660 14. Vinay G, Nishant S, Deepesh M, Himanshu P (2020) IOT enabled data acquisition system for electric vehicle. AIP Publishing 15. Bhatti AR, Salam Z, Aziz MJBA, Yee KP, Ashique RH (2016) Electric vehicles charging using photovoltaic: status and technological review. Renew Sustain Energy Rev 54:34–47 16. Pachauri RK, Mahela OP, Khan B, Kumar A, Agarwal S, Alhelou HH, Bai J (2021) Development of Arduino assisted data acquisition system for solar photovoltaic array characterization under partial shading conditions. Comput Electr Eng 92:107175
Chapter 6
Machine Learning Based Robotic-Assisted Upper Limb Rehabilitation Therapies: A Review Shymala Gowri Selvaganapathy , N. Hema Priya, P. D. Rathika, and M. Mohana Lakshmi
1 Introduction Stroke is among the leading causes of deaths and impairment in adults worldwide, with the prevalence of stroke rising every decade after the age of 55. One of the most typical outcomes of a stroke is the loss or impairment of physical functioning. It can result in neurological and mobility impairments including loss of balance, strength, and capabilities, diminished fractionated motion capacity, abnormal muscle development, and paralysis, among several other things. Physical rehabilitation workout evaluation is a critical step in determining the best therapeutic diagnosis for a patient who is suffering from musculoskeletal and neurological illnesses such as stroke. However, this process relies on the therapist’s experience and is executed infrequently. Researchers have looked into the prospect of using sensor and machine learning technology to create computer-assisted decision support systems for assisting rehabilitation. As a result, existing health care services are structured so that a preliminary level of rehabilitation services is completed in an actual hospital under the close supervision of a healthcare professional, followed by the second level in an outpatient setting, in which patients complete a set of recommended exercises at home. Robotic stroke rehabilitation services have the potential to enhance stroke patients’ upper limb functional recovery. Although machines really aren’t meant to take the place of clinicians, they might be a cost-effective supplement to treatment support S. G. Selvaganapathy (B) · N. Hema Priya · M. Mohana Lakshmi Department of Information Technology, PSG College of Technology, Coimbatore, India e-mail: [email protected] N. Hema Priya e-mail: [email protected] P. D. Rathika Department of Robotics & Automation Engineering, PSG College of Technology, Coimbatore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_6
59
60
S. G. Selvaganapathy et al.
and can shorten therapeutic time. For instance, instead of supervising one patient at a time, a single clinician might monitor the exercise program of numerous people at the same time using robotic rehabilitation gear. This would have the potential to dramatically improve the efficacy of treatment. Robots can be used to observe the intrinsic patterns of movement and estimate human motivations when combined with machine learning techniques. Intelligent approaches can learn from previous control processes and adapt to a changing environment model more effectively. In addition, using benchmark verification and pattern recognition, the evaluation process can become more scientific. This has necessitated the development of an assistive robotic tool for stroke rehabilitation patients to guarantee that they do their recommended exercises correctly. By utilizing the targeted joints and muscles, rather than compensating with other stronger muscles. The assistive robotic tool has to enable stroke rehabilitation to finish their therapies without the continuous surveillance of a physiotherapist. Compensatory upper-limb movements should be detected by the assistive robotics. Patients can then be given feedback through the software package to help them adjust their postures. Figure 1 shows the block diagram for machine learning based Assistive Robotics for upper limb rehabilitation. This survey article provides a detailed review on the recent studies in robotic stroke rehabilitation services available and could serve as a stepping stone for future
Fig. 1 Machine learning based assistive robots for upper limb rehabilitation [1]
6 Machine Learning Based Robotic-Assisted Upper Limb Rehabilitation …
61
research on this topic. The rest of the article can be outlined as: Sect. 2 provides preliminaries on available rehabilitation treatments. Section 3 discusses rehabilitation robots currently deployed for assisting patients and Sect. 4 discusses on Conclusion and future work.
2 Background on Rehabilitation Treatments Once a limb’s motor control has been damaged, a therapist must determine the best treatment option for the affected arm. Selecting the appropriate treatment process is a critical decision that has a direct impact on the treatment’s efficacy. Rehabilitation has seen a significant increase in the use of robot intervention. Robots can offer repetitive behaviors for limbs with disabilities.
2.1 Passive Therapy Passive therapy involves no work from the individual and is typically used in the early phases of post-stroke complaints when the damaged limb is not responding. Patients with hemiplegia who have one-sided paralysis are frequently administered passive treatment. During the session, the injured limb is moved in a precise trajectory for repetitions, which is commonly done by a rehabilitation robot as shown in Fig. 2 adapted from [2]. The movement’s path is carefully orchestrated to prevent possible harm to the patient. Fig. 2 Passive therapy adapted from [2]
62
S. G. Selvaganapathy et al.
Fig. 3 Active therapy adapted from [2]
2.2 Active Therapy This form of therapy is recommended for individuals who can use their damaged limb to a certain extent. The active phrase denotes the capacity to move the damaged limb, yet inefficiently. An external force is applied by a therapist or a robot to aid the patient in completing the assigned task in active-assistive therapy. It is utilized to help with a range of motion issues. A patient with a damaged back of the upper arm was given active-assistive treatment, in which the patient was directed to attain a certain target, and the attached robot assisted the patient in completing the job as shown in Fig. 3 adapted from [2].
2.3 Bilateral Therapy The duplicating idea in rehabilitation is referred to as bilateral treatment. The patient has complete control of the affected limb when the impaired limb mimics the movement of the functional limb as shown in Fig. 4 adapted from [2].
3 State-of-Art in Rehabilitation Robots The following sections examine multiple recent studies utilizing assistive robotics for rehabilitation support. Mirror Image Movement Enabler (MIME) [3] is CPM automated equipment that comprises a wheelchair and a height-adjustable seat on which the user seats and places his or her impaired limbs. The injured limb is placed into a forearm cast, which
6 Machine Learning Based Robotic-Assisted Upper Limb Rehabilitation …
63
Fig. 4 Bilateral therapy adapted from [2]
inhibits wrist and hand movements. The operator is a Puma 560 (Programmable Universal Machine for Assembly) robot, which is linked to prosthesis and applied pressure to the wounded area throughout its motions. Puma 560 robot has six Degrees Of Freedom (DOF), allowing for a wide range of motion postures in three dimensions. The three therapy modalities provided by MIME are passive, active-assistive/activeresistive and bilateral. ADAPT [4] stands for ADaptable and Automatic Presentation of Tasks, and it is intended to aid in the rehabilitation of upper-limb capabilities in stroke victims. ADAPT trains patients in a range of genuine cognitive activities that demand grasping and handling in an extensive, dynamic and adaptive manner. The mechanics of the functional tasks are simulated by a general-purpose robot, which then displays these operational challenges to the patient. ADAPT can automatically adjust between tools that relate to the operational duties due to a revolutionary resource mechanism. Despite the significance of rehabilitation evaluation in improving outcomes for patients and lowering medical costs, current methodologies are limited in their variety, sturdiness, and practicality. A deep learning-based methodology is used in [5] for assessing the quality of health rehabilitation process like metrics for assessing movement performance, grading algorithms for translating motion excellence performance measures into numerical value and Neural Network based architectures for producing scores of incoming motions. The suggested evaluation method encapsulates the low-dimensional modeling process obtained using a deep autoencoder network and is built on the log-likelihood of a Gaussian mixture prototype. Relative effectiveness of treatments must be understood in order to identify a successful rehabilitation therapy. However, there is currently no complete summary of observational studies in this field. A Cochrane analysis by combining systematic literature review of therapies for improving upper limb function following a stroke is done in [6]. Cochrane and non-Cochrane evaluations of RCTs contrasting
64
S. G. Selvaganapathy et al.
upper limb procedures with really no treatment, standard care, or potential therapies in chronic stroke patients are considered. Upper limb mobility was the significant component of concern; additional outcomes comprised movement disability and activity of everyday living competence. ARMin [7] is an upper limb rehabilitative automaton with seven DOF, enabling three-dimensional shoulder movements, forearm flexor muscles, forearm assisted method, and wrist lateral bending, as well as enabling hand closing and opening. The sitting person links his or her hands to the mechanical arm and modifies its lengths to its maximum using ARMin, which comprises a seat and mechanical arm. ARMin features three degrees of safety, including several monitors that work together as a system to determine any malfunctions that may occur. The robot-assisted treatment in [8] is designed for people who have upperextremity nerve damage as a result of a stroke or other regional or peripheral nervous system diseases. The developed robot-assisted treatment initiates elbow stretching, as well as forearm subluxation and pronation. For upper limb recovery, the robotic system employed Kinect’s skeleton mapping. A wireless access point is used to communicate position control. The system’s maximal stability was established using a Lyapunov-based technique. The designed system is a two DOF prosthesis which can be utilized for assessment, physical therapy, and outcome assessment. Internet-of-Things, computer vision, and intelligence system capabilities are used in [9] to create smartphone-based information systems that really can assist stroke patients with upper-limb recovery. Users’ learning activity data can be gathered using the smart phone’s installed multi-modal sensor, and then transferred to a server over the Internet. A DTW-KNN combined method is presented in this study to recognize the correctness of therapy actions and categorize them into several training completion categories. An experimental platform with an adaptable upper-limb exoskeleton given low tension controlled elastic transducers is shown in [10]. This enables the testing of new algorithms and equipment for further unsupervised treatment of moderate to severe brain impairments. To best simulate the responsive and precise haptic interaction of doctors, the design is tuned to accomplish a high range of motion (ROM) and robust interaction force management. The given robot includes postures near to the chest, head, and behind the back, as well as the essential ROM required for activities of daily life (ADL). The kinematics has been designed for excellent manipulability and low friction during ADL. MERLIN [11] method has the goal of bringing neuro-rehabilitation to the houses of post-stroke sufferers, with the intention of providing regular, comprehensive, motivational, and patient-tailored remediation under the indirect observation of a therapist. The system consists of ArmAssist (AA), a cost-effective autonomous system that is based on realistic games created by TECNALIA, as well as the Antari Home Care system, which was created by GMV to electronically monitor, manage, and adapt the patients’ regular training. The AA system is a flexible technology that incorporates low-cost, transportable mechanical equipment for full upper-limb treatment, as well as a software system that supports serious games to inspire users and analyze their progress.
6 Machine Learning Based Robotic-Assisted Upper Limb Rehabilitation …
65
Home-based rehabilitation (HBR) [12] system was developed with the capability to recognize and document the quantity and timing of physical rehabilitation performed by the user using a wearable device and mobile phone software with a machine learning (ML) technique. Along with evaluating the efficacy of the HBR system through a potential comparative analysis with post-stroke surviving victims. Wearable sensors provide for real-time assessment of upper limb (UL) functioning [13]. To track the post-stroke trajectory of UL effectiveness and its interactions with the other parameters. At 2, 4, 6, 8, 12, 16, 20, and 24 weeks during their first incident, individuals (n = 67) with UL nerve entrapment were assessed. UL impairments (Fugl-Meyer), aptitude for movement (Action Research Arm Test), and activity performance in everyday life, as well as other possible modifying factors, were all investigated. Individual shift trajectories for every data level were modeled, as well as the mediating effect on UL performance paths. Because of its intrinsic elasticity and high power-to-weight proportion, pneumatic artificial muscles (PAMs) have indeed been extensively used for the control of diagnostic instruments. However, because of the highly nonlinear and time-varying characteristics of PAMs, maintaining high-performance tracking control is challenging. To facilitate quick generalization ability, this study proposes a high-order pseudo-partial derivative-based model-free adaptable incremental learning controller (HOPPD-MFAILC) [14]. Throughout repetitions, the fluctuations of PAM are turned into a dynamic regularization model, and a high-order evaluation method is used to predict the pseudo-partial differential element of the regularization model using just the data and information from past versions. Serious games and virtual reality (VR) are increasingly being examined as substitutes to standard rehabilitation treatments. The potential of a framework that focuses on serious games as a source of amusement for physiotherapy patients is described in [15]. As a low-cost tracking gadget and the program Unity, a series of Open Source Serious Games for rehabilitation have been produced. Such Serious Games record 3D human body information and save it in a database for therapists to analyze afterwards. When transitioning from Industry 3.0 to Industry 4.0, smart robotics will be a key element. The capacity to learn is one of the most important characteristics of a smart system. In the context of smart assembly, this entails introducing learning capabilities into current fixed, repetitive, task-oriented industrial manipulators, thereby making them ‘smart.’ Authors provide two reinforcement learning (RL)-based remuneration approaches in this research [16]. With the goal of improving performance of the controller, the learnt correction signal, which adjusts for unmodeled abnormalities, is added to the current baseline input. On a 6-DoF industrial robotic manipulator arm, the suggested training algorithms are implemented to follow various types of standard pathways, such as rectangular or circular trajectories, or to trace a path on a three-dimensional environment. A neural-fuzzy adaptive controller (NFAC) [17] focused on the radial basis function network is designed for a therapeutic prosthesis to provide human arm mobility support in the current work. The rehabilitative automaton’s system prototype and electrical real-time control system, which has 7 actuated degrees of freedom and
66
S. G. Selvaganapathy et al.
can perform reasonable upper-limb actions, have been thoroughly studied. In order to aid crippled patients in undertaking routine daily rehabilitation procedures, the RBFN-NFAC based algorithm is utilized to achieve model predictive accuracy with parametric uncertainties and environmental interruptions. Since medical studies only indicate good motor gains in a small proportion of participants, the necessity for patient-specific therapies becomes clear. This finding lays the groundwork for “precision rehabilitation.” In this environment, tracking and making predictions that define the recovery trajectory is critical. Data obtained by sensing devices allows physicians to do so with minimal effort on the part of both clinicians and patients. The method [18] described in this research uses machine learningbased algorithms to predict clinical scores from mobile sensing data obtained during functional movement tasks. Sensor-based score estimates were found to be quite similar to clinician-generated scores. Rehab-Net [19] is a supervised neural framework for efficiently identifying three upper body motions of the human arm, including extensions, bending, and spin of the elbow, which may be used to track rehabilitation progress along the way. The conceptual methodology, Rehab-Net, is developed with a customizable convolutional neural network (CNN) model that uses two layers of CNN intermixed with max-pooling layer, followed by a fully connected layer that categorizes the three movements from tri-axial acceleration input data collected from the arm, and is individualized, light in weight, and low-complex. To tackle this problem, authors created an automation system based [1] on ubiquitous sensors and computer vision approaches that can determine the assessment score objectively. Authors identified two kinds of technological characteristics to accomplish this: one that can encapsulate rehabilitation data from both paralyzed and non-paralyzed sides while restricting high-level noises including such unimportant everyday routines. Motion capture has been used in a broad array of applications over the previous few years, including cinema special effects, controlling games and robotics, rehabilitation systems, simulations, and more. Markers [20], an organized setting, and highresolution cameras in a dedicated environment are used in current human motion capture approaches. The most challenging difficulty in a human motion capture system is estimating elbow angle due to fast mobility. Authors employ elbow angle detection as a major study topic in this work, and they offer a unique, markerless, and cost-effective system that utilizes an RGB lens to estimate elbow angle instantaneously using a part affinity pattern. In clinical practice, existing machines have to become more intelligent and trustworthy. Machine learning algorithms (MLAs) can learn from information and forecast future unexpected circumstances, which can help robot-assisted rehabilitation be more productive. Authors present a comprehensive overview of machine learningbased strategies [21] for robot-assisted limb rehab in this work. The present incarnation of upper rehabilitation robots is first discussed. Following that, the designs and applications of Machine learning algorithms for limb mobility intention interpretation, human–robot interface monitoring, and motor coordination evaluation are explained and analyzed.
6 Machine Learning Based Robotic-Assisted Upper Limb Rehabilitation …
67
Table 1 provides the comparison of existing robotic rehabilitation models for upper arms. The comparison is based on the problem addressed, working principle of the model, the machine learning algorithm used and the result obtained.
4 Conclusion and Future Work Upper limb rehabilitation robots and machine learning technologies have already shown significant improvement and efficiencies in therapeutic treatment in the last few decades. This paper investigates robot-assisted upper-limb training techniques and also how Machine learning algorithms can be used for movement intention detection, human–robot interactive control, and statistical muscle control evaluation in recovery. Robotic psychotherapy has progressively highlighted the importance of patient constructive participation, which can assist in enhancing the central nervous system and accelerate healing, by understanding patients’ objectives of reliance on machine learning to support the training of basic skills and establishing the patient’s enthusiasm to deliberately workout. This paper provides the review of state of art robotic rehabilitation model which can be served useful for future research in this domain. As a result of rapid technological advancements, self-driving vehicles have been successfully tested. The future advancement in technology will be merging machine learning with rehabilitation robots with improved feedback mechanisms that would help make thousands of lives better.
68
S. G. Selvaganapathy et al.
Table 1 Comparison of state of art robotic rehabilitation for upper arm S. No. Problem addressed
Working
Machine learning/deep learning used
Result
1
Examine impact of robot-assisted approach for upper-limb treatment for muscle control post-stroke
A robotic manipulator connected to prosthesis provides pressure to the injured limb while moving
–
After one month of therapy, robot group outperformed control group in the proximal mobility component of the Fugl-Meyer exam
2
Assisting rehabilitation of upper-limb capabilities in stroke patients
Robotic component, – ADAPT supports patients in a range of practical functional activities like stretching through an intensive, active, and adaptive manner
Effects of the functional activities are simulated by a general-purpose robot, which displays those challenges to the patients. Effectiveness of ADAPT for the treatment has not been validated with a healthy subject
3
Patient’s progress assessment in the assigned rehabilitation process referred to computer-aided evaluation of treatment and rehabilitation
New framework for evaluating home-based restoration for development of metrics for measuring movement performance
This method provides average absolute deviation per exercise of 0.02527
4
Cochrane evaluation by combining systematic literature review of therapies for upper limb function
Reviews of the – efficiency of therapies to enhance limb function post-stroke, both Cochrane and non-Cochrane
Provided specific suggestions for future study by combining all relevant review information
5
Better diagnostic framework for assessing limb movement deficits following a brain trauma
The ARM guide can – be used to assess a variety of neurological dysfunction, such as aberrant tone, loss of coordination, and slowness
Early findings with stroke patients show that such medication can yield quantitative effects
Neural network. Deep learning-based modeling for encryption between data acquired and scorecards
(continued)
6 Machine Learning Based Robotic-Assisted Upper Limb Rehabilitation …
69
Table 1 (continued) S. No. Problem addressed
Working
Machine learning/deep learning used
Result
6
The purpose is to bring automated electrical stimulation depending on muscle condition
Combines two types – of rehabilitation: movement therapy and electrotherapy. The idea is to combine the benefits of both methods
The newly created wearable low-cost robotic system has found to be very effective
7
Evaluating stroke patients’ upper limb motor control and properly monitoring their rehabilitation status
A desktop upper limb treatment robot was developed, and quantifiable assessment method for upper limb motor performance
Back propagation neural network (BPNN), K-nearest neighbors (KNN), and support vector regression (SVR) methods were used to create three alternative statistical assessment methods
Among the three models, the BPNN approach has the best evaluation performance scoring accuracy of 87.1%
8
Adaptable upper-limb prosthesis based on low-impedance torque controlled elastic motors
The requisite power and torque control capability is provided by customized modularity series elastic lifters
–
No other system gives comparable flexibility for a wide variety of tests in one equipment
9
Evaluate the usefulness of the MERLIN robotic system for upper limb therapy in persons
System consists of – ArmAssist, a system that is based on realistic games created by TECNALIA, and Antari Home Care system, to electronically monitor, manage, and adapt the patients’ regular training
The ArmAssist assessment yielded an average score of 6 out of 7
(continued)
70
S. G. Selvaganapathy et al.
Table 1 (continued) S. No. Problem addressed
Working
Machine learning/deep learning used
Result
10
Development of home rehabilitation system, with an off-the-shelf wristwatch and the convolution neural network (CNN)
Physiotherapists will see the time information of a patient’s home exercise and determine the best precise activity for patients
Convolution neural networks are used. The algorithm is used to determine the rehab exercise
The accuracy of motion detection was maximum in the accelerometer paired with gyroscope data (99.9)
11
Examinations looked for UL impairments, movement capability, and activity performance in regular living and other possible modifying factors
To trace the – post-stroke evolution of upper limb (UL) and its interactions with other elements Techniques. At different weeks during their first stroke, they were examined
A 3-parameter logistic approach suited better, reflecting the rapid development following stroke within the extended data collection process
12
A high-order pseudo-partial derivative-based model-free adaptive iterative learning controller (HOPPD-MFAILC) is provided
The fluctuations of – PAM are converted into dynamic regularization models, and a high-order method is used to predict the pseudo-partial derivatives element
HOPPD-MFAILC can track the intended path with increased convergence and with monitoring efficiency
13
Development of a platform based on game elements as a source of motivation for physiotherapy patients
Real Games record 3D human physical data and save it in a patient’s record and VR-based method for computerized motor control
–
Important therapeutic instrument for tele-rehabilitation in order to minimize their time spent in clinics, and their costs
14
The learnt correction output, which adjusts for nonlinear abnormalities, is included in nominal input
6-DOF commercial robotic manipulating limb to trace a route on a three-dimensional area to track several types of standard pathways
Suggested two control approaches, RL-based input compensation and RL-based benchmark compensating, related to Reinforcement Learning (RL)
When compared to PD, MPC, and ILC, the proposed RL-based techniques provide a significant performance boost
(continued)
6 Machine Learning Based Robotic-Assisted Upper Limb Rehabilitation …
71
Table 1 (continued) S. No. Problem addressed
Working
Machine learning/deep learning used
Result
15
A neural-fuzzy adaptive controller for a medical prosthesis that supports human muscle control using a radial basis function network
The therapeutic robots’ structural architecture and electronic complete control system, which has 7 operational dof
RBFN-based NFAC algorithm is presented to ensure path tracking precision
RBFN-based NFAC methodology would accomplish reduced motion tracking error and good frequency reaction characteristics
16
Machine learning-based system to determine clinical scores from wearable sensor data obtained by functional activities
Data obtained via wearable sensors enables doctors to do rehabilitation trajectory with minimal effort on their part and that of their patients
Uses linear regression and random foster classifier. They are used to estimate the score
The coefficients of determination for the upper-limb disability and movement quality were 0.86 and 0.79
17
“Rehab-Net,” a deep learning network for identifying three upper limb motions of the person’s arm: extending, bending, and spin of the elbow
Rehab-Net, is developed with a convolutional neural network (CNN) model that categorizes the three movements from tri-axial acceleration data collected from the arm
Uses Convolution neural network to categorize the three movements
Using semi-naturalistic input, it reached an overall accuracy of 97.89
18
Efficient algorithm which can forecast the assessment score realistically using wearable sensors and machine learning techniques
Accelerometer data was collected from patients in unrestricted contexts for an eight-week period using sensors, and model want to translate the accelerometer data into the evaluation
The longitudinal mixed-effects model with Gaussian process prior (LMGP) was used to model the unpredictable effects
Model gives Root Mean Square Error of 3.12
19
Elbow angle assessment, and a unique, position-based, and cost-effective scheme that requires an RGB camera to estimate elbow degree in realtime
Conducted a – cup-to-mouth motion while simultaneously measuring the angle with an RGB camera and a Microsoft Sensor
In the coronal and sagittal planes, a marker less and cost-effective RGB camera displays average RMS inaccuracies of 3.06° and 0.95° (continued)
72
S. G. Selvaganapathy et al.
Table 1 (continued) S. No. Problem addressed
20
Assessment of machine learning-based strategies for robot-assisted limb therapy
Working
Machine learning/deep learning used
Result
The present state of upper robotic arms is first discussed. The design and implementations of MLAs are next evaluated
Includes algorithms like Convolution neural network, support vector machine, neural network, recurrent neural network etc.
This article summarizes MLAs for robotic limb recovery
References 1. Chen X, Guan Y, Shi JQ, Du XL, Eyre J (2020) Automated stroke rehabilitation assessment using wearable accelerometers in free-living environments. arXiv:2009.08798 2. Qassim HM, Wan Hasan WZ (2020) A review on upper limb rehabilitation robots. Appl Sci 10(19):6976 3. Lum PS, Burgar CG, Shor PC, Majmundar M, Van der Loos M (2002) Robot-assisted movement training compared with conventional therapy techniques for the rehabilitation of upper-limb motor function after stroke. Arch Phys Med Rehabil 83(7):952–959 4. Choi Y, Gordon J, Kim D, Schweighofer N (2009) An adaptive automated robotic task-practice system for rehabilitation of arm functions after stroke. IEEE Trans Rob 25(3):556–568 5. Liao Y, Vakanski A, Xian M (2020) A deep learning framework for assessing physical rehabilitation exercises. IEEE Trans Neural Syst Rehabil Eng 28(2):468–477 6. Pollock A, Farmer SE, Brady MC, Langhorne P, Mead GE, Mehrholz J, van Wijck F (2014) Interventions for improving upper limb function after stroke. Cochrane Database Syst Rev (11) 7. Reinkensmeyer DJ, Kahn LE, Averbuch M, McKenna-Cole A, Schmit BD, Rymer WZ (2014) Understanding and treating arm movement impairment after chronic brain injury: progress with the ARM guide. J Rehabil Res Dev 37(6):653–662 8. Bouteraa Y, Abdallah IB (2016) Exoskeleton robots for upper-limb rehabilitation. In: 2016 13th International multi-conference on systems, signals & devices (SSD). IEEE, pp 1–6 9. Miao S, Shen C, Feng X, Zhu Q, Shorfuzzaman M, Lv Z (2021) Upper limb rehabilitation system for stroke survivors based on multi-modal sensors and machine learning. IEEE Access 9:30283–30291 10. Zimmermann Y, Forino A, Riener R, Hutter M (2019) ANYexo: a versatile and dynamic upper-limb rehabilitation robot. IEEE Robot Autom Lett 4(4):3649–3656 11. Guillén-Climent S, Garzo A, Muñoz-Alcaraz MN, Casado-Adam P, Arcas-Ruiz-Ruano J, Mejías-Ruiz M, Mayordomo-Riera FJ (2021) A usability study in patients with stroke using MERLIN, a robotic system based on serious games for upper limb rehabilitation in the home setting. J Neuroeng Rehabil 18(1):1–16 12. Chae SH, Kim Y, Lee KS, Park HS (2020) Development and clinical evaluation of a web-based upper limb home rehabilitation system using a smartwatch and machine learning model for chronic stroke survivors: prospective comparative study. JMIR Mhealth Uhealth 8(7):e17216 13. Lang CE, Waddell KJ, Barth J, Holleran CL, Strube MJ, Bland MD (2021) Upper limb performance in daily life approaches plateau around three to six weeks post-stroke. Neurorehabil Neural Repair 35(10):903–914 14. Ai Q, Ke D, Zuo J, Meng W, Liu Q, Zhang Z, Xie SQ (2019) High-order model-free adaptive iterative learning control of pneumatic artificial muscle with enhanced convergence. IEEE Trans Industr Electron 67(11):9548–9559
6 Machine Learning Based Robotic-Assisted Upper Limb Rehabilitation …
73
15. Oña ED, Balaguer C, Jardón A (2018) Towards a framework for rehabilitation and assessment of upper limb motor function based on serious games. In: 2018 IEEE 6th International conference on serious games and applications for health (SeGAH). IEEE, pp 1–7 16. Pane YP, Nageshrao SP, Kober J, Babuška R (2019) Reinforcement learning based compensation methods for robot manipulators. Eng Appl Artif Intell 78:236–247 17. Wu Q, Wang X, Chen B, Wu H (2018) Development of an RBFN-based neural-fuzzy adaptive control strategy for an upper limb rehabilitation exoskeleton. Mechatronics 53:85–94 18. Adans-Dester C, Hankov N, O’Brien A, Vergara-Diaz G, Black-Schaffer R, Zafonte R, Bonato P (2020) Enabling precision rehabilitation interventions using wearable sensors and machine learning to track motor recovery. NPJ Digit Med 3(1):1–10 19. Panwar M, Biswas D, Bajaj H, Jöbges M, Turk R, Maharatna K, Acharyya A (2019) Rehab-net: deep learning framework for arm movement classification using wearable sensors for stroke rehabilitation. IEEE Trans Biomed Eng 66(11):3026–3037 20. Yahya M, Shah JA, Warsi A, Kadir K, Khan S, Izani M (2018) Real time elbow angle estimation using single RGB camera. arXiv:1808.07017 21. Ai Q, Liu Z, Meng W, Liu Q, Xie SQ (2021) Machine learning in robot assisted upper limb rehabilitation: a focused review. IEEE Trans Cogn Dev Syst
Chapter 7
Performance Analysis of Classic LEACH Versus CC-LEACH Lakshmi Bhaskar
and C. R. Yamuna Devi
1 Introduction The main design goals in WSN are based on how, where and when for aggregation function, routing scheme, and aggregation schedule, respectively. They can be categorized based on different requirements under each bucket as said in a later case. In data aggregation, the algorithm must be designed based on the characteristics of data flow in the network for appropriate forwarding, to reach the respective destination or a sink. The aggregation of data always supports enhancing the quality of services in networks. The networks can be classified as homogeneous and heterogeneous networks, for a wide range of applications heterogeneous networks are suggested, for such networks, the routing algorithms can be tree-based, cluster-based, or centralized, and the cluster-based network is selected for aggregation and routing of data packets since it contributes to efficient energy usage, the lifespan of nodes, consumed energy, throughput and network lifetime. In any WSN application for data aggregation using cluster-based algorithms cluster head election is the crucial task. It will drastically reflect on the performance of the network lifetime if the characteristics of the sensor node are not defined properly. Hence clustering methods can be revisited to divide and organize sensor nodes into several clusters. A cluster head is defined for each cluster so that other active nodes can rely on it for the transmission of data to a suitable destination. The leverages of using clustering are its scalability and functionality. Data aggregation through clustering reduces the energy consumption during transitional states of nodes. Thus, facilitating effective utilization of available energy at sensor nodes further boosts L. Bhaskar (B) · C. R. Yamuna Devi Department of Telecommunication Engineering, Dr. AIT, Bangalore, India e-mail: [email protected] C. R. Yamuna Devi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_7
75
76
L. Bhaskar and C. R. Yamuna Devi
network lifetime. The parameters of node characteristics are necessary to select the eligible CH. The proposed algorithm is tested for 4kbits of data using random energy in order to predict the lifetime of the active sensor nodes in the network.
2 Literature Work The algorithms such as Cluster-based Data Aggregation Scheme (CDAS), Bandwidth Efficient Cluster-based Data Aggregation (BECDA) scheme, energy-aware clustering algorithm EADC, Cluster-based Compressed Data Aggregation and Routing (CCDAR), LEACH, etc. along with these AI-based algorithms also can be considered for improvement of network lifetime and throughput some examples are like Particle Swarm Optimization, Neural Networks, Genetic Algorithms, and Ant Colony Optimization etc. Artificial intelligence-based techniques help in the improvement of network lifetime and throughput Ant Colony Optimization can be used for the improved routing to wireless sensor networks and PSO is a better option for clustering to select the best cluster head in the network [1]. The method can be used at various stages of WSN for data processing. LEACH solves most of the problems faced by traditional protocols. Seedha Devi et al. [2], have suggested a Cluster-based Data Aggregation Scheme for reducing packet loss and latency in WSN. This method has two phases: An aggregated tree is constructed and slots are scheduled through an algorithm. The first phase is also referred to as the Aggregation Tree Construction phase, a compressive function that aggregates the data packets received from clusters. A Minimum Spanning Tree (MST) is produced by the base station, based on information from each cluster and it allocates the data transmission intervals for each CH. The second phase is referred to as the slot scheduling algorithm, data aggregated is classified by CH, into high priority and low priority data. The data having a higher priority is assigned to a time slot based on priority, although the data with less priority are queued initially and are allocated time slots after transmission of higher priority data. This process reduces the loss of packets and ensures secure packet delivery. This method proposed by V. Seedha et al. minimizes the end-to-end delay along with transmission overhead. It also optimizes the amount of consumed energy by the nodes because packet retransmission and superfluous waiting can be reduced or nullified, through this process along with improvisation in the network lifetime. Subedi et al. [3] proposed an analysis of 100 nodes network in which nodes were randomly distributed in a 100 * 100 network area with the Base station at a central location. The 1 Mbps bandwidth of the channel was considered with each data message was 500 B, and the packet header of 25 B long for each type of packet. The process of the simulation starts with the setup phase where the cluster area is defined randomly depending on the density of the nodes within the network. The node which can satisfy the condition below:
7 Performance Analysis of Classic LEACH Versus CC-LEACH
T (i ) =
(p ), 1− p∗ rmod 1p
0,
i ∈G
77
(1)
otherwise
If the node satisfies Eq. (1), in which T (i) refers to the threshold value, p refers to the probability of the node being cluster head and r is the iteration number. It is considered the cluster head. Communication of network nodes will be through their cluster heads rather than individuals; this reduces the consumed energy in any network and also smoothens the packet flow. A centralized K-Means algorithm was added which has high stability in the setup phase and aids in a longer network lifetime when the cluster head becomes a normal node. Few methods like the Multihop route discovery [4] algorithm intend to reduce delay and enhance lifetime. Few algorithms such as DELCA and LCA [5] and GPSR [6] are unsuitable for the heterogeneous type of wireless sensor networks though they are efficient in optimization of energy consumed, network lifetime, duplication, etc. The broadcasting problem exists in a few wireless sensor networks. This work proposes the behavior of the CC-LEACH algorithm for the aggregation of data in heterogeneous networks. The focus of this study is to implement and perform a set of simulation testing on CC-LEACH protocol using a Python simulator to improve the energy consumed and lifetime of the network in WSN and compare their performance. The energy efficiency of the LEACH [7] protocol is commonly obtained by assigning the role of a cluster head on a rotation basis consistently among available nodes; however, the existing energy available in the sensor node is neglected when CH needs to be selected in the classic LEACH algorithm. This may cause picking CHs having the least available energy and it may further cause the early death of the cluster head leading to the deterioration of the network. The LEACH protocol separates the lifecycle of any network into several iterations, and in each iteration, there are two phases, namely the setup phase and steady phase, it is used for cluster formation and then packet transmission from each CH to the sink. The setup phase starts by partitioning the network field into a number of clusters, then CHs are allocated to every cluster through a random selection of cluster heads. One node can become the head of the cluster only when the random number is less than the threshold value ‘T (i)’, and then the nodes can serve as the CH for the first iteration. The just-selected CHs transmit an advertisement signal to the other nodes to announce them as the CHs for the existing round. After that, ordinary nodes join the cluster having the most considerable signal strength. The last stage of the initial iteration is the steady phase in which every CH uses “time division multiple access (TDMA)” to generate an interval for all the existing nodes for transmitting their sensitive data to the cluster head. Hence, the energy required for transmitting packets in every approach is variable or random. The nodes which do not cluster heads are allotted a lower energy amplification level, and a higher energy amplification level is allotted to the CH for the current iteration, since it requires more energy than the regular nodes for the reception of packets from the nodes and other clusters, accumulating the data, removal of unwanted repeated data
78
L. Bhaskar and C. R. Yamuna Devi
Fig. 1 Flow diagram of LEACH algorithm for WSN data aggregation
Initialise the Network Parameters
Create random network; configure the sensors
Advertisement message
Initialization of energy levels to nodes
Broadcast CH to all sensors
Start of Data packet transmission From CH – Sink after aggregation
Data sent to BS
and forwarding data packets to the intended recipient. In consecutive iterations, if the previous CH becomes a regular node, the classic LEACH algorithms modify the higher amplification energy level to lower the amplification energy of that CH, which is presently the regular node. This is illustrated below in Fig. 1.
3 Proposed Algorithm Improvement in the performance measure of energy efficiency and network lifetime of the LEACH algorithm, a new approach of dividing the network area into circular clusters is proposed to increase the life of the WSN network as compared with classic LEACH. The process flow is illustrated in Fig. 2. The process depicts that node initialization, setup phase and steady phase must be the same as LEACH protocol further, the cluster formation and grouping election of cluster heads needs to be performed for circular clusters, and finally data transmission by connecting the two nearest nodes so as to reach the destined sink or base station. The circular grouping is performed based on the condition in Eq. (2). Diameter of circle =
Width of network √ (p ∗ n
(2)
7 Performance Analysis of Classic LEACH Versus CC-LEACH Fig. 2 Process flow for circular clustering of nodes in CC-Leach
79
Initialisation of nodes
Data transmission
Election of cluster heads
Formation of circular clusters
Grouping of nodes
The diameter of each circular cluster ‘d’, it’s evaluated as ratio of width of a network ‘W ’ and the number of segments on each side, where the square root of product of ‘p’ is the probability of the node being cluster head and ‘n’ represents the total number of nodes. The selection of cluster heads in the first round is similar to that of the LEACH protocol where every node picks a random node number and compares it with the threshold as mentioned in Eq. (1). The later process of clustering is as shown in the process flow figure. The results obtained for this process of data aggregation are consolidated and depicted graphically, also the parameters considered for the same are tabulated. It is justified through the results obtained for CC-LEACH, it lessens the energy consumed and inflates the lifetime of the network, illustrated in Figs. 3 and 4 and Figs. 5 and 6, respectively. The main flow of the data aggregation process resembles LEACH but the cluster formation and process of cluster head election varies. The flow chart depicts the process flow in Fig. 3. The proposed work was simulated through the Python platform. The parameters considered for the work are listed below in Table 1.
4 Results 4.1 Results for Energy Consumed The existent LEACH algorithm shows that the network is alive with sufficient energy for 30–75 iterations with an initial energy of 2 J, a comparative analysis is generated for classic LEACH algorithm with the above parameters being set for network design
80
L. Bhaskar and C. R. Yamuna Devi
Fig. 3 Energy consumed by classic LEACH and CC-LEACH for 50 nodes
Fig. 4 Energy consumed by classic LEACH and CC-LEACH for 100 nodes
and simulated using python platform, this comparison is performed for 2 J, with 50 nodes and 100 nodes for a network. Comparative results for the parameter assumed in Table 1 is as shown in the graphs below figures. The energy consumption by the network graph in Fig. 3 is simulated for LEACH and CC-LEACH using 50 nodes in a network shows that in LEACH algorithm the energy fades away within 70 iterations, whereas in CC-LEACH it exists beyond 80
7 Performance Analysis of Classic LEACH Versus CC-LEACH
81
Fig. 5 Network lifetime of classic LEACH at 5th and 20th iteration
Fig. 6 Network lifetime of classic LEACH and CC-LEACH at 5th and 20th iteration
Table 1 Parameter for data aggregation process used in LEACH and CC-LEACH
Name of the parameter
Parameter values
Area of the network
1000 * 1000 sqm
Initial energy
2J
Transmitter energy
100 nJ
Receiver energy
100 nJ
Probability of a node to be a CH
10%
Data aggregation energy
10 nJ
Data packet size
4000 bits
Number of packets sent in steady-state phase 10
iterations, which depicts an improvement in network behavior by around 12% is observed. The energy consumption by the network graph in Fig. 4 is simulated for LEACH and CC-LEACH using 100 nodes in a network shows that in LEACH algorithm the energy fades away within 40 iterations, whereas in CC-LEACH it exists up to 60
82
L. Bhaskar and C. R. Yamuna Devi
iterations, which depicts an improvement in network behavior by around 15% is observed when the network is populated by 100 nodes.
4.2 Results for Network Lifetime Figure 5 illustrates the network lifetime of a classic LEACH applied to a network, the behavior is shown for a varied number of nodes (50 through 100 with step size of 10) at 5th and 20th iterations. Initial iterations depict the nodes fade away gradually, but later iteration shows that nodes fade away drastically by 20th iteration. These illustrations are simulated in the python platform. In Fig. 6 a comparative analysis between Classic LEACH and CC-LEACH is illustrated in two graphs for a network with 50 and 100 nodes at 5th and 20th iterations, respectively. It depicts that there is an improvement of about 10–15% in the number of alive nodes available for transmission i.e., enhancing the network lifetime in CC-LEACH.
5 Conclusions A juxtaposition of the classic leach and the proposed method are depicted in the above graphs. The analysis of these graphs shows that the amount of energy consumed in the proposed technique is less than classic Leach i.e., the method restores energy for a long duration, further conserving usage of energy and increasing the network lifetime. The analysis also depicts the network lifetime Figs. 5 and 6, the network is alive for a longer time, about 15–20% increase compared to the existing LEACH algorithm. This process can be further improved to enhance the throughput and other metrics, by modifying the network parameters. In this way, there can be further improvisation of Quality-of-service parameters of a WSN by using appropriate data aggregation processes. The simulation is performed using python platform, further, it can be progressed for real-time implementation.
References 1. Kumar H, Singh PK (2018) Comparison and analysis on artificial intelligence based data aggregation techniques in wireless sensor networks. In: International conference on computational intelligence and data science (ICCIDS 2018). Elsevier Ltd. https://doi.org/10.1016/j.procs.2018. 05.002 2. Seedha Devi V, Ravi T, Baghavathi Priya S. Cluster based data aggregation scheme for latency and packet loss reduction in WSN. Comput Commun. https://doi.org/10.1016/j.comcom.2019. 10.003
7 Performance Analysis of Classic LEACH Versus CC-LEACH
83
3. Subedi S, Lee S, Lee J (2018) A new LEACH algorithm for the data aggregation to improve the energy efficiency in WSN. Int J Internet Broadcast Commun 10(2):68–73. https://doi.org/ 10.7236/IJIBC.2018.10.2.11 4. Yamuna Devi CR, Sunder D, Manjula SH, Venugopal KR, Patnaik LM (2014) Multi-hop routing algorithm for delay reduction and lifetime maximization in wireless sensor networks. IJREAT Int J Res Eng Adv Technol 2(3). ISSN: 2320-8791 5. Min J-K, Ng RT, Shim K (2019) Efficient aggregation processing in the presence of duplicately detected objects in WSNs. Hindawi J Sens Article ID 1496208, 15 pp. https://doi.org/10.1155/ 2019/1496208 6. Johny Elma K, Meenakshi S (2018) Energy efficient clustering for lifetime maximization and routing in WSN. Int J Appl Eng Res 13(1):337–343. ISSN 0973-4562 7. Heinzelman W, Chandrakasan A, Balakrishnan H (2000) Energy-efficient communication protocol for wireless sensor networks. In: Proceeding of the Hawaii international conference system sciences, Hawaii, January 2000
Chapter 8
Anomaly Detection in the Course Evaluation Process Vanishree Pabalkar , Ruby Chanda , and Anagha Vaidya
1 Introduction The use of Analytics in the traditional educational setup is steadily growing over recent years due to the advancement in e-technology that is helping in increased data collection, because of digitization of the admission and evaluation process has become inevitable. But the most critical use of analytics is the study of the effects of various pedagogical enhancements that are introduced in student learning environments. In a traditional setup, the student performance is measured by the marks that he/she scores in the final written examination rather than measuring the level of skills imparted to him/her as part of the teaching and learning process. Due to qualitative changes in the educational environment, continuous evaluation systems are adopted in most universities. This paper proposes an analytical model to address the challenges in the education evaluation system such as the identification of gaps or defect/discrepancy in the evaluation system, profiling of student performance based on skills, the impact of previous degree/experience on the student performance, identification of a relational pattern in internal and external evaluation, etc. The different techniques for anomaly detection in the subject evaluation are demonstrated on the student performance data.
V. Pabalkar (B) · R. Chanda · A. Vaidya Symbiosis Institute of Management Studies, Symbiosis International (Deemed University), Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_8
85
86
V. Pabalkar et al.
2 Literature Review Anomaly detection has been an area of interest for several researchers for a very long time. There has been great demand in the compliance process, the security, health and medical risk, etc., where anomaly detection has a significant role to play. In the areas like data mining, machine learning to it has a major role. In recent times, concepts like deep learning, NLP, AI, IoT, etc. have exhibited abundant potential. The complex data can be derived and made simple to understand through these aspects. These are the expressive ways of data representations. In recent years, the educational data mining (EDM) and learning analytics (LA) communities have emerged approaches for working with educational data learning analytics refers to the measurement, collection, analysis, and reporting of data about the progress of learners and the contexts in which learning takes place [1]. It is used for predicting student success by examining how and what students learn and how success is supported by academic programs and institutions [2]. These analyses are supported by Educational Data Mining (EDM). EDM uses multiple analytical techniques to understand relationships, structures, patterns, and causal pathways within complex educational datasets [3]. It is used for predicting student progression in academics, their learning style, performance growth, and skill evaluation. EDM data is hierarchical and non-independent in nature hence it uses traditional DM algorithms, unsupervised machine learning algorithms, and statistical techniques [4, 5]. The different data mining techniques like clustering [6], classification [7], regression, neural network, decision trees [8, 9], etc. are also used for finding learning patterns. With the help of these techniques, EDM aims to develop models to improve the learning experience and institutional effectiveness. Various methods have been proposed, applied, and tested in the field of EDM [10]. While developing these model anomalies may be existing. Anomalies refer to non-conforming behavior patterns which are applied differently depending on applications as outliers, exceptions, discrepancies, abnormalities, etc. [11]. The anomaly is finding out the abnormal behavior of a given data set that is not known previously. This behavior may or may not be harmful. This detection provides significant and critical information in various applications. This is performed by using different data mining techniques, machine learning, and different hybrid approaches for the same [12]. It is presented for two purposes first, to remove unwanted objects before any data analysis is performed on the data, second—detecting an unobserved pattern [13]. Concerning this purpose an anomaly detection is classified into three types— point anomalies, contextual anomalies, and collective anomaly. In point anomalies, individual data instances can be considered as abnormal with respect to rest if data. The contextual anomaly measure concerning contextual and behavior attributes [14]. Collective anomalies are a subset of instances that occur together as a collection and whose occurrence is not regular concerning a normal behavior [15]. Anomaly detection is a significant problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed. It is extensively used for identifying frauds in the banking sector [16], credit cards [17], mobile phone fraud detection, and insurance claim fraud detection
8 Anomaly Detection in the Course Evaluation Process
87
[18]. Segregating defective components in a manufacturing environment [19], finding intrusions in network service [20], locating malignant tumors in MRI images, etc. [21–23]. Literature survey shows that anomaly detection has been sparingly applied to educational data and has a greater scope as demonstrated in this paper. Artificial Intelligence is an emerging technology used in varied fields and domains. AI in education is still in its nascent stage. AI in education is characterized by two major data-centric fields which have been used lately, EDM (education data mining) and LA (Learning analytics) using machine learning in education and research. When educational data is explored and extracted, it is called educational data mining. This is done using patterns based on educational/instructional data called learning analytics. There are previous studies regarding anomaly detection which is conducted through unsupervised techniques. During the last decade, unsupervised detection has been focused on in a big way. This research proposed a new anomaly detection method for tuning and regulating the parameters without predefining them. Similar studies were conducted lately to detect anomalies for evaluating in-class learning processes using distance and density approaches of machine learning. The study was conducted using an experimentation method and the dataset was a class evaluation score data set. The result showed that the density approach outperformed the accuracy of traditional k-Nearest Neighbor (k-NN) in detecting anomalous data.
3 Data and Methodology The student’s marks in the last four semesters of master’s in business management study in a university. The student can opt for different specializations like marketing, finance, human resource, etc., as explained in Table 2. The subjects are divided into ‘Core subject’ and ‘General subject.’ The student can take the significant specialization in one stream and a minor specialization in another stream. Each subject is evaluated for 60 marks, which is encompassed by three different test skills listed in Table 1. The previous score marks of 10 and 12 grade are also considered for data labeling purposes. Before proceeding with the experiment, the data coding is explained in Fig. 2. The proposed framework is explained below—Framework Model For Evaluation System. Educational data mining has a technological focus on adapting, enhancing, and devising more and more data mining techniques to support actionable analytics on the available data in the educational field. An analytical model spells out the complete systematic process starting from data collection, analytical activities, and actions. Figure 1 shows the four components of the proposed framework model. The proposed model uses the data collected through existing data collection processes and thus requires thorough pre-processing before it can support analytics. The analytical activities then identify patterns and trends in the data. One of the significant events in the analysis process is to detect anomalies in the evaluation process. Second significant activity is profiling of the student based on different skill sets. This can be used to identify the weaknesses so that appropriate remedial actions
88 Table 1 Coding for different test style
Table 2 Subject specialization
V. Pabalkar et al. Test type
Test ID
Class test
CT
Project
PR
Viva/oral
VV
Multiple choices questions
MQ
Case studies
CS
Assignments
AS
Research papers
RP
Book reviews
BR
Presentations
PT
Quiz
QT
Lab tests
LT
Class participation
CP
Total
TO
Specialization code
Specialization
HR
Human resource management
MK
Marketing management
FN
Financial management
SY
Computer management
OM
Operational management
GE
General subject
Categories of subject C
Core/Compulsory subject
O
Optional subject
can be taken. The strengths of the students on various subjects and skills can be used for recommending suitable streams or jobs/internships. The predictive analytics can be used to establish relationships between the previous degree marks and the current performance of the student. Measuring and controlling learning interactions such as a course evaluation process. Some of the steps of the model as highlighted in Fig. 1 are demonstrated on the actual data collected in a Managerial Institute.
3.1 Data Encoding and Cleaning The input to the process is the data collected as part of the admission and evaluation process in a Management Institute, which covers student performance in various tests
8 Anomaly Detection in the Course Evaluation Process
89
Fig. 1 Proposed analytical model
conducted for varied subjects over the four semesters, previous student history, and the information about faculty including their teaching responsibilities. The major activity is encoding and cleaning the data from available sources and making it amenable to analytics. The students are assessed in varied ways apart from the class tests such as the presentation of the project undertaken, books and papers reviewed, research proposals prepared, etc. Table 1 shows the coding used for different test types. As the student progresses through four semesters, a large set of subjects are covered which are categorized according to streams. Some subjects are mandatory, and some are optional. Table 2 represents the encoding used for subjects and Table 3 shows that for the streams. The encoding used for each subject is presented in Fig. 2. The data set used for experimental analysis consists of marks of 314 students in the first semester in different tests in 15 subjects for batch 2015–17. The methodology used for the identification of discrepancies in data Several activities follow once the data is ready for analysis. R tool along with its packages was used for analysis. There are several techniques that help in identifying irregularities, discrepancies, or anomalies in the data. Most of the techniques attach anomaly scores so that the anomalies can be ranked while others label entities as anomalous or provide a visual representation for the human experts to decide the label. The techniques can be Descriptive Statistics, supervised, semi-supervised or unsupervised learning. In the next section, some experimental results on anomaly detection in the course evaluation process are presented. Anomalies in the course evaluation process. An analytical study of student performance in different subjects can be used to identify discrepancies or anomalies in the evaluation process. Table 4 summarizes different techniques used and each will be discussed in further detail.
90
V. Pabalkar et al.
Table 3 Subject code Subject name
Sub code
Core/Optional
Group code
Organizational behavior
101
Core
GEN
Core
HRM
Core
GEN
Optional
Finance
Core
MKT
Core
GEN
…
…
Human resource management
107
…
…
Advanced statistics
201
…….
…
Financial engineering and analytics
220
…
…
Customer relationship management
310
…
…
Corporate governance and ethics
401
…
…
Fig. 2 Subject encoding
Encryption Dencryption
Plain Text
+
Algorith m
Cipher Text
Cipher Text
+
Algorith m
Plain Text
Table 4 Different techniques used for anomaly detection in the course evaluation process Sr. No.
The instrument used for detection
Measurement parameters
1
Violin plot
Visual representation Statistical
To be labeled by a human expert
2
Statistical summary
Five point summary Statistical and other descriptive statistical measures
To be labeled by a human expert
3
Isolation tree
Path length/anomaly score
Ranked list
Type of learning technique
Unsupervised
Measurement type
4
K-nearest neighbor
Distance matrices
Unsupervised
Ranked list
5
Cluster analysis and confusion matrix
Accuracy and error rate
Supervised
Labeled based on thresholds
6
Probability density function
P_value measurement
Semi-supervised
Labeled based on thresholds
The descriptive statistics mainly represent the spread, share, and unusual features of the data. A statistical summary in the form of a violin plot can prove to be a very effective tool in visualizing and comparing evaluations carried out on different
8 Anomaly Detection in the Course Evaluation Process
91
Fig. 3 Student performance in different subjects using violin plots
subjects. Violin plots represent a lot of summary information as well as data distribution in an appealing visual form. A violin plot includes within it a thin box plot together with an asymmetrical kernel density plot which shows peaks in the data. It brings to the fore any deviation or discrepancies such as bimodality, outliers, skewness, etc. It also shows whether the data values are clustered around the median (shown by a thin white dot) or minimum and maximum values The violin plot in Fig. 3 presents the students’ performance in different subjects in the semester I of a batch. The plot shows the normal distribution of the data. Data is distributed at one end, and bimodality is considered as an anomalous data. Among the violin plots in Fig. 3, the data distribution for the subject “RM” and “HRM” is bi-model which means most of the students are scoring very high marks, many of them are scoring very fewer marks while very few of them are in the middle range. The distribution of marks of “OR”, “AE” is normally distributed in the interval of [30 marks till 70 marks] so are the “FA” and “BS” marks with little wider range. The graph peaks of “BE”, “OM”, ”MM”, “ME”, “HRM”, ”BS” are flatter which indicates that there are many students scoring maximum marks, while “BC”, “LAB” the graph base is flat while the peak point of this graph is sharp which indicate the many students have scored lower marks while a student score higher marks in the subject. As the performance of the subject depends on several tests conducted in the subject. The graph can be drilled down further to view a test-wise summary in each subject as in Fig. 4. The descriptive statistics of Fig. 4 is presented in Table 5 which can be used for anomaly detection. The mean and median of all these three criteria are close which indicates that (central tendency is similar) the distribution is normal while the distribution of marks in the “Class test” is increased in the upper tail, while in the “Assignment” the marks are distributed in lower tail, in “Presentation” it appears in the middle. This indicates that the presentation marks may be judgmental. In Fig. 5, the plots of a single test type “Class Test” in all subjects can be compared.
92
V. Pabalkar et al.
The class test data is left-skewed means most of the scores fall on the side of the maximum. The flat indicates that many students have higher scores. The violin plots or the statistical summary can be presented to human experts to allow them to label anomalies. The visual representation can also be used by the faculty to understand the flows and take appropriate actions.
Fig. 4 Violin plots for class test, assignment, and presentation
Table 5 Quantitative data classification
Quantitative data Types Data collection Data analysis Examples Advantages
8 Anomaly Detection in the Course Evaluation Process
93
Fig. 5 Violin plot description
3.2 Anomaly Detection Using Isolation Trees Anomalies being rare occurrences can be easily isolated as they differ in their characteristics from normal data. In the isolation Forest algorithm, binary search trees are constructed for each feature. The anomalies are expected to be closer to the root in these isolation trees compared to normal data. The Isolation Forest algorithm has a linear time complexity and low memory requirements. The Isolation Trees package in R is used which provides both path length and a normalized anomaly score. The plot suggests the evaluation in some subjects is very much different. To get the anomaly score in the subject score, we need to consider the features of each subject. Since only the marks obtained by around 315 students in each subject are available, it was necessary to extract fewer features that can collectively represent this knowledge. It is evident that not all elements are responsible for anomalous behavior hence features need to be chosen appropriately. Thus, several available options for the feature set are considered. One way was to consider a five-point summary that consists of a minimum, maximum, median, and the first and third quartile. R provides several packages that give a broader set of characteristic statistical features. Three
94
V. Pabalkar et al.
different packages ‘Hmisc’, ‘Pastecs,’ and ‘Psych’ are used for features and run the isolation forest algorithm on it. Table 6 shows the path length and anomaly scores for a different set of features. A threshold value of anomaly score is used for labeling anomalies, and the identified anomalies are highlighted. Quantitative data that had the quantification of data collected. It is through a deductive approach. In this more focus is on testing the theory. Subject ‘102’ is detected as anomalous by all the three approaches while subjects ‘101, ‘103, ‘107’, ‘110’. ‘113’ and ‘115’ appear as normal in all of them.
3.3 Anomaly Detection Using Nearest Neighbor Anomalies being rare and isolated occurrences, the Nearest neighbor classification approach can be used to identify them. The Anomaly score can be the average distance of k neighbors or in the case of k = 1, a distance of the nearest neighbor. As the distance is computed based on different features, a large value indicates that the data is far removed from others. The anomaly score will be low for similar data. For the dataset of marks in different subjects, the five-point summary was used as features and the anomaly scores as the nearest neighbor distance were computed. Table 7 shows the anomaly scores for different subjects and the anomalies are highlighted. Figure 6 is a dendrogram based on the distance matrix which shows the grouping between subjects and the isolated subjects. The results indicate subjects ‘102’, ‘104’, ‘106’, ‘109’ as anomalous.
3.4 Anomalies Detection Using Classification Semi-supervised methods require normal data for building the model. This normal data will be used to tag students into three classes: good, average, and poor. For each of the remaining subject’s classification of the students into good and poor students is compared leaving out the average students. The accuracy and error rates are then used to identify anomalies. The normal data can be decided using domain knowledge or as in the above case, the five subjects that uniformly appear as normal in the isolation tree method can be treated as normal data. This data is then used to form three clusters which are treated as actual classification. This is then compared with classification given by an individual subject giving rise to a confusion matrix as shown in Fig. 7 and the respective clusters are shown in Figs. 8 and 9. Table 7 represents the error rate in the confusion matrix. The average students that are in the middle second cluster are ignored which is marked red. The precision is calculated by considered correctly classified data in two extreme classes one and three poor and good students. The error rate is similarly computed by considering misclassifications among class one
0.344292354
7.023664554
Sub115
7.023664554
Sub112 0.532227012
2.15443133
Sub110
Sub111
0.532227012
0.344292354
7.023664554
Sub109
4.15443133
0.344292354 0.721037082
2.15443133
Sub108
4.15443133
0.721037082
7.023664554
Sub107
Sub113
0.532227012 0.344292354
4.15443133
Sub106
Sub114
0.344292354
7.023664554
Sub105
5.207392358
0.453601862
0.453601862 0.63417478
5.207392358
0.619479953
3.15443133 3
0.411337507
0.453601862
0.411337507
0.411337507
5.851655907
5.207392358
5.851655907
5.851655907
0.453601862
0.619479953
3.15443133 5.207392358
0.544852092
0.453601862
0.411337507
0.859151308
0.453601862
Anomaly score
4
5.207392358
0.544852092 0.344292354
4
7.023664554
Sub104
5.851655907
0.532227012
4.15443133
Sub103
5.207392358 1
0.344292354 0.859151308
7.023664554
1
Path value
Sub102
Anomaly score
Path value
Subject code
Median, mean, mean standard error, confidence interval mean, variance, standard deviation, coefficient of variance
Sub101
Minimum, first quartile, median, third quartile, maximum
Measurement parameters
Table 6 Anomaly scores using isolation forest algorithm for different feature sets
7.748880484
7.748880484
7.748880484
2.15443133
7.748880484
7.748880484
7.748880484
0.308399483
0.308399483
0.308399483
0.721037082
0.308399483
0.308399483
0.308399483
0.308399483 0.63417478
7.748880484
0.859151308 3
1
0.308399483 0.721037082
7.748880484
0.308399483
0.544852092
0.308399483
Anomaly score
2.15443133
7.748880484
4
7.748880484
Path value
Mean, standard deviation, median truncated mean, median absolute deviation, minimum, maximum range, skew, kurtosis, standard error
8 Anomaly Detection in the Course Evaluation Process 95
0
1.0040 5
0.5524 5
0.7979
0.6268 2
0.3891 3
0.9907 7
0.7860 3
0.5857 2
0.8073 8
0.97842
0.60165
1.14621
1.10207
0.50766
1.32611
1.87184
1.23089
1.52507
1.11241
Sub103 0.5540 1
Sub104 0.9888 4
Sub105 0.3572 9
Sub106 0.7763
Sub107 0.6825 4
Sub108 0.5201 7
Sub109 1.0408
Sub110 0.2521 3
Sub111 0.8781 8
Sub112 0.4070 4
0.4584 8 0.6646 9 0.4791 7 1.2054 6 0.4883 9 0.9096 6
0.6743 4
0.4625 6
1.2323 4
1.9313 5
1.0324 1
1.5694 6
0.3666 6
0
0.8706 2
0.7685 9
0.8706 2
0
0.5524 5
1.1462 1
0.6016 5
0.9784 2
0
Sub102 1.1306
1.0040 5
0.3572 9
0.9888 4
0.5540 1
1.1306
Sub101 0
0.5877 1
1.2168 1
0.8732 7
1.6096 7
0.8287 8
0.6739 9
0
0.4584 8
0.6743 4
0.7979
1.1020 7
0.7763
Sub105 Sub106
Sub103 Sub104
Sub101 Sub102
Table 7 Distance metrics
0.6642 9
1.1867 2
0.8097 2
1.5335 2
0.9042 9
0
0.6739 9
0.6646 9
0.4625 6
0.6268 2
0.5076 6
0.6825 4
0.8020 8
0.4389 5
0.7329 5
0.7919 1
0
0.9042 9
0.8287 8
0.4791 7
1.2323 4
0.3891 3
1.3261 1
0.5201 7
1.4257 5
0.5576 3
1.1818 4
0
0.7919 1
1.5335 2
1.6096 7
1.2054 6
1.9313 5
0.9907 7
1.8718 4
1.0408
Sub107 Sub108 Sub109
0.3454 9
1.0968 3
0
1.1818 4
0.7329 5
0.8097 2
0.8732 7
0.4883 9
1.0324 1
0.7860 3
1.2308 9
0.2521 3
1.2187 2
0
1.0968 3
0.5576 3
0.4389 5
1.1867 2
1.2168 1
0.9096 6
1.5694 6
0.5857 2
1.5250 7
0.8781 8
0
1.2187 2
0.3454 9
1.4257 5
0.8020 8
0.6642 9
0.5877 1
0.3666 6
0.7685 9
0.8073 8
1.1124 1
0.4070 4
0.6234 7
1.0877 9
0.6717 3
1.3756 2
0.8201 3
0.2385 3
0.7903 7
0.6342 4
0.6311 5
0.5522 7
0.5877 8
0.5471 4
0.9240 8
0.6409 1
0.9585 9
1.1202 5
0.5064 1
0.6329 1
0.7794 7
0.6494
0.9926 8
0.2167 2
0.9509 1
0.7224 7
(continued)
0.4357 1
1.0597 4
0.0957
1.1133 1
0.7203 7
0.8497 8
0.9453 8
0.5393 5
1.0963 3
0.7783 3
1.2555 2
0.2591 7
Sub110 Sub111 Sub112 Sub113 Sub114 Sub115
96 V. Pabalkar et al.
0.7783 3
0.2167
1.25552
0.5077
Sub115 0.2591 7
0.2521
0.2167 2
0.95091
Sub114 0.7224 7
Mincol
0.5522 7
0.58778
0.4626
0.3573
0.5393 5
0.6494
0.9926 8
1.0963 3
0.6342 4
0.6311 5
0.4585
0.9453 8
0.7794 7
0.7903 7
0.2385
0.8497 8
0.6329 1
0.2385 3
0.3891
0.7203 7
0.5064 1
0.8201 3
0.5576
1.1133 1
1.1202 5
1.3756 2
0.0957
0.0957
0.9585 9
0.6717 3
0.439
1.0597 4
0.6409 1
1.0877 9
0.3455
0.4357 1
0.9240 8
0.6234 7
0.2385
0.6910 2
0.6256 4
0
0.2167
0.9626 7
0
0.6256 4
0.0957
0
0.9626 7
0.6910 2
Sub103 Sub104 Sub105 Sub106 Sub107 Sub108 Sub109 Sub110 Sub111 Sub112 Sub113 Sub114 Sub115
Sub113 0.5471 4
Sub101 Sub102
Table 7 (continued)
8 Anomaly Detection in the Course Evaluation Process 97
98
V. Pabalkar et al.
Fig. 6 Dendrogram showing the isolation of anomalous subject
and class 3. So for the confusion matrix in Fig. 8, precision = (40 + 35)/(40 + 35 + 33 + 5) and error rate = (33 + 5)/(40 + 35 + 33 + 5) (Fig. 7). The precision and error rate for different subjects is given in Fig. 8 where the anomalies are highlighted by considering precision below a specified threshold of 0.7 and error rate above a threshold of 0.2. Fig. 7 Confusion matrix
sbject Name sub101
0.855421686746988 0.144578313253012
sub102
0.742268041237113 0.257731958762887
sub103
0.958677685950413 0.041322314049586 8 0.862068965517241 0.137931034482759
sub104
Accuracy
Error Rate
sub105
0.675
sub106
0.702970297029703 0.297029702970297
0.325
sub107 sub108
0.907216494845361 0.092783505154639 2 0.898876404494382 0.101123595505618
sub109
0.853932584269663 0.146067415730337
sub110 sub111
0.938144329896907 0.061855670103092 8 0.558441558441558 0.441558441558442
sub112
0.777777777777778 0.222222222222222
sub113 sub114
0.936363636363636 0.063636363636363 6 0.87962962962963 0.12037037037037
sub115
0.879120879120879 0.120879120879121
8 Anomaly Detection in the Course Evaluation Process
99
Fig. 8 Cluster diagram—cluster on training data set
Fig. 9 Anomaly detection
Table 8 Confusion matric error rate table Confusion matrix for Subject 102 Cluster No.
1
2
3
1
40
23
5
2
56
52
18
3
33
52
35
100
V. Pabalkar et al.
3.5 Anomaly Detection Using Probability Density Value For anomaly detection using the probabilistic method, normal data is to be used as a reference point to build the probabilistic model. It really does not matter if few values are anomalous in this normal data as the computations will average out the discrepancies. In this example, the data of the five subjects were treated as normal. The next step is to choose a set of features and in this example five-point summary (minimum, lower quartile, median, upper quartile, maximum) was used as a feature set. For each feature, the mean and standard deviation are computed using the normal data. The probability density value (P-value) is then computed using the formula in Fig. 10. The minimum of the P-value for each subject in normal data is used as the threshold value P. For the remaining data, the P-value is computed and it is anomalous if the P-value is less than epsilon. Tables 9 and 10 give the P-values for the data and the anomalies are highlighted in red color. Fig. 10 Formula P-value calculation
Table 9 P-value for training data set
P_val on training sataset Sr. No.
Subject name
P_val
1
Sub101
2.063E−06
2
Sub103
3.456E−07
3
Sub107
1.3E−07
4
Sub110
8.284E−07
5
Sub113
6.813E−07
6
Sub115
6.703E−07
Table 10 P-value for test data set P_val on test data set Subject name
P_val
Sub102
3.095E−12
Sub104
2.942E−10
Sub105
6.5E−07
Sub108
1.836E−08
Sub109
7.038E−15
Sub106
2.277E−08
Sub114
8.465E−08
8 Anomaly Detection in the Course Evaluation Process Sr No
Techniques used
sub 101
sub 102
Sub 103
Sub 104
Sub 105
Sub 106
101
Sub 107
sub 108
sub 109
sub 110
sub 111
sub 112
sub 113
sub 114
sub 115
1
Violin Plot
0
√
0
0
0
√
0
0
0
0
√
0
√
0
0
2
Statistical Anomaly Detection
0
√
0
0
0
√
0
0
0
0
√
0
0
0
0
Isolation Forest K-NN
0
√
0
0
√
√
0
0
0
0
0
√
0
0
0
0
√
0
√
0
√
0
0
√
0
0
0
0
0
0
Confusion Metrics Probability Density Function
0
√
0
√
0
√
0
√
0
0
√
0
0
0
0
0
√
0
0
0
0
0
0
√
0
0
0
0
0
0
3 4 5 6
√
Anomalous Subject
0
Non Anomalous Subject
Fig. 11 All criteria marks for sub 102
3.6 Actionable Analytics The different techniques help us in identifying anomalies and there are variations in the results. It is unfair to declare something as anomalous without strong supporting evidence but the evaluation for subject ‘102’ appears consistently to be anomalous across multiple techniques. One can drill down and look at the violin plots for individual evaluation criteria used for the subject. The violin plot in Fig. 11 clearly indicates discrepancies in the scores for the class test, case study, and presentation. All techniques summary information is given in Table 1, sub102 is highlighted as anomalous subject evaluation.
4 Discussion and Conclusion Though a number of varied tools were used in the research paper, it can be said that the use of a specific tool is based on the type of data. Tools are also used to determine and explain the algorithm used in the research paper. This again depends on the type of algorithm developed based on the data relating to a study. EDM provides a set of techniques that can be conveniently applied to large data collected as part of the evaluation process to ensure remedial actions.
References 1. Ali L, Hatala M, Gaševi´c D, Jovanovi´c J (2012) A qualitative evaluation of evolution of a learning analytics tool. Comput Educ 58(1):470–489 2. Mattingly KD, Rice MC, Berge ZL (2014) Learning analytics as a tool for closing the assessment loop in higher education. Knowl Manag E-Learn: Int J 4(3):236–247 3. Romero C, Ventura S (2010) Educational data mining: a review of the state of the art. IEEE Trans Syst Man Cybern C Appl Rev 40(6):601–618 (2010)
102
V. Pabalkar et al.
4. Baker RS, Yacef K (2009) The state of educational data mining in 2009: a review and future visions. JEDM-J Educ Data Min 1(1):3–17 5. Mohsin MFM, Norwawi NM, Hibadullah CF, Wahab MHA. Mining the student programming performance using rough detecting anomalous user behavior using an extended isolation forest algorithm: an enterprise case study. arXiv:1609.06676 6. Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Futur Gener Comput Syst 55:278–288 7. Bolton RJ, Hand DJ (2001) Unsupervised profiling methods for fraud detection. Credit Scoring Credit Control VII:235–255 8. Fawcett T, Provost F (1999) Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 53–62 9. Keogh E, Lin J, Lee SH, Van Herle H (2007) Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11(1):1–27 10. Ni X, He D, Ahmad F (2016) Practical network anomaly detection using data mining techniques. VFAST Trans Softw Eng 9(2):1–6 11. Rodger JA (2015) Discovery of medical big data analytics: improving the prediction of traumatic brain injury survival rates by data mining patient informatics processing software hybrid hadoop hive. Inform Med Unlocked 1:17–26 12. Pang G, Shen C, Cao L, Hengel A (2020) Deep learning for anomaly detection: a review 13. Baker RSJ, Siemens G (2014) Educational data mining and learning analytics. In: Sawyer K (ed) Cambridge handbook of the learning sciences, 2nd edn. Cambridge University Press, New York, NY, pp 253–274 14. Romero C, Ventura S (2010) Educational data mining: a review of the state of the art. IEEE Trans Syst Man Cybern Part C Appl Rev 40:601–618 15. Agrawal S, Agrawal J (2015) Survey on anomaly detection using data mining techniques. Procedia Comput Sci 60:708–713 16. Gogoi P, Borah B, Bhattacharyya DK (2010) Anomaly detection analysis of intrusion data using supervised & unsupervised approach. J Converg Inf Technol 5(1):95–110 17. Patcha A, Park JM (2007) An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput Netw 51(12):3448–3470 18. Buczak AL, Guven E (2016) A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun Surv Tutor 18(2):1153–1176 19. Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126 20. Goldstein M, Dengel A (2012) Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. KI-2012: Poster Demo Track 59–63 21. Géryk J, Popelínský L, Trišˇcík J (2016) Visual anomaly detection in educational data. In: International conference on artificial intelligence: methodology, systems, and applications. Springer International Publishing, pp 99–108 22. Baek C, Doleck T (2020) A bibliometric analysis of the papers published in the journal of artificial intelligence in education from 2015–2019. Int J Learn Anal Artif Intell Educ 23. Sahin M, Yurdugül H (2019) Educational data mining and learning analytics: past, present and future e˘gitsel veri madencili˘gi ve ö˘grenme analitikleri: dünü, bugünü ve gelece˘gi. Bartin Univ J Fac Educ 9(1):121–131. https://doi.org/10.14686/buefad.606077
Chapter 9
Self-attention-Based Efficient U-Net for Crack Segmentation Shreyansh Gupta, Shivam Shrivastwa, Sunny Kumar, and Ashutosh Trivedi
1 Introduction Cracks provide essential information about the health, safety, and serviceability of a structure [1]. All types of buildings, including concrete walls, beams, slabs, road pavements, and brick walls, are prone to cracking [2]. In concrete structures, cracks can allow the penetration of corrosive and harmful chemicals into the structure, which may internally damage or corrode the structures. Therefore, to attenuate the workload of facilitators and experts, an automated crack detection method is of dire requirement. Nowadays, artificial intelligence (AI) is being widely used to solve various problems [3]. From solving mathematical and engineering problems [4] to solving problems in health care [5] and finance [6], AI is being used to automate tasks and solve problems everywhere. As a result, a deep learning-based automatic crack segmentation technique is proposed in this paper. The proposed method detects and segments the cracks present in a surface accurately. This method would ensure the detection and distress quantification of cracks to be more efficient and economical. The proposed architecture was inspired from EfficientNet [7], Self-Attention [8], and U-Nets [9]. The proposed method improves upon the previous state-of-the-art methods by improving the encoding process by using EfficientNet and introducing self-attention-based module in the encoder-decoder architecture. The following are the contributions of the present study.
S. Gupta (B) · S. Shrivastwa · S. Kumar · A. Trivedi Department of Civil Engineering, Delhi Technological University, Delhi, India e-mail: [email protected] A. Trivedi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_9
103
104
S. Gupta et al.
1. A self-attention module is introduced for crack detection and segmentation. The self-attention mechanism gives the architecture the ability to detect fine-grained features from an image, which helps to increase the efficiency of the overall architecture. 2. A scalable and efficient pre-trained encoder based on EfficientNet is used to extract essential features from the image and convert the image into a latent feature-rich vector. 3. A U-Net-inspired decoder with skip connections is used to increase the retentive power of the proposed model and convert the latent vectors obtained from the encoder to a pixel-wise probability map. The remainder of the paper is organized as follows. The literature review section describes the work done prior to the current study in the field of Crack Segmentation and Crack Detection. The Methodology section explains the architecture used in the present study. The Experimentation and Results section will describe the experimental setup, the dataset used, and the results obtained after training the proposed model on the dataset and compare its performance with other state-of-the-art models. The last section explains the conclusion and the future work.
2 Literature Review With the advancement of computer programming, machine learning, and many other technologies, recent decades have witnessed a great number of research in crack detection and segmentation based on images leading to fully automated solutions to crack detection and segmentation. Prior to most of the works toward automated crack detection [10] utilized a three-step method for crack detection in underground pipelines, where they first enhance the contrast of the image and transform the image to grayscale, and this step is called pre-processing. Then, to separate cracks from the background, the cross-section curvature evaluation is performed, and in the end, the output from the curvature evaluation is made to go through a linear filter to obtain a segmentation map. The main shortcomings of this work were that this technique failed for images shot in dimly lit areas, and there were false detections in images where background paths and cracks cross each other. Oliveira and Correia [11] proposed passing the image through a morphological filter for pre-processing to reduce the variance in the intensity of pixel and apply the dynamic threshold to identify dark spaces that might correspond to cracks. For calculating the entropy, the thresholded images are divided into blocks that are not overlapping each other. Oliveira and Correia [11] achieved a recall of 94.8%. Fujita and Hamamoto [12] used a combination of shading binarization and correction, wavelet transforms, and global thresholding to detect cracks from noisy images. Other than the deterministic systems which suffer from a limitation of high computational cost and less scalability, researchers around the world have proposed even more sophisticated, accuracy, and precision-oriented crack detection architecture using DCNN, such as [9] and [13]. A DCNN crack archi-
9 Self-attention-Based Efficient U-Net for Crack Segmentation
105
tecture simply based on 500 images of pavement cracks was proposed by Zhang and Cheng [14]. The images were collected using a smartphone and were of size 3264 * 2488. To effectively analyze numerous kinds of cracks, Wang and Hu [15] developed a model. Proposed by Zhang and Cheng [14], the grid-based deep network model which could detect cracks in pavement gave an accuracy of alligator cracks of only 90.1%. Compared to the accuracies of transverse cracks, i.e., 97.6% and longitudinal cracks, i.e., 97.2%, it’s very low. Another deep convolutional neural network mode which can detect cracks in pavement automatically, trained by Gopalakrishnan et al. [16], is transferred from the pre-trained models of ImageNet VGG16. Although this one-layered neural network model may need more discussion in the area of its generalization capability. A pixel-wise pavement crack detection architecture based on FCN, proposed by Park et al. [17], can perform crack detection based on patches. Despite all that, there have been few experiments conducted validating the reliability of this method. An VGG19-based [12] encoder-decoder network has been proposed by Yang et al. [18] to produce crack maps on pavements and walls made up of concrete. Even though these top of the model crack detection architecture performing up to the mark, some researchers have delved deeper to produce networks that have outperformed these DNCCs such as [9] and [13] on crack segmentation performance. Zou et al. [19], being one of them, proposed DeepCrack, a SegNet [13]-based encoder-decoder network with cross-entropy losses across multiple scales. Another paper written by Bang et al. [20] proposed ResNet-based encoder-decoder network for crack segmentation of roadways on images containing objects generally not present on roads. For concrete surface crack segmentation, an encoder-decoder network with an encoder inspired from VGG16 was proposed by Dung and Anh [21]. Zhang et al. [22, 23], and Fei et al. [24] developed DCNNs trained CrackNet and its variants. These were tested on range images scanned using laser, for crack segmentation of roadway in contrast to the intensity images as used by DCNNs mentioned above.
3 Proposed Methodology The architecture of the proposed network was inspired by the [9] and [8] architectures. The proposed model architecture consists of three parts, as shown in Fig. 1: Encoder part, which is used to convert the input image into latent vectors, is based on EfficientNet [7], Unet Inspired latent feature decoder which upsamples the latent vectors encoder then converts them into pixel-wise probability distributions, and Self-Attention module is used in the skip connections to the decoder.
106
S. Gupta et al.
Fig. 1 Proposed model architecture: the input consists of preprocessed images and the segmentation map’s output. The illustration consists of EfficientNet-b2 encoder with convolutional decoder and SAL, which means self-attention layer
3.1 EfficientNet Encoder Module EfficientNet was proposed to solve the problem of model scaling, i.e., to explore the ways to increase the accuracy. Therefore, to solve this problem, [7] came up with a MobileNet V2 [25]-based simple architecture and a scaling technique called compound scaling technique to increase the number of parameters in the model to improve the accuracy. The compound scaling technique can be extended to the existing Convolution models as well. Still, the critical part here is to select an appropriate baseline model which achieves good results because the compound scaling technique increases the predictive capabilities of the baseline model by replicating the model’s cardinal convolutional operations and network architecture. Therefore, [7] used neural architecture search to build EfficientNet-B0 shown in Fig. 2. The primary building block of the EfficientNet network is the MBConv layer with squeeze-and-excitation optimization. MBConv layer is similar to MobileNet v2’s inverted residual blocks. To increase the number of channels in the feature maps, 1 × 1 convolutions are used on the input activation maps. Then 3 × 3 Point-wise convolutions and Depth-wise convolutions are used to further increase the number of channels. Lastly, another 1 × 1 convolutional layer is used to bring the number of channels back to its initial value. This architecture helps to reduce the number of operations required and the parameters of the model. In this study, we use EfficientNet-B2. The encoder takes an image of resolution imgi ∈ R C×H ×W as image input and gives a latent vector W H veci ∈ R 320× 32 × 32 as output latent vector.
9 Self-attention-Based Efficient U-Net for Crack Segmentation
107
Fig. 2 EfficientNet-B2 encoder architecture with MBConv layers
3.2 Decoder Module The decoder receives the latent vector and aims to upsample these latent vectors to generate a pixel-wise probability map. The decoder contains an Upsampling Module, which contains upsampling layers and Convolutional layers, shown in Fig. 3. To increase the retention power of the proposed architecture, skip connections or short connections are added between the encoder and decoder, i.e., outputs of various resolutions from the encoder are fused with the same resolution outputs of the decoder as shown in the figure below.
3.3 Self-attention Module The self-attention module is inspired by Cordonnier et al. [8]. Its purpose is to enhance the sensitivity toward the boundaries for enhanced results as self-attention can give a global view of the data. As shown in Fig. 4, the self-attention module divides the input into three vectors, namely Key (K), Query (Q), and Value (V). The input vector veci ∈ R C×h×w is first reshaped into a 2D vector of resolution
Fig. 3 Decoder module with upsampling module and skip connections
108
S. Gupta et al.
Fig. 4 Self-attention module with key (K), query (Q), value (V), and attention matrices and selfattention output
veci ∈ R C×N where C is the number of channels in the input vector and N product of all other dimensions. Then the two-dimensional vector is passed to a convolutional ' layer with a 1×1 kernel to get K, Q, and V vectors where K , Q, V ∈ R C ×N . The K and Q vectors are multiplied and passed through the softmax activation function (see Eq. 1) to obtain the Self-Attention Matrix (SAM). The SAM is then multiplied with the value vector passed through another convolutional layer with 1 × 1 kernel h ∈ R C×N and reshaped back to its original shape. The figure below explains the above procedure graphically. β = Softmax(Q K T )
(1)
h = βV
(2)
4 Experimentation and Analysis In this section, we’ll go over the dataset that was utilized to test and verify the proposed architecture’s efficiency, the metrics used to evaluate the proposed architecture quantitatively, and the training configuration and compare the results obtained from
9 Self-attention-Based Efficient U-Net for Crack Segmentation
109
the proposed architecture with other state-of-the-art methods. The dataset utilized for experimentation and comparison with state-of-the-art models is briefly introduced in the first sub-section. The measures used to evaluate the proposed architecture are described in the second sub-section. The third sub-section provides an overview of the training and configuration processes. The fourth sub-section presents the quantitative and qualitative results from the CRACK500 dataset in order to evaluate the suggested architecture.
4.1 Dataset Description Crack500 Dataset: This dataset includes 500 images of pavement cracks with a resolution of 2000 × 1500 pixels collected at Temple University campus using a smartphone by Yang et al. [26]. Each image is annotated on a pixel level. Images this large won’t fit on GPU memory; therefore, [26] patched each image into 16 smaller images. The final resolution of each image was 640 × 320, and the dataset has 1896 images in training set, 348 images in validation set, and 1124 images in testing set. The comparisons with state-of-the-art models were made on the results from the testing set.
4.2 Evaluation Metrics Accuracy (A), F1 Score (F1), Recall (R), Intersection-Over-Union (IoU), and Precision (P) metrics [27] are used to quantify the performance of the model proposed in the current study and compare it with state-of-the-art models. 1. Accuracy: It is the percentage of pixels in the output that matches the expected pixel class in the ground truth mask. It is the most simple and primitive metric which is prone to class imbalance. The mathematical representation of accuracy is given below Accuracy (A) =
XT N + XT P × 100 XT N + XT P + X FN + X FP
(3)
2. Precision: It is the percentage of pixels classified as a crack. It is defined as the number of pixels correctly classified as a crack divided by the number of pixels belonging to the crack class. Mathematically precision is defined as shown below Precision (P) =
XT P XT P + X FP
(4)
110
S. Gupta et al.
3. Recall: The fraction of pixels accurately classified as crack class is known as recall. It is defined as the proportion of pixels correctly categorized as a crack class to the total number of pixels in the crack class. A mathematical model of recollection is shown here. Recall (R) =
XT P XT N + X FP
(5)
4. F1 score is the harmonic mean of recall (R) and precision (P). It gives an overall view of the model performance. A high value of F1 score usually implies a higher value of precision and a higher value of recall. It can be mathematically defined as shown below 2∗ P ∗ R (6) F1 score (F1) = P+R 5. IoU Intersection-over-Union is a widely accepted metric for measuring the performance of the segmentation algorithm. By dividing the overlap area by the union area, it calculates the overlap between the ground truth and predicted masks. Mathematical expressions to determine IoU are given as IoU =
N P ∩ NGT N P ∪ NGT
(7)
where X T P , X F P , X T N , X F N are number pixels correctly classified as cracks, number pixels incorrectly classified as cracks, number of pixels correctly classified as non-crack, and number of pixels incorrectly classified as non-crack, respectively. P and R are Precision and Recall. N P , NGT , N P ∩ NGT , N P ∪ NGT are the predicted mask, ground truth mask, overlap area between the predicted mask and ground truth mask, and union area between the predicted mask and ground truth mask, respectively.
4.3 Training Configuration All the experiments have been conducted in the PyTorch framework [28] on Google Collaboratory Pro. The GPU used was Tesla P-100. The model was trained using the Stochastic Gradient Decent (SGD) optimizer with an initial learning rate kept at 0.01 and one cycle learning rate schedular and model hyperparameters according to the available resources. The learning rate of 0.01 ensured that loss does not overshoot the global minima during backpropagation and that the proposed model is trained effectively in less number of training iterations. The data was preprocessed with a combination of some primary image augmentations like Random rotate, Random flip, and Random Equalize applied using the Torchvision library. All experiments were conducted under major resource constraints. The resource constraints limited
9 Self-attention-Based Efficient U-Net for Crack Segmentation
111
us to using a lighter version of the EfficientNet encoder and limited the number of training iterations of the model. Techniques like automatic mixed precision and gradient accumulation were used to train the proposed architecture, which affected the results. Improvement in the following results might be seen by changing and tuning various variables given the available resources, such as better GPUs, more disk space, and RAM.
4.4 Results and Comparative Analysis The proposed model was trained on the training split of the CRACK500 dataset containing 1896 images provided by Yang et al. [26] under the conditions mentioned in Sect. 4.1. The evaluation results and comparative analysis for the CRACK500 dataset were done on the test split provided by Yang et al. [26] containing 1124 images and are presented in Table 1. From Table 1, it is clear that the proposed architecture performs best in terms of F1 Score 0.7749 and gives comparable results on Precision, Recall, and IoU 0.8178, 0.74619, and 0.6159. We also observe that, unlike the other compared models where precision and recall have a very high difference, i.e., either one of precision or recall has a very high value, which indicates a bias toward are non-crack class due to class imbalance in the datasets, the proposed model achieves a comparable and high precision and recall value (indicated by the high F1 Score), therefore, indicating no bias toward any particular class. Figure 5 explains our results illustratively where we took a few samples from the test and show how our model predicts the presence of a crack (Model predictions are at the right while the ground truths are at the center).
Table 1 Results on CRACK500 dataset compared with state-of-the-art models Accuracy Precision Recall F1 score Paper [29] [30] [31] [32] [33] [34] [26] [35] Proposed method
− − − − − − − − 97.4
0.592 0.581 0.695 0.713 0.736 0.742 0.589 0.851 0.763
0.512 0.582 0.674 0.695 0.716 0.728 0.771 0.718 0.790
0.517 0.556 0.689 0.705 0.729 0.732 0.657 0.756 0.775
IoU 0.370 0.399 − − − − 0.497 0.656 0.663
112
S. Gupta et al.
Fig. 5 Qualitative results on Crack500 dataset. From left RGB image, ground truth map, model output (from proposed architecture)
5 Conclusion and Future Work In the present research study, we propose a deep learning-based effective and efficient solution for surface crack segmentation, which is capable of aiding surveyors and construction professionals around the world and would also help automate the process of crack detection and repair. The authors showed the merits of using the self-attention module and EfficientNet encoding. The efficacy of the proposed architecture was shown by evaluating the proposed architecture on the Crack500 benchmark dataset, where the proposed architecture beats all other state-of-the-art models by achieving an F1 score of 0.775 and IoU of 66.3. In the future, experiments can be conducted on incorporating the Multi-Headed Self-Attention and Patching strategy.
References 1. Mohan A, Poobal S (2018) Crack detection using image processing: a critical review and analysis. Alexandria Eng J 57(2):787–798 (2018). ISSN: 1110-0168 2. Velumani P et al (2020) Analysis of cracks in structures and buildings recent citations analysis of cracks in structures and buildings. J Phys 12:116. https://doi.org/10.1088/1742-6596/1706/ 1/012116 3. Kumar A et al (2020) Crack detection of structures using deep learning frame-work. In: Proceedings of the 3rd international conference on intelligent sustainable systems, ICISS 2020, pp 526–533. https://doi.org/10.1109/ICISS49785.2020.9315949
9 Self-attention-Based Efficient U-Net for Crack Segmentation
113
4. Papadrakakis M et al (2016) Optimization of large-scale 3-d trusses using evolution strategies and neural networks. 14:211–223. https://doi.org/10.1260/0266351991494830. ISSN: 02663511 5. Gupta S et al (2020) Instacovnet-19: a deep learning classification model for the detection of covid-19 patients using chest X-ray. Appl Soft Comput 106–859. https://doi.org/10.1016/j. asoc.2020.106859. ISSN: 15684946 6. Goodell JW et al (2021) Artificial intelligence and machine learning in finance: identifying foundations, themes, and research clusters from bibliometric analysis. J Behav Exp Finance 32:100–577. https://doi.org/10.1016/j.jbef.2021.100577. ISSN: 2214-6350 7. Tan M, Le Q (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning. Proceedings of machine learning research, vol 97, PMLR, pp 6105–6114 8. Cordonnier J-B et al (2019) On the relationship between self-attention and convolutional layers 9. Ronneberger O et al (2015) U-net: convolutional networks for biomedical image segmentation. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9351, pp 234–241. ISSN: 16113349 10. Iyer S, Sinha SK (2005) A robust approach for automatic detection and segmentation of cracks in underground pipeline images. Image Vis Comput 23:921–933. https://doi.org/10.1016/j. imavis.2005.05.017. ISSN: 02628856 11. Oliveira H, Correia P (2009) Automatic road crack segmentation using entropy and image dynamic thresholding. IEEE. ISBN: 978-161-7388-76-7 12. Fujita Y, Hamamoto Y (2011) A robust automatic crack detection method from noisy concrete surfaces. Mach Vis Appl 22:245–254. https://doi.org/10.1007/s00138-009-0244-5. ISSN: 09328092 13. Badrinarayanan V et al (2015) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39:2481–2495. https://doi.org/10. 1109/TPAMI.2016.2644615. ISSN: 01628828 14. Zhang K, Cheng H (2017) A novel pavement crack detection approach using pre-selection based on transfer learning. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics). LNCS, vol 10666, pp 273–283. https://doi.org/10.1007/978-3-319-71607-7_24. ISSN: 16113349 15. Wang X, Hu Z (2017) Grid-based pavement crack analysis using deep learning. In: 2017 4th International conference on transportation information and safety (ICTIS), pp 917–924. https:// doi.org/10.1109/ICTIS.2017.8047878 16. Gopalakrishnan K et al (2017) Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Constr Build Mater 157:322– 330. https://doi.org/10.1016/J.CONBUILDMAT.2017.09.110. ISSN: 0950-0618 17. Park S et al (2019) Patch-based crack detection in black box images using convolutional neural networks. J Comput Civ Eng 33:04 019 017. https://doi.org/10.1061/(ASCE)CP.1943-5487. 0000831. ISSN: 0887-3801 18. Yang X et al (2018) Automatic pixel-level crack detection and measurement using fully convolutional network. Comput-Aided Civ Infrastruct Eng 33:1090–1109. https://doi.org/10.1111/ MICE.12412. ISSN: 1467-8667 19. Zou Q et al (2019) Deepcrack: learning hierarchical convolutional features for crack detection. IEEE Trans Image Process 28(3):1498–1512 20. Bang S et al (2019) Encoder-decoder network for pixel-level road crack detection in blackbox images. Comput-Aided Civ Infrast Eng 34:713–727. https://doi.org/10.1111/MICE.12440. ISSN: 1467-8667 21. Dung CV, Anh LD (2019) Autonomous concrete crack detection using deep fully convolutional neural network. Autom Constr 99:52–58. https://doi.org/10.1016/J.AUTCON.2018.11.028. ISSN: 0926-5805 22. Zhang A et al (2017) Automated pixel-level pavement crack detection on 3d asphalt surfaces using a deep-learning network. Comput-Aided Civ Infrastruct Eng 32:805–819. https://doi. org/10.1111/MICE.12297. ISSN: 1467-8667
114
S. Gupta et al.
23. Zhang A et al (2018) Deep learning-based fully automated pavement crack detection on 3d asphalt surfaces with an improved cracknet. J Comput Civ Eng 32:04 018 041. ISSN: 0887-3801 24. Fei Y et al (2020) Pixel-level cracking detection on 3d asphalt pavement images through deeplearning-based cracknet-v. IEEE Trans Intell Transp Syst 21(1):273–284 25. Sandler M et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), June 2018 26. Yang F et al (2020) Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans Intell Transp Syst 21(4):1525–1535 27. Zhou S, Song W (2021) Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Autom Constr 125. ISSN: 09265805 28. Paszke A et al (2019) Pytorch: an imperative style, high-performance deep learning library 29. Xie S, Tu Z (2015) Holistically-nested edge detection. Int J Comput Vis 125:3–18. ISSN: 15731405 30. Liu Y et al (2019) Richer convolutional features for edge detection. IEEE Trans Pattern Anal Mach Intell 41:1939–1946. ISSN: 19393539 31. Nguyen NTH et al (2018) Pavement crack detection using convolutional neural network. In: ACM International conference proceeding series, pp 251–256 32. Fan Z et al (2018) Automatic pavement crack detection based on structured prediction with the convolutional neural network. ISSN: 2331-8422 33. Zhang H et al (2020) Resnest: split-attention networks 34. Lau SL et al (2020) Automated pavement crack segmentation using U-Net-based convolutional neural network. IEEE Access 8:114 892–114 899. ISSN: 21693536 35. Abdellatif M et al (2021) Combining block-based and pixel-based approaches to improve crack detection and localization. Autom Constr 122:103–492. ISSN: 0926-5805
Chapter 10
Lung Carcinoma Detection from CT Images Using Image Segmentation C. Karthika Pragadeeswari , R. Durga , G. Dhevanandhini , and P. Vimala
1 Introduction Cells in the body quickly divide and grow when they have cancer. When cancer starts in the lungs, it is referred to as a carcinoma or lung cancer. According to a WHO survey, it is the second most commonly diagnosed cancer death in the world. The most common cause of lung cancer or carcinoma is cigarette smoking. It is most prevalent in smokers, accounting for 85% of all cases. In recent years, various Computer-Aided Diagnosis (CAD) systems have been developed. Early detection of lung cancer is critical for preventing deaths and increasing survival rates. Lung lumps are tiny tissue that can be cancerous or non-cancerous, typically called malignant or benign. Lakshmanaprabu et al. [1] proposed an automatic lung cancer diagnosis system based on computed tomography (CT) images of the lungs. They examined the CT scan of lung images based on linear discriminant analysis (LDA) and optimal deep neural network (ODNN). The dimensions of the features retrieved from the CT lung images were subsequently decreased by using an LDR. The algorithm used a binary classification to determine if the data was benign or cancerous. The ODNN was then used for CT scans and improved with MGSA to produce a more accurate Present Address: C. Karthika Pragadeeswari · G. Dhevanandhini Assistant Professor, Department of Electronics and Communication Engineering, Alagappa Chettiar Government College of Engineering and Technology, Karaikudi, Tamil Nadu, India R. Durga (B) Assistant Professor, Department of Electronics and Communication Engineering, Government College of Engineering Srirangam, Trichy-12, Tamil Nadu, India e-mail: [email protected] Present Address: P. Vimala Assistant Professor, Department of Electronics and Communication Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalainagar, Chidambaram, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_10
115
116
C. Karthika Pragadeeswari et al.
technique. The majority of benign tissues are non-cancerous and do not develop rapidly, whereas malignant tissues grow rapidly, can influence other internal parts, and are risky to one’s health. Small cell lung cancer (SCLC) and Non-small cell lung cancer (NSLC) are the types of lung cancer. The tumor size reveals how the malignant development is arranged in the lungs. There are four distinct stages of cellular breakdown in the lungs, each of which can be regained if the tumor is in the first or second stage. It is difficult to recover if the tumor is in the third stage of the disease. The fourth stage is the most dangerous, as there are no chances of removing the lung tumor because the malignant cells have disseminated throughout the body, making recovery extremely difficult. To reduce the death rate from lung cancer, screening, chest radiograph (X-Ray), MRI (Magnetic Resonance Imaging), and CT (Computed Tomography) scans should be used. In this research, CT scans are often favored since they have less noise. In CT scan, variations in image intensity misinterpretation by practitioners and radiologists may make it difficult to determine cancerous cells. Many techniques have been created, and lung cancer detection is developed by Furat et al. [2]. In this category, several works were presented. However, some systems do not have adequate detection accuracy and need to obtain the maximum accuracy level. We looked at the most novel cancer detection methods. The following are the most important aspects of the current study: (1) Introducing a new way for detecting lung cancer, utilizing the best lung computed tomography images available to complete their analysis, and proposed a new model. (2) A new structure is proposed for a convolutional neural network that can be used to diagnose cancer.
2 Literature Review Several researchers have suggested and achieved lung cancer detection, utilizing various images based on various segmentation methodologies. Chen et al. [3] developed an adaptive threshold algorithm, mathematical morphology, and Watershed algorithm to solve challenges encountered when detecting lung illness using the CAD system. The authors presented a few steps for image segmentation: first, the original Computed Tomography image’s noise was reduced using the Gaussian filter and gradient enhancement technique (CT), then they segmented the images using the OTSU method, then they eliminated the primary lobe and trachea from CT lung images, and finally, they segmented the images with the OTSU method. After that, in the third step, they detached the primary lobe and trachea from Lung nodule images. At last, they segmented with improved watershed transform, and testing is done using Computed Tomography Lungs images to segment the region of lung parenchyma. As a result, this researcher has successfully segmented lung cancer. Aggarwal et al. [4] presented a model for distinguishing nodules from normal lung anatomical structures. Geometrical, statistical, and gray level characteristics
10 Lung Carcinoma Detection from CT Images Using Image Segmentation
117
are extracted using this approach. LDA is utilized as a predictor for segmentation, and effective thresholding is applied. The method has an efficiency of nearly 84%, a sensitivity of roughly 98%, and a specificity of 53%. Given the fact that the method identifies the cancer lump with poor precision. To categorize, standard segmentation techniques were used. In conclusion, combining any of its processes in the model does not increase the chances of improvement. Sangamithraa et al. [5] use the K-means unsupervised learning method for segmentation and clustering. It categorizes the pixel dataset based on specified attributes. A Back propagation network is used in this model for classification. Features such as entropy, correlation, homogeneity, PSNR, and SSIM are retrieved using Gray-level Co-occurrence Matrix (GLCM) approach. About 90.7% of the time, the system is accurate. For noise reduction, a median filter is applied, which can help our new model eliminate noise and enhance accuracy. Using an active contour and fuzzy system model, Roy et al. [6] implemented a model to identify tumor nodules. For image enhancement, this method uses gray transformation. The image is binarized prior to segmentation, and the results are segmented using the active contour model. The method of fuzzy inference is used to classify cancer. The classifier is trained by extracting features such as area, mean, entropy, major, and minor axis length. The method has an overall accuracy of 94.12%. Among its limitations, it does not discriminate between normal and malignant cancers, which is the suggested model’s future scope. Several efforts have already been presented to effectively diagnose lung cancer, as evidenced by the literature. However, each has its own set of pros and cons. In this investigation, we seek to employ a deep learning technique to develop a much more efficient method for lung cancer diagnosis. This study examines multiple methods for predicting lung cancer in Sect. 2 based on comments from the literature. Section 3 explains the most effective lung cancer prediction methods using Modified Watershed Segmentation and CNN. The research is concluded in Sect. 5 after Sect. 4 describes the effectiveness of the cancer prediction technique.
2.1 Techniques Used for Detection
Author
Year
Algorithm
Findings
Manju et al. [7]
2021
The target classes were classified using the PCA (Principal Component Analysis) algorithm. For additional classification, an SVM (Support Vector Machine) was applied
The classification approach provides precision, recall, and f1 score. The confusion matrix calculated accuracy and error rates of 0.87 and 0.3, respectively (continued)
118
C. Karthika Pragadeeswari et al.
(continued) Author
Year
Algorithm
Priyadharshini et al. [8]
2021
BAT optimization approach The accuracy was 96.43%, with Convolution Neural with the least amount of Network was utilized to inaccuracy being the percent improve segmentation Fuzzy c means and provide improved classification results
Findings
Kannan et al. [9]
2020
Otsu’s thresholding and k Means clustering segmentation methods were used using a median filter
When compared to an X-ray image, Otsu’s segmentation algorithm generates huge results on CT and MRI images
Shakeel et al. [10]
2019
A deep learning neural network that is developed in real-time and a better way to abundant clustering
The weighted mean function, which replaces the pixel with a probability distribution and cumulative distribution process, is computed to improve quality. It provides 8.42% accuracy with a 0.038 inaccuracy
3 Methodology The Cancer Imaging Archive (CIA) dataset, which included 5043 DICOM (Digital Imaging and Communications in Medicine) format images separated into 3000 training and 2043 imaging images, was used to collect lung CT scans. The lung CT images were processed by pre-processing; the following sections discuss the processes of pre-processing, segmentation, feature extraction, and classification process which is noted in Fig. 1 flow diagram. Convolutional Neural Networks (CNNs) have recently become well-known developments in the field of medical imaging technology. The majority of profound learning applications in cancer detection are built upon CNNs. The convolutional neural network may be a sophisticated learning algorithm that takes in an input image and assigns meaning to each item or component of the image so that it can distinguish them from one another. The CNN calculation requires less “pre-processing” than other classification calculations. Whereas the essential strategy channels are physically built, the CNN secures the capacity to memorize these filters/specifications with adequate preparation. The CNN architecture is comparative to the association design of “neurons” within the human brain and is propelled by the “visual cortex” organization within the brain. Each neuron reacts to jolts as if it were in a restricted zone of the visual field known as the “Receptive Field.” A set of these areas overlap to cover the complete visual zone. In this research, the application of a modified watershed algorithm and CNN on lung cancer determination has been examined.
10 Lung Carcinoma Detection from CT Images Using Image Segmentation Fig. 1 Proposed model flow diagram for lung carcinoma detection
119
Input Image (CT Images)
Image Pre- Processing (Median Filter)
Modified Watershed Segmentation
Segmented Output
Feature Extraction
CNN with Prediction Model
Classification Result
3.1 Image Pre-processing First, a grayscale image of a CT scan is processed using a median filter as part of image pre-processing. During the image capture procedure for CT scans, a few disturbances are inserted on the images, which aid in the detection of nodules. Hence, these noises have to be expelled for the exact location of the cancer. When employing the noise reduction stage, maintain the image’s clarity as much as is practicable while maintaining the image’s edges. Zhou et al. [11] proposed a popular median low-pass filter that determines each output pixel and refers to the average brightness values of nearby pixels. The quantity of a pixel in median filtering is obtained by the middle amount of nearby pixels. The middle filter has a low sensitivity to throw values, so it may remove these locations without lowering image resolution. This filter additionally diminishes the light intensity change, while keeping the state of the edges and their position to remove salt and pepper noise from the CT images presented by Zhang and Hong [12]. This filter by m × n area sorts in rising request, with the focal component determination of the arranged qualities and supplanting by the middle pixels. Likewise, the median filter can effectively eliminate the salt and pepper noises which are presented in Sharifrazi et al. [13]. Consequently, in this research, we used this filter as a pre-processing phase of the input CT images. In median filtering, pixels are replaced by the middle esteem of their neighbors. y(m, n) = median(xi, j : (i, j )) ∈ τ where τ defines the nearby neighbors in (m, n).
(1)
120
C. Karthika Pragadeeswari et al.
3.2 Modified Watershed Segmentation The proposed modified watershed segmentation technique is used in the segmentation process of an image to discover objects or boundaries that aid in the detection of a cancer nodule from a CT scan image by locating “catchment basins” or “watershed ridgelines”. With light pixels denoting high elevations and dark pixels denoting low altitudes, it treats it as though it were a surface. Its ability to distinguish and recognize the touching elements of an image is its fundamental feature. According to Song et al. [14] explanation of improper segmentation, if cancer nodules are contacting other false nodules, this trait facilitates this. The image size of 256 * 256 is taken into account. The input RGB image transforms to a grayscale image first. Then the Edge detection mask is used to calculate the image based on gradient magnitude values. The image’s object gradient is acquired to be low on the inside and high on the border side. The foreground objects are then chosen as the object’s dim areas. Then employ the morphological procedures “opening-by-construction” and “closing-construction” to increase the maxima inside each image. After that, we use the erosion operation to perform a closure procedure. Opening-reconstruction and closing-by-reconstruction procedures have no impact on the image.
3.3 Features Extraction Each precisely articulated object has some characteristics or traits that allow us to quickly recognize it. Some features are required to automatically categorize the items. The features vector is a collection of these features. We must be able to locate lung cancer to diagnose it. Such aspects of the image in the field of image processing, there are three categories of features: structural features, functional characteristics, and aesthetic features. Statistical texture characteristics and spectral texture feature round out the list. There are other structural components that are also referred to as binary characteristics, including area, centroid, perimeter, orientation, projection, aspect ratio, and Euler number. The first and second-order statistical texture features of the two classes are also divided by statistical features. First-order features are extracted directly from the gray level histogram, whereas second-order texture features are obtained by first determining the co-occurrence matrix, followed by determining the mean, entropy, and co-variance. Gabor and wavelet features are highly prevalent in spectral characteristics. Area, perimeter, centroid, and diameter characteristics are calculated in this stage and utilized as training features to create a classifier.
10 Lung Carcinoma Detection from CT Images Using Image Segmentation
121
3.4 Convolutional Neural Networks Convolutional neural networks can more accurately identify and categorize lung cancer stages in less time, which is critical for predicting a patient’s therapeutic interventions and survival rate. It is a feedforward neural network inspired by biological visual system models explained in Khobragade et al. [15]. Hattikatti [16] proposed a detailed study on specific neurons which are lined, so that they respond to overlapping regions in their receptive field, and it remains reliable with modern imaging system structure. When neurons with the same parameters are applied at different positions over overlapping sections of the previous layer, translational invariance is established. This makes it possible for CNNs to identify objects in their receptive area regardless of their size, position, orientation, or other visual features. In addition, as compared to fully connected neural networks, CNNs have less constrained connectivity, which reduces the computing needs of training. Two types of lung cancer may be detected using CNN. The first group conducts pre-processing functions suited for training and processing images in CNN, enabling feature extraction, while the second category performs the classification of input CT images, determining whether the nodule is benign or cancerous. CNN is trained with CT images using back-propagation methods. The training phase and the testing phase are the two parts of the process. CNN’s are trained using CT scans during the first stage, with 1000 images utilized to train the network for categorization of the lung as malignant or non-cancerous. During the testing stage, the network is given an unknown image to evaluate as malignant or non-cancerous. Images are trained and evaluated in the DICOM format itself for minimal loss of features by altering the network settings so that it can take DICOM images. These are the procedures for lung carcinoma detection using watershed segmentation with CNN features such as area, perimeter, centroid, and pixel diameter were retrieved from the discovered cancer nodes. CNN was trained using extracted features, and a trained model was developed.
4 Result and Discussion The results are described in the suggested lung cancer segmentation. Here, MATLAB 2015R is used to implement the proposed system. Lung CT images were collected from internet resources as experimental images from The Cancer Imaging Archive (CIA) dataset, consisting of 5043 DICOM images. For the examination of lung cancer from various patients with 48 series, 5043 images were used. The 5043 images were divided into 2043 testing images and 3000 training images for the evaluation of lung cancer-related modalities. After splitting the images, the pre-processing, segmentation, feature extraction, and classification processes were carried out using the MATLAB tool and the aforementioned algorithm. MATLAB is used for detection and feature extraction, and the machine learning toolbox is used for classification. The
122
C. Karthika Pragadeeswari et al.
classification learner toolbox makes it simple and quick to create a trained prediction model from the characteristics retrieved. To avoid overfitting during the training procedure, 5 folds cross-validation was performed. The classifier is trained using 22 different DICOM images, and the result is validated using 6 images containing a total of 15 nodules. The accuracy of segmented lung region images was compared with a number of representative lung cancer images using the dataset described in Table 1. Table 1 Sample lung cancer segmented image Sample Image
Median Filter Image
Segmented Image
Classification
Benign Image 1
Malignant
Image 2
Malignant
Image 3
Malignant
Image 4
Malignant
Image 5
Malignant
Image 6
10 Lung Carcinoma Detection from CT Images Using Image Segmentation
123
4.1 Evaluation Criteria The proposed method makes use of three types of performance analysis: Accuracy, Sensitivity, and Specificity presented in Pathan and Saptalkar [17]. Accuracy is defined as the percentage of correctly identified segments or it is the ratio of correctly predicted segments to the total segments. It is given by Eq. (2). Accuracy =
TP + TN ∗ 100 TP + FP + FN + TN
(2)
The measure of Sensitivity is a measure of how many accurately detected positive instances are classified as positive by the experiment expressed in Eq. (3), whereas specificity is the measure of the proportion of negatives that are correctly identified and are calculated by Eq. (4). Sensitivity =
TP ∗ 100 TP + FP
(3)
Specificity =
FP ∗ 100 FP + TN
(4)
where TP: True Positive, TN: True Negative, FN: False Negative, FP: False Positive. The confusion matrix given in Fig. 2 depicts the system’s performance. A total of 22 samples are analyzed. The matrix shows that 17 samples are accurately classified and 05 samples are incorrectly classified out of 16 samples. In Table 1, the results of a CT scan are tabulated, first column represents the sample grayscale images, and the next column represents the median filtered image, Fig. 2 Confusion matrix of the classification results
124 Table 2 The efficiency of the segmentation method for different lung images
C. Karthika Pragadeeswari et al. Sample image
Accuracy
Sensitivity
Specificity
Image 1
99.6
98.30
98.25
Image 2
98.5
98.01
97.90
Image 3
98.9
98.50
98.40
Image 4
97.3
97.03
97.01
Image 5
98.1
98.02
97.25
Image 6
97.9
97.02
97.05
Fig. 3 The efficiency of the segmentation method for different lung images
Third represents the watershed segmented image and classification of the image, respectively. The obtained value is shown in Table 2 based on the metrics. The resulting panel depicts the system parameters such as accuracy, specificity, and sensitivity. In this way, the system is implemented and its efficiency of the segmentation method is shown in Fig. 3. Thus, when compared to other classification techniques, deep learning with instantaneously trained neural networks detects lung cancer with a maximum accuracy of 98.45%.
5 Conclusion Lower living standards, including sedentary lifestyles, bad eating habits, and increasing smoking, have caused cancer rates to climb dramatically in recent decades,
10 Lung Carcinoma Detection from CT Images Using Image Segmentation
125
prompting researchers and scientists to take action to cure this dangerous disease. Scientists have discovered that early discovery of this condition makes it easier to cure and reduces the risk of death from the disease. In this research, various algorithms have been evaluated to detect lung cancer. For the best diagnosis of CT-based lung imaging in medical images, an automatic method based on convolutional neural networks with watershed segmentation was developed. Medical images often need pre-processing before being subjected to statistical analysis. The median filter has better results than the mean filter because the filter decreases the light intensity variance and also removes the salt and pepper noises easily. The proposed approach uses watershed segmentation for detection and CNN for classifying a nodule as malignant or benign to identify the cancerous nodule from the lung CT scan image. The classification and accuracy were high because high-level characteristics based on CNN were extracted. Furthermore, in the initial steps, the feature vector size was reduced and its precision was increased, resulting in a decrease in space preservation and an increase in speed and accuracy.
6 Future Work Future studies will focus on creating image processing techniques for X-ray and PET images to increase accuracy. Clustering and KNN or fuzzy K and C means may be used for classification. Furthermore, it is employed in various cancer kinds, such as breast cancer and skin cancer, and can also benefit from the proposed technique.
References 1. Lakshmanaprabu SK, Mohanty SN, Shankar K, Arunkumar N, Ramirez G (2019) Optimal deep learning model for classification of lung cancer on CT images. Futur Gener Comput Syst 92:374–382 2. Furat O, Wang M, Neumann M, Petrich L, Weber M, Krill CE III, Schmidt V (2019) Machine learning techniques for the segmentation of tomographic image data of functional materials. Front Mater 6:145 3. Chen X, Feng S, Pan D (2015) An improved approach of lung image segmentation based on watershed algorithm. In: Proceedings of the 7th international conference on internet multimedia computing and service, pp 1–5 4. Aggarwal T, Furqan A, Kalra K (2015) Feature extraction and LDA based classification of lung nodules in chest CT scan images, In: 2015 international conference on advances in computing, communications and informatics (ICACCI), Kochi, India, pp 1189–1193. https://doi.org/10. 1109/ICACCI.2015.7275773 5. Sangamithraa PB, Govindaraju S (2016) Lung tumour detection and classification using EK-Mean clustering. In: 2016 International conference on wireless communications, signal processing and networking (WiSPNET). IEEE, pp 2201–2206 6. Roy TS, Sirohi N, Patle A (2015) Classification of lung image and nodule detection using fuzzy inference system. In: International conference on computing, communication & automation. IEEE, pp 1204–1207
126
C. Karthika Pragadeeswari et al.
7. Manju BR, Athira V, Rajendran A (2021) Efficient multi-level lung cancer prediction model using support vector machine classifier. IOP Conf Ser: Mater Sci Eng 1012(1):012034. IOP Publishing 8. Priyadharshini P, Zoraida BSE (2021) Bat-inspired metaheuristic convolutional neural network algorithms for CAD-based lung cancer prediction. J Appl Sci Eng 24(1):65–71 9. Kannan V, Naveen VJ (2020) Detection of lung cancer using image segmentation. Int J Electr Eng Technol (IJEET) 2(11):7–16 10. Shakeel PM, Burhanuddin MA, Desa MI (2019) Lung cancer detection from CT image using improved profuse clustering and deep learning instantaneously trained neural networks. Measurement 145:702–712 11. Zhou Y, Lu Y, Pei Z (2021) Accurate diagnosis of early lung cancer based on the convolutional neural network model of the embedded medical system. Microprocess Microsyst 81:103754 12. Zhang Z, Hong WC (2019) Electric load forecasting by complete ensemble empirical mode decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dyn 98(2):1107–1136 13. Sharifrazi D, Alizadehsani R, Roshanzamir M, Joloudari JH, Shoeibi A, Jafari M, Acharya UR (2021) Fusion of convolution neural network, support vector machine and Sobel filter for accurate detection of COVID-19 patients using X-ray images. Biomed Signal Process Control 68:102622 14. Song S, Jia H, Ma J (2019) A chaotic electromagnetic field optimization algorithm based on fuzzy entropy for multilevel thresholding color image segmentation. Entropy 21(4):398 15. Khobragade S, Tiwari A, Patil CY, Narke V (2016) Automatic detection of major lung diseases using Chest Radiographs and classification by feed-forward artificial neural network. In: 2016 IEEE 1st International conference on power electronics, intelligent control and energy systems (ICPEICES). IEEE, pp 1–5 16. Hattikatti P (2017) Texture based interstitial lung disease detection using convolutional neural network. In: 2017 International conference on big data, IoT and data science (Bid). IEEE, pp 18–22 17. Pathan A, Saptalkar BK (2012) Detection and classification of lung cancer using artificial neural network. Int J Adv Comput Eng Commun Technol 1(1)
Chapter 11
A Deep Learning Based Human Activity Recognition System for Monitoring the Elderly People V. Gokula Krishnan, A. Kishore Kumar, G. Bhagya Sri, T. A. Mohana Prakash, P. A. Abdul Saleem, and V. Divya
1 Introduction Contextual calculating is the method to the future of ubiquitous and pervasive computing, and it’s the way to get there. It helps a ubiquitous application in a selfdriving setting make an emergency call or turn the lights on or off when someone enters or leaves the area [1, 2]. Activity recognition helps people figure out what’s going on in the world around them by analyzing data from a variety of monitoring sources and sensors. It then tailors services to common apps. Healthcare applications include things like monitoring the activities of an elderly person and recognizing their activities so that you can figure out their mental and physical state [3]. Many people are interested because HAR is important in fields like pervasive computing and human–computer interaction [4] as well as computer vision and health care. A typical HAR system is made up of different parts, like feature selection, feature extraction, segmentation, and recognition. It’s the most important one because it helps V. Gokula Krishnan (B) Department of CSE, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences (SIMATS), Thandalam - 602105, Chennai, Tamil Nadu, India e-mail: [email protected] A. Kishore Kumar Department of ECE, Sree Vidyanikethan Engineering College, Tirupati, Andhra Pradesh, India G. Bhagya Sri Department of IT, CVR College of Engineering, Mangalpally, Hyderabad, Telangana, India T. A. Mohana Prakash Department of CSE, Panimalar Engineering College, Poonamallee, Chennai, Tamil Nadu, India P. A. Abdul Saleem Department of CSE, CVR College of Engineering, Mangalpally, Hyderabad, Telangana, India V. Divya School of Electrical and Electronics Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_11
127
128
V. Gokula Krishnan et al.
cut down on the size of things and makes HAR systems more likely to recognize things. Adding a lot of dimensions to an activity recognition system not only takes a lot of time but also makes the well-known curse of dimensionality even worse. So, an effective feature selection method makes the dimensionality problem easier to solve and speeds up the recognition process. Feature selection has been looked at in great detail in the literature, and it helps us understand the data [5]. However, most have limitations and can only be used for certain things. Several are shown. HAR, a widely utilized technique in video processing, is a current hot issue. Smart city management, traffic surveillance, hospital administration, and security systems are just a few of the ways in which technology may be used to help a city run smoothly. There are a number of factors to consider when selecting key frames for video processing, such as the length and frame dimensions of the training video, which might vary greatly. According to a review of video-based human performance identification algorithms, the majority of these systems use random frame selection via the input video [6, 7] or a combination of all processed frames [8]. This affects the efficiency of the training process and system accuracy, and increases the amount of time it takes to train the system. When you use the best frames in a frame sequence when you train the system, it will be better [9]. In a typical scene, there are many frames that don’t have much information because there isn’t much movement in them. These frames can be thrown away in the training. HAR systems [10], on the other hand, usually need a lot of samples to train. There are also problems with most HAR systems because of the different resolutions of the cameras that are used to acquire the videos. Slowing down the system by shrinking or reducing frames to fit different camera resolutions is a common practice in videobased high-resolution audio systems (HAR). The primary objective of this research is to apply a CNN model [15] to photos from two distinct datasets in order to determine what a person is doing. Then, the video is cut down in this work, and the images are resized using adaptive frame cropping. The rest of the paper shows the rest of the information as follows: Sect. 2 looks at how other people do things, and Sect. 3 gives a quick overview of the proposed model. Section 4 talks about how to test the proposed model for validation, and that’s where it is. When you finish your research, you can see how it went in Sect. 5.
2 Related Works Zayed and Rivaz used ultrasonic images of a mechanical object that was pressurized to do elastomeric experiments. An MLP classifier was first used to sort the images, and any two consecutive frames that had no meaningful information were culled from the pool of potential training frames. A linear combination of weighted main components was used as input for the MLP algorithm, which decomposed the displacement. Key frames for an automated driving system were selected by Lin et al. using the Kanade–Lucas–Tomasi (KLT) algorithm. This is how they did it: It’s called this method because the main features of the frames, like the edges of the frame, the road
11 A Deep Learning Based Human Activity Recognition System …
129
lines, and other things like that, are taken out. Then, the difference between these features is used to decide whether to keep or throw away the frame. Chen et al. [11] came up with a way to re-identify people in videos. Key frames are chosen based on both their location and their time. None of the research above was made for HAR systems, and none of it can be used right away. But it has features and ideas that can be used in the expansion of a good process. Jeyabharathi and Djey [12] used a cut set to get the video background. They chose key frames from a long sequence of frames to get the background. There were parts of a video where the shapes of successive forms looked the same. They cut out the parts that didn’t have as much information and kept the important parts. When Wang et al. [13] wanted to pick the best frames in a HAR video, they looked at which parts of the human body move in each frame. Low-moving-party frames were not used in the training process. Zhou et al. [14] showed how to use deep learning to separate objects in a video. This can be used to find people in HAR systems. The suggested system worked well in different situations, but it isn’t clear how well it would work in more complicated situations where there are a lot of people at the same time. Jagtap et al. came up with two ways to speed up the convergence of deep learning methods. For example, Jagtap et al. [16] demonstrated that adaptive activation functions can be used in video processing. With key frame selection and adaptive activation algorithms, deep-learning-based HAR systems can be made to work better. When it comes to HAR systems, hybrid techniques typically have a narrow range of applications and action types.
3 Proposed System Figure 1 provides the workflow of the proposed model. The relevant information is provided by key frames itself, which does not require best sequences about human actions. According to the HAR system, the input images are resized or cropped during adaptive frame cropping and then, it is directly given to the CNN for the prediction process. In this study, video shortening is utilized to discover key frames and select relevant sequences in two datasets. The selected frames are resized using adaptive frame cropping, where this image is given as an input to the proposed Convolutional Neural Network (CNN). The experiments are conduct-ed on various activities in terms of classification rate. From the results, it is proved that the CNN achieved a 98.22% classification rate on Weizmann action dataset.
3.1 Dataset Weizmann and KTH public action datasets are used in the testing of the proposed MNF-HAR framework.
130
V. Gokula Krishnan et al.
Fig. 1 Proposed system architecture
KTH Action Dataset (KTH-AD) There were 25 participants in the KTH-AD who performed six different activities in four different indoor and outdoor settings, including hand-clapping, hand-waving, boxing, running, jogging, and walking as seen in Fig. 2. With a still camera and a homogeneous background, there are 2391 observations. In this case, the frame is 160 × 120 pixels in size. Weizmann Action Dataset (Weizmann-AD) In Weizmann-AD, there were 9 people who did 10 dissimilar things seen in Fig. 3. Total, 90 video clips are used in Weizmann-AD. Each video clip has an average of 15 frames, and there are on average 15 frames in each clip. The frame is 144 pixels wide by 180 pixels tall, or 144 by 180 pixels.
Fig. 2 Sample KTH action dataset images
Fig. 3 Sample Weizmann action dataset images
11 A Deep Learning Based Human Activity Recognition System …
131
Fig. 4 Dissimilar Gradient masks: a Sobel, b Prewitt, c Roberts, d Central change, and e Middle alteration
3.2 Video Shortening Various approaches to finding key frames and selecting a sequence that is most effective for the task at hand have been examined in prior research in this area. There is no need to look for the greatest sequence, but rather the shots that best represent how people move. Because many HAR systems operate online, the proposed method’s speed is critical. There are certain requirements that must be completed before this approach can be implemented; therefore it uses a gradient operator to extract the picture’s edges from each frame. It is then taken into consideration to generate an approximate approximation of the action information by taking into account the movement that is depicted by a difference between the edges of consecutive frames. Gradient differences between the frames can reveal how much movement has occurred in key locations. Using the gradient operator instead of optical flow for motion detection is faster because it doesn’t have to perform as many calculations. To begin, the frames of the video are separated (Fi). Finally, the gradient operator eliminates the edges from all GFVs in the same color group afterward. The gradient operator may apply a wide variety of masks on various objects. Each pixel in all three masks is assigned a 3-by-3 neighborhood. The 2-by-2 neighborhood is used by the Sobel and Prewitt masks, whereas the 2-by-2 neighborhood is used by the Roberts mask. There are column vectors for the “central difference” and “middle difference” masks respectively. Figure 4 depicts the different vertical gradient masks. The masks are turned horizontally so that they appear the same way on both sides.
3.3 Adaptive Frame Cropping There are two parts to this new idea. First, the frames that were chosen in the preceding step need to be resized so that they fit in the HAR scheme. In some ways, the video is resized using the same methods that are used to resize images. It may not work as well with these methods because of how small the human image is in the frames that come out of them. The proposed process was made to figure out the movement district, which is the part of the video that shows a person doing something. This is the part of the video that shows a person doing something. Choosing the cropping area at random is also a bad and too easy way to do it. On the other hand, the best thing to do is to figure out where the humans are in the frame
132
V. Gokula Krishnan et al.
Fig. 5 Sample pre-processed image
and then choose the cropped area based on where the humans are and what they are doing. However, this is not a good idea because it takes a lot of time to do. All of the differences between the normalized frames that were calculated in the previous step are brought over to this step, which reduces the number of calculations needed. They are employed to create a visual representation of the video’s vitality. In this map, each pixel’s value represents the sum of all the information regarding the movement of that pixel through time. The energy map will now be examined with an average filter in the following phase. Average filter window size is the same as input size when using a HAR system. In the end, you’ll obtain the finished product. After all that, the pixel that has the most value in the filtered image is chosen to be in the center of a cropping area. It is very easy to say that this window has more important information about how the video is moving than any other window that can be found in the video as seen in Fig. 5. This is because the gradient difference between frames was used to find this information.
3.4 CNN CNNs are one of the most common types of deep learning architectures that mimic the way the brain works, like (ANNs) Artificial Neural Networks. Convolutional Neural Networks (CNNs) for classification of sleep stage scoring based on singlechannel Electroencephalography (EEG) by learning the specific task filters. People use AlexNet, GoogleNet, SqueezNet, and Resnet [17] to make CNN. Local connections instead of full connections are a big advantage for CNNs over ANNs. In all but the last layer, CNNs use these connections instead of full connections (Fig. 6). When each layer connects to the previous layer using kernels or filter banks, it does so in a way that connects the two layers. Furthermore, the structure of a CNN has a series of the most common layers: The convolutional layer is the very first layer of the neural network. Weights known as “kernel” weights are used in this layer to link feature maps from one layer to feature maps from another layer (filter banks). The total of all local weights is passed via a non-linearity function like Relu. The following diagram illustrates how the kernel (filter) size and feature map are set: (1). Using F and m, we can control this. Another thing to keep in mind: Bias is referred to as B, while the weight of the filter is known as W j . That signifies that j is the ith
11 A Deep Learning Based Human Activity Recognition System …
133
Fig. 6 General CNN architecture
feature map of the layer that has W j as its weight. Yil = Bil +
m l (l−1) j=1
Fi,l j + W j(l−1)
(1)
There are two common pooling layers used in CNNs: the convolution layer’s output is reduced in spatial dimension without a change in depth. To avoid overfitting in training, pooling layers might be used to reduce computing operations. The min, max, and average operations are all found in the pooling layer. Most fields have seen reasonable results with max pooling layers. A pooling layer’s height and width can be determined using (2) and (3), where W1, H1, and D1 are designated as the width, height, and depth of the pooling layer’s input. Additionally, the kernel and stride sizes of shifting are referred to as F and S, respectively. W2 = H2 =
W1 − F S H1 − F S
+1
(2)
+1
(3)
The fully connected layer is the third common layer. It is at this layer that each neuron is linked to the preceding layer’s neurons, as well as the calculated scores for each of the dataset’s classes. Additionally, in the final convolutional layers, the softmax function is commonly used to estimate the distribution of class labels across the convolutional layers. This is the formula for the softmax function, in which f (z) (softmax) converts between 0 and 1 and k is the dimensional value of the input variable. ex j f (z) = K For j = 1, . . . K x k=1 e k
(4)
In this study, instead of utilizing ordinary CNNs to categorize human activity frames, a customized CNN architecture was suggested that includes five convolutional layers, four pooling layers, and three fully linked layers. Softmax was taken
134
V. Gokula Krishnan et al.
Table 1 Summary of the proposed CNN architecture Layer type
Number of kernels
Kernel size
Output size
Convolutional
96
3×3
96 × 111 × 111
Max pooling
–
2×2
96 × 110 × 110
Convolutional
128
3×3
128 × 110 × 110
Convolutional
256
3×3
256 × 54 × 54
Max pooling
–
2×2
256 × 27 × 27
Convolutional
384
3×3
384 × 14 × 14
Max pooling
–
2×2
384 × 13 × 13
Convolutional
512
3×3
512 × 11 × 11
Max pooling
–
2×2
512 × 6 × 6
Fully connected
–
–
2048 × 1 × 1
Fully connected
–
–
1024 × 1 × 1
Fully connected with maxpolling
–
–
12 × 1 × 1
into account in the final fully connected layer for determining the likelihood of the dataset’s 12 different activity classes. Non-linearity in the output of convolutional and fully connected layers was achieved using the Relu activation function. A total of 96 3 × 3 kernel-sized filters were used in the first convolutional layer. A Relu activation function was also applied to the convolution layer’s output. For example, a 2 × 2 kernel size was used to down sample the previous convolutional layer’s output without changing its depth. The second and third convolutional layers were applied after the first pooling layer. To the softmax layer, the same design was carried on. Table 1 depicts the proposed CNN’s layers in great detail. Additionally, the batch size (the number of input samples per training cycle) and the learning rate (the pace at which the system learns) were set to 32 and 0.01, respectively.
4 Results and Discussion The Chance frames and video cropping with static or blind cropping zones were used to train all HAR techniques [18–21]. On a system with the following hardware specs, all of these methods were implemented: Graphics Card: Nvidia Radeon RTX 2070 super, with an i7 9700 K processor, 16 GB of RAM, and the MATLAB 2020 software. In each dataset, 70% of the movies were used for training, and 30% for testing. All comparisons used the same training and testing data. There were always 20 input frames with video dimensions of 111 by 111 in all implementations. Remaining components are from prior approaches that could not be used to the proposed one due to optimal frame and cropping region choice. Table 2 represents the Confusion matrix for the proposed method as the proposed HAR scheme showed in classification rate (unit: %). Different techniques were used
11 A Deep Learning Based Human Activity Recognition System …
135
Table 2 Confusion matrix for proposed method as the proposed HAR scheme showed important presentation in classification rate (unit: %) Activities
Wave2
Wave1
Walk
Skip
Side
Run
Pjump
Jack
Bend
Wave2
96
0
1
1
0
1
0
1
0
Wave1
0
98
0
0
2
0
0
0
0
Walk
0
0
99
0
0
0
0
0
1
Skip
0
1
0
99
0
0
0
0
0
Side
0
0
0
0
100
0
0
0
0
Run
0
0
1
0
0
97
0
0
0
Pjump
0
0
0
1
0
0
99
0
0
Jack
0
0
1
0
2
0
1
96
0
Bend
0
0
0
0
0
0
0
0
100
Average
98.22
to measure the performance as Wave2, Wave1 Walk, Skip, Side, Run Pjump, Jack and Bend. There are different activities used, by this activity the model reaches the average value of 98.22% respectively. Table 3 signifies the confusion matrix for the proposed CNN-HAR classification rate in different activities. The average analysis of the method reaches 99.00% respectively. On the KTH dataset, classification losses are shown in Fig. 7 for different training epochs. Increasing the number of training epochs reduces both the training and validation losses. As epochs climb to 500, the validation loss stabilizes and the network is close to convergence. As time goes on, the disparity between these two losses widens. As a result, the training epoch number is set to [500, 750]. If the validation loss does not reduce after 500 epochs, the training procedure is terminated. Figure 8 represents the Impact of the sum of training epochs on the Weizmann action dataset. In this graphical representation, the loss values are determined in different epochs. Table 3 Confusion matrix for proposed CNN-HAR classification rate (unit: %) Activities
Hand-clap
Hand-wave
Boxing
Running
Jogging
Walking
Hand-clap
100
0
0
0
0
0
Hand-wave
0
99
1
0
0
0
Boxing
0
0
98
1
1
0
Running
0
1
0
98
1
0
Jogging
0
0
0
0
100
0
Walking
0
0
0
1
0
99
Average
99.00
136
V. Gokula Krishnan et al.
Fig. 7 Impact of number of training epochs
Fig. 8 Impact of amount of training epochs on Weizmann action dataset
11 A Deep Learning Based Human Activity Recognition System …
137
5 Conclusion Experiments are done on a variety of activities to see how well they can classify things. From the results, it can be seen that the CNN had a 98.22% success rate on the Weizmann action dataset. Remove noise and distortions. Find features. Choose features. and classify them as they go through a series of steps. The selected frames are resized using adaptive frame cropping, where these images are given as an input to the proposed Convolutional Neural Network (CNN) for Human Activity Recognition (HAR). Recently, a lot of cutting-edge techniques have come up with ways to get and choose features from data. Machine learning has been used to classify these techniques. Despite the fact that many techniques use simple feature extraction processes, they can’t figure out complicated actions.
References 1. Ramanujam E, Perumal T, Padmavathi S (2021) Human activity recognition with smartphone and wearable sensors using deep learning techniques: a review. IEEE Sens J 2. Mekruksavanich S, Jitpattanakul A (2021) Biometric user identification based on human activity recognition using wearable sensors: an experiment using deep learning models. Electronics 10(3):308 3. Siddiqi MH, Khan AM, Lee SW (2013) Active contours level set based still human body segmentation from depth images for video-based activity recognition. KSII Trans Internet Inf Syst 7(11):2839–2852 4. Elmezain M, Al-Hamadi A (2018) Vision-based human activity recognition using LDCRFs. Int Arab J Inf Technol 15(3):1683–3198 5. Kamimura R (2011) Structural enhanced information and its application to improved visualization of self-organizing maps. Appl Intell 34(1):102–115 6. Song X, Lan C, Zeng W, Xing J, Sun X, Yang J (2019) Temporal–spatial mapping for action recognition. IEEE Trans Circ Syst Video Technol 30:748–759 7. Hajihashemi V, Pakizeh E (2016) Human activity recognition in videos based on a two levels K-means and Hierarchical Codebooks. Int J Mechatron Electr Comput Technol 6:3152–3159 8. Deshpnande A, Warhade KK (2021) An improved model for human activity recognition by integrated feature approach and optimized SVM. In: Proceedings of the 2021 international conference on emerging smart computing and informatics (ESCI). IEEE, Pune, India, pp 571– 576 9. Zayed A, Rivaz H (2020) Fast strain estimation and frame selection in ultrasound elastography using machine learning. IEEE Trans Ultrason Ferroelectr Freq Control 68:406–415 10. Lin X, Wang F, Guo L, Zhang W (2019) An automatic key-frame selection method for monocular visual odometry of ground vehicle. IEEE Access 7:70742–70754 11. Chen Y, Huang T, Niu Y, Ke X, Lin Y (2019) Pose-guided spatial alignment and key frame selection for one-shot video-based person re-identification. IEEE Access 7:78991–79004 12. Jeyabharathi D, Dejey (2018) Cut set-based dynamic key frame selection and adaptive layerbased background modeling for background subtraction. J Vis Commun Image Represent 55:434–446 13. Wang H, Yuan C, Shen J, Yang W, Ling H (2018) Action unit detection and key frame selection for human activity prediction. Neurocomputing 318:109–119 14. Zhou T, Li J, Wang S, Tao R, Shen J (2020) Matnet: motion-attentive transition network for zero-shot video object segmentation. IEEE Trans Image Process 29:8326–8338
138
V. Gokula Krishnan et al.
15. Jagtap AD, Kawaguchi K, Em Karniadakis G (2020) Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks. Proc R Soc A 476:20200334 16. Jagtap AD, Kawaguchi K, Karniadakis GE (2020) Adaptive activation functions accelerate convergence in deep and physics informed neural networks. J Comput Phys 404:109136 17. Targ S, Almeida D, Lyman K (2016) Resnet in resnet: generalizing residual architectures. arXiv preprint arXiv:1603.08029 18. Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, pp 6299–6308 19. Zheng Z, An G, Ruan Q (2020) Motion guided feature-augmented network for action recognition. In: Proceedings of the 2020 15th IEEE international conference on signal processing (ICSP), Beijing, China, vol 1, pp 391–394 20. Chen E, Bai X, Gao L, Tinega HC, Ding Y (2019) A spatiotemporal heterogeneous two-stream network for action recognition. IEEE Access 7:57267–57275 21. Yudistira N, Kurita T (2020) Correlation net: Spatiotemporal multimodal deep learning for action recognition. Signal Process Image Commun 82:115731
Chapter 12
Tweet Classification on the Base of Sentiments Using Deep Learning Firas Fadhil Shihab and Dursun Ekmekci
1 Introduction Experts estimate that unstructured data accounts for 80% of all the data in existence (images, videos, text, etc.). This information can be gleaned from a variety of sources, including social media postings and tweets, phone calls, surveys, and interviews, as well as blogs, forums, and other online publications. It is difficult to browse through all the web’s content and discover patterns. As a result, the firm must study this data to take better action. Sentiment Analysis is an example of a procedure that uses textual data to make conclusions. When you use web scraping methods, you can get the data you need for sentiment analysis right from the pages on the Internet [1]. The first and most important step is to understand the issue statement. This will give you an idea of what to expect when you get there. Tweets containing hate speech must be identified in this activity. A tweet is considered hate speech if it incorporates a racist or sexist emotion linked with it for convenience. Racist or sexist tweets must be categorized separately from the others. Given a training dataset, the goal is to predict which tweets will be labeled as racist or sexist based on the labels assigned to the tweets in the test dataset. The Twitter Sentiment Analysis Practice Problem (TSAPP) is addressed in this research [2]. To investigate how to solve a sentiment analysis challenge: We began by preprocessing and analyzing the data we had available. We next used Bag-of-Words and TF-IDF to extract characteristics from the cleaned text. In the end, we were able to categorize the tweets using two different feature sets.
F. F. Shihab (B) · D. Ekmekci Department of Computer Engineering, Karabuk University, Karabük 78000, Turkey e-mail: [email protected] D. Ekmekci e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_12
139
140
F. F. Shihab and D. Ekmekci
The Random Forest classifier aims for accuracy (90.25%) and generalizability (90.25%) [3]. Derivational characteristics are tested against some of the most essential aspects of the fundamental features to demonstrate their superiority. An accurate and generalizable classifier is constructed using data from the Institute of Random Forest, a machine learning technique that can eliminate most overfitting and build a generic model that can be deployed for correct usage immediately after training. The most recent ratio features that were developed outperform these fundamental account attributes. Data analysts now have a set of rules to follow with the identification of ID3 as a strong data classification approach. The Fast Fuzzy classification technique established in [4, 5] produced better classification results. The ID3 decision tree and SVM algorithm have also been used to increase classification accuracy and speed. It is based on the original ID3 and has an updated ID3 version (dubbed AFI-ID3). A feature selection technique takes the role of traditional data collection methods. In this novel technique for feature selection, the significance of an attribute is determined using the association function and the sum of the values of each of its attributes. Ding et al. [6] produced an enhanced ID3 that was utilized as a splitting metric in rough set theory instead of the information gain measure in conventional ID3. The lack of data renders ID3’s improved rough set theory version useless. By averaging information gains over a variety of characteristics, Zhu et al. [7] improved ID3 to address the problem of multivalued attribute bias. As part of the study [8], researchers removed a few examples from the conventional ID3 technique to lessen the algorithm’s rigor. UCI training datasets were used in the development of the ID3 technique by Rajeshkanna [9] and evaluated using a range of statistical criteria. Classifying reviews using two classifiers and seeing which one works better is the goal of [10], a unique contribution. The two classifiers are called DT and NB, respectively. Each explicit suggestion was looked at with Guerreiro et al. [11]’s new text mining technology, which was made just for this study [12]. The convolutional neural network beat the other techniques in the research, with an accuracy rate of 79%. By using the TF-IDF feature extractor and the Chi-Square and Mutual Information feature pickers, Zhang came up with a revolutionary technique. A sentiment classification technique, such as Logistic Regression, Linear SVC, and Multinomial Naive-Bayes, is then used to process the obtained information. According to a study published in [13], the earthquake data from September 19, 2017, was evaluated using OM methods and supervised machine learning. They developed a set of three separate classifiers to do this. It was possible to establish via our studies that SVM and NB were the most accurate in classifying emotions. The author of [14] has presented a unique technique for multi-class hierarchical opinion mining in Arabic text that depends on a three-level binary tree structure. SVM has been used as a classifier to extract the tweet’s emotions after NLP operations such as word tokenization and lemmatization. Based on the data, it was found that the NB has the highest level of accuracy, at 88.17% [15]. A new paradigm for aspect-level opinion mining is presented in the publication [16]. A lot of research has been done on how to use deep neural networks to automatically classify health-related tweets
12 Tweet Classification on the Base of Sentiments Using Deep Learning
141
in Arabic, French, and English. One example is [17], who came up with a unique way to automatically classify tweets in those languages. We cover the basics of solving a common sentiment analysis challenge. We begin by preprocessing and sanitizing the tweets’ raw content. After that, we go over the newly cleaned text and see if we can gain a sense of the tweets’ context. As the last step, we utilize our feature sets to build models that can better understand and interpret the emotions expressed in tweets.
1.1 Sentiment Analysis (SA) In SA, emotional states and subjective information are systematically identified by using natural language, text analysis, machine learning, and biometrics, and then extracted, quantified, and researched [18–20]. Sentiment analysis is commonly used to better understand client attitudes in a wide range of industries, from marketing to customer service to clinical care. Sentiment analysis (sometimes referred to as opinion mining or emotion AI) is a branch of natural language processing (NLP) that analyzes text to determine how people feel about certain ideas (positive, negative, or neutral). Machine learning and rule-based approaches may be used to analyze sentiment. This is best seen in Fig. 1. Here are some applications of sentiment analysis: • Market analysis • Social media monitoring • Customer feedback analysis—Brand sentiment or reputation analysis
Fig. 1 Sentiment analysis
142
F. F. Shihab and D. Ekmekci
1.2 Natural Language Processing (NLP) Humans use NLP to communicate with one another. Speech or text are both acceptable options. Software-assisted natural language processing, or NLP [19], is a growing field. Natural Language Processing (NLP) is a higher-level word that includes Natural Language Understanding (NLU) and Natural Language Generation (NLG): NLP = NLU + NLG
(1)
Sentiment analysis is one of the most frequent uses of NLP in data science today. This domain has transformed the way organizations operate, which is why every data scientist has to be conversant with this domain. Sentiment analysis can be performed on thousands of text documents in seconds, rather than the hours it would take a team of people to conduct the same thing manually.
2 Methodology Machine learning techniques all have their strengths and weaknesses. For instance, using a categorical dependent variable in linear regression is something you may want to experiment with. Do not even bother trying. Low values of adjusted statistics will not be recognized. In these types of circumstances, SVM, Random Forest, and other algorithms such as Logistic Regression and Decision Trees should be used instead. Essentials of Machine Learning Methods (Logistic Regression (LR)) is a good place to start if you want to learn more about these algorithms. Using a variety of feature extraction approaches, a similar conclusion was reached. There are a variety of methods for extracting feature information from text, but the Term Frequency (TF) and its Inverse Document Frequency (IDF), as well as Word2Vec and Doc2Vec are among the most prominent. The authors discovered that using TF, IDF, and TF-IDF with linear classifiers like SVM, LR, and perception increased the accuracy of a native language recognition system. Ten different languages are used in cross-validation trials. The TF-IDF is used to tag n-gram words, characters, and parts-of-speech tags. The TF-IDF weighting on features is better than other techniques when dealing with words. Similarly, the authors of [26] looked at the performance of a neural network combined with three feature extraction algorithms for text analysis. TF-IDF and its two derivatives, Latent Semantic Analysis (LSA) and Linear Discriminant Analysis (LDA), are used to assess the performance of each feature analysis technique (LDA). The findings show that the model’s accuracy increases when using a large dataset with TF-IDF. For smaller datasets, combining TF-IDF with LSA gives equivalent accuracy. The process of identifying customer complaints about a product or service from user comments posted on the Twitter platform is critical to the success of the
12 Tweet Classification on the Base of Sentiments Using Deep Learning
143
Fig. 2 Basic steps of opinion mining on social network platforms
company’s goods and services [21, 22]. Microblog data may be analyzed to determine the polarity (negative, neutral, or positive) of user perceptions of a service or product using OM. Analyzing comments left by users on various social media platforms serves as the basis for the sub-operations shown in Fig. 2. There may be hidden information in the daily stream of social media posts that machine learning can help find [23]. Given a set of independent parameters, the LR approach may be used to determine a binary outcome (1/0, True/False, True/False). To indicate binary/categorical outcomes, dummy variables are employed [24]. You may alternatively think of logistic regression as a kind of linear regression in which the outcome variable is categorical, and the dependent variable is the log of changes. In other words, data is used to figure out how likely it is that an event will happen [25].
2.1 Derivation of Logistic Regression Equation The Generalized Linear Model (GLM) includes regression techniques as a subclass (GLM). It was Nelder and Wedderburn in 1972 that developed this model as a way to apply linear regression to issues that were not naturally suited to it [26]. Logistic regression was included as a particular instance in a class of models that includes other models. The linear regression model has two parts: [27] and [28]: G(E(y)) = α + β x1 + γ x2
(2)
144
F. F. Shihab and D. Ekmekci
Here, g () is the link function, E (y) is the expectation of target variable, and α + βx1 + γ x2 is the linear predictor (α, β, γ to be predicted). The role of the link function is to “link” the expectation of y to the linear predictor. Important Points Study variables aren’t assumed to be linearly related in GLM. In the logit model, the link function and independent variables are assumed to have a linear relationship. The regression model does not have to be distributed in a typical fashion. For parameter estimation, it does not employ Ordinary Least Squares (OLS). Maximum Likelihood Estimation is used in its place (MLE). Errors must not be spread in a usual manner. This is a basic linear regression equation with the dependent variable wrapped in the link function, to begin with, the logistic regression: G(y) = βo + β(Age)
(3)
For ease of understanding, we considered “Age” as the independent variable. It is all about probabilities in logistic regression (success or failure). g() is the link function, as mentioned above. The probability of success (p) and the probability of failure (f ) are used to calculate this function (1 − p). p must satisfy the following requirements. Since p, >= 0, it must always be positive. In other words, it can never be more than 1, since p is smaller than 1. We can get to the heart of logistic regression by meeting these two requirements. We begin by denoting the link function as g () with “p” and finally derive it. The exponential version of the linear equation is used since probability must always be positive. The exponent of this equation will never be negative for any combination of slope and dependent variable: p = exp(βo + β(Age)) = e ∧ (βo + β(Age))
(4)
To make the probability less than 1, we must divide p by a number greater than p. This can simply be done by p = exp(βo + β(Age))/exp(βo + β(Age)) + 1 = e ∧ (βo + β(Age))/e ∧ (βo + β(Age)) + 1
(5)
Using (a), (b), and (c), we can redefine the probability as p = e ∧ y/1 + e ∧ y
(6)
where p is the probability of success. This (d) is the Logit Function. If p is the probability of success, 1-p will be the probability of failure, which can be written as q = 1 − p = 1 − (e ∧ y/1 + e ∧ y) where q is the probability of failure.
(7)
12 Tweet Classification on the Base of Sentiments Using Deep Learning
145
On dividing, (d)/(e), we get P =e∧y 1− P
(8)
After taking log on both sides, we get log
p 1
−p =y
(9)
Log (p/1–p) is the link function. Logarithmic transformation on the outcome variable allows us to model a non-linear association linearly. After substituting the value of y, we get log
p 1
− p = β0 + βage
(10)
In Logistic Regression, this is the formula. There is an odd ratio in this situation (p/1–p). When the log of probability value is positive, there is always a greater than 50% chance of success [29].
2.2 Evaluation Performance When evaluating the efficiency of a logistic regression model, there are a few indicators to examine. Logistic regression’s modified R2 has an analog in Akaike Information Criteria (AIC). The number of model coefficients is taken into account while calculating the AIC, which is a measure of model fit. As a result, we always choose a model with a low AIC score above anything else. A Null Deviance is the response predicted by a model with just an intercept, whereas a Residual Deviance shows the actual response. The better the model, the lower the value. The response anticipated by a model when independent variables are included is known as residual deviation. The better the model, the lower the value. As the name suggests, the confusion matrix depicts the difference between actual and predicted values. For example, we may use this method to determine the model’s accuracy and prevent overfitting, as shown in Fig. 3.
Fig. 3 Evaluation performance
146
F. F. Shihab and D. Ekmekci
To calculate the accuracy of the model: True Positive + True Negatives True Positive + True Negatives + False Positive + False Negatives
(11)
From the confusion matrix, Specificity and Sensitivity can be derived as illustrated below A B False Positive Rate (FPR), 1 − Specificity = A+B A+B D C sum to 1 {True Positive Rate (TPR), Specificity = False Negatives Rate (FNR), 1 − Specificity = C+D C+D sum to 1 {True Negatives Rate (TNR), Specificity =
(12) (13)
3 Tweet Preprocessing and Cleaning As an example, the pictures below depict two scenarios of office space—one is untidy and the other is clean and organized (see Fig. 4). These two images illustrate the pre-processing work. Which of the following is the most probable situation in which you will have no trouble locating the document? The less cluttered one, of course, since everything has a designated place. There is a lot of overlap in the data-cleaning process. Finding the correct information is much simpler when data is organized systematically. Preparing the text data is a necessary step to make it simpler to extract data from the text and use machine learning algorithms. If this step is skipped, the risk of dealing with unreliable and inconsistent data increases. Noise, such as punctuation, special characters, numerals, and phrases that do not have much weight in the content, should be removed from the tweets in this phase. We want to use our Twitter text data to extract quantitative characteristics later. We use all terms that are unique in the entire dataset to generate this feature space. The quality of our feature space will improve if we preprocess our data adequately. Here is a sample of the dataset that was utilized in this study. ID, label, and tweet make up the data’s three columns. The target variable is labeled,
Fig. 4 Example scenarios of office space
12 Tweet Classification on the Base of Sentiments Using Deep Learning
147
and the tweet includes the tweets that need to be cleaned and preprocessed to make them ready for use, so they can be used. Clean is a function that takes in text and outputs text that is free of any punctuation and numeric symbols, as the name implies. We used it in the review column and added the cleaned text to a new column called “Cleaned Reviews”.
3.1 Removing Twitter Handles (@User) There are various Twitter handles (@users) in the tweets, which is how Twitter users are recognized. We delete all of these Twitter handles from the database. The first is a combined train and test set for the sake of simplicity. This saves time and effort by not having to repeat the same actions.
3.2 Stop-Word Removal A stop-word is a term in English that conveys little or no meaningful information. Text preparation necessitates their removal. Every language’s stop-words are listed on Nltk. Look at the English stop-words.
3.3 Removing Punctuations, Numbers, and Special Characters Punctuation, numerals, and other special characters are ineffective. Instead of including them in the work, it might be wiser to delete them as we did with our Twitter accounts. We will use spaces instead of characters and hashtags in this section.
3.4 Removing Short Words When picking the length of the words to be removed, we must exercise caution. So many terms are only three letters long that I have chosen to delete them. For example, the words “hmm” and “oh” have little practical use. Remove them from your life.
148
F. F. Shihab and D. Ekmekci
3.5 Tokenization When a piece of text is broken into fragments, the procedure is referred to as tokenization. It may be done at the sentence or word level (sentence tokenization) or word level (word tokenization). Tokenizing all of our dataset’s cleansed tweets occurs at this step. Text is broken down into tokens, which are individual words or phrases, via the process of lemmatization.
3.6 Stemming A word’s suffixes (such as “ing,” “ly,” “es,” and “s”) are removed using a set of rules called stemming. There are several distinct ways to say “play,” such as “player,” “played,” “plays,” and “playing.” The stem of a word is the component that conveys its meaning. Stems and tokenization are two typical methods for locating root or stem words. For example, stemming often results in useless root words, since it merely removes certain letters at the end of the process. Here’s an example of how stemming and tokenization vary. The result of stemming is the stem, and the output of lemmatization is a lemma, as seen in the preceding example. There is no grammatical significance to the stem “glance” in the word “glanced.” The Lemma look, on the other hand, is flawless. Steps 2–5 were now clear to us after using basic examples. Allow me to bring the conversation back to where it started, at the root of the issue.
4 Story Generation and Visualization from Tweets We research and clean up the tweet’s text in this part. Data exploration and visualization, no matter what kind of data it is, is a critical step in learning new knowledge. There is no reason that they should be restricted to merely using the techniques described in this lesson. Our investigation will be hindered if we do not first think about and ask questions about the data we have at our disposal. What are some of the most common questions? Are there any terms that appear more often than others in the dataset do? What are the most frequently used terms in the dataset for negative and positive tweets? Twitter hashtags have a limit of 30 characters. What patterns can I deduce from the data I have collected? Both of these attitudes are related to what trends? So, how well do they mesh with our underlying feelings?
12 Tweet Classification on the Base of Sentiments Using Deep Learning
149
4.1 Understanding the Common Words Used in the Tweets: Word-Cloud In this phase, we examine the training dataset to determine how evenly distributed the supplied emotions are. Plotting word clouds might help to identify the most frequently used terms. In a word cloud, the most frequently used terms are shown in the largest font, while the least frequently used words appear in the smallest font (see Figs. 5, 6, and 7). Most of the terms are either positive or neutral, as can be seen. The most common ones are happiness and love. It does not give us an insight into the language related to racist or sexist tweets. As a result, in Fig. 7, we have distinct textual data for both the racist and sexist categories in our training data. Most of the terms are either positive or neutral, as can be seen. Love and happiness are the most often used expressions. As a result, most of the most frequently used terms are consistent with the mood of non-racist or sexist tweets. Figure 8 shows the other sentiment’s word cloud. Negative, racist, and sexist phrases are to be expected. Most of the terms have negative meanings, as can be seen from the list above. In other words, it seems that the test data we have is of high quality. After that, we study the most popular Twitter hashtags and trends. Figure 8 displays the reviewers’ most commonly used terms for the top 30 words.
Fig. 5 The natural words
150
Fig. 6 Words in non-racist/sexist tweets
Fig. 7 Racist/sexist tweets
F. F. Shihab and D. Ekmekci
12 Tweet Classification on the Base of Sentiments Using Deep Learning
151
Fig. 8 Most frequently occurring words—Top 30
4.2 Understanding the Impact of Hashtags on Tweets Sentiment At any given period in time, Twitter’s trending hashtags are identical to those hashtags. We need to see whether these hashtags can help us categorize tweets into distinct feelings, or if they are just a waste of time. A tweet from our dataset is shown below. To our opinion, the tweet is sexist and the hashtags used to describe it are sexist as well (see Fig. 9). It seems reasonable that all of these hashtags are good. The plot of the second list is expected to include negative words. Figure 10 shows the most frequently used hashtags in racist and sexist tweets.
Fig. 9 Non-racist/sexist tweets
152
F. F. Shihab and D. Ekmekci
Fig. 10 Hashtags racist/sexist tweets
Most of the words are negative, although there is a handful that is neutral. We should maintain these hashtags in our database since they provide important information. Consequently. once the tweets have been tokenized, we can begin extracting characteristics from them.
5 Extracting Features from Cleaned Tweets To examine preprocessed data, features must be created. Bag-of-Words, TF-IDF, and Word embedding are just a few of the approaches that may be used to build text features. Only Bag-of-Words and TF-IDF are covered in this work [30].
5.1 Term Frequency—Inverse Document Frequency (TF-IDF) Features When it comes to analyzing a corpus, the bag-of-words technique may be a good starting point, but it doesn’t take into consideration the total number of times a term appears in a given text or tweet [31]. It penalizes often occurring terms by giving them lower weights, while words that are uncommon across the corpus but appear frequently in a small subset of texts benefit from the TF-IDF algorithm. In the context of TF-IDF, the following definitions are critical: (Number of terms in the document) both the machine learning model and the testing data were subjected to feature extraction techniques: the training data was used to train the chosen models, while the testing data was used for classification. When it comes to information retrieval and summarization, the TF-IDF score is often utilized. According to its creators, the TF-IDF metric is meant to show how important an expression is inside a certain text. Extraction of TF and IDF features is done using TF-IDF. The tokens that are most scarce in a dataset are rewarded by IDF. It is more significant to the interpretation of each text if a rare term occurs in both. For each Word2Vec, IDF is equal to the log
12 Tweet Classification on the Base of Sentiments Using Deep Learning
153
(N/n), where N is the number of documents, and n is how many times it has been in each document [32]: TF_IDF = TF ∗ IDF
(12)
6 Model Building: Sentiment Analysis To collect the data in a usable manner, we have completed all the pre-modeling steps. TF-IDF and W2V can be used to develop prediction models on the dataset. To create the models, we employed logistic regression. Using a logic function, it estimates the chance of an event occurring. In logistic regression, the following equation is used: p(t) =
1 1 + e ∧ −t
where the variable P denotes a population, e is Euler’s number (2.7), and the variable t might be thought of as time. We used the W2V and TF-IDF features to train the machine learning algorithms (Logistic Regression, Random Forest Classifier, Decision Tree Classifier, SVM, and XGB Classifier), and they returned a training and validation accuracy and an F1-score on the validation set. Now that we have a model, we can use it to predict test results. Table 1 shows the performance evaluation of classification across different machine learning approaches. As statistically shown in Table 1, the result of LR is more accurate than other machine learning classifiers which is validation accuracy 0.9482 and F1-score 0.6521 without feature extraction (TF-IDF), and with feature extraction using TF-IDF the validation accuracy is 0.9616 and F1-score 0.7633. The research of neutral tweets will be the major focus of future work since some tweets do not contain a positive or negative attitude. The focus of the suggested research is on Twitter data, but the implications of this paper’s scope suggest that it might as well be applied to other social media platforms’ data. Figure 11 represents the dataset size and the data execution time of each data preprocessing task before and after the implementation of the Hadoop framework on the Sentiment 150 Sentiments dataset, respectively.
7 Conclusions A vote classifier based on logistic regression is proposed in this study. To integrate the likelihood of LR and TF-IDF, soft voting is used. Sentiment analysis may also be performed using different machine learning-based text categorization approaches. Users from all around the world contributed to a Twitter dataset that was used
154
F. F. Shihab and D. Ekmekci
Table 1 Evaluation performance between different machine learning approaches Methods
Range
Max features
Training accuracy
Validation accuracy
F1-score
Without feature extraction Random forest classifier
31,962
2500
0.9751
0.9362
0.6171
Decision tree classifier
31,962
2500
0.9652
0.9158
0.5344
SVM
31,962
2500
0.9521
0.9358
0.4986
XGB classifier
31,962
2500
0.9457
0.9458
0.5748
Logistic regression
31,962
2500
0.9637
0.9482
0.6521
With feature extraction Random forest classifier
31,962
2500
0.9991
0.9529
0.6246
Decision tree classifier
31,962
2500
0.9991
0.9315
0.5825
SVC
31,962
2500
0.9781
0.9521
0.5632
XGB classifier
31,962
2500
0.9603
0.9555
0.6248
Logistic regression
31,962
2500
0.9851
0.9616
0.7633
Fig. 11 Dataset size and data execution time of each data preprocessing task applied to the Sentiment 150 dataset
12 Tweet Classification on the Base of Sentiments Using Deep Learning
155
for the research. The influence of feature extraction strategies such as TF-IDF and Word2Vec on model classification accuracy were examined. Positive, negative, and neutral tweets were categorized using the specified classifiers. In addition to accuracy, validation accuracy and the F1-score were utilized as performance indicators. This shows that TF-IDF feature extraction is better for tweet categorization based on the findings. An F1-score of 0.7633 and a validation accuracy of 0.9616 are achieved by the presented voting classifier using TF-IDF. When compared to non-ensemble classifiers, ensemble classifiers have better accuracy. TF-IDF feature extraction was also used in the implementation of other machine learning models. That doesn’t fare well on the chosen dataset, as seen by these findings. This conclusion is not supported by data from other classifiers. Thus, future work will focus on experimenting with other deep learning algorithms on the datasets specified and new datasets that are available. The analysis of neutral tweets will be the major focus of future study since some tweets do not contain a positive or negative attitude. The focus of the suggested research is on Twitter data, but the implications of this paper’s scope suggest that it might as well be applied to other social media platforms’ data.
References 1. Khalid M, Ashraf I, Mehmood A et al (2020) GBSVM: sentiment classification from unstructured reviews using ensemble classifier. Appl Sci 10:2788. https://doi.org/10.3390/app100 82788 2. Shoumy N, Ang L, Seng K et al (2020) Multimodal big data affective analytics: a comprehensive survey using text, audio, visual and physiological signals. J Netw Comput Appl 149:102447. https://doi.org/10.1016/j.jnca.2019.102447 3. Schnebly J, Sengupta S (2019) Random forest twitter bot classifier. In: 2019 IEEE 9th annual computing and communication workshop and conference (CCWC). IEEE, pp 0506–0512 4. Charbuty B, Abdulazeez A (2021) Classification based on decision tree algorithm for machine learning. J Appl Sci Technol Trends 2(01):20–28 5. Olaru C, Wehenkel L (2003) A complete fuzzy decision tree technique. Fuzzy Sets Syst 138(2):221–254 6. Rathi M, Malik A, Varshney D, Sharma R, Mendiratta S (2018) Sentiment analysis of tweets using machine learning approach. In: 2018 eleventh international conference on contemporary computing (IC3). IEEE, pp 1–3 7. Zhu L, Yang Y (2016) Improvement of decision tree ID3 algorithm. In: International conference on collaborative computing: networking, applications and worksharing. Springer, Cham, pp 595–600 8. Kaewrod N, Jearanaitanakij K (2018) Improving ID3 algorithm by ignoring minor instances. In: 2018 22nd international computer science and engineering conference (ICSEC). IEEE, pp 1–5 9. Hamad Y, Mohammed OKJ, Simonov K (2019) Evaluating of tissue germination and growth rate of ROI on implants of electron scanning microscopy images. In: Proceedings of the 9th international conference on information systems and technologies, pp 1–7 10. Devi BL, Bai VV, Ramasubbareddy S, Govinda K (2020) Sentiment analysis on movie reviews. In: Emerging research in data engineering systems and computer communications. Springer, Singapore, pp 321–328 11. Guerreiro J, Rita P (2020) How to predict explicit recommendations in online reviews using text mining and sentiment analysis. J Hosp Tour Manag 43:269–272
156
F. F. Shihab and D. Ekmekci
12. Mehta RP, Sanghvi MA, Shah DK, Singh A (2020) Sentiment analysis of tweets using supervised learning algorithms. In: First international conference on sustainable technologies for computational intelligence. Springer, Singapore, pp 323–338 13. Zhang J (2020) Sentiment analysis of movie reviews in Chinese 14. López-Chau A, Valle-Cruz D, Sandoval-Almazán R (2020) Sentiment analysis of Twitter data through machine learning techniques. In: Software engineering in the era of cloud computing. Springer, Cham, pp 185–209 15. Addi HA, Ezzahir R, Mahmoudi A (2020) Three-level binary tree structure for sentiment classification in Arabic text. In: Proceedings of the 3rd international conference on networking, information systems & security, pp 1–8 16. Patel R, Passi K (2020) Sentiment analysis on Twitter data of world cup soccer tournament using machine learning. IoT 1(2):218–239 17. Wang Y, Chen Q, Shen J, Hou B, Ahmed M, Li Z (2021) Aspect-level sentiment analysis based on gradual machine learning. Knowl-Based Syst 212:106509 18. Baccouche A, Garcia-Zapirain B, Elmaghraby A (2018) Annotation technique for healthrelated tweets sentiment analysis. In: 2018 IEEE international symposium on signal processing and information technology (ISSPIT). IEEE, pp 382–387 19. Hameed Z, Garcia-Zapirain B (2020). Sentiment classification using a single-layered BiLSTM model. IEEE Access 8:73992–74001 20. Zhang M (2020) E-commerce comment sentiment classification based on deep learning. In: 2020 IEEE 5th international conference on cloud computing and big data analytics (ICCCBDA). IEEE, pp 184–187 21. Mandloi L, Patel R (2020) Twitter sentiments analysis using machine learning methods. In: 2020 international conference for emerging technology (INCET). IEEE, pp 1–5 22. Misopoulos F, Mitic M, Kapoulas A, Karapiperis C (2014) Uncovering customer service experiences with Twitter: the case of airline industry. Manage Decis 23. Hamad YA, Simonov K, Naeem MB (2019) Lung boundary detection and classification in chest X-rays images based on neural network. In: International conference on applied computing to support industry: innovation and technology. Springer, Cham, pp 3–16 24. Kirasich K, Smith T, Sadler B (2018) Random forest vs logistic regression: binary classification for heterogeneous datasets. SMU Data Sci Rev 1(3):9 25. Meier L, Van De Geer S, Bühlmann P (2008) The group lasso for logistic regression. J Roy Stat Soc Ser B (Stat Methodol) 70(1):53–71 26. Nelder JA, Wedderburn RW (1972) Generalized linear models. J Roy Stat Soc Ser A (Gen) 135(3):370–384 27. Kabaev E, Hamad Y, Simonov K, Zotin A (2020) Visualization and analysis of the shoulder joint biomechanics in postoperative rehabilitation. In: SibDATA, pp 34–41 28. Cameron AC, Windmeijer FA (1997) An R-squared measure of goodness of fit for some common nonlinear regression models. J Econometrics 77(2):329–342 29. Ayer T, Chhatwal J, Alagoz O, Kahn CE Jr, Woods RW, Burnside ES (2010) Comparison of logistic regression and artificial neural network models in breast cancer risk estimation. Radiographics 30(1):13–22 30. Cummins N, Amiriparian S, Ottl S, Gerczuk M, Schmitt M, Schuller B (2018) Multimodal bag-of-words for cross domains sentiment analysis. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4954–4958 31. Kadhim AI (2019) Term weighting for feature extraction on Twitter: a comparison between BM25 and TF-IDF. In: 2019 international conference on advanced science and engineering (ICOASE). IEEE, pp 124–128 32. Soares ER, Barrére E (2019) An optimization model for temporal video lecture segmentation using word2vec and acoustic features. In: Proceedings of the 25th Brazillian symposium on multimedia and the web, pp 513–520
Chapter 13
A Comparison of Top-Rated Open-Source CMS—Joomla, Drupal, and WordPress for E-Commerce Website Savan K. Patel, Falguni Suthar, Swati Patel, and Jigna Prajapati
1 Introduction E-commerce is the word in today’s world that might not be unknown to anyone. The e-commerce market is touching new highs every day in terms of revenue. Every businessman wants to go online to increase sales and to meet customers’ requirements. To go online, you require web content that is easily deliverable. Online shopping is an easy and convenient option for customers if you manage it well as a web owner. CMS is the better option for a non-technical person to manage web content in an easy manner. In general, CMS fulfills the common publishing process but said CMS is providing better than others in terms of documentation, user query support, plug-ins, robust security, etc. In this paper, a researcher has tried to remove a dark shadow from the question which is best and why. To get this answer researcher has created common content in mentioned CMS to take a different parameterized reading. This reading and their explanation help the reader to understand CMS and its selection criteria. We have gone through Google Insights and GTmetrix reports to get the first insight out.
S. K. Patel (B) · F. Suthar · S. Patel · J. Prajapati Acharya Motibhai Patel Institute of Computer Studies, Ganpat University, Mehsana, India e-mail: [email protected] F. Suthar e-mail: [email protected] S. Patel e-mail: [email protected] J. Prajapati e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_13
157
158
S. K. Patel et al.
2 Review of Literature First, Wahyudi Agustiono has compared five different CMSs for e-commerce purposes. He has compared OpenCart, VirtueMart, WordPress, Woocommerce, and Prestashop on one evaluation model containing quality requirements, characteristics, and testing strategies. In his study, he found that no CMS is perfect for e-commerce but each one has its own pros and cons in the reference to e-commerce [1]. Jose-Manuel Martinez-Caro et al. have compared Joomla, Drupal, and WordPress by creating common content in terms of requirements, functionality, and visuals. Through comparative analysis, he has tried to find out its advantages, disadvantages, and different hurdles associated with it. He has also compared these CMSs on security parameters to find vulnerabilities and security enhancement to find safer CMS [2]. Nick Schäferh has done CMS comparison for a selection of CMS like WordPress, Joomla, and Drupal. He has compared different criteria like cost and expense, expertise level required, design, popularity, customization, security, and community support. After observation, he has suggested when to use these CMS [3]. Editorial Staff at WPBeginner have compared 15 CMS on different criteria like ease of use, design, popularity, add-ons, help and support option, and cost. They also gave different pros and cons of each CMS. After comparison, they found that WordPress is the best among them in several criteria [4]. Amal A. Alrasheed et al. have compared three content management systems, Joomla, Drupal, and Magento, for e-commerce purposes. He has compared these CMSs based on some parameters like functionality, community support, addons, performance, hosting and installation, etc. He aimed to find good CMS for E-commerce. He also suggested some changes to make it better [5].
3 Joomla When we list powerful Open-Source Content Management Systems, how can we forget Joomla. It is one of the most popular open-source CMSs in its class. It is used to build influential websites and online applications. We can create any type of website using Joomla like the blog, e-commerce, different portals, community sites, etc. It offers easy customization as per user requirements. WordPress can be a blogger’s first choice but it doesn’t mean that users can’t blog with Joomla. Joomla offers a different template that meets users’ requirements. Users can add different functionalities using thousands of add-ons. Researcher has used Joomla 3.9.22 version and used J2store plug-in to create an e-commerce site. Joomla Core Features: Below are some of the core features of Joomla: 1. Great Design, 2. Easy content editing and publishing, 3. Basic functionalities, 4. Easy content management professional support [6].
13 A Comparison of Top-Rated Open-Source CMS—Joomla, Drupal …
159
4 Drupal Users can build a strong and user-friendly website using Drupal. Users can modify their website without technical skills that fulfill user needs. Drupal can expand as per users changing requirements. Drupal has a multilingual facility for both admin and user to view the same site in their language. It is easily customizable to fit your need to make your site attractive. It has good community support to solve your issues. When you see any new development in web design, Drupal will be at your doorstep [7]. Researcher has used Drupal core 9.0.8 Drupal Commerce 8.x-2.21 version and used J2store plug-in to create an e-commerce site. Drupal Core Features • • • • •
Large community. Better user experience. High scalability. Secure. Easy accessible [7].
5 WordPress Web users see WordPress as a blogging platform initially. But for a long time WordPress came out as a useful CMS. WordPress’s main benefits are numerous plug-ins by the developers’ community. All CMS tasks like creation, editing, publishing content, and yes SEO are also well managed by WordPress plug-ins nowadays. It has different add-ons to enhance the user Interface. It has a large community to help and solve user doubts even in selecting particular plug-ins [8]. Researcher has used WordPress 5.5.3 version and used WooCommerce plug-in to create an e-commerce site. WordPress Core Features • • • • • • • •
Simple. Flexible. Easy publication process. Advance tools for publication. Easy user management and media management. Lots of themes. Good community support. Well-managed SEO[9].
160
S. K. Patel et al.
6 Case Studies As discussed earlier, we have developed three different websites with similar content using Joomla, Drupal, and WordPress. As the content is similar, a researcher has gone for statistical comparison using the different parameters. A researcher has used GTmetrix and Google PageSpeed Insight online tools to evaluate the website’s performance.
6.1 Server and Client Machine Configuration In this case, three different sites were hosted on a remote server, and the user requested it from the client machine. A researcher has considered different performance criteria. The criteria readings depend on the hardware configuration of the client machine and mainly the speed of the Internet. Using these case studies, researcher has tried to remove a dark shadow from the question of which CMS is to select and why. Tables 1 and 2 show server and client machine configurations, respectively. The experimented results need to understand the study in a better way. Table 3 shows GTmetrix Overall comparison. Its different parameters are discussed below: GTmetrix grade: This is the first criteria that gives an overall evaluation grade. We can see that WordPress and Joomla are having the same grade and Drupal is well ahead in this comparison. Table 1 Server configuration
Table 2 CLIENT configuration
O.S
Linux
Hardware configuration
Intel Xeon Quad Core 3470
RAM
6 GB—DDR3
Hard drive
1 Tera Byte SATA
Motherboard
Intel
Server
Apache Server 2.4. 35
Front end
PHP 7.4
Back end
My SQL 5.7.32 Server
1
Internet connection—broadband 2 MB
2
Hard drive—500 GB
3
OS—Windows 7 professional
4
Processor—core i3
5
RAM—4 GB
13 A Comparison of Top-Rated Open-Source CMS—Joomla, Drupal …
161
Table 3 GTmetrix overall comparison WordPress
Joomla
Drupal
URL
drsavanpatel.tk joomla.drsavanpatel.tk drupalshop.drsavanpatel.tk
GTmetrix grade
D
Remarks
D
A
Higher is better
Performance 56% score
54%
99%
Higher is better
Structure score
79%
81%
95%
Higher is better
Largest contentful paint
3.1 s
3.2 s
0.7 s
Lower is better
Total blocking time
0 ms
0 ms
0 ms
Lower is better
Cumulative layout shift
0.01
0.02
0
Lower is better
Performance Score: Like GTmetrix Grade in performance score also Drupal has cleared its ground by getting a 99% score while Joomla and Drupal got the almost similar score. Structure score: In structure score WordPress secure 79%, Joomla got 81% while Drupal comes first with 95% [10]. Largest contentful Paint: If we try to understand LCP in the simple form it measures the time of the largest content element to load on your webpage to be visible on the visitor’s screen. In this parameter, WordPress and Joomla have almost the same number but Drupal is on top as in this parameter lower number is better. Total blocking time: It calculates the total amount of time your webpage does not allow users to interact, in other sense the total time your page was blocked. Cumulative Layout Shift: Simply say, CLS measures the sudden shifting of web elements when the page is being extracted. It is an aggregate score of layout shifting in your entire page. In this parameter, all CMSs got an almost equal rating. By looking at the derived result of all parameters mentioned in Table 3, it seems that Drupal CMS is performing well compared to WordPress and Joomla. Drupal has performed well in GTmetrix Grade, Performance Score, Structure Score, and LCP. Table 4 shows GTmetrix detailed comparison. Its different parameters are discussed below: Like GTmetrix overall comparison, Drupal is well ahead in all parameters in GTmetrix detailed comparison also. In the further section, all parameters listed in Table 4 are explained in detail with their results.
162
S. K. Patel et al.
Table 4 GTmetrix detail comparison
6.1.1
First Contentful Paint
FCP records the time when the visitor’s browser does any sort of page execution. In more depth, we can say that it is the time at which the first element of the content which can be background color, text, image, and Headings is painted on the page. It has its significance as until the page is not showing any changes the user feels that the page is loading. At last, it’s not useful in the measurement of page performance because in many cases the first paint is page background color or some sort of graphics that does not provide much facility to the visitors. We can see that Joomla is taking almost 3 s in this parameter while WordPress stood second in this with 2.7 s but Drupal comes first with only 0.7 s.
6.1.2
Speed Index
Joomla and WordPress got the almost similar point while Drupal has cleared its ground and comes first in this race.
13 A Comparison of Top-Rated Open-Source CMS—Joomla, Drupal …
6.1.3
163
Time to Interactive
It is used to measure a page’s load reaction and also helps to identify conditions at which point the page looks responsive but isn’t. In simple term, it measures the time when your page becomes fully interactive and able to take the user’s input. We got a similar kind of result in this parameter also in which Joomla and WordPress CMS’s page takes the almost same time to be fully interactive while Drupal takes one-fourth time compared to these two and is well ahead in this race.
6.1.4
Connect Duration
This is the time taken by the client machine to load the page from the server. Technically speaking, it is a merger of connect time, Request sending time, DNS time, and blocked time. When you look at the values of this parameter, WordPress performs worst with 446 ms, Joomla stands in second position with 298 ms. In comparison with these two, Drupal outperforms by taking only 37 ms and secured the first position.
6.1.5
Backend Duration
After the connection, one request is generated by the client machine and will be responded to by the server for the page and the time taken in this process is known as backend duration. In this parameter, WordPress has taken 1 s, Joomla has taken 2 s while Drupal has just taken 264 ms.
6.1.6
First Paint
It is the point when the browser does any kind of rendering on the page. First paint depends on page structure, it can be its background color (even white) or the mainstream of the page. It is important because until the first paint browser shows a default or blank page, after this point the user gets the indication that the page is loading. However, using this point we cannot say how much time the page takes to be fully rendered, or it’s no indication of the fast loading of the page. In this parameter, Joomla and WordPress take 3.0 s in comparison while Drupal takes only 329 ms.
6.1.7
DOM Loaded
It is the time when DOM is interactive and no script blocking for Java script. If there is no JavaScript blocking or any issue in the execution of JavaScript then there is no difference between DOM loaded and DOM Interactive Time. It is the time measurement in the execution of the JavaScript which is triggered by an event in DOM loaded. It also works as the beginning point in many JS frameworks whose
164
S. K. Patel et al.
delay also counts as rendering delay. In this parameter, all three CMSs got the same point as they secured in the first paint. Drupal comes first in this.
6.1.8
DOM Interactive
It’s a point when the browser finished parsing and loading HTML. It is processed before the final rendering of how the browser structures it internally. Its timing is very close to DOM loaded timing. In this parameter, all three CMSs have the same timing like DOM loaded parameter timing.
6.1.9
Onload Time
It is the indication of all process completion of the page like images, CSS, etc. finish loading. In this parameter, WordPress took 3.0 s Joomla stands with 3.2 s while Drupal is the winner with 1.2 s.
6.1.10
Fully Loaded Time
It is the total time taken by the below events probably: • • • • • •
Onload Time. Last request captured. Largest contentful paint. Total time to interactive. First paint. First contentful paint.
Fully loaded time will count all time when your page stops downloading, or executing any kind of script like JavaScript so there are chances that your counted fully loaded time might be longer. That is why it’s not the best indicator to judge performance [10]. In Table 5, page details different parameters mentioned and as per its comparison users can see that WordPress page size is 971 KB with Joomla page having 1.25 MB size in comparison with these two, Drupal is on top with only page size of 245 KB. Table 5 Page details WordPress
Joomla
Drupal
URL
drsavanpatel.tk
joomla.drsavanpatel.tk
drupalshop.drsavanpatel.tk
Total page size
971 KB
1.25 MB
245 KB
Total page request
36
40
11
Fully loaded time
6.2 s
2.8 s
1.2 s
13 A Comparison of Top-Rated Open-Source CMS—Joomla, Drupal …
165
Total page requests to download the page WordPress took 36 total requests and in the same reference Joomla took 40 while Drupal just took 11 requests. Fully loaded time is already discussed in the previous section as we can see that Drupal took very less time to load a full page.
6.2 Second Case Studies Google PageSpeed Insight Google PageSpeed Insight (PSI) works for both desktop and mobile devices. PSI is used to evaluate the performance of a page. It also provides different suggestions to improve performance. PSI shows two types of data: 1st Lab and 2nd Field. Users can get suggestions on performance issues using lab data but it might not be used to solve real-world problems. Field data represents real user experience but has its limitations.
6.2.1
Performance Score
PSI provides a summarized page performance score. It uses lab data to analyze the performance of the page.
6.2.2
Classification of Score
>90 → good >50 < 90 → needs improvement 120 mg/dl (0 = False, 1 = True)
7
Restecg
Resting ECG result (0: Normal, 1: ST-T wave abnormality, 2: LV hypertrophy)
8
Thalach
Maximum heart rate achieved [71, 202]
9
Exang
Exercise-induced angina (0: No, 1: Yes)
10
Oldpeak
ST depression induced by exercise relative to rest [0.0, 62.0]
11
Slope
The slope of the peak exercise ST segment (1: up-sloping, 2: flat, 3: down-sloping)
12
Ca
Number of major vessels colored by fluoroscopy (values 0–3)
13
Thal
Defect types: values 3: normal, 6: fixed defect, 7: irreversible defect
14
Class
Diagnosis of heart disease (1: unhealthy, 0: healthy)
16 Earlier Heart Disease Prediction System Using Machine Learning
195
3.5 Classifications Logistics Regression, Support Vector Machine, Random Forest Gini, Random Forest entropy, K-NN, and Naïve Bayes are used to classify cardiovascular disease. Simple Logistics Regression (LR). Binary classification in statistical data uses a nonlinear function known as Logistics Regression, and it transforms any values with 0 and 1 and uses LR-linear. Support Vector Machine. SVM is used for both linear and nonlinear datasets. It applies a nonlinear mapping method for the transformation of data with higher dimensions, SVM-linear, and SVM-RBF. K-Nearest Neighbors (K-NN). K-Nearest Neighbors classify the supervised data. It is one of the simplest and most widely used classification algorithms in which a new data point is classified based on similarity in the specific group of a neighboring data point. Naive Bayes. Each input variable is independent. It uses conditional independence, which means attribute value on a given class is independent on the values of other attributes. Nashif et al. [19] show their work on this learning model. Random Forest. It uses ensemble learning that provides higher accuracy through cross-validation and the output is only selected when multi-decision trees give the same results, and in our study of heart disease prediction, we used RF-Entropy and RF-Gini and both give high accuracy of about 98.49% in our training and testing model. The slighter difference between RF-Entropy and RF-Gini is the index range of the RF-Entropy index range which lies in the interval (0, 1), and the RF-Gini index range lies in the interval (0, 0.5).
3.6 Data Analysis In our study, we used various statistical features like Accuracy, Precision, Sensitivity, Specificity, log loss, ROC-AUC score, F1-score, and recall score which are shown in Table 2. Table 2 Comparative study of various classifiers before feature selection Model
Accuracy Precision Sensitivity Specificity F1_Score ROC
K-NN
0.804
0.798
0.896
0.768
0.816
0.802 6.768
0.607
SVM_lnr 0.804
0.773
0.885
0.716
0.825
0.800 6.769
0.612
Log_Loss Mathew corr. coef.
SVM_rbf 0.804
0.778
0.875
0.726
0.824
0.801 6.769
0.610
RF_Gini
0.985
0.972
1.000
0.968
0.986
0.984 0.521
0.970
RF_Ent
0.985
0.972
1.000
0.968
0.986
0.984 0.521
0.970
196
T. Kritika et al.
The comparative study shows that RF-Gini has an accuracy of about 98.50% whereas K-NN has 80.40% and SVM-linear has 80.40% accuracy as shown in Table 2. The ROC curve for different classifiers, i.e., RF-Gini, RF-Entropy, SVM, and K-NN is shown in Fig. 2. RF-Entropy has a good area coverage of about 0.998 units in curves drawn between true-positive and false-positive rates. Precision and recall curves for different classifiers, i.e., RF-Gini, RF-Entropy, SVM, and K-NN are shown in Fig. 3. RF-Gini has a good trade-off between precision and recall for different thresholds. High recall and high precision are shown by RF-Gini in Table 3. Confusion matrix is shown in Fig. 4a, where RF-Gini has evaluated the performance of a classification model. The actual target to the machine predicted is about an accuracy of 98.50% on taking n-estimators = 100. Confusion matrix is shown in Fig. 4b, where RF-Entropy has evaluated the performance of a classification model. Fig. 2 ROC Curves without using soft voting classifier
Fig. 3 Precission recall curves without using soft voting classifier
16 Earlier Heart Disease Prediction System Using Machine Learning
197
Table 3 Comparative study of the various classifiers Model
Accuracy Precision Sensitivity Specificity F1_Score ROC
RF_Gini
0.985
0.972
1.0
0.968
0.986
0.984 0.521
0.970
RF_Entropy 0.985
0.972
1.0
0.968
0.986
0.984 0.521
0.970
Log_Loss Mathew corr. coef.
The actual target to the machine predicted is about an accuracy of 98.495% on taking n-estimators = 100. The comparative study of various machine learning algorithms on training datasets before feature selections taking different n-estimators is shown in Table 4. After 9 estimated features selection using a soft voting function that is represented in Table 5 shows various ML algorithms reading in Jupyter notebook. The comparative study of the above model via ROC-AUC curves and precision recall curves is given below. ROC is probability between true positive and false positive curves and AUC indicates the measure of separability, which indicates how well our model classifies things, the higher the AUC the better is the Prediction.
(a)
(b)
Fig. 4 Confusion matrix for a RF-Gini and b RF-Entropy
Table 4 Comparative study of various ML algorithms
Training model
Accuracy n-estimator (100) (%)
Accuracy n-estimator (50) (%)
Accuracy n-estimator (10) (%)
LR-linear
86.66
86.66
84.17
SVM-linear
86.41
86.41
82.73
SVM-RBF
89.05
89.05
88.92
Naïve Bayes
85.78
85.78
82.74
K-NN
88.55
88.55
81.28
RF-Ent
98.49
98.49
96.83
RF-Gini
98.49
98.11
97.40
198
T. Kritika et al.
Table 5 Comparative study of various ML algorithms after feature selection Model
Accuracy Precision Sensitivity Specificity F1_Score ROC
Soft voting
0.890
K-NN
0.836
0.980
0.792
0.903
Log_Loss Mathew corr. coef.
0.886 3.799
0.791
0.820
0.793
0.885
0.750
0.836
0.817 6.217
0.642
SVM_lnr 0.840
0.790
0.942
0.729
0.860
0.836 5.526
0.691
SVM_rbf 0.880
0.845
0.942
0.813
0.891
0.877 4.145
0.764
RF_Gini
0.970
0.945
1.000
0.938
0.972
0.969 1.036
0.941
LR
0.830
0.787
0.923
0.729
0.850
0.826 5.872
0.668
RF_Ent
0.970
0.945
1.000
0.938
0.972
0.969 1.036
0.941
After feature selection, the ROC curves for different classifiers, i.e., RF-Gini, RF-Entropy, SVM, and Soft Voting are shown in Fig. 5. Precision–recall curves indicate a trade-off between true-positive and positive predictive values for prediction; the high area represents both high recall and high precision which is shown in Fig. 6. The comparative study of soft voting machine learning algorithms is performed on training datasets after feature selections taking different n-estimators. This shows the testing model is suited best for predictions of heart disease as shown in Table 6. The confusion matrix is shown in Fig. 7 for n-estimators 100, 70, and 10 having the same as shown in Table 6.
Fig. 5 ROC curves with soft voting classifier
16 Earlier Heart Disease Prediction System Using Machine Learning
199
Fig. 6 Precision recall curves with soft voting classifier
Table 6 Comparative study of soft voting algorithm Model
Accuracy (n-estimator)
Precision
Specificity
F1-score
ROC-AUC
Soft voting
89% (100)
0.836
0.791
0.902
0.886
Soft voting
89% (70)
0.836
0.791
0.902
0.886
Soft voting
89% (10)
0.836
0.791
0.902
0.886
(a) n-estimator =100
(b) n-estimator = 70
(c) n-estimator = 10
Fig. 7 Confusion matrix of a soft voting algorithm
4 Result Analysis We verify our results of the proposed model by selecting different numbers of features starting from 13, then 9 selected features, and 7 selected features. Figure 8a, b represents gender- and age-wise distributions of heart patients in our datasets, where the conclusion comes to the point that 60–65% of the population of male having aged under 50–60 has a major chance of heart disease, and the pie chart in Fig. 8a represents that probability in male having heart disease is 70%, whereas in female the probability is only 30%. Figure 9a, b represents normal patient’s age distribution, and the gender distribution of normal patients concludes that normal patients aged 55–65 are more prone to
200
T. Kritika et al.
(a)
(b)
Fig. 8 Distribution for a gender and b age
(a)
(b)
Fig. 9 Distribution of normal patients for a age and b gender
heart disease, whereas Fig. 10 shows a blue bar of a male having proportions more toward heart disease. Figure 10a, b, shows heart patient’s age-wise distribution of heart disease which represents the person having chances of heart disease is more from the age 30–35 onward and the graph increases rapidly which represents a concern in that area, whereas in Fig. 10 the proportion of female patients with heart disease is in better proportion than male. Figure 11a, b, shows rest ECG of normal patients versus rest ECG of heart patients, the figure can be a point of consideration. Figure 12 shows our seaborn distribution of numerical features of age, resting ECG, and cholesterol, predicting target variables represent 1 for male and 0 for female. After that outlier detection and removal are done using exploratory data analysis, using Gaussian multivariate distributions, normal distribution, 8 machine learning
16 Earlier Heart Disease Prediction System Using Machine Learning
(a)
201
(b)
Fig. 10 Distribution of heart patients for a age and b gender
(a)
(b)
Fig. 11 Comparison of rest ECG for a normal patient and b heart patients
algorithms are applied, among which only 5 have more than 80% of accuracy, 10 cross-validations for measurement of performance of an algorithm. Precision = TP/(TP + FP), Recall = TP/(TP + FN), F1-score = 2 ∗ (precision * recall)/(precision + recall), Accuracy = (TP + TN)/N,
202
T. Kritika et al.
Fig. 12 Seaborn distribution of different attributes
Sensitivity = TP/(TP + FN), Specificity = TN/(TN + FP). (where TP = True Positive, TN = True Negative, FP = False Positive, FN = False Negative, N = Number of Estimators). The various machine learning model performance on the dataset having proportion, 80% train data, 20% test data machine learning, cross-validation features of the accuracy of various machine learning training models, and on taking n-estimators = 100 we got the results: RF-Ent100: 98.3%, RF-Gini100: 98.36%; on taking nestimators = 70, RF-Ent: 98.11%, RF-Gini: 98.49%; on taking n-estimators = 50, RF-Ent: 98.49%, RF-Gini: 98.4937%; on taking n-estimators = 10, RF-Ent: 98.11%, RF-Gini: 98%. Train–test model accuracy on datasets, 80% train data and 20% test data from train–test selection, and the findings are dedicated below. RF-Ent: 98.36%, RF-Gini: 98.61%. Train–test model accuracy on datasets, 75% train data, 25% test data, from train–test selection having different machine learning algorithms performance are RF-Ent: 98.79%, RF-Gini: 98.52%.
16 Earlier Heart Disease Prediction System Using Machine Learning Table 7 Comparative analysis of various ML algorithms
203
S. No.
Model
Accuracy
F1-score
1
RF_Entropy
98.492
0.985
2
RF_Gini
98.492
0.985
3
Soft voting
89.0
0.902
Train–test model accuracy on datasets, 90% train data, 10% test data, from train– test selection having different machine learning algorithm performances are RF-Ent: 99.66%, RF-Gini: 99.44%.
5 Conclusion The work presented in this paper proposed working of a prototype model for earlier prediction of heart disease in any individual, after performing various analyses over various machine learning algorithms, as stated in Table 7. The best proven accuracy achieved by Random Forest Gini has an accuracy of 98.49% on taking n-estimators = 100 and so on, and after random forest entropy also has the accuracy of 98.49% on n-estimators = 100 as shown in Table 7, select the best 7, best 9, best 13 features and then accuracy is calculated over that selected feature, and it has been found that 13 selected features have the best accuracy in the range of 97–98.5% on the testing model, the soft voting function can be removed for removal of data number variations. Using soft voting functions we can solve such problems, Jupyter notebook is a module where all analyses take place, it is a multivariate-type dataset, which means it provides a separate mathematical or statistical variable. The prototype of the working model in collaboration with IOT used ECG and heartbeat sensors to read real-time data connected with Arduino Uno and ESP 8266. The future work lies in using all real-time parameters of all 13 attributes used for analyzing data and then the parameters entered in clouds API give the accurate results of a person’s health and their report via email. The proposed work serves as a very successful tool for doctors and all people.
References 1. Hazra A, Mandal S, Gupta A, Mukharzee A (2017) Heart disease diagnosis and prediction using machine learning and data mining techniques: a review. Adv Comput Sci Technol 10:2137– 2159 2. Patel J, Upadhyay P, Patel D (2016) Heart disease prediction using machine learning and data mining techniques. Int J Comput Sci Commun (IJCSC) 7:129–137. http://doi.org/10.090592/ IJCSC.2016.018 3. Chavan Patil AB, Sonawane P (2017) To predict heart disease risk and medications using data mining techniques with an IOT based monitoring system for post-operative heart diseases patients. Int J Emerg Trends Technol (IJETT) 4:8274–8281
204
T. Kritika et al.
4. Zhao W, Wang C, Nakahira Y (2011) Medical applications on internet of things. In: IET international conference on communication technology and application (ICCTA 2011), Beijing, 14–16 October 2011, pp 660–665 5. Soni J, Ansari U, Sharma D (2011) Intelligent and effective heart disease prediction systems using weighted associative classifiers. Int J Comput Sci Eng (IJCSE) 3(6):2385–2392 6. Yuce MR, Redout J-M, Wu T, Wu F (2017) An autonomous wireless body area network implementation towards IoT connected healthcare applications. Australian research council future fellowship under grant ft 130100430. IEE Access 162:116–129. https://doi.org/10.1109/ ACCESS.2017.2716344 7. Manjeet Singh M, Marins LM, Joanis P, Mago VK (2016) Building a cardiovascular disease predictive model using structural equations model and fuzzy cognitive map. In: IEEE international conference on fuzzy systems (FUZZ), Vancouver, 24–29 July 2016, pp 1377–1382 8. Ghadge P, Girme V, Kokane K, Deshmukh P (2016) Intelligent heart attack prediction system using big data. Int J Recent Res Math Comput Sci Inf Technol 2(2):73–77 9. Shouman M, Turner T, Stocker R (2012) Using data mining techniques in heart disease diagnosis and treatment. In: 2012 Japan-Egypt conference on electronics, communications and computers, pp 173–177. https://doi.org/10.1109/JEC-ECC.2012.6186978 10. Long NC, Meesad P, Unger H (2015) A highly accurate firefly based algorithm for heart disease prediction 42:8221–8231. https://doi.org/10.101016/j.eswa.2015.06.024 11. Sharma H, Rizvi MA (2017) Prediction of heart disease using machine learning algorithms: a survey. Int J Recent Innov Trends Comput Commun 5(8):99–104. https://doi.org/10.17762/ijr itcc.v5i8.1175 12. Das R, Turkolgu I, Sengur A (2009) Effective diagnosis of heart disease through neural networks ensembles. Expert Syst Appl 36(4):7675–7680. https://doi.org/10.1016/j.eswa.2008.09.013 13. Albahr A, Albahar M, Thanoon M, Binsawad M (2021) Computational learning model for prediction of heart disease using machine learning based on a new regularizer. Comput Intell Neurosci 10. Article ID 8628335. https://doi.org/10.1155/2021/8628335 14. Pal A, Srivastva R, Singh YN (2021) CardioNet: an efficient ECG arrhythmia classification system using transfer learning. Big Data Res 26:100271. https://doi.org/10.1016/j.bdr.2021. 100271 15. Reddy NSC, Nee SS, Min LZ, Ying CX (2019) Classification and feature selection approaches by machine learning techniques: heart disease prediction. Int J Innov Comp 9(1) 16. Singh AJ, Kumar M (2020) Comparative analysis on prediction of software effort estimation using machine learning techniques. In: 1st international conference on intelligent communication and computational research (ICICCR-2020) 17. Wahyuni R, Irawan Y (2019) Web-based heart disease diagnosis system with forward chaining method. J Appl Eng Technol Sci (JAETS) 1(1) 18. Hungarian Institute of Cardiology, Budapest, Andras Janosi, MD. University hospital, Zurich, Switzerland and William Steinborn, MD. University Hospital, Basel, Switzerland and Matthias Pfister MD, V.A. Medical Centre Long Beach Cleveland and Clinic Foundations Robert Detra no MD, PhD, the Cleveland Heart Disease Dataset (CHDD http://archive.ics.uci.edu/ml/dat asets/Heart+Disease) 19. Nashif S, Raihan R, Islam Md R, Imam MH (2018) Heart disease detection by using machine learning algorithm and a real-time cardiovascular health monitoring system. World J Eng Technol 6(4)
Chapter 17
Heart Disease Prediction Using Machine Learning and Neural Networks Vijay Mane, Yash Tobre, Swapnil Bonde, Arya Patil, and Parth Sakhare
1 Introduction Over the years, the risks of heart disease are rapidly growing to millions. According to research published by the World Health Organization, more than 18 million individuals have died due to heart disease. Heart is a vital organ that ensures the appropriate functioning of the human body; if it is harmed, human health is significantly harmed. Several indications indicate the presence of underlying cardiovascular disease symptoms. It involves symptoms such as an erratic heartbeat, discomfort in the chest, swollen legs, and sleep disturbances. Cardiovascular disease diagnosis and treatment are highly difficult, especially in underdeveloped nations. The proper prognosis and management may be hampered because of the limited availability of diagnostic devices, medical practitioners, and other services. The traditional methods employed by doctors are considered to be extremely costly and computationally complex. Angiography, among the traditional procedures, is considered a solution that provides exact results for the prognosis of cardiac diseases. Moreover, Angiography has several limitations, is exceedingly expensive, and has potential adverse effects. As a result, traditional techniques usually result in inaccurate, making heart disease prediction challenging [1]. The use of technology and various machine learning approaches became widespread to find less invasive methods [2]. There are a number of open sources for collecting patient data for acquiring access to patient information, and research may be performed to see whether various innovative techniques can be utilized to accurately diagnose people and detect this disease before it becomes fatal. In this paper, machine learning-based classification algorithms, as well as neural networks, are implemented to diagnose cardiovascular diseases. Logistic regression, Decision trees, Random Forest, Gradient Boosting, Support vector machine, Naive V. Mane (B) · Y. Tobre · S. Bonde · A. Patil · P. Sakhare Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_17
205
206
V. Mane et al.
Bayes, and K-Nearest Neighbor are some of the classification algorithms implemented in this paper. All classification models are trained to predict the occurrence of heart disease in this study, and their performance is compared using evaluation metrics such as sensitivity, accuracy, and so on to determine which classification model is the best for predicting the occurrence of heart disease. Moreover, the same data is also tested with neural networks of two categories: Dense Neural Networks and Convolutional Neural Networks. The analysis for the same is given.
2 Literature Review In paper [3], the authors use various classifier techniques of Support vector machine, neural system, and K-nearest neighbors. The Decision tree and Naive Bayes are used to pick up counts with the data for better results to identify the heart disease and more acceptable accuracy than other classifiers. In this paper [4], Different machine learning measures are implemented for heart sickness, for example, Angle boosting, Logistic relapse, irregular woods, and SVM. Also, multiple heart-related cases can be discovered using Naive Bayes. Upgradation of information collecting for achievable and fair purposes can be operated by these calculations. The research implemented in this [5] paper involves using states of the art data mining methods to give effective results. The paper uses algorithms such as K-Nearest Neighbor, Decision Tree, Random Forest Classifier, and Support Vector Machine. As a result, it obtained an accuracy of 90.32% using SVM and Random Forest to detect heart disease risk levels. The paper [6] depicts the intense significance of Support Vector Machines which shows how to diverge comprehensive data collection accordingly. An influential element of the Naive Bayes can also be mentioned here for ailment. This paper supported locating adequate performance of calculations [7]. Cooper et al. [8] show that by using fewer factors, the framework can anticipate coronary illness with state-of-the-art improvements employing diverse machine learning techniques. In paper [2], the prognosis of error rates and survival rates is demonstrated. Kuka et al. [9] display all possible outcomes of different machine learning methods for measuring heart sickness prediction. The [10] illustrates locating the identity and casualty rate with the estimations by encountering the demise rate. William set al. [11] use different techniques of machine learning such as Decision trees, Naïve Bayes, Random Forest, Logistic Regression, Support Vector Machines, and XG Boost to predict heart diseases. It predicts an accuracy of 95% by using Random Forest. In the paper [12], a diagnosis system has been made for heart disease and a statistical measuring system using an artificial neural network model. The accuracy obtained is 89.01%, and the specificity lies around 96%, along with 80% sensitivity. In [13], a three-phase technique is formed based on the Artificial neural network measure for coronary disease prediction in angina. This approach gained 88.89% accuracy. Sharma et al. [14] aim to build an ML model for heart disease prediction based on the related parameters. The benchmarked UCI Heart disease prediction
17 Heart Disease Prediction Using Machine Learning and Neural Networks
207
dataset is being used here, which consists of 14 different parameters related to Heart disease. ML algorithms such as Random Forest, Support Vector Machine (SVM), Naive Bayes, and Decision tree were used. The result demonstrates that compared to other ML techniques, Random Forest gives more accuracy in less time for the prediction. An adept medical diagnosis procedure for heart disease forecast has been presented in [15]. Different machine learning models have been utilized to predict, for example, Decision tree, Naive Bayes, and Artificial Neural Network. The accuracy attained by the artificial neural network was 88.12%, decision tree accuracy was 80.4%, and Naive Bayes acquired 86.12% precision. In [16], a machine learningbased heart disease system was introduced. An artificial neural network dynamic branch prediction algorithm with the feature selection algorithm was implemented, and the model’s results were satisfactory. In another paper [17], a heart disease prediction system is prepared to predict whether the patient is likely to be diagnosed with heart disease or not using the patient’s medical history. Different machine learning algorithms such as logistic regression and KNN are used to predict and classify heart disease patients. The results here depict that the proposed model was quite satisfying, particularly KNN, and logistic regression showed good accuracy compared to the previously used classifiers such as Naive Bayes.
3 Dataset and Description The dataset [18] used was donated to UCI Machine Learning Repository by David W. Aha and was created by the Hungarian Institute of Cardiology, Budapest; University Hospital, Zurich; University Hospital, Basel; and V.A. Medical Center, Long Beach, and Cleveland Clinic Foundation. The dataset initially consisted of 76 different attributes, but all of the published experiments refer to using a subset of 14 attributes. Moreover, the Cleveland dataset is the only one to be mainly used by ML researchers. This dataset does not provide any sensitive data as the names, and social security numbers of the patients were removed, replacing the same with dummy values. Following are the 13 attributes that were used in the raw dataset. The 14th attribute is the target variable where 0 represents no heart disease while 1 represents heart disease.
3.1 Age Age is one of the most popularized factors for heart disease. The aging causes various changes in our body, which directly may or may not increase the risk of having heart disease. It is observed that throughout the life of a person, the heart does not beat as fast as it used to. Moreover, the fatty deposits on the wall of arteries build over the course of many years.
208
V. Mane et al.
3.2 Sex Sex refers to the gender identity of the person, male or female. It is observed that men are likely to develop Heart Diseases earlier than women over the years.
3.3 Cp (Chest Pain) Type 1—Typical Angina: Defined as substernal chest pain precipitated by physical exertion or emotional stress and relieved with rest or Nitro-glycerin. Women and Elderly patients, as well as Diabetics, have it. This is not a severe condition. Type 2—Atypical Angina: A feeling described almost as a muscle pull or pain. This condition is not relieved by rest or Nitro-glycerin. Problems may also present like indigestion or heartburn and can mimic gastrointestinal issues. Common in women. Type 3—Non-Anginal Pain: Chest pain lasting for more than 10 min is non-anginal pain. Type 4—Asymptomatic: Also known as silent myocardial infarction (SMI). Incidences among middle-aged people experiencing SMI are twice as common and frequent in men than in women.
3.4 Trestbps Resting Blood pressure is measured in mmHg. It is believed that the blood pressure should be lower than 120 mmHg. If it is higher than that, there may be a chance of heart disease.
3.5 Chol High-density lipoprotein (HDL), low-density lipoprotein (LDL), and triglycerides in the blood are the three components. When a person’s low-density lipoprotein (LDL) level is high, it causes artery narrowing. Furthermore, a high triglyceride level increases the chance of a heart attack. On the other hand, a high amount of high-density lipoprotein (HDL) is considered “good” cholesterol, minimizing the risk of a heart attack.
17 Heart Disease Prediction Using Machine Learning and Neural Networks
209
3.6 Fbs The Fbs stands for Fasting Blood Sugar. It has been observed that excessive high blood sugar levels over many years can damage the blood vessels and nerves that control the cardiovascular system.
3.7 Resting ECG It is an electrocardiogram of the patient while resting. The Electrocardiogram test assists in providing heart rate and rhythm.
3.8 Thalach It stands for Maximum heart rate. It is observed that with heart rate increasing by ten beats per minute, the chances of cardiac death rise by more than 20%.
3.9 Exang Exang stands for exercise-induced angina. It is a pain in the chest when hard labor works like exercise.
3.10 Old Peak St The peak exercise ST segment is considered to be down sloping or horizontal when there is 1.0 mm at 60–80 ms ST depression after the J point. A normal ST takes place when the ST segment points sharply upwards. ST depression calculated in METs has a higher chance of a patient suffering from the multi-vessel disease.
3.11 Slope It signifies peak exercise slope. The shape of this slope around various points helps analyze the heart condition.
210
V. Mane et al.
3.12 Ca These are Key Vessels colored using fluoroscopy. These are also considered significant vessels in the whole functioning.
3.13 Thal This represents a heart condition called Thalassemia. Here, 1 stands for fixed defect, 2 stands for normal, and 3 stands for a reversible defect.
4 Methodology In building a heart disease prediction model, or any model for that matter, accuracy is one of the most critical factors. This research aims to provide a compilation of such models. Moreover, before implementing these models, we perform an immense feature selection by using different methods and selecting the best method for each algorithm by varying the number of features to be selected. This approach contributes to increased insights into the data and applies algorithms to maximize target variable accuracy.
4.1 Schema The schema for representing the knowledge that we gained involves two different sections, with data pre-processing being a common step among both of them as shown in Fig. 1. Machine Learning: In machine learning algorithms, the data is engineered based on the needs appropriate to all the algorithms. Neural Networks: While building neural networks, the data was engineered to provide precision over redundant epochs.
17 Heart Disease Prediction Using Machine Learning and Neural Networks
211
Fig. 1 Machine learning algorithms
4.2 Exploratory Data Analysis Exploratory Data Analysis is used to get insights into data. Many times, data can be biased or not suitable for the application. Various tools in python like matplotlib, seaborn, and pandas offer ways to perform such an EDA.
4.3 Feature Selection We performed the following feature selection methods, compiling the results from all of them; we have selected our final features. Information Gain: Information Gain is a filter method for finding mutual information between target variables and attributes. Fisher Score: Fisher Score works to return the values of maximum precision of a factor of predicting the target variable based on the maximum likelihood function. Correlation Coefficient: This coefficient describes the relationship between all the variables and their interdependence.
212
V. Mane et al.
Recursive Feature Elimination: This machine learning method depends on algorithmic methods to provide ranks (importance order) of the features. We have implemented RFE using the logistic regression method, decision tree method, random forest method, and extreme gradient boost method.
4.4 Machine Learning Techniques Following are the algorithms used: Logistic Regression: Logistic Regression is renowned for binary classification and widely used across the Machine Learning community. It provides appropriate solutions due to its relatively easy application to a broad range of problems. It functions on the categorical dependent variable. Logistic Regression proves concise and operative where the outcomes can be categorized on a binary scale of 0 and 1. In the case where the variables have more than one outcome, then use multinomial logistic regression. The logistic function could be described as. Naïve Bayes: The naive Bayes classifier uses the Bayesian algorithm. This classification algorithm is intensely scalable as it requires variables to be linear in the state of predictor variables in a problem statement. It works in a relatively similar manner as SVM due to the similarity in the classification and regression. It recognizes the unique traits of the data points related to the model. The predictable state depicts the probability of each input attribute and provides the likelihood of an occurred event. Here the model provides results through articulated conditional probability, which can be given as follows: P = (y = 1|X ) = P = (A|B) =
1 1 + e−Z
(P(B|A) − P(B)) P(A)
Random Forest Classifier: Random Forest is a Meta estimator that implements various Decision Tree Classifiers intending to improve accuracy and control overfitting. For each attribute, it constructs decision trees. Random Forest is a sort of machine learning method in which it constructs a dynamic model where the weak models are also incorporated.
17 Heart Disease Prediction Using Machine Learning and Neural Networks
213
Extreme Gradient Boost: The gradient is mainly helpful for reducing loss function, which is the actual difference between original values and predicted values. Gradient boost is a greedy algorithm that can efficiently overfit the training dataset rapidly, which enhances the performance of an algorithm. It optimizes and hence enhances the arbitrary differentiable loss functions that were stated earlier. K- Nearest Neighbor: KNN is one of the prime examples of instance-based learning or non-generalizing learning, meaning that it does not aim to build a model like many machine learning algorithms. It uses a system of prediction based on the nearest known data points for an unknown data point. The number of neighbors on which the KNN classifier depends its results is a parameter that we can tune to get the best results. Decision Tree Classifier: The Decision Tree algorithms form a network of distinct decision possibilities in the form of a tree. The basic intuition is to traverse through all of the possible precision paths and select the best one to ensure maximum efficiency, decision trees smoother to interpret while handling multiple outcomes. Support Vector Machine: SVM works similarly to logistic regression, the only difference being that it is driven by linear function rather than the logarithmic function. Unlike logistic regression, SVM gives the class identity of outcomes rather than probability. The number of distinct values of target variables determines the class identity. In SVM, the kernel technique is used, which entails analyzing several Expert Thinking methods expressed purely as dot products between data elements.
4.5 Neural Networks Techniques Neural networks are the networks that aim to imitate the working of neurons in the human brain. Neural networks use a learning approach through every repetition (otherwise known as epochs), proving very effective in predictive algorithms. This paper demonstrates a multi-layer perceptron [18]. Dense Neural Networks: Dense neural networks are the networks in which each neuron in the dense layer of the neural network is densely connected to all the inputs from the preceding layers. Mathematically, the network performs a vector–matrix multiplication with values of the parameters, which can be updated with the help of backpropagation. Convolutional Neural Networks: Based on a known grid-like topology, these specialized kinds of networks are used for processing data. A few of the examples include time-series data which can be thought of as a 1-D grid, taking samples at regular time intervals. An example is image data, which is a two-dimensional grid of pixels. CNN’s use of the mathematical technique ‘Convolution’ suggests that the network uses it. For example, for continuous time: S(t) =
x(a)w(t − a)da
214
V. Mane et al.
For discrete-time: S[n] =
∞
x[a]w[n − a]
−∞
where S[n] is the result of convolution between x and w.
5 Experiments 5.1 Exploratory Data Analysis Age See Fig. 2
Sex See Fig. 3
Chest pain See Fig. 4
TrestBps See Fig. 5
Cholesterol See Fig. 6
Fbs See Fig. 7
Restecg See Fig. 8
Thalach See Fig. 9
Exang See Fig. 10
Slope See Fig. 11
Ca See Fig. 12
Thal See Fig. 13
Old peak See Fig. 14
This exploratory data analysis gives an insightful perspective and helps us to understand data from a statistical standpoint.
Fig. 2 Age distribution
17 Heart Disease Prediction Using Machine Learning and Neural Networks
215
Fig. 3 Gender comparison by health condition
Fig. 4 Chest pain type
5.2 Feature Selection Following are the results obtained from different methods of feature selection. These graphs give us defined
216
V. Mane et al.
Fig. 5 Resting blood pressure distribution
Information gain See Fig. 15
Fisher score See Fig. 16
Correlation coefficient See Fig. 17
Recursive Feature Elimination The recursive Feature elimination is presented in Table 1.
5.3 Machine Learning Techniques We implemented the machine learning algorithms and got the following results which are represented in Table 2.
17 Heart Disease Prediction Using Machine Learning and Neural Networks
Fig. 6 Cholesterol distribution
Fig. 7 Fasting blood sugar
217
218
Fig. 8 Resting ECG comparison
Fig. 9 Maximum heart rate distribution
V. Mane et al.
17 Heart Disease Prediction Using Machine Learning and Neural Networks
Fig. 10 Exercise induced angina comparison
Fig. 11 Peak exercise slope comparison
219
220
Fig. 12 A number of major vessels versus count
Fig. 13 Thalassemia count
Fig. 14 Old peak distribution
V. Mane et al.
17 Heart Disease Prediction Using Machine Learning and Neural Networks
Fig. 15 Information gain feature selection
Fig. 16 Fisher score feature selection
221
222
V. Mane et al.
Fig. 17 Correlation coefficient feature selection Table 1 Recursive feature elimination
Table 2 Machine learning techniques
Sr. No.
Algorithm name
Features eliminated
1
Logistic regression
Chol, Age, fbs
2
Decision tree classifier
Slope, restecg, fbs
3
Random forest classifier
Sex, restecg, fbs
4
Extreme gradient boost
Cp, restecg, fbs
Sr. No.
Algorithm used
Accuracy (%)
1
Logistic regression
86
2
Naïve Bayes
85.3
3
Random forest classifier
93.65
4
Extreme gradient boost
94.63
5
K-nearest neighbor
87.8
6
Decision tree classifier
94.63
7
Support vector machine
98.04
17 Heart Disease Prediction Using Machine Learning and Neural Networks
223
Fig. 18 Model accuracy before fixing overfitting
5.4 Neural Network Techniques
Sr. No.
Before fixing overfitting
After fixing overfitting
Dense neural network model accuracy
See Fig. 18
See Fig. 19
Dense neural network model loss
See Fig. 20
See Fig. 21
Convolutional neural network model accuracy
See Fig. 22
See Fig. 23
Convolutional neural network model loss
See Fig. 24
See Fig. 25
Even in CNN, it can be observed that the model fits almost perfectly with its training data, defeating its purpose, and overfitting occurs. Such a case is avoided by adding dropout layers to the neural network.
6 Conclusion By doing Exploratory Data Analysis experimentation provides us with feature selection based on different methods and helps determine the validity of each attribute in determining heart disease. One of the most prominent conclusions drawn is the analysis that Fasting Blood Sugar has comparatively very less impact on determining whether the person has heart disease or not. Then a little bit more significant yet not enough is the attribute Resting ECG. Following through analysis, it was observed that
224
V. Mane et al.
Fig. 19 Model accuracy after fixing overfitting for dense neural networks
Fig. 20 Model loss before fixing overfitting for dense neural networks
Age, Sex, Cholesterol, Chest Pain, and Peak Exercise Slope are not the most prominent features. Afterward, we implemented a machine learning and neural network model based on the modification of the primary data, which led to accuracy above 85% on average. However, the pragmatism of this model is weakened by the relatively small size of the dataset, which causes overfitting without the use of fixing mechanisms like dropout layers. Even then, the model performs accurately based on the given data. To realize the practical potential of this model, a vast and suitable
17 Heart Disease Prediction Using Machine Learning and Neural Networks
225
Fig. 21 Model loss after fixing overfitting for dense neural networks
Fig. 22 Model accuracy before fixing overfitting for convolutional neural networks
data set. This can further be developed into a web application or a mobile application to improve its readability.
226
Fig. 23 Model accuracy before fixing overfitting for convolutional neural networks
Fig. 24 Model loss before fixing overfitting for convolutional neural networks
V. Mane et al.
17 Heart Disease Prediction Using Machine Learning and Neural Networks
227
Fig. 25 Model loss after fixing overfitting for convolutional neural networks
References 1. Muhammad Y, Tahir M, Hayat M et al (2020) Early and accurate detection and diagnosis of heart disease using an intelligent computational model. Sci Rep 10:19747 2. Bharti S, Singh SN (2015) Analytical study of heart disease comparing with different algorithms. In: 2015 international conference computing, communication & automation (ICCCA) 3. Parthiban G, Srivasta SK (2012) Applying machine learning methods in diagnosing heart disease for diabetic patients. Int J Appl Inf Syst (IJAIS) 3(7) (ISSN: 2249-0868 Foundation of Computer Science FCS, New York, USA) 4. Patel J, Upadhyay T, Patel S (2016) Heart disease prediction using machine learning and data mining technique. Int J Comput Sci Commun 7(1) (Sept 2015–March 2016) 5. Bhunia PK, Debnath A, Mondal P, Monalisa DE, Ganguly K, Rakshit P (2021) Heart disease prediction using machine learning. Int J Eng Res Technol (IJERT) NCETER 9(11) 6. Meyfroidt G, Guiza F, Ramon J, Brynooghe M (2009) Machine learning techniques to examine large patient databases. Best Pract Res Clin Anaesthesiol 23(1) (Elsevier) 7. Kanchan BD, Kishore MM (2016) Study of Machine learning algorithms for special disease prediction using principal of component analysis. In: 2016 international conference global trends in signal processing, information computing and communication (ICGTSPICC) 8. Cooper GF, Aliferis CF, Ambrosino R, Aronisb J, Buchanan BG, Caruana R, Fine MJ, Glymour C, Gordon G, Hanusad BH, Janoskyf JE, Meek C, Mitchell T, Richardson T, Spirtes P (1997) An evaluation of machine-learning methods for predicting pneumonia mortality. Elsevier 9. Kuka M, Kononenko I, Groselj C, Kalif K, Fettich J (1999) Analysing and improving the diagnosis of ischaemic heart disease with machine learning, vol 23. Elsevier, Artificial Intelligence in Medicine 10. Das R, Turkoglu I, Sengur A (2009) Effective diagnosis of heart disease through neural networks ensembles. Expert Syst Appl 36(4):7675–7680 11. Williams R, Shongwe T, Hasan AN, Rameshar V (2021) Heart disease prediction using machine learning techniques. In: 2021 international conference on data analytics for business and industry (ICDABI), pp. 118–123. https://doi.org/10.1109/ICDABI53623.2021.9655783
228
V. Mane et al.
12. Olaniyi EO, Oyedotun OK, Adnan K (2015) Heart diseases diagnosis using neural networks arbitration. Int J Intell Syst Appl 7(12):72 13. Palaniappan S, Awang R (2008) Intelligent heart disease prediction system using data mining techniques. In: Proceedings ACS/IEEE international conference on computer systems and applications, March 2008, pp 108–115 14. Sharma V, Yadav S, Gupta M (2020) Heart disease prediction using machine learning techniques. In: 2020 2nd international conference on advances in computing, communication control and networking (ICACCCN), pp 177–181. https://doi.org/10.1109/ICACCCN51052. 2020.9362842 15. Jabbar MA, Deekshatulu B, Chandra P (2013) Classification of heart disease using artificial neural network and feature subset selection. Global J Comput Sci Technol Neural Artif Intell 13(3):4–8 16. Janosi A, Steinbrunn W, Pfisterer M, Detrano R (1988) Heart disease. UCI machine learning repository 17. Jindal H et al (2021) IOP Conf Ser Mater Sci Eng 1022:012072 18. Khemphila, Boonjing V (2011) Heart disease classification using neural network and feature selection. In: 2011 21st international conference on systems engineering, pp 406–409. https:// doi.org/10.1109/ICSEng.2011.80
Chapter 18
Mathematical Approaches in the Study of Diabetes Mellitus S. V. K. R. Rajeswari and P. Vijayakumar
1 Introduction Diabetes Mellitus (DM) is one of the pathological disorders spread worldwide. According to WHO statistics, DM stood in the ninth rank with 1.5 million deaths in 2019. The same survey showed that the diabetic population increased from 108 million in 1980 to 422 million in 2014 [1]. A decrease in the concentration of Insulin and a decrease in the response of peripheral tissues to insulin Type-I DM is due to juvenile-onset, and Type-II DM is due to maturity onset. In contrast, the third type, Type-III DM, is Gestational diabetes which arrives during pregnancy and leaves after delivery [2]. Consumed food is the source for glucose. Glucose is a complex carbohydrate that is broken further into Polysaccharides. Glucose is in whole blood, plasma or serum samples [3], distributed in plasma water and intracellular erythrocyte water [4]. Glucose concentration is 15% less in whole blood when compared with plasma or serum. For glucose regulation, Insulin and glucagon are the driving hormones [5]. The glucose concentration in blood is between 80 to 120 mg/dl [3]. Insulin is plasma volume in the whole blood. Insulin is a protein hormone secreted by β cells of the Pancreas. Insulin helps maintain blood glucose levels by facilitating cellular glucose update, regulating carbohydrate, lipid and protein metabolism, which nurtures cell development and growth [6]. Due to the impermeability, Insulin is in the plasma volume of the whole blood [4]. Glucagon generated by α cells of the Pancreas increase blood glucose concentration by decreasing glycogen synthesis, stimulating the breakdown of stored glycogen and increasing gluconeogenesis in the liver [2]. S. V. K. R. Rajeswari · P. Vijayakumar (B) SRM Institute of Science and Technology, Kattankulathur 603203, Tamil Nadu, India e-mail: [email protected] S. V. K. R. Rajeswari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_18
229
230
S. V. K. R. Rajeswari and P. Vijayakumar
Glucagon regulates the plasma glucose secreted in reciprocation to Insulin, which simulates catabolic glucose utilization [7]. Glucagon is responsible for the secretion of Insulin. The cells control the glucose production, triacylglycerol deposition and protein synthesis produced by the pancreas [8, 9]. Glucagon is higher in diabetic individuals [9]. Glucagon is released when glucose levels are low and helps raise blood glucose levels. Insulin helps in converting the stored glycogen into glucose. The elevated glucose levels stimulate insulin release [10].
1.1 Need for a Mathematical Model (MM) Mathematical models (MM) represent the glucose-insulin oscillations [2, 10]. The accuracy of the MM models determines the optimal therapy for diabetes [2]. There are many mathematical models developed right from 1960 [10]. A simple approximation model is used for clinical testing procedures and identifying the parameters of interest from the small dataset [2, 11]. Complex representation handles different aspects of glycemic control, including meals, Insulin and other hormones, and compartment distribution [2]. The MM helps to analyze the performance simulation of glucose-insulin reactions in normal and diabetic individuals [10]. MM can analyze the Homeostatic control of the human body [2, 11].
1.2 Contribution of the Current Work In the current work, we present different mathematical approaches to derive glucose dynamics concerning Insulin, glucose and glucagon. The mathematical model depicts the linear and nonlinear relationship between glucose, Insulin and glucagon through different literature [10]. The human body, which is divided into organs, is assumed as compartments. The Sorensen model describes the mathematical model for each compartment. Hovorka model elaborates the MM of Type-I DM. DM has a strong association with obesity. Pancreas response and insulin sensitivity to glucose tolerance between lean and obese individuals are explored in the Bergmann model. Configuration of parameters that affect blood glucose, Insulin is developed by computer simulation. The dynamic model is proposed by introducing a time delay in the differential equations. The study ends with an FDA-approved UVA Padova simulator for the treatment of T1DM. Enhancements in the existing models and limitations are discussed in each model considered for future research work.
18 Mathematical Approaches in the Study of Diabetes Mellitus
231
2 Mathematical Models in Diabetes Mellitus Mathematical Models comprise different mechanisms to illustrate physiological diseases. Apart from the traditional methods described in theory, the MM provides awareness in improving convictions, parameter estimation, sensitivity determination, simulation of complex and straightforward phenomena and providing predictions [11]. MM models proposed for insulin sensitivity, glucagon, and pancreatic responsivity for glucose regulation in the body are discussed in this section.
2.1 Sorensen Model The MM proposed by Sorensen is a famous global six compartment mathematical model with 22 differential equations in 1985. It labels the compartment for each body organ. The model describes the time course of glucose concentration in different body organs, i.e., lungs, heart, brain, gut, liver, kidney, peripheral tissue and glucagon and Insulin on pancreatic secretion. An individual mathematical function was to analyze the metabolic rate for different stimuli on the clinical data. Physiological MM for each was produced mathematically [4]. The schematic block diagram of the Sorensen Glucose and Insulin model is illustrated in Figs. 1 and 2 [4]. The arrows represent the flow between the compartments. The glucose and Insulin model is derived from the brain, heart and lungs, gut, liver, kidney, and periphery. Mass balance equations to analyze chemical reactors and alternative processes in the body and Metabolic Services and sinks for analyzing biochemical processes are derived for the Glucagon model. (i) Glucose model equation In the Sorensen glucose model, an individual compartment for each organ is illustrated. Effect of glucose and its flow in all of the six compartments i.e., brain, heart and lungs, gut, liver, kidney, and the periphery is depicted in Fig. 1. (a) Brain VbvG
dGbv V bn = Q bG (G hl − G bv ) − (G bv − G bn ) dt T bn
(1)
V bn dGbn = (G bv − G bn ) − rbgun dt T bn
(2)
vbn
where VbvG = volume of glucose in brain; QG b = volume of vascular blood water flow rate in brain glucose (dl/min); G hl = glucose in heart and lungs; G bv = volume of glucose in brain; G bn = glucose in brain interstitial fluid space; vbn = volume of interstitial fluid in brain; T bn = transcapillary diffusion time constant (min) in brain; r bgun = metabolic source or sink rate (mg/min) during brain glucose uptake.
232
S. V. K. R. Rajeswari and P. Vijayakumar
Fig. 1 Schematic representation of Sorensen glucose model
Fig. 2 Schematic representation of Sorensen insulin model
18 Mathematical Approaches in the Study of Diabetes Mellitus
233
(b) Heart and Lungs VhlG
dGhl G = Q bG G bv + Q lG Q l + Q kG G k + Q Gp G pv − Q hl G hl − rr bcul . dt
(3)
where VhlG = volume of glucose in heart and lungs; Q l , Q k , Q p = vascular blood water flow rate (dl/min) in lungs, kidney and periphery; G k = glucose concentration in lungs and periphery; G pv = periphery volume of glucose; rr bcul = metabolic source or sink rate (mg/min) during red blood cell glucose uptake. (c) GUT VGG
dGG = QG G (G hl − G G ) − r GGU dt
(4)
where r GGU = metabolic source or sink rate (mg/min) during GUT glucose utilization. (d) Liver VLG
dG L G = Q GA G hl + Q G G G G − Q l G l − r hgpl − r hgul dt
(5)
where rhgpl = metabolic source or sink rate (mg/min) during hepatic glucose production; rhgul = metabolic source or sink rate (mg/min) during hepatic glucose uptake. (e) Kidney VKG
dG K = Q GK (G hl − G K ) − r K G E K dt
(6)
where VKG = volume of glucose in kidney; Q GK = vascular blood water flow rate (dl/min) in kidney glucose; G K = volume of glucose in kidney; r K G E K = metabolic source or sink rate (mg/min) during kidney glucose excretion. (f) Periphery V PGI
) V PI ( dG P I = G pv − G pi − r P GU P dt TGP
(7)
where V P I = volume of periphery; T G P = transcapillary diffusion time constant (min) in periphery; G pv = volume of glucose in periphery; G pi = volume of insulin in periphery; r P GU P = peripheral glucose uptake.
234
S. V. K. R. Rajeswari and P. Vijayakumar
(ii) Insulin model equation In Sorensen insulin model, individual compartment for each organ is illustrated. Effect of insulin and its flow in all of the six compartments i.e., brain, heart and lungs, gut, liver, kidney, periphery is depicted in the Fig. 2. (a) Brain i Vbn
dibv = Q ibn (Ih L − Ibn ) dt
(8)
where i Vbn = volume of insulin in brain; Q ibn = vascular blood water flow rate (dl/min) in brain insulin; Ih L = insulin in heart and lungs; Ibn = insulin in brain. (b) Heart and Lungs Vhli
di hl = Q ibn i bn + Q li il + Q ik i k + Q ip i pv − Q ipv i hl dt
(9)
where Vhli = volume of insulin in heart and lungs; i bn = insulin from brain; il = insulin from lungs; i hl = insulin from heart and lungs; i k = insulin from kidney; i pv = insulin periphery volume. (c) GUT VGi
di G = Q iG (i hl − i G ) dt
(10)
where VGi = volume of insulin in GUT; Q iG = vascular blood water flow rate (dl/min) in GUT insulin; i G = insulin from GUT. (d) Liver VLi
di L = Q iA i hl + Q iG i G − Q li il − r pirl − rlicl dt
(11)
where VLi = volume of insulin in liver; Q iA = vascular blood water flow rate (dl/min) in artery insulin; r pirl = metabolic source or sink rate (mg/min) in pancreatic insulin; rlicl = metabolic source or sink rate (mg/min) in liver insulin clearance. (e) Kidney Y Ki
di K = Q iK (i hl − i K ) − r K icK dt
(12)
18 Mathematical Approaches in the Study of Diabetes Mellitus
235
where Y Ki = vascular plasma space of kidney insulin; r K icK = metabolic source or sink rate (mg/min) in kidney insulin clearance. (f) Periphery G V pv −
( ) V pi ( ) di pv = Q Gp G hl − G pv − G pv − G pi dt T Gi VP I
) di Pi V Pi ( i pv − i pi − r Pic P = dt T Gi
(13) (14)
where G V pv = volume of glucose in periphery; Q Gp = vascular blood water flow rate (dl/min) in periperal glucose; V P I = volume of periphery insulin; r Pic P = metabolic source or sink rate (mg/min) in peripheral insulin clearance. (iii) Glucagon model equation (a) Mass Balance V r = r Prr G − r Pr cG
(15)
where r Prr G = metabolic source or sink rate (mg/min) in pancreatic glucagon release; r Pr cG = metabolic source or sink rate (mg/min) in plasma glucagon clearance. (b) Metabolic Services and Sinks r Pr cG = r Mr cM
(16)
where r Mr cM = metabolic source or sink rate (mg/min) in metabolic glucagon clearance.
2.1.1
Limitation of the Model
The model raises the complexity with 22 differential equations with around 135 parameters. The experiment on the Pancreas, which is responsible for the secretion of Insulin, is not satisfactory in the original model. The model could have developed the secretion of incretin hormones to enable the appropriate secretion of Insulin with glucose load in Pancreas. Thus, the glucose-insulin dynamics in Pancreas are missing, which is a limitation. The oral glucose administration and its appearance in plasma are lacking. In the model, only intravenous administration of Insulin is considered. The gut glucose absorption was adjusted to fit with the observed glucose concentration. The model does not consider gastric emptying, post-glucose consumption analysis and insulin effect.
236
2.1.2
S. V. K. R. Rajeswari and P. Vijayakumar
Improvements of the Sorensen Model
To overcome the limitation of the gastric emptying process, the effect of Insulin due to incretin hormones and two subsystems were added by considering two subsystem models. This was achieved by adding two differential equations, mass balance and time variation of blood glucose and Insulin in the Pancreas [12]. The work is enhanced by considering the effect of gastric emptying and Insulin due to increasing hormones after oral glucose intake [13]. A nonlinear system is developed where noise such as meals and exercises are considered by utilizing compartmental theory [14]. Similar work has been done by considering food habits where the variation of insulin concentration against time is observed. This model can be further developed by considering other parameters, such as the effect of exercise [15]. A simplified four-compartment model with Liver, Pancreas, Heart and lungs tissues is proposed. The sampling Importance Resampling (SIR) model is implemented to identify the abnormalities present in Type-2 diabetic patients [16].
2.1.3
Advantage of the Sorensen Model and Future Work
Though the model is complex, it allows the assumption of regular volunteers, T1DM, and T2DM for simulation. In the original work, insulin administration was considered through infusions. The work projects subcutaneous layers, inhalers or pumps for insulin administration. Due to many differential equations, computational complexity is high in many models. It can be decreased by considering a few compartments and developing differential equations for that individual compartment. This can be achieved by developing a nonlinear system with noise and considering the parameters such as meal intake, physical activity, and drugs that get affected in real-time. The model can be further verified by observing the difference between a diabetic individual and a healthy human.
2.2 Hovorka Model Hovorka proposed a nonlinear two compartmental model, i.e., glucose and insulin kinetics for Type-I diabetes, to develop an artificial Pancreas. The artificial Pancreas, which has a monitoring system, is responsible for monitoring the glucose and release of Insulin from the insulin pump using the control algorithm. The model includes a glucoregulatory system with a glucose subsystem that includes glucose absorption, distribution and disposal. Another subsystem proposed is the Insulin subsystem which includes insulin absorption kinetics, distribution and disposal dynamics and the Insulin action subsystem. The Bayesian parameter is used as a controller to determine time-varying parameters. The nonlinear function represents insulin-independent glucose fluxes and renal clearances [17]. The schematic block diagram of the Hovorka glucose-insulin subsystem is illustrated in Fig. 3 [17].
18 Mathematical Approaches in the Study of Diabetes Mellitus
237
Fig. 3 Compartmental model of Hovorka glucose-insulin system
(i) Glucose subsystem equation gc dqa (t) = − 01 + xa (t)qa (t) + kab q b (t) − gr + u g (t) + egpd [1 − xc (t)] dt vg g(t) (17) dqb (t) qa (t) = xa (t)qa (t) − [kab + xb (t)]qb (t)y(t)g(t) = dt Vg
(18)
where c = qa , qb = masses of glucose in accessible and non accessible compartments; g01 non-insulin dependent glucose flux; kab = rate constant from non accessible to accessible compartment; gr = renal glucose clearance; y, g = measurable glucose concentration Vg = distribution of volume of the accessible compartments; egpd = endogenous glucose production; u g (t) = administration of glucose; egpd = endogenous glucose production. (ii) Insulin subsystem equation sa (t) dsa (t) = u(t) − dt tmax, i
(19)
dsb (t) sa (t) sb (t) = − dt tmax, i tmax, i
(20)
Plasma insulin concentration i(t),
238
S. V. K. R. Rajeswari and P. Vijayakumar
u i (t) di (t) = − ke i (t) dt VI
(21)
where sa , sb = absorption of subcutaneously administered sort acting insulin; u(t) = administration of insulin; tmax, i = time to maximum insulin absorption; ke = fractional elimination rate. u i (t) =
sb (t) tmax, i
(iii) Insulin action subsystem equation d xi1 = −kai1 xi1 − kbi1 I (t) dt
(22)
d xi2 = −kai2 xi2 − kbi2 I (t) dt
(23)
d xi3 = −kai3 xi3 + kbi3 I (t) dt
(24)
where xi1, xi2, xi3 = effects of insulin on glucose distribution; kai = i = 1, 2, 3; kbi = i = 1, 2, 3.
2.2.1
Limitation of the Model
The model focuses on maintaining homeostasis in T1DM and does not consider samples for normal or T2DM individuals. This model handles time delay by implementing MPC (model predictive control). However, the speed of execution for the prediction is less.
2.2.2
Improvements of the Hovorka Model
Hovorka’s model enhances the effect of meals and insulin inputs on insulin absorption to the blood glucose concentration. This model derives the relationship between insulin and blood glucose concentration. It is observed that when a meal is consumed, the blood glucose concentration remains high as Pancreas is not able to produce Insulin in sufficient amounts. For diabetic individuals, it is mathematically derived that the blood glucose level fluctuates depending on mealtime and insulin bolus. It also experiments that if Insulin is given thirty minutes before meal consumption, the body will produce a steady glucose concentration [18]. A linear control system is proposed in a closed-loop for the deployment of the artificial Pancreas. Infused Insulin
18 Mathematical Approaches in the Study of Diabetes Mellitus
239
is taken as input, and glucose concentration is taken as output. It was observed that feedback control is not possible for artificial Pancreas and that the system is observable but not controllable [19]. The Hovorka model depicts a limitation of interaction between the parameters and variables in a glucose-insulin dynamic system. This limitation is handled by a recent study that enhanced Hovorka’s model. The system identification technique has enabled the interaction of parameters that enhanced the existing model. The glucose subsystem model is developed by introducing insulin action variables in the subsystem. The concentration of plasma insulin is enhanced by increasing time which remains constant after some time. The blood glucose level is controlled better during a steady state [20]. Real-time scenarios can be considered with noise, such as meal consumption, physical activity and drug intake and its effect on glucose and Insulin for Type-I diabetic and non-diabetic individuals.
2.2.3
Advantage of the Hovorka Model and Future Work
The Hovorka model uses Model Predictive Control (MPC) which enables control of the nonlinear system and can handle time delays. Hovorka’s model is based on “closing the patient loop” in the advancement of automatic control laws. When compared with another model, this model is more straightforward than the Sorensen, Bergman and UVA/Padova simulator model. The real-time model can be created for monitoring the regulation of glucose and Insulin. Insulin is the input for the model, whereas glucose is the output. A nonlinear model that is controllable and absorbable can be taken as research work. An app to predict Insulin and glucose regulation to maintain homeostasis in the body can also be taken as future work.
2.3 Bergman Model The Bergman model approach depicts the relationship between pancreas response and insulin sensitivity to glucose tolerance in lean and obese individuals. Insulin responsiveness to glucose (ϕ1), the time course of plasma insulin (ϕ2) and insulin sensitivity (SI) are the three parameters employed to achieve pancreatic insulin release and distribution. Insulin resistance is observed in low-tolerant obese individuals [21]. Glucose disappearance and insulin kinetics in the Bergman model are depicted in Fig. 4 [21]. d 1 1 I D(t) Iscb (t) = − Isc (t) + dt τ 1b τ 1b C1b
(25)
where t = time; Iscb = subcutaneous concentration; τ 1b = time constant; I D = delivery; C1b = clearance rate.
240
S. V. K. R. Rajeswari and P. Vijayakumar
Fig. 4 Bergman glucose-insulin system
d 1 1 I pb (t) = − I p (t) + Iscb dt τ 2b τ 2b
(26)
where I pb = plasma concentration; τ 2b = time constant; I p = plasma insulin. d Ibe f f (t) = − p2b Ibe f f (t) + p2b S I I p (t) dt
(27)
where Ibe f f = insulin effectiveness; p2b = insulin motility; S I = insulin sensitivity; I p = plasma concentration. ( ) d pg(t) = − pg(t) Ibe f f + G g + r2g (t) + E g dt
(28)
where pg = plasma glucose concentration; G g = endogenous glucose production; r2g = glucose absorption from meals; E g = endogenous glucose production.
2.3.1
Limitation of the Model
Bergmann’s model does not consider other parametric disturbances such as meal intake, the effect of physical activity and counter-regulatory hormones.
18 Mathematical Approaches in the Study of Diabetes Mellitus
2.3.2
241
Improvements of the Bergman Model
The Bergman model has a limitation for certain physiological parameters in realtime. The work has been modified by introducing two negative feedback loops in pancreatic insulin release and glucose concentration [22]. The time delay between insulin generation and glucose supply in blood is considered [23]. In the former models, the nonlinear condition of the body is not considered. A closed-loop system that acts as an artificial pancreas is developed with an insulin pump from the existing Bergman model. This model considers meal intake as a noise that is controlled by insulin infusion in the blood. The equations derived are implemented for analyzing basal insulin infusion rate [24]. Recent work extends the Bergmann model by adding external energy input and pancreatic cells that help in secreting more Insulin during high glucose levels in the body. The case where the presence of β cells and absence of α cells are considered. It was observed that the nonlinearity increases the insulin action and the insulin glucose concentration. The presence of β cells decrease the glucose, which is lesser than the glucose present when α cells are absent. Insulin action and Insulin concentration are found to be more effective in the presence of βcells. The smaller the kinetics value of glucose parameters, i.e., insulin concentration, insulin sensitivity, and plasma insulin decay rate, the lower the glucose concentrations [25]. The closed-loop system is designed to work as an artificial pancreas in recent work. Control parameters are tuned by considering Internal model control (IMC) and proportional derivative (PD) [26]. In another recent work, it was observed that insulin action and insulin concentration help reduce blood glucose [27].
2.3.3
Advantage of the Bergman Model and Future Work
Bergman’s model considers the glucose-insulin regulation among lean and obese individuals, which hasn’t been handled in other models. Along with meal disturbances, other factors such as exercise can also be considered as research. The model can be further enhanced by considering normal individuals, and the difference can be noted. A predictive algorithm for a particular day in controlling calories, meal intake and physical activity can also be developed.
2.4 Time Delay Model Introducing delay in differential models finds its applications in medical and biological fields [28]. The minimal model states that the Intravenous Glucose Tolerance Test (IVGTT) experiments with a minimum set of identifiable parameters. This famous ‘minimal model’ is questioned by the more sensible ‘dynamic model.’ One limitation of the ‘minimal model’ is the parameter fitting, i.e., insulin concentration and glucose concentration which is done in different steps. In real-time, it should be considered
242
S. V. K. R. Rajeswari and P. Vijayakumar
as a whole. The minimal model doesn’t accept equilibrium, and thus the mathematical results produced by the minimal model are not realistic and are unbounded. An auxiliary variable time delay X(t) is introduced to encounter these limitations. The time delay model is stated as the ‘dynamic model’ [29]. The time delay ‘t’ is referred to the secretion of Insulin from Pancreas. It is assumed that the insulin secretion is proportional to the average value of glucose concentration in b5 minutes before time delay ‘t.’ dG p (t) = −b1 G p (t) − b4 I p (t)G p (t) + b7 dt
(29)
where t = time delay; G p = plasma glucose concentration; I p = insulin plasma concentration; b1 = 1st order glucose concentration parameter; b4 = Insulin dependent glucose disappearance rate; b7 = Increase in plasma glucose concentration. G p (t) = G bp ∀t ∈ [−b5 , 0), G p (0) = G bp + b0
(30)
where b5 = length of past plasma glucose concentration; G bp = basal plasma glucose concentration. d I p (t) b6 = −b2 I p (t) + dt b5
t G p(s) ds, I (0) = Ib + b3 b0
(31)
t−b5
where b2 = first order disappearance rate constant for insulin; b3 = first phase insulin concentration; b5 = increase in concentration of glucose; b6 = Second phase insulin release rate.
2.4.1
Limitation of the Model
Introducing time delays make the system analysis, controller design and state estimation a challenge. The stability of the system, which is a limitation of time delay, can be achieved by considering interval time-varying delay. This model cannot be applied in real-time monitoring and analysis of glucose-insulin due to its instability when compared with other models.
2.4.2
Improvements of the Time Delay Model
The assumptions made by the model are restrictive. In a unit of time, a unit of Insulin has the ability to process a unit of glucose which restricts the delay introduced
18 Mathematical Approaches in the Study of Diabetes Mellitus
243
[30]. Further study on introducing delay is incorporated. In this model, the functions to admit asymptotic steady state and the delay introduced are verified with alternate ways of introducing delays. In theory, unstable positive steady states produce oscillations but not realistically. This study focused on obtaining the factors that incorporate the asymptotic steady state. It was observed that a positive steady state could be unstable when the delay is large [31]. There is a limitation in this model as the solution comes into effect after 40 min of meal consumption, and its real-time clinical application may have overall effects. Hopf bifurcation is a condition when system stability changes its state. When introducing time delay, T1 and T2, it is observed that Hopf bifurcation surfaces when time delay crosses the threshold [28]. In an advanced study of Hopf bifurcation, introducing time delay in Insulin secretion with blood glucose and glucose drop due to increased insulin concentration gives rise to complexity such as periodic oscillations with biological findings, periodic doubling cascade and chaotic state. These complexities help in predicting Diabetes Mellitus (DM) [32].
2.4.3
Advantage of the Time Delay Model and Future Work
Oscillations of insulin glucose can be observed, and correlation analysis can be made from time delay. In future work, a self-regulatory analysis of the insulin and glucose model can be designed that estimates close approximation to obtained data. Research work can be considered in developing a time delay model with stability, positivity of solutions and the boundedness of solutions.
2.5 UVA Padova Simulator UVA Padova is an FDA-approved simulator in python for the analysis of treatment protocols for T1DM. The first study to propose glucose and insulin kinetics observed that when glucose decreases beyond a limit, insulin-dependent utilization increases nonlinearly. The simulator is validated by considering real-time patient diabetic data. Complying with the clinical definitions, insulin carb ratio (CR), and the correction factor (CF) are implemented. Along with the measurement of impact on diabetes management and prediction, a framework for in silico trials in order to test the glucose sensor, insulin augmentation pump prediction, and closed-loop single or dual hormone controller design [33]. The parameters of MM proposed by UVA Padova are described below. (i) Glucagon secretion and kinetics equation: H p (t) = −cH p (t) + G S(t)H p (0) = H pb where
(32)
244
S. V. K. R. Rajeswari and P. Vijayakumar
t = tissue; H p (t) = plasma hormone concentration; c = clearance rate; G S(t) = glucagon secretion; H pb = basal value. (ii) Glucagon action equation: E G P(t) = r p1 − r p2 Pg (t) − r p3 X id (t) + k X gd (t)
(33)
where E G P(t) = end glucose production; r p1 , r p2 , r p3 = rate parameter; Pg (t) = glucose in plasma; X id (t) = delayed insulin action; k = liver responsivity to glucagon; X gd (t) = delayed glucose action. (iii) Glucose utilization in hypoglycemia equation: −[SG p + X p (t)(1 + r1 risk)]G p (t) + SG p G b
(34)
G X I (t) ifG p < G V
(35)
[ ] G X I (t) G p (t) = − SG p + X p (t) G p (t) + SG p G b + V
(36)
G(0) = G b G(0) = G b Otherwise
(37)
X p (t) = − p2 X p (t) + p2 S I p [I (t) − I (b)]X p (0) = 0
(38)
where G p (t) = plasma glucose concentration; SG p , S I p , p2 = model parameters; X p (t) = insulin action; G b = basal glucose; G XVI (t) = exogenous glucose infusion rate. (iv) Subcutaneous glucagon transport equation ) ( Hsg1 (t) = − k gh1 + k gh2 Hsg1 (t) + Hin f g (t)
(39)
Hsg1 (0) = Hsg1 (b)
(40)
Hsg2 (t) = k gh1 Hsg1 (t) − k gh3 Hsg2 (t)
(41)
Hsg2 (0) = sg2b
(42)
Ra H (t) = k gh3 Hsg2 (t)
(43)
18 Mathematical Approaches in the Study of Diabetes Mellitus
245
where Hsg1 (t), Hsg2 (t) = glucagon concentration in subcutaneous space, k gh1 , k gh2 , k gh3 = rate parameters; Hin f g (t) = glucagon infusion rate. Carboydrate Optimal bolus
(44)
1700 Total daily insulin
(45)
(v) Carbohydrate Ratio = . (vi) Correction Factor = . 2.5.1
Limitation of the Model
The UVA/Padova simulator is mainly designed for T1DM, which lacks the representation of the insulin secretion mechanism. Hence the model cannot be used for simulating normal human volunteers and T2DM individuals. When compared with other models in developing mobile applications for monitoring glucose, this model is valid only for T2DM.
2.5.2
Improvements of the UVA/Padova Model
The study was revised by making modifications to the newer version. In the updated model, the size of meals was taken into account for the emptying rate for the oral absorption model. For glucagon secretion, a continuous state-space model is also proposed. FDA approves the model after applying it to real patient data where the error obtained was lesser than 2.5% [33]. The simulator is upgraded by modeling intraday variability of insulin sensitivity (S1). Insulin infusion and insulinto-carbohydrate ratio are measured with time-varying distributions. Subcutaneous insulin delivery is added with both intradermal and inhaled insulin pharmacokinetics in the model [34]. An upgraded study determined the value of mealtime difference duration. Linear effect parameters, i.e., Carbohydrate, initial blood glucose and Mealtime difference, are considered for hyperglycemia or hypoglycemia. The algorithm also proposed an open loop method where the parameter affecting hypoglycemia was detected and prevented, and hyperglycemia was reduced [35]. The carbohydrate intake that churns from the meal can vary from patient to patient, i.e., the gender, age, weight, meal preferences, body type and amount of meal consumed. The simulator provides a traditional method for research, but it can produce results with under or more performance which may lead to erroneous treatment. A recent study proposed a realistic approach to generating glycemic outcomes. The parameters are optimized by considering disturbances, i.e., exercise [36]. The limitation of this approach is the lack of considering other parameters affecting the blood glucose levels. Future work by considering stress, chronic complications, and gestational diabetes should
246
S. V. K. R. Rajeswari and P. Vijayakumar
be considered to enhance the model in real-time scenarios. Future work may also include designing a simulator for T2BM as well by considering the factors affecting blood glucose levels and real-time disturbances.
2.5.3
Advantage of the UVA/Padova Model and Future Work
This model is approved by the FDA for T1DM. Existing work can be enhanced by including stress, physical activity and meal intake as disturbances. A mobile application that can suggest meal intake and calorie reduction on a particular day can be designed by considering the former factors. The influence of intraperitoneal insulin kinetics can be considered in the model for future work in T1DM.
3 Conclusion Differential equations help in modeling the behavior of complex systems. This paper has presented existential studies on different approaches to mathematical models and their adaptations. The six compartmental model of Sorensen and its advanced studies including gastric emptying is presented. Non-linear parameters are included in improved studies of the Sorensen model which is discussed. Two compartment Hovorka model is thoroughly presented along with advanced studies by considering non-linearities. Bergman models which are formulated on Pancreas response and insulin sensitivity for time delay and parameter disturbances are extensively presented. Insulin and glucose dynamics are derived by introducing a time delay dynamic model. FDA-approved UVA/Padova simulator for TIDM is studied. Every model is discussed with its advantages and limitations along with future work. Researchers may consider the limitations discussed in the paper as their future work for the treatment of DM.
References 1. World Health Organization (2021) Diabetes, November 10. Who.int website. https://www.who. int/news-room/fact-sheets/detail/diabetes 2. Galasko G (2015) Insulin, oral hypoglycemics, and glucagon. https://doi.org/10.1016/B978-0323-39307-2.00031-X 3. McMillin JM (1990) Blood glucose. In: Walker HK, Hall WD, Hurst JW (eds) Clinical methods: the history, physical, and laboratory examinations, 3rd edn. Butterworths, Boston, Chapter 141. https://www.ncbi.nlm.nih.gov/books/NBK248/ 4. Thomas SJ (1985) A physiologic model of glucose metabolism in man and its use to design and assess improved insulin therapies for diabetes. PhD thesis, Massachusetts Institute of Technology, Cambridge 5. Marie Ntaganda J et al (2018) Simplified mathematical model of glucose-insulin system. Am J Comput Math 8(3):233–244.
18 Mathematical Approaches in the Study of Diabetes Mellitus
6. 7. 8.
9.
10.
11.
12. 13.
14. 15.
16.
17.
18.
19. 20.
21.
22.
23.
24.
247
pdfs.semanticscholar.org/0304/79726b7debaaf95702628732f9313f7b09c1.pdf. https://doi. org/10.4236/ajcm.2018.83019 Wilcox G (2005) Insulin and insulin resistance. Clin Biochem Rev 26(2):19–39 Kelly RA et al (2019) Modelling the effects of glucagon during glucose tolerance testing. Theor Biol Med Model 16(1). https://doi.org/10.1186/s12976-019-0115-3. Accessed 1 Feb 2022 Kalra S, Gupta Y (2016) The insulin: glucagon ratio and the choice of glucose-lowering drugs. Diab Ther Res Treat Educ Diab Relat Disord 7(1):1–9. https://doi.org/10.1007/s13300-0160160-4 Samols E et al (1966) Interrelationship of glucagon, insulin and glucose: the insulinogenic effect of glucagon. Diabetes 15(12):855–866. https://doi.org/10.2337/diab.15.12.855. Accessed 14 Oct 2019 Hussain J, Zadeng D (2014) A mathematical model of glucose-insulin interaction. www.sem anticscholar.org. www.semanticscholar.org/paper/A-mathematical-model-of-glucose-insulininteraction-Hussain-Zadeng/541aefa67e17368d3cba3ce4550d3d56da07d976 Boutayeb A, Chetouani A (2006) A critical review of mathematical models and data used in diabetology. BioMed Eng OnLine 5(1). https://doi.org/10.1186/1475-925x-5-43. Accessed 23 Dec 2020 Alvehag K, Martin C (2006) The feedback control of glucose: on the road to type II diabetes. IEEE Xplore, ieeexplore.ieee.org/document/4177084. Accessed 1 Feb 2022 López-Palau NE, Olais-Govea JM (2020) Mathematical model of blood glucose dynamics by emulating the pathophysiology of glucose metabolism in type 2 diabetes mellitus. Sci Rep 10(1):12697. https://doi.org/10.1038/s41598-020-69629-0 Sudhakar S (2018) Mathematical model using MATLAB tool for glucose-insulin regulatory system of diabetes mellitus Banzi W, Kambutse I, Dusabejambo V, Rutaganda E, Minani F, Niyobuhungiro J, … Ntaganda JM (2021) Mathematical modelling of glucose-insulin system and test of abnormalities of type 2 diabetic patients. Int J Math Math Sci 2021:e6660177. https://doi.org/10.1155/2021/6660177 Banzi W et al (2021) Mathematical modelling of glucose-insulin system and test of abnormalities of type 2 diabetic patients. Int J Math Math Sci 2021:e6660177, www.hindawi.com/jou rnals/ijmms/2021/6660177/, https://doi.org/10.1155/2021/6660177 Hovorka R, Canonico V, Chassin LJ, Haueter U, Massi-Benedetti M, Orsini Federici M, Pieber TR, Schaller HC, Schaupp L, Vering T, Wilinska ME (2004) Nonlinear model predictive control of glucose concentration in subjects with type 1 diabetes. Physiol Meas 25(4):905–920. https:// doi.org/10.1088/0967-3334/25/4/010 Daud NaM et al (2015) Meal simulation in glucose-insulin reaction analysis using Hovorka model towards system-on-chip implementation. Undefined, www.semanticscholar.org/paper/ Meal-simulation-in-glucose-insulin-reaction-using-Daud-Mahmud/83467ee1a93dc09cf0d6f 79fa577cabbee5bbe87. Accessed 1 Feb 2022 Saleem M et al (2016) A linear control of Hovorka model. Sci Int (Lahore) 28(1):15–18. www. sci-int.com/pdf/636911094643734086.pdf. Accessed 1 Feb 2022 Binti Mohd Yusof NF et al (2015) Simulation work for the control of blood glucose level in type 1 diabetes using Hovorka equations. Adv Mater Res 1113:739–744. https://doi.org/10. 4028/www.scientific.net/amr.1113.739. Accessed 5 July 2021 Bergman RN et al (1981) Physiologic evaluation of factors controlling glucose tolerance in man: measurement of insulin sensitivity and beta-cell glucose sensitivity from the response to intravenous glucose. J Clin Investig 68(6):1456–1467. https://doi.org/10.1172/jci110398 Sturis J et al (1991) Computer model for mechanisms underlying ultradian oscillations of insulin and glucose. Am J Physiol 260(5 Pt 1):E801–E809. https://doi.org/10.1152/ajpendo. 1991.260.5.E801 Engelborghs K et al (2001) Numerical bifurcation analysis of delay differential equations arising from physiological modeling. J Math Biol 42(4):361–385. https://doi.org/10.1007/s00 2850000072 Sivaramakrishnan N, Lakshmanaprabu SK, Muvvala MV (2017) Optimal model based control for blood glucose insulin system using continuous glucose monitoring. J Pharm Sci Res 9(4):465
248
S. V. K. R. Rajeswari and P. Vijayakumar
25. Urbina G et al (2020) Mathematical modeling of nonlinear blood glucose-insulin dynamics with beta cells effect. Appl Appl Math Int J (AAM) 15(1). digitalcommons.pvamu.edu/aam/vol15/iss1/10/ 26. Nani F, Jin M (2015) Mathematical modeling and simulations of the pathophysiology of type-2 diabetes mellitus. Math and computer science faculty working papers, 1 Oct. digitalcommons.uncfsu.edu/macsc_wp/25/. Accessed 1 Feb 2022 27. Satama-Bermeo G et al (2021) Simulation and comparison of glucose-insulin models for type 1 diabetes virtual patient. IEEE Xplore. ieeexplore.ieee.org/document/9590773. Accessed 1 Feb 2022 28. Alzahrani S (2020) A glucose-insulin model with two time delays. Int J Differ Eq 15(1):01. ripublication.com/ijde20/v15n1p01.pdf, https://doi.org/10.37622/ijde/15.1.2020. 1-10. Accessed 1 Feb 2022 29. De Gaetano A, Arino O (2000) Mathematical modelling of the intravenous glucose tolerance test. J Math Biol 40(2):136–168. https://doi.org/10.1007/s002850050007. Accessed 1 Feb 2022 30. Li J et al (2001) Analysis of IVGTT glucose-insulin interaction models with time delay. Discrete Continuous Dyn Syst—B 1(1):103–124. https://doi.org/10.3934/dcdsb.2001.1.103. Accessed 14 Mar 2020 31. Man CD et al (2014) The UVA/PADOVA type 1 diabetes simulator: new features. J Diab Sci Technol 8(1):26–34. https://doi.org/10.1177/1932296813514502 32. Al-Hussein A-BA et al (2020) Hopf bifurcation and chaos in time-delay model of glucose-insulin regulatory system. Chaos Solitons Fractals 137(C). ideas.repec.org/a/eee/chsofr/v137y2020ics0960077920302459.html. Accessed 1 Feb 2022 33. Molano-Jimenez A, León-Vargas F (2017) UVa/Padova T1DMS dynamic model revision: For embedded model control. In: 2017 IEEE 3rd Colombian conference on automatic control (CCAC), pp 1–6 34. Visentin R et al (2018) The UVA/padova type 1 diabetes simulator goes from single meal to single day. J Diab Sci Technol 12(2):273–281. https://doi.org/10.1177/1932296818757747 35. Çankaya N, Aydo˘gdu Ö (2020) Three parameter control algorithm for obtaining ideal postprandial blood glucose in type 1 diabetes mellitus. IEEE Access 8:152305–152315. https://doi. org/10.1109/ACCESS.2020.3015454 36. Ahmad S et al (2021) Generation of virtual patient populations that represent real type 1 diabetes cohorts. Mathematics 9(11), 1200. https://doi.org/10.3390/math9111200. www.mdpi. com/2227-7390/9/11/1200. Accessed 1 Feb 2022
Chapter 19
Path Planning of Autonomous Underwater Vehicle Under Malicious Node Effect in Underwater Acoustic Sensor Networks Arnav Hari, Prateek, and Rajeev Arya
1 Introduction Underwater exploration has been made possible due to the advent of remotely operated vehicles (ROVs) [1]. They are also called autonomous underwater vehicles (AUVs). In recent times, AUVs have enabled support and maintenance of underwater wireless sensor networks. However, water, being fluid in nature, undergoes constant movement in water bodies such as seas, oceans, lakes, and rivers. This leads to several natural disturbances such as water current, wind, oceanic waves, etc. Manmade disturbances such as maritime ship routes and underwater submarines further introduce noise and disturbances which affect the normal functioning of AUVs. AUVs are used as a tool for collecting data and forwarding it. The low speed of AUV is not able to communicate data on time [2]. A HDCS is proposed to balance the requirement of time delay for UWSN to collect data from deployed AUVs. The AUVs may face various issues during deployment and sensor network formation which must be studied and remedied accordingly by using command filtered backstepping approach time of trajectory tracking error tolerance band [3]. One such issue is the problem of malicious behavior of AUVs due to deflection of trajectory by the sea waves. The context of the current work shall encapsulate path planning of AUVs under malicious node effect in underwater acoustic sensor networks (UASNs). Notations R2
Two-dimensional space
V
Vector field (continued)
A. Hari · Prateek (B) · R. Arya National Institute of Technology Patna, Ashok Rajpath, Patna 800005, Bihar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_19
249
250
A. Hari et al.
(continued) Vn
Tangent vector
p1 , p2 . . . pn
Boundary point of region
c1 , . . . cm
Collection of communication range
sn 1 , . . . sn n
Collection of sensing range
A
Area of specified region
Sn 1 , . . . Sn 8
Area of sensing circle
V(x)
Sound speed profile
S1 , . . . Sn
Sensor nodes
H
Hilbert scale
λ
Eigenvalue
N
Outward normal vector
Ω
Set of collection of point
∑
Set of circles
1.1 Related Work Prior works in UWSN path planning have dealt with some common issues such as the role of AUVs in UWSNs [4] and the effect of ocean disturbances in the hydrodynamic modeling of the AUVs. Underactuated AUVs and input quantization have been treated according to an adaptive prescribed performance tracking control [5]. Underactuated AUVs have also been treated in the presence of bounded disturbances through a geometric control mechanism [6]. Acoustic communication in an underwater environment is subject to acoustic echo channels and sparse channel behavior. A sparse sensing and estimation technique [7] based on soft parameter functioned setmembership normalized least mean square algorithm alleviates the issue of multipath and channel echo. AUVs are faced with the problem of underwater glider inaccuracy due to water current uncertainty. Development of a feedback control strategy [8] to compensate for the positional, directional, intensity, and action depth inaccuracies ensures motion accuracy for the AUVs. Path planning in UWSNs must remain collision-free. A pseudospectral method in a static environment allows for a two-staged cooperative path planner [9] in a dynamic multi-AUV environment. Collisions with dynamic obstacles are evaded by development of online path re-planners. Obstacles between control nodes on offline paths pose a hurdle to a multi-AUV path, degrading the localization accuracy of UWSN targets. Insertion of adaptive knots eliminates the occurrence of such hurdles, enabling cooperative path planning in underwater scenarios. Malicious behavior due to umbilical lag detracts the maneuverability of the AUV. A set of nonlinear Kuhn–Tucker (KT) equations have been proposed to counter the lag in the response model [10]. AUV stabilization requires online identification of
19 Path Planning of Autonomous Underwater Vehicle Under Malicious …
251
time delay in AUV, as reported in [11]. Underactuated AUV is susceptible to ocean drifts. A backstepping method, when combined with a Lyapunov-based analysis, overcomes the dynamic model errors of the underactuated AUV characteristics [12]. A different backstepping approach is taken in a Lagrange system for computation of source of error [13]. A second moment process is considered for performance evaluation of global noise-to-state stability. Lagrange systems once again come into the field of view when researchers try to develop a structure-independent framework for uncertainty prediction [14]. In a different tangent, a nonlinear Lyapunov-based potential field approach is proposed for the correction of erroneous AUV trajectory [15]. The authors have considered the drag flow of the fluid flow to avoid obstacles in the path of the AUV. Nonlinearity has also been used to treat uncertain hydrodynamic coefficients for a marine surface ship. It gives an insight into the planar motion mechanism of AUVs which would otherwise be impossible without some sort of malice compensation [16]. Malicious behavior [17] of AUVs may arise due to the added mass and inertia of an AUV. When subject to a hydrodynamic damping, buoyancy, or propeller force, the conventional path of AUVs distorts considerably. By developing a quaternionbased Line-of-sight (LOS) guidance algorithm for AUV motion, a dynamic AUV model is formulated in [18]. Additionally, AUV stability is addressed using the Baumgarte–Lagrange-based method. In a similar approach, wind, wave, and ocean current effects have been identified in the path control of an underactuated AUV [19]. However, instead of an LOS algorithm, the authors have proposed a second-order, open-loop dynamic model with error control using neural network for efficient target tracking of an underwater WSN system. Malicious AUV nodes could deviate from the designated path and velocity, leading to errors in launching attitude and control parameters. A grouping-based dynamic trajectory plan [20] provides direction to the AUV node to collect data. One approach suggested in literature by analyzing sensitivity is the optimization of response surface [21]. Another approach is by a Sobol’s method [22]. Node malice can further be mitigated by compensating the effect of gravity and correcting trajectory using internal sliding mode method [23]. A method is proposed based on a generalized Voronoi diagram to reduce redundancy in the obstacle-free region [24]. In [25], the novel system is proved to be able to plan a collision-free path and computing efficiency of path planning was also improved. Bio-inspired self-organizing method is proposed [26] in which each node scans only its nearby environment and identifies obstacles, generating new sensing and communication regions which reduces the obstacle effect network. This method formulates average sub-gradient to address the extreme seeking problem of the AUVs.
1.2 Major Contributions On the basis of the literature survey, three major problems of underwater sensor networks are identified as follows:
252
A. Hari et al.
• Appearance of malicious effects which disturb the UWSN topology. • Effect of malicious AUV on its trajectory which reduces the coverage area and the coverage accuracy. • Improper path planning due to lack of robust malicious node compensation techniques in underwater scenarios. Underwater path planning in a specified region is difficult because various factors such as water movement displace the deployed node underwater. This paper addresses the identified problems by proposing an algorithm based on the partition of a large underwater region into small specified regions which reduces the energy consumption of the sensor node and increases the lifetime of the AUVs. The key contributions of this paper are as follows: • An area-based specified region for localization is used by implementing Voronoi partition for the decomposition of the whole region into small specified regions. • For the AUV nodes deployed outside the specified region, a path planning model called Voronoi Area Path Planning (VAP2 ) algorithm is presented to move the AUV nodes into the domain of the specified region. • A boundary condition is set on the specified region so that the recently entered AUV nodes due to VAP2 algorithm do not stray outside the specified region. • An illustrative scenario is generated by incorporating the identified problems. An area coverage model is proposed which is based on the principle of Voronoi partition. The location of the AUV node is estimated and the localization error is computed. The performance comparison and discussion of results verify the effectiveness of the proposed algorithm. The rest of the paper is organized as follows: Sect. 2 discusses the path planning model of the scenario. Section 3 proposes a Voronoi area path planning (VAP2 ) approach to compensate for malicious AUV nodes in underwater acoustic sensor networks. Simulation results and discussions are mentioned in Sect. 4. Findings are concluded in Sect. 5.
2 Methods 2.1 Path Planning Model The main motive of the proposed method is to obtain a short and safe path from initial point of AUV node to final position, hence a path planning model is presented to represent the AUV deployment scenario. Let S denote a set of n AUV nodes {S1 , S2 , . . . Sn }. Let us assume that the AUV nodes know the boundaries of the environment and also have the information about their initial position but not the final position in the marine environment. Every AUV node can communicate its path to other AUV if and only if it is within a given acoustic range. We assume that
19 Path Planning of Autonomous Underwater Vehicle Under Malicious …
253
• for each AUV, the acoustic range (for communication) is greater than sensing range (for information); • the AUV operates within a convex polygonal underwater environment denoted by E ∈ R 2 . The objective of each AUV node is to follow a maximally informative path, minimizing redundancy in information collection subject to containing its path length within a given boundary. ( ) ( ) ( ) X dx x = + , Y dy y
(1)
where X, Y denote the final positions of the AUV node; x, y are the initial positions of the AUV node; and d x and dy are displaced in the position of the AUV node. { p1 , p2 . . . pn }, X sn = Definition: Define S n {as the region generated by point } n n 1 n 1 n by H (0, s; s ) and X s,0 = k ∈ H (0, s; S ) : k|┌s = 0 . Find the X sn and X s,0 the following equation: ∥k∥2n
=
( s ∑ n 0
λi ki2 (x)
i=1
| ) | | dvi (x) |2 | dx + || dx |
(2)
for all k(x, y) =
n ∑
ki (x) pi (y) ∈ X sn
(3)
i=1
where H is Hilbert scale, p is the number of points, s is the sub-region of space, and λ is the eigenvalue. After finding the region for AUV node deployment, the boundary condition is applied such that Ω0 and Ω1 be open set in R 2 . The boundaries are given by ┌0 and 1 1 ┌1 of class C 1 with Ω0 ⊂ Ω1 . Let {┌s }s=1 s=0 be a family line of class C with C depending upon s. s is a set of boundaries of the open set of family Ωs with Ωs ⊂ Ωs 1 ,
(4)
for 0 ≤ s < s 1 ≤ 1. After applying the boundary condition, the recently entered AUV nodes due to the path planning model do not stray outside the specified region.
254
A. Hari et al.
3 Algorithmic Formulation
1
2 3 4
5
6
Pseudo Code for VAP2 (Voronoi Area Path Planning) Algorithm Initialization Initialize-R, Rc , K, dt , interlim, thr, Node count, point count Bounded Voronoi Find how many points in communication range other than its centroid: For i = 1 : point count Calculate the planes for each other point: For i = 1 : point count Calculate the intersection of all planes for each node: For i = 2 : other point count Limit the Voronoi cells to the specified environment Polygon Check input size are same or not Find number of vertices Shift data to mean of vertices x − xmean y = y − ymean
7
8
Find δ x and δ y Sum CW boundary interval Check CCW and CW boundary Find centroidal moments: j = iuu + ivv
9
Find principal moments and orientation:
i1 , i2 ,
ang1 , ang 2
10
Control Law Find the centroid of all the intersecting sensing area Find distance of node from centroid d = Cn , S n
11
Find output velocity vout = G × d
vout
for gain
G
19 Path Planning of Autonomous Underwater Vehicle Under Malicious …
255
The flowchart of the proposed Voronoi area path planning (VAP2 ) algorithm is shown in Fig. 1. Initially, the AUV node is deployed underwater and the deployment is checked whether it belongs to the specified region or not. If the AUV strays outside the specified region, then a path model is used to redirect the AUV inside the specified region. Next, two types of range are addressed, namely, sensing range and communication range. In the case of sensing range, one must check whether the whole specific area is used for sensing or not. If only a part of some region is used, it would lead to formation of what is known as a hole. To reduce the extent of holes, a proposed algorithm is used. In the case of communication range, one must check the position of the AUV node. If the AUV node lies on the boundary of the sensing range, then the VAP2 algorithm is recommended to reduce the communication error. In the last block of flowchart, path planning is achieved by providing the path to the AUV node to confine them to a bounded area. For the path model, a control law is used, whereas, for the bounded area, a range approximation and a plane formation mathematics are used.
4 Performance and Discussion In this section, the simulated results are used to verify the proposed algorithm. The platform used is MATLAB. In the existing deployment method, all the AUV nodes are active irrespective of their initial position being inside or outside the specified acoustic area boundary. Conventionally, the lesser the number of AUV nodes at the boundary of the specified area, higher is the coverage accuracy of the area.
4.1 Node Deployment AUV nodes are deployed in underwater acoustic specified areas with the help of submarines or ships. Due to the movement of water some of the AUV nodes stay outside the specific area. A scenario is illustrated in Fig. 2 to represent the intended UWSN.
4.2 UWSN Coverage Initially, the sensing region does not cover the whole area because some of the AUV nodes are outside the specified acoustic environment. Some coverage holes remain in the specified acoustic environment and hamper the coverage accuracy of the system. After implementing the proposed VAP2 algorithm, these coverage holes get reduced and coverage accuracy of the system is enhanced, as shown in Fig. 3.
256
Fig. 1 Schematic flowchart of the proposed VAP2 algorithm Fig. 2 UWSN scenario with 8 AUV nodes in an aquamarine environment
A. Hari et al.
19 Path Planning of Autonomous Underwater Vehicle Under Malicious …
257
Fig. 3 Coverage accuracy of the AUV node comparing VAP2 algorithm to the conventional k-coverage approach
Table 1 VAP2 parameters
Parameter
Value
Number of AUV nodes
8
Radius of sensing circle
10 m
Radius of communication circle
20 m
Simulation area
30 m × 25 m
4.3 Communication Range Initially, the communication range of the AUV node is not uniform because of random deployment of the AUV node which causes communication error. After applying the VAP2 method, the communication range is uniformly spread in the specified area. The parameter specifications used in the path planning model are outlined in Table 1. Figure 4 shows the trajectory of transmission of the signal for different numbers of iteration. The predicted results are compared with the observed trajectory and it has been found that the predicted trajectory and the observed trajectory are approximately same. The improvement in coverage accuracy has been graphically represented in Fig. 5 for every AUV individually and from this figure we can see that in most of the AUV nodes area coverage accuracy is improved. Figure 6 shows the variation in signal amplitude due to the malicious effect of the AUV nodes which is more in case of underwater due to different types of perturbation in underwater.
5 Conclusion To improve the coverage accuracy of specified regions, various area coverage algorithms are used. In this work, VAP2 algorithm was proposed for path planning of malicious AUV nodes and to help improve the coverage accuracy of a specified region by up to 19% compared to the conventional technique. It enabled alleviation
258
A. Hari et al.
Fig. 4 Trajectory of the signal transmission
Fig. 5 Comparison of coverage accuracy of the AUV node after applying the VAP2 algorithm
Fig. 6 Variation in signal amplitude due to malicious AUV node behavior
19 Path Planning of Autonomous Underwater Vehicle Under Malicious …
259
of communication error with standard deviation of 9.2%, thereby reducing the extent of coverage hole. Based on the partition of the whole region into different specified areas, the future applicability lies on the detailed study and analysis of localization error due to the malicious AUV nodes which are path-planning-aware. Acknowledgements This work was supported by the Ministry of Electronics and Information Technology, Government of India, under Grant 13(29)/2020-CC&BT.
References 1. Tiwari BK, Sharma R (2018) A computing model for design of a flexible buoyancy system for autonomous underwater vehicles and gliders. Def Sci J 68:589–596 2. Liu Z, Meng X, Liu Y, Yang Y, Wang Y (2021) AUV-aided hybrid data collection scheme based on value of information for internet of underwater things. IEEE Internet Things J XX:1–12 3. Li J, Du J, Chen CLP (2021) Command-filtered robust adaptive NN control with the prescribed performance for the 3-D trajectory tracking of underactuated AUVs. IEEE Trans Neural Netw Learn Syst 1–13 4. Sahoo A, Dwivedy SK, Robi PS (2019) Advancements in the field of autonomous underwater vehicle. Ocean Eng 181:145–160 5. Huang B, Zhou B, Zhang S, Zhu C (2021) Adaptive prescribed performance tracking control for underactuated autonomous underwater vehicles with input quantization. Ocean Eng 221:108549 6. Henninger HC, von Ellenrieder KD, Biggs JD (2019) Trajectory generation and tracking on SE(3) for an underactuated AUV with disturbances. IFAC-PapersOnLine. 52:242–247 7. Li Y, Wang Y, Sun L (2019) A flexible sparse set-membership NLMS algorithm for multi-path and acoustic echo channel estimations. Appl Acoust 148:390–398 8. Wu H, Niu W, Wang S, Yan S (2021) An analysis method and a compensation strategy of motion accuracy for underwater glider considering uncertain current. Ocean Eng 226:108877 9. Zhuang Y, Huang H, Sharma S, Xu D, Zhang Q (2019) Cooperative path planning of multiple autonomous underwater vehicles operating in dynamic ocean environment. ISA Trans 94:174– 186 10. Yiming L, Ye L, Shuo P (2021) Double-body coupled heading manoeuvrability response model considering umbilical lag of unmanned wave glider. Appl Ocean Res 113:102640 11. Pedersen S, Liniger J, Sørensen FF, Schmidt K, Von Benzon M, Klemmensen SS (2019) Stabilization of a ROV in three-dimensional space using an underwater acoustic positioning system. IFAC-PapersOnLine 52:117–122 12. Cho GR, Park DG, Kang H, Lee MJ, Li JH (2019) Horizontal trajectory tracking of underactuated AUV using backstepping approach. IFAC-PapersOnLine 52:174–179 13. Wu Z, Karimi HR, Shi P (2019) Practical trajectory tracking of random Lagrange systems. Automatica 105:314–322 14. Roy S, Baldi S (2020) Towards structure-independent stabilization for uncertain underactuated Euler-Lagrange systems. Automatica 113:108775 15. Keymasi Khalaji A, Tourajizadeh H (2020) Nonlinear Lyapounov based control of an underwater vehicle in presence of uncertainties and obstacles. Ocean Eng 198:106998 16. Xu H, Hassani V, Guedes Soares C (2019) Uncertainty analysis of the hydrodynamic coefficients estimation of a nonlinear manoeuvring model based on planar motion mechanism tests. Ocean Eng 173:450–459 17. Arya R (2021) Perturbation propagation models for underwater sensor localisation using semidefinite programming. Def Sci J 71:807–815
260
A. Hari et al.
18. Rodriguez J, Castañeda H, Gordillo JL (2020) Lagrange modeling and navigation based on quaternion for controlling a micro AUV under perturbations. Rob Auton Syst 124:103408 19. Elhaki O, Shojaei K (2018) Neural network-based target tracking control of underactuated autonomous underwater vehicles with a prescribed performance. Ocean Eng 167:239–256 20. Cheng M, Guan Q, Ji F, Cheng J, Chen Y (2022) Dynamic detection based trajectory planning for autonomous underwater vehicle to collect data from underwater sensors. IEEE Internet Things J 4662:1–12 21. Wu H, Niu W, Wang S, Yan S, Liu T (2020) Sensitivity analysis of input errors to motion deviations of underwater glider based on optimized response surface methodology. Ocean Eng 209:107400 22. Wu H, Niu W, Wang S, Yan S (2021) Sensitivity analysis of control parameters errors and current parameters to motion accuracy of underwater glider using Sobol’ method. Appl Ocean Res 110:102625 23. Hernández-Sánchez A, Poznyak A, Chairez I, Andrianova O (2021) Robust 3-D autonomous navigation of submersible ship using averaged sub-gradient version of integral sliding mode. Mech Syst Signal Process 149:107169 24. Chi W, Wang J, Ding Z, Chen G, Sun L (2021) A reusable generalized Voronoi diagram based feature tree for fast robot motion planning in trapped environments. IEEE Sens J 25. Hu J, Hu Y, Lu C, Gong J, Chen H (2021) Integrated path planning for unmanned differential steering vehicles in off-road environment with 3D terrains and obstacles. IEEE Trans Intell Transp Syst 1–11 26. Eledlebi K, Ruta D, Hildmann H, Saffre F, Alhammadi Y, Isakovic AF (2020) Coverage and energy analysis of mobile sensor nodes in obstructed noisy indoor environment: a Voronoiapproach. IEEE Trans Mob Comput 1233:1–17
Chapter 20
A Study on Attribute Selection Methodologies in Microarray Data to Classify the Cancer Type S. Logeswari, D. Santhakumar, and D. Lakshmi
1 Introduction The DNA-enabled microarray technology serves as a dynamic platform for researchers to analyze gene expression levels simultaneously. Microarray data analysis can help to overcome the gene expression profiling dilemma [1]. The most common use of microarray data in real time is to classify cancer types. Despite the fact that cancer is a hereditary condition, the cause of cancer must be determined by finding the gene that is responsible for the cancer. As a result, alterations in gene expression levels have occurred. Classifying the acquired samples into the impacted and unaffected categories is a vital process in bioengineering. This necessitates the use of computational approaches for implementation, in which the classification phase is carried out using data samples that have been labeled from the complete sample data, and the prediction model is used for the testing phase for unknown data. The high dimensionality of genetic data makes interpretation easier, yet the sample size is quite small. In such instances, selecting the relevant gene is critical. The training phase data sample size is typically smaller than the feature data sample size for most microarray data, which has an impact on the performance of well-known classifiers on certain types of data. As a result, the identification of specific genes that can distinguish between the two categories could facilitate the construction of successful classifiers [2, 3]. S. Logeswari (B) Karpagam College of Engineering, Coimbatore, Tamil Nadu, India e-mail: [email protected] D. Santhakumar CSE, CK College of Engineering and Technology, Cuddalore, Tamil Nadu, India D. Lakshmi CSE, VIT, Bhopal University, Bhopal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_20
261
262
S. Logeswari et al.
Humanity’s fear of cancer has become unavoidable. Cancer diagnosis techniques are based on the scientific and phonological structure of the disease. A multitude of factors, such as particular chemical interactions, smoking, cigarette consumption, and so on, can cause gene mutations, which can lead to cancer. Cancer can be diagnosed by selecting the relevant gene that functions as a basic reason for the occurrence of cancer. Multiple unknown reasons may also be responsible for cancer, which can be identified by selecting the appropriate gene that acts as a basic reason for the occurrence of cancer. This entire procedure aids in the detection and prediction of a certain cancer kind. Microarray data allows clinicians to recommend a better health chart to their patients, therefore improving their quality of life [4–6]. According to data scientists, the increase in dimensionality of data leads to a concrete increase in data size, which must be steady and trustworthy for analysis. This is known as the “Curse of Dimensionality” (CoD). Microarray data is a solution that seeks to condense a large amount of data into a small number of features or properties. Based on the number of tests conducted on genetic data, it may be concluded that a large amount of genetic data is irrelevant, rendering it useless for categorizing cancer types [2]. A minimal amount of a gene could be beneficial [1, 6, 7]. As a result, certain strategies for cancer classification as well as gene selection are required [4, 6, 8, 9]. As a result, attribute selection serves as a computational task reducer during the classification step [10, 11].
2 Related Works Microarray data analysis is extensively used in cancer categorization. Despite this, due to the curse of sparsity and dimensionality, categorization of gene expression profiles is a difficult endeavor. It is observed from the literature that feature selection is a highly effective strategy for overcoming these challenges [12, 13]. A comparative study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Pattern for Cancer Prediction was given by Liu et al. [4]. They employed six feature selection methods based on entropy theory, χ2 -statistics, and t-statistics in this comparison. Following the selection of features from expression profiles by these methods, the efficacy of these methods is evaluated by comparing the error rates of four classic classification algorithms. Based on gene expression data, Abusamra [10] conducted a comparative assessment of state-of-the-art feature selection approaches, classification methods, and their combinations. They compared the efficiency of three classification methods: support vector machines, k-nearest neighbour, and random forest with eight feature selection methods such as information gain, twoing rule, sum minority, etc. The categorization performance was assessed using five-fold cross validation. In the experiments, two publicly available glioma gene expression data sets were employed. The findings revealed that feature selection has an important role in categorizing gene expression data applied to only these selected features versus all features.
20 A Study on Attribute Selection Methodologies in Microarray Data …
263
Wang et al. [14] introduced mRMR-ICA, a hybrid feature selection method for cancer classification that combines minimum redundancy and maximum relevance (mRMR) with imperialist competition algorithm (ICA). The proposed approach mRMR-ICA uses mRMR [15] as a preprocessing step to remove redundant genes and provide compact datasets for feature selection using ICA. It will employ a support vector machine (SVM) to assess the accuracy of feature gene classification. The accuracy of classification and the quantity of selected genes are both factors in the fitness function. The results show that mRMR-ICA can successfully eliminate redundant genes, allowing the algorithm to select fewer informative genes for higher classification results. It can also reduce calculating time and increase efficiency. Using Bacterial Colony Optimization with multi-dimensional population (BCOMDP), Wang et al. (2019) suggested a feature selection for classification of microarray gene expression malignancies. In comparison to feature selection algorithms with given desired subset sizes, experimental results show that the proposed method constructed with a population comprising multi-dimensionality variables is more effective in selecting feature subsets while being less computationally demanding. Jain et al. [16] suggested a binary hybrid strategy for classifying cancer types by combining correlation-based feature selection (CFS) with enhanced binary PSO (iBPSO), and Goa et al. [17] proposed a hybrid filter and wrapper method for diagnosis models to attain both high accuracy and efficiency. To obtain the global optimal parameters, it employs Particle Swarm Optimization (PSO) and the Fruit Fly Optimization Algorithm (FOA). The proposed method outperforms standard methods in terms of test quality and feature dimension, according to the findings of the experiments.
3 Methodology Machine learning approaches have aided in the prediction and treatment of numerous genetic and biological disorders to a considerable extent [5, 8]. The ability to determine the sort of sickness that happened from a given input dataset has become a high expectation in the field of biomedical research. Machine learning techniques were required to analyze and predict the kind of disease due to the increasing structure in genetic data [8]. From the large amount of microarray genetic data, a small number of genes will be chosen to classify the phenotype. In this work, the following algorithms are used for comparative analysis in attribute selection.
3.1 Ant Colony Optimization (ACO) for Attribute Selection Dorigo et al. presented Ant Colony Optimization [18] as a meta-heuristic technique inspired by the requirement for a unique phenomenon that must incorporate all of the
264
S. Logeswari et al.
functioning principles of an ant system at the end of the 1990s. The implementation of this technique is mostly targeted at certain optimization problems. ACO not only produces superior results, but it also adheres to a predetermined computing time that can be extended dependent on the importance of the time. The amount of pheromone dropped by the ants in their path determines the shortest path. The problem that ACO should solve should be expressed in a graphical fashion with edges and nodes. There should be a heuristic desirability (ï) for each edge. During the transition phase, rules must be framed to update the edges utilizing pheromone levels, which should be a probabilistic one. According to the ACO graphical depiction, the ant starts at node ‘a’ and seeks to find a variable to include in the traversal to the next site. The attributes b, c, and d are chosen as needed. The journey will be terminated once the final state d is reached, and the attribute subset will be declared as a contender. The chance of ants travelling from attribute i to j may be calculated using the Eq. (1). ⎧ α α ⎨ ∑ [τi j (t)] ·[ηαi j ] α if j ∈ J k i k τ · η (t) [ ] [ i j ij] k i∈Ji pi j (t) = ⎩ Otherwise 0
(1)
k = number of ants. ηij = possibility of selecting attribute j from i. Jik = combination of nearer nodes from i which are about to travel. α, β = factors for regulating the combinatory outcomes. τ ij = entire pheromone that represents edge virtually.
3.2 Ant Lion Optimization for Attribute Selection ALO is a nature-inspired algorithm that mimics the foraging behavior of antlion larvae. An antlion’s life can be divided into two stages: larvae and adulthood. Trapping is a key element in the early stages. By performing a spherical movement that works as a trap, an Antlion builds a cone. Roulette Wheel Selection (RWS) is used for the selection of antlions based on their fitness value. The shape is directly proportional to the intensity of hunger and the form of the moon. When the demand is met, it captures the food in the hole generated by the spherical motion and throws it out. The ants’ random walk is determined by updating pit up spots around the search region at each iteration [19], as shown in Eq. (2). X (t) = [0, cumsum(2r (t1) − 1), cumsum(2r (t2) − 1), . . . cumsum(2r (tn) − 1)] (2) where cumsum denotes the sum of the aggregation, n denotes the highest iterative term, t denotes the real iterative term, and r(t) denotes the stochastic form, where the random value is minimal up to 0.5. According to Fig. 1, the fittest antlion that changes
20 A Study on Attribute Selection Methodologies in Microarray Data …
265
Fig. 1 Flowchart—steps in ALO attribute selection
itself during each repetition is designated as Elite one and it will be displayed as a result for the entire trapping phase.
3.3 Support Vector Machines for Attribute Classification Vapnik et al. [20] proposed a support vector machines-based estimation and classification methodology that has attracted a lot of attention for classifying a given data set into binary form based on data samples that are tagged as support vectors. In other words, SVM is capable of doing correct classification for linearly separable data samples. Even for non-linear data samples, the dimension of the data can be
266
S. Logeswari et al.
Fig. 2 Linear classification using SVM
raised to resolve the non-complexity linearity and allow for SVM-based classification. It builds a hyperplane from the given training data samples using the Lagrange theory of optimization. If a hyperplane properly separates the input data samples, it is considered to be optimal. The Lagrange equation for linearly separable classes can be denoted as follows: { [ ] +1 W T X + b > 0 (3) Y = sgn W T X + b > 0 = −1 W T X + b ≤ 0 The profit margin is maximized through the use of support vectors (see Fig. 2). The data sample can be found in any of the two categories, + 1 or −1, and the margin’s center point is always 0.
3.4 Correlation Based Attribute Selection (CAS) Instead of analyzing each individual attribute based on its value and ranking, “correlation based selection” was offered as an analytical technique for a concrete evaluation. Because the quantity of qualities for microarray genetic data is enormous, it serves as an initial level searching tool for identifying the useful features. CAS [21] chose a batch of the highest correlation for class and the lowest correlation for characteristics. CAS computes a correlation matrix using the training data set. The following Eq. (4) is used to compute the score allocation for the batch of variables: Merits = √
kr c f k + k(k − 1)r f f
(4)
where Merit s represents the analytical merit of a batch variable S that has K features and cf denotes a combined interconnection of features and classes. ff is a mean
20 A Study on Attribute Selection Methodologies in Microarray Data …
267
association of features. The most important factor of using CAS for selecting the genes is to predict the microarray dataset earlier through the correlation of batches of genes.
4 Results and Discussion A dataset of leukaemia has been selected for experimentation in order to evaluate the performance of the proposed approaches. The experiment was first carried out to identify the properties that are measured using performance metrics standards. Finally, the selected attributes are subjected to the SVM classifier, and the results are obtained. For each of the attribute selection approaches, performance measures such as accuracy, sensitivity, specificity, and f-measure are calculated. The results are depicted in Table 1. Figure 3 shows that technique without attribute selection has a 79% accuracy, CFS has an 83% accuracy, ACO based attribute selection has an 85% accuracy, and ALO based attribute selection has a 91% accuracy. The comparison based on the performance measures; sensitivity, specificity and F-measure are depicted in the Fig. 4 for the proposed methods. It is observed from Table 1 Performance measures Measures
No attribute selection
CAS
ACO-attribute selection
ALO-attribute selection
Accuracy
79
83
85
91
Sensitivity
0.86
0.90
0.86
0.95
Specificity
0.91
0.89
0.92
0.95
F-measure
0.86
0.86
0.86
0.93
Fig. 3 Comparison based on accuracy
268
S. Logeswari et al.
Fig. 4 Comparison of performance measures
the experimental results that the ALO based feature selection gives better results than the other methods.
5 Conclusion This research investigates and presents various techniques for identifying relevant genes to classify cancer types. Methods for classifying microarray gene data have also been discussed in order to diagnose the susceptibility sooner rather than later. Various studies were carried out to measure the achievement of methods engaged in classification and those not involved in any such methods in order to observe the efficiency of approaches for selecting genes. The findings highlight the importance of approaches for selecting relevant genes, which is even more important when using genetic data to determine cancer types. The total evaluation process revealed that the ALO-based attribute selection methodology made a significant contribution to diagnosis, which may be further enhanced in the future with a fuzzy-based ranking system to arrange vulnerable kinds from genetic data.
References 1. Osareh A, Shadgar B (2010) Microarray data analysis for cancer classification. In: 2010 5th international symposium on health informatics and bioinformatics (HIBIT), pp 125–132 2. Aziz R, Verma CK, Srivastava N (2017) Dimension reduction methods for microarray data: a review. AIMS Bioeng 4(1):179–197 3. Wall ME, Rechtsteiner A, Rocha LM (2003) A practical approach to microarray data analysis. Kluwel, Norwell, MA, Chapter 5, pp 91–109
20 A Study on Attribute Selection Methodologies in Microarray Data …
269
4. Liu H, Li J, Wong L (2002) A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform 13:51–60 5. Bhola A, Tiwari AK (2015) Machine learning based approaches for cancer classification using gene expression data. Mach Learn Appl Int J MLAIJ 2(3/4):1–12 6. Zhang J, Deng H (2007) Gene selection for classification of microarray data based on the Bayes error. BMC Bioinform 8:370 7. Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinform. https://doi.org/10.1155/2015/198363 8. Pirooznia M, Yang J, Yang M, Deng Y (2008) A comparative study of different machine learning methods on microarray gene expression data. BMC Genom 9(S1):S13 9. Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437 10. Abusamra H (2013) A comparative study of feature selection and classification methods for gene expression data of glioma. In: 4th international conference on computational systemsbiology and bioinformatics, pp 5–14 11. Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: Proceedings of 18th international conference machine learning 12. Abusamra H (2013) A comparative study of feature selection and classification methods for gene expression data of glioma. Procedia Comput Sci 23:5–14 13. Almugren N, Alshamlan H (2019) A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE Access 7:78533–78548 14. Wang S, Kong W, Deng J, Gao S, Zeng W (2018) Hybrid feature selection algorithm mRMRICA for cancer classification from microarray gene expression data. Comb Chem High Throughput Screen 21(6):420–430 15. Alshamlan H, Badr G, Alohali Y, mRMR-ABC (2015) A hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. BioMed Res Int 2015, 15 pp. Article ID 604910. https://doi.org/10.1155/2015/604910 16. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215 17. Gao X, Liu X (2018) A novel effective diagnosis model based on optimized least squares support machine for gene microarray. Appl Soft Comput 66:50–59 18. Dorigo M, Di Caro G (1999) Ant colony optimization: a new metaheuristic. In: Proceedings of the 1999 congress on evolutionary computation-CEC99 (Cat. no. 99TH8406), vol 2. IEEE, pp 1470–1477 19. Mafarja MM, Mirjalili S (2019) Hybrid binary ant lion optimizer with rough set and approximate entropy reducts for feature selection. Soft Comput 23:6249–6265 20. Vapnik V (1998) Statistical learning theory. Wiley-Interscience, NY, USA 21. Alshamlan HM (2018) Co-ABC: correlation artificial bee colony algorithm for biomarker gene discovery using gene expression profile. Saudi J Biol Sci 25:895–903
Chapter 21
Fine Tuning the Pre-trained Convolutional Neural Network Models for Hyperspectral Image Classification Using Transfer Learning Manoj Kumar Singh
and Brajesh Kumar
1 Introduction In recent years, hyperspectral image (HSI) processing has become a rapidly evolving field in remote sensing and other applications. Cao et al. [1] presented evolving areas of remote sensing. Hyperspectral imaging is concerned with the analysis and interpretation of spectra obtained from a specific scene by an airborne or satellite sensor at a short, medium, or long distance. With hundreds of spectral channels and spectral resolution of the order of 10 nm, this system can cover the wavelength range from 0.4 to 2.5 µm. As all materials have some chemical properties, Kumar et al. [2] utilize a hyperspectral signature to recognize individual absorption properties of different materials. It is, therefore, useful to identify fine changes in the reflectance of vegetation, soil, water, minerals, etc. For more precise and detailed information extraction, hyperspectral images provide adequate spectral information to identify and discriminate spectrally similar materials. HSI classification is a technique for constructing thematic maps from remotely sensed hyperspectral images. A thematic map depicts the ground surface materials e.g. plants, soil, roof, road, water, concrete, etc. with distinguishable themes. Kumar [3] and Kumar et al. [4] explained a wide range of effective HSI classification schemes based on spectral and spatial information. Paoletti et al. [5] and Li et al. [6] presented that deep learning-based classification models have recently been brought to the hyperspectral community, and they appear to hold a lot of promise in the field of remote sensing classification. Convolutional neural networks (CNNs) have been proved to be an effective technique for hyperspectral image classification. Wu and M. K. Singh (B) · B. Kumar Department of Computer Science & Information Technology, MJP Rohilkhand University, Bareilly, India e-mail: [email protected] B. Kumar Atal Center for Artificial Intelligence, MJP Rohilkhand University, Bareilly, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_21
271
272
M. K. Singh and B. Kumar
Prasad [7] introduced a hybrid hyperspectral image classification model in which the few initial convolutional layers extract location invariant middle-level features, followed by recurrent layers that extract spectral-contextual information. Ding et al. [8] cropped patches from 2D input images to train a 2D-CNN architecture that learns the data-adaptive kernels on its own. Chen et al. [9] integrated the Gabor filtering approach with 2D-CNN to tackle the overfitting problem caused by limited training data. The Gabor filtering substantially reduces the overfitting problem by extracting spatial features such as edges and textures. Ran et al. [10] introduced a spatial pixel pair feature as an improved pixel pair feature. Gao et al. [11] proposed a CNN architecture for HSI classification. They reduced high correlation among HSI bands by transforming the 1D spectral vector to a 2D feature matrix and cascading composite layers. Li et al. [12] extracted the first principal component with enhanced spatial information using PCA and then passed it to a full CNN framework for classification. Zhong et al. [13] proposed a supervised spectral-spatial residual network, which extracts discriminative joint representation using a sequence of 3D convolutions in the corresponding spectral and spatial residual blocks. Instead of assigning fixed weights to incorporate spatial features, Li et al. [14] used an adaptive weight learning technique to reflect the variations in spatial contexture in diverse hyperspectral patches. Roy et al. [15] investigated a new architectural design that can adaptively find adjustable receptive fields and then an upgraded spectral-spatial residual network for joint feature extraction to make the convolutional kernel more flexible. The considerable improvement in the performance achieved on benchmark data sets has contributed to their demand. Singh et al. [16] used a Markov random field to improve the classification map produced by 3D CNN. However, the need for large training sets and high-end computing resources is the major concern of the CNNbased methods. The concept of transfer learning can help to greatly reduce these requirements. Marmanis et al. [17] and Romero et al. [18] fine-tuned a previously trained CNN model for a new purpose. Such previously trained networks are known as pre-trained models. Yosinski et al. [19] showed that the pre-trained models alleviate the problem of large training samples required by deep learning-based techniques. The performance of a model increases with the use of the transfer learning technique as it makes use of previous knowledge. The information from a genuine/reliable source is shifted to the targeted domain to acquire the knowledge of the data that is unseen or unlabeled. This makes transfer learning most effective to use in areas where training data samples availability is not in abundance. Both the source and target domains are expected to be related but not identical in most cases. They may, however, have different distributions and the data in the two domains may differ due to different collection circumstances.
21 Fine Tuning the Pre-trained Convolutional Neural Network Models …
273
The goal of this work is to assess the efficacy of some pre-trained CNN models for HSI classification. Several pre-trained CNN networks including efficientNetB0, efficientNetB7, ResNet50, VGG19, DenseNet121, and DenseNet201 are fine-tuned and evaluated on two hyperspectral images Houston and KSC. The findings of this work will make it easier for remote sensing researchers to use powerful pre-trained CNN classifiers.
2 Methodology Among the various deep learning methods, CNN is the method that has been the most popular in the field of computer vision and remote sensing. CNN method is highly effective for processing data represented in a grid structure, such as photographs. CNN is based on the anatomy of an animal’s visual brain and is designed to automatically and adaptively learn spatial hierarchies of information, from low-level to highlevel patterns. Convolutional neural networks (CNNs) have mainly three different types of layers: (1) convolution layer, (2) pooling layer, and (3) fully connected layers. The first two layers, the convolution and pooling layer, extract features, while the third, a fully linked layer, transfers those features into the final output, such as classification. The convolution layer, which is made up of a sequence of mathematical operations like convolution, is a key aspect of CNN. Convolution is a feature extraction process that uses a tiny array of numbers called a kernel to apply to the input. As input, a tensor of numbers is employed. At each point of the tensor, compute an element-wise product between each element of the kernel and the input tensor and add it to the output value at the corresponding place of the output tensor to produce a feature map. Convolution operation on a 2-D image I with a 2-D kernel K is performed as S(i, j ) = (I ∗ K )(i, j ) =
∑ ∑ m
n
I (m, n)K (i − m, j − n)
(1)
Using several kernels, this process is repeated to generate an arbitrary number of feature maps that reflect various aspects of the input tensors. As a result, different kernels might be viewed as separate feature extractors. A stride is a distance between two consecutive kernel points, and it also specifies the convolution procedure. The stride of 1 is the most usual choice; however, the stride greater than 1 is occasionally used to accomplish feature map downsampling. A nonlinear activation function is applied to the output of convolution that helps a model adapt to a variety of data. The most frequent activation function used is the rectified linear unit (ReLU). To reduce the spatial amount of input, the pooling layer is typically employed after a convolution layer. It is applied separately to each depth slice of the input volume. A kernel/size filter is a matrix that is applied to all of the incoming data. The pooling layer reduces the size of feature maps according to the formula shown below:
274
M. K. Singh and B. Kumar
(n h − f + 1)/s ∗ (n w − f + 1) ∗ n c
(2)
where, nh , nw, and nc are height, width, and the number of channels of the feature map respectively. Filter size is represented by f and s is the stride length. Pooling is also advantageous for obtaining rotational and positional invariant dominant features, which aids the model’s training process. Two typical pooling operations are maximum (Max) pooling and average pooling. The most frequently used maxpooling method selects the maximum value from the portion of the picture overlapped by the kernel. On the other hand, the average pooling operation calculates the average of all the values from the section of the image covered by the kernel. The number of such layers can be increased depending on the image complexity but at the cost of more processing resources. The output feature maps of the last convolution or pooling layer are usually flattened and turned into a one-dimensional (1D) array. Flattened features are projected onto one or more dense layers, which are fully connected. A learnable weight connects each input to each output in a dense layer. In the final fully linked layer, the number of output nodes is usually equal to the number of classes that produce the final output. Transfer learning is a machine learning technique that seeks to apply knowledge from one task (source task) to a different but similar activity (target task). It is the process of improving learning in a new activity by transferring knowledge from a previously acquired related task. With the help of transfer learning, the need for a large number of training samples and high-end computing resources in CNN models can be reduced. Various pre-trained CNN models can be fine-tuned for hyperspectral image classification. The VGG19 is a 19-layer deep convolutional neural network. Karen Simonyan and Andrew Zisserman [20] built and trained VGG19 at the University of Oxford in 2014. The VGG19 network was trained using over 1 million pictures from the ImageNet collection. VGG19 network has been trained to classify up to 1000 items. Color images with a resolution of 224 × 224 pixels were used to train the network. Total trainable parameters in VGG19 are 143 million. In 2019, Google trained and released EfficientNet, a cutting-edge convolutional neural network, to the general public. There are eight different implementations of EfficientNet (B0 to B7). The number of layers in efficientnetB0 is 237 and in EfficientNetB7 the total layer comes out to 813. EfficientNetB0 has a total of 53 million trainable parameters. Total trainable parameters in EfficientNetB7 are 66 million. ResNet50 is a CNN with 50 layers of depth. Microsoft designed and trained it in 2015. In a feed-forward approach, DenseNet connects each layer to every other layer. Dense layers are linked by dense circuitry, in which each dense layer is capable of receiving feature maps from previous layers and passing them on to subsequent layers. In DenseNet121 there are 121 layers and 8 million trainable parameters while denseNet201 has 201 layers and 20 million trainable parameters. In this paper, a pre-trained CNN model-based classification scheme is proposed for hyperspectral imagery as shown in Fig. 1. The pre-trained networks accept only three band images. Therefore, the input hyperspectral image is first reduced using principal component analysis. The reduced image is then sent to the CNN model,
21 Fine Tuning the Pre-trained Convolutional Neural Network Models … Fig. 1 Proposed classification scheme
Hyperspectral Image
Dimensionality reduction
275 Ground reference
Training set
Training set
using PCA
Train Pretrained CNN model on new Training set
Accuracy analysis
(Fine Tuning)
Classify Image
Classification map
which is fine-tuned with the training samples from the ground reference image. Once the CNN model is fine-tuned, it can classify the input image. The results are evaluated on the test set.
3 Experiments and Results The experiments are performed to evaluate the performance of the six different pre-trained CNN models: EfficientNetB0, EfficientNetB7, ResNet50, VGG19, DenseNet121, and DenseNet201. All tests are performed on a computer with a Xeon 4.5 GHz processor, 64 GB RAM, and an 8 GB Quadro P4000 GPU. The code is written in Python and uses a variety of frameworks including Keras and TensorFlow.
3.1 Data Sets Two well-known hyperspectral images, Houston University and KSC are used in the experiments. An aerial ITRES-Compact Airborne Spectrographic Imager 1500 hyperspectral imager was used to capture the image of the University of Houston campus and the nearby urban area. The image has spatial dimensions of 1905 × 349 pixels with a spatial resolution of 2.5 m per pixel. It consists of a total of 15 classes. The FCC and ground reference of the Houston University image is given in Fig. 2. There are two standard sample sets for Houston University. Both these sets are combined in this work to form the ground reference as shown in Fig. 2b. The second image was acquired using the AVIRIS instrument over the KSC in Florida, USA, in 1996. After removing noisy bands, the final image contains 176 bands with a 512 × 614 size, ranging from 400 to 2500 nm, and 20 m spatial resolution. The ground truth available contains 13 classes. Figure 3 shows the FCC and ground reference of the KSC image. The class wise number of training and test samples for both images are shown in Table 1.
276
M. K. Singh and B. Kumar
(a)
(b)
C1
C2
C3
C4
C5
C6
C7
C9
C10
C11
C12
C13
C14
C15
C8
Fig. 2 Houston University (a) FCC (b) ground reference
(a)
(b)
C1
C2
C3
C4
C5
C6
C8
C9
C10
C11
C12
C13
C7
Fig. 3 KSC (a) FCC (b) ground reference
3.2 Results and Discussion The accuracy of a classification and other parameters are examined in the findings. The accuracy of classification is measured in terms of overall kappa and overall accuracy (OA). The ground reference is used to select the training pixels randomly. For both images, 10% of the samples are used for training, and the remaining samples are retained for testing. The evolution of training and validation accuracy as a function
21 Fine Tuning the Pre-trained Convolutional Neural Network Models …
277
Table 1 Details of information classes in hyperspectral images Kennedy space center
Houston university Class
Training
Testing
Class name
Training
Testing
Healthy grass
125
1126
Scrub
76
685
Stressed grass
125
1129
Willow swamp
24
219
Synthetic grass
70
627
CP hammock
26
230
Trees
124
1120
Slash pine
25
227
Soil
124
1118
Oak/Broadleaf
16
145
Water
33
292
Hardwood
23
206
Residential
127
1141
Swamp
11
94
Commercial
124
1120
Graminoid marsh
43
388
Road
125
1127
Spartina marsh
52
468
Highway
123
1104
Cattail marsh
40
364
Railway
124
1111
Salt marsh
42
377
Parking Lot 1
123
1110
Mud flats
50
453
Water
93
834
Parking Lot 2
47
422
Tennis Court
43
385
Running Track
66
594
of the number of epochs is depicted in Figs. 4 and 5 for all the tested models for Houston University and KSC images respectively. It can be very clearly observed that the training and validation accuracy curves are very similar indicating that pre-trained models are well fit for the problem considered in this work. Classification performance. The classification accuracy of all six pre-trained models for the Houston University image is presented in Table 2. Both class wise and overall accuracy values are reported for all six pre-trained models. It can be seen that the EfficientNetB0 gives the best results. The overall kappa and OA values for EfficientNeB0 are 0.9004 and 90.79% respectively. Most of the classes except Trees, Water, and Running track provide good classification accuracy. EfficientNetB7 and DenseNet121 are other models that exhibited similar performance. VGG19 is the model with the lowest accuracy of 83.14%. VGG19 produces its maximum accuracy of 96.03% for the Tennis Court class. For class water, all models except EfficientNetB0 give an accuracy of less than 70%. Based on the results it is highly recommended to use EfficientNetB0 for performing classification on images containing large areas with water and if training samples of water are very less. Similarly, for class Tree, DenseNet121 and EfficientNetB7 should be selected for classifying images having trees in a large area as they give the highest accuracy of 88.34 and 87.54% respectively. A similar type of observation can be done on other classes which assist in selecting the effective model for performing classification on the targeted image. The classification maps for Houston University are shown in Fig. 6 for visual inspection.
278
M. K. Singh and B. Kumar
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 4 Training accuracy and validation accuracy for Houston image. a Densenet121, b Densenet201, c EfficientnetB0, d EfficientnetB7, e Resnet50, f VGG19
(a)
(d)
(b)
(e)
(c)
(f)
Fig. 5 Training accuracy and validation accuracy for KSC image. a Densenet121, b Densenet201, c EfficientnetB0, d EfficientnetB7, e Resnet50, f VGG19
The classification results for the KSC image are reported in Table 3. For the KSC image, the VGG19 produces the highest OA 95.77%. It gives a global kappa value of 0.9530. Interestingly DenseNet121, which is one of the best performing models for Houston Image, does not perform well for KSC. For Graminoid marsh, DenseNet121 gives a very poor accuracy of 35.50%. All six models perform well
21 Fine Tuning the Pre-trained Convolutional Neural Network Models …
279
Table 2 Classification accuracy (OA in % and kappa) for Houston University Pretrained CNN models
Class
ENB0
ENB7
RN50
DN121
DN201
VGG19
Healthy grass
89.37
87.93
77.94
88.81
86.57
88.73
Stressed grass
88.12
79.59
87.72
87.48
77.35
78.39
Synthetic grass
95.12
96.41
95.55
97.13
97.85
92.97
Trees
81.75
87.54
74.28
88.34
77.09
69.37
Soil
95.73
95.81
93.80
94.69
96.62
93.48
Water
83.38
67.69
60.31
59.69
57.23
60.62
Residential
94.56
88.41
86.99
84.38
86.36
68.38
Commercial
86.17
85.29
84.65
83.12
83.04
79.82
Road
88.26
92.49
87.54
83.47
91.69
84.98
Highway
99.51
95.68
96.58
97.80
97.88
89.81
Railway
99.43
94.09
99.27
97.89
97.73
92.39
Parking Lot 1
87.75
86.78
91.08
95.13
94.00
86.29
Parking Lot 2
91.26
87.42
82.94
89.34
92.54
66.31
Tennis Court
100
98.13
99.77
99.77
100
96.03
Running Track
78.64
92.88
67.88
87.73
82.42
87.88
Overall OA (%)
90.79
89.54
87.00
89.89
88.69
83.14
Overall Kappa
0.9004
(a)
(b)
0.8869
(c)
0.8593
(d)
0.8907
(e)
0.8776
0.8175
(f)
Fig. 6 Houston University classification maps: a DenseNet121, b DenseNet201, c EfficientNetB0, d EfficientNetB7, e ResNet50, f VGG19
280
M. K. Singh and B. Kumar
for Scrub, Saltmarsh, Mudflats, and water classes. For class Cattail marsh, EfficientNetB0, EfficientNetB7 and DenseNet201 give 100% accuracy. Figure 7 shows the classification maps for the KSC image. Pre-trained models perform well for the hyperspectral images but different models perform differently for different images. Table 3 Classification accuracy (OA in % and kappa) for Kennedy Space Center Pretrained CNN models
Class Scrub
ENB0
ENB7
RN50
DN121
DN201
VGG19
98.16
95.27
97.90
92.51
98.82
97.77
Willow swamp
78.19
82.72
97.94
73.66
64.20
92.18
CP hammock
71.48
85.94
66.80
38.67
79.30
93.75
Slash pine
88.89
91.67
84.92
89.29
88.89
81.75
Oak/Broadleaf
97.52
96.89
87.58
93.79
100
100
Hardwood
53.28
100
96.07
74.24
76.86
92.58
Swamp
77.14
100
95.24
63.81
76.19
98.10
Graminoid marsh
50.12
59.16
87.24
35.50
93.50
87.70
Spartina marsh
50.96
98.85
99.62
97.69
94.62
96.92
Cattail marsh
100
100
99.50
85.40
100
95.05
Salt marsh
99.76
100
98.57
95.47
99.28
99.05
Mud flats
98.41
98.61
98.81
96.42
100
99.20
Water
99.14
98.38
99.46
99.89
99.78
99.35
Overall OA (%)
84.84
93.39
95.12
84.66
93.93
95.77
Overall Kappa
0.8299
0.9265
0.9324
0.9530
0.9456
0.8288
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 7 KSC classification maps: a DenseNet121, b DenseNet201, c EfficientNetB0, d EfficientNetB7, e ResNet50, f VGG19
21 Fine Tuning the Pre-trained Convolutional Neural Network Models …
281
The classification accuracy is plotted against the number of epochs in Figs. 4 and 5. It is observed from the graphs that classification accuracy becomes stable after a small number of epochs. In most cases, only ten epochs are good enough for finetuning a pre-trained CNN model for hyperspectral image classification. Therefore, it also alleviates the need for high-end computing resources. Effect of learning rate. The effect of the learning rate on classification accuracy can be observed in Fig. 8, where classification accuracy is plotted against the learning rate. The learning rate varies from 0.0001 to 0.1. It is observed from the figure that most of the models give their best accuracy at 0.1 or 0.01. For both images, accuracy sharply decreases at a learning rate of 0.0001. Impact of training samples. Experiments are performed to observe the effect of training samples on classification accuracy. Training set size varies from 1 to 40%. The impact of the training set size on classification accuracy is shown in Fig. 9. It can be observed that accuracy increases with the increase in training sample size. It can be seen that accuracy increases sharply between 1 and 4% training size, the rate of increase in accuracy is moderate between 5 and 10%, and increase in accuracy becomes very slow between 10 and 40% training sample size. Overall, it can be concluded from this experiment that pre-trained CNN models can give good classification accuracy with a smaller training set. Impact of batch size. Figure 10 shows the impact of batch size on accuracy. Experiments are carried out to observe how classification accuracy gets affected by the change in the batch size. Batch size varies from 16 to 512. It is selected as the power of 2. The output of the experiment shows that all models produce the highest accuracy at batch size 32. As batch size increases accuracy starts decreasing, and the lowest accuracy is obtained at batch size 512. Therefore, it is observed that a smaller batch size gives better results for classification.
(a)
Fig. 8 Effect of learning rate a Houston b KSC
(b)
282
M. K. Singh and B. Kumar
(a)
(b)
Fig. 9 Effect of training samples a Houston, b KSC
(a)
(b)
Fig. 10 Effect of batch size a Houston, b KSC
4 Conclusion In this paper, six pre-trained CNN models were fine-tuned for hyperspectral image classification. The experiments were carried out on Houston University and KSC images. It was observed from the experimental results that EfficientNetB0 produced the highest accuracy of 90.79% for the Houston image and VGG19 yielded the highest accuracy of 95.79% for the KSC image. It is found that only a small number of epochs are needed to fine-tune the pre-trained model for hyperspectral classification to get good accuracy. The pre-trained models can perform well with a smaller training set and give effective results using transfer learning. One single model cannot be attributed as the best model that can be applied to all types of images. Instead, a model can be selected based on the image content. The experiment results shown in this paper will be useful for researchers in selecting the pre-trained networks according to the image content which is to be classified. For example, the pretrained network ENB7 should be selected instead of ENB0 for classifying the image containing mostly hardwood material, as we can see in Table 3, ENB0 gives only 53.28% accuracy while ENB7 produces 100% accuracy for the hardwood class.
21 Fine Tuning the Pre-trained Convolutional Neural Network Models …
283
References 1. Cao X, Yao J, Xu Z, Meng D (2020) Hyperspectral image classification with convolutional neural network and active learning. IEEE Trans Geosci Remote Sens 58(7):4604–4616 2. Kumar B, Dikshit O (2017) Spectral contextual classification for hyperspectral imagery with probabilistic relaxation labeling. IEEE Trans Cybern 47(12):4380–4391 3. Kumar B (2020) Hyperspectral image classification using three-dimensional geometric moments. IET Image Proc 14(10):2175–2186 4. Kumar B, Dikshit O, Gupta A, Singh MK (2020) Feature extraction for hyperspectral image classification: a review. Int J Remote Sens 41(16):6248–6287 5. Paoletti M, Haut JM, Plaza J, Plaza A (2019) Deep learning classifiers for hyperspectral imaging: a review. ISPRS J Photogramm Remote Sens 158:279–317 6. Li S, Song W, Fang L, Chen Y, Ghamisi P, Benediktsson JA (2019) Deep learning for hyperspectral image classification: an overview. IEEE Trans Geosci Remote Sens 57(9):6690–6709 7. Wu H, Prasad S (2017) Convolutional recurrent neural networks for hyperspectral data classification. Remote Sens 9(3):298 8. Ding C, Li Y, Xia Y, Wei W, Zhang L, Zhang Y (2017) Convolutional neural networks based hyperspectral image classification method with adaptive kernels. Remote Sens 9(6):618 9. Chen Y, Zhu L, Ghamisi P, Jia X, Li G, Tang L (2017) Hyperspectral images classification with gabor filtering and convolutional neural network. IEEE Geosci Remote Sens Lett 14(12):2355– 2359 10. Ran L, Zhang Y, Wei W, Zhang Q (2017) A hyperspectral image classification framework with spatial pixel pair features. Sensors 17(10):2421 11. Gao H, Yang Y, Li C, Zhou H, Qu X (2018) Joint alternate small convolution and feature reuse for hyperspectral image classification. ISPRS Int J Geo Inf 7(9):349 12. Li J, Zhao X, Li Y, Du Q, Xi B, Hu J (2018) Classification of hyperspectral imagery using a new fully convolutional neural network. IEEE Geosci Remote Sens Lett 15(2):292–296 13. Zhong Z, Li J, Luo Z, Chapman M (2018) Spectral–spatial residual network for hyperspectral image classification: a 3-d deep learning framework. IEEE Trans Geosci Remote Sens 56(2):847–858 14. Li S, Zhu X, Liu Y, Bao J (2019) Adaptive spatial-spectral feature learning for hyperspectral image classification. IEEE Access 7:61534–61547 15. Roy SK, Manna S, Song T, Bruzzone L (2021) Attention-based adaptive spectral-spatial kernel resnet for hyperspectral image classification. IEEE Trans Geosci Remote Sens 59(9):7831– 7843 16. Singh MK, Mohan S, Kumar B (2021) Hyperspectral image classification using deep convolutional neural network and stochastic relaxation labeling. J Appl Remote Sens 15(4):042612 17. Marmanis D, Datcu M, Esch T, Stilla U (2016) Deep learning earth observation classification using ImageNet pretrained networks. IEEE Geosci Remote Sens Lett 13(1):105–109 18. Romero A, Gatta C, Camps-Valls G (2016) Unsupervised deep feature extraction for remote sensing image classification. IEEE Trans Geosci Remote Sens 54(3):1349–1362 19. Yosinski J, Clune J, Bengio Y, Lipson H (2014). How transferable are features in deep neural networks? In: Proceedings of the advances in neural information processing systems, pp 3320– 3328 20. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations. dblp, San Diego, CA, USA, pp 1–14
Chapter 22
Computational Prediction of Plastic Degrading Microbes Using Random Forest N. Hemalatha, W. Akhil, Raji Vinod, and T. Akhil
1 Introduction Plastic is nothing but a polymeric material. Plastic pollution has become one of the most stressing environmental issues, as the rapidly increasing production of disposable plastic products overwhelms the world’s ability to deal with them. We cannot think of a life without plastic. Plastics revolutionized medicine with lifesaving devices, made space travel possible, lightened cars, and jets saved fuel and pollution, and also saved lives with helmets, incubators, and equipment for clean drinking water but on the other side, it makes a lot of environmental pollution [1]. Practically it is difficult to avoid plastic completely from our daily life. The only solution to control plastic pollution is degrading the plastic products rather than throwing them into the surroundings. Using the proper management, we can reduce the pollution in the environment more than what plastic creates. Streptococcus, Micrococcus, Staphylococcus, Moraxella, and Pseudomonas are some of the plastic-degrading microbes found in Indian mangrove soil [2]. This outcome was a result of Japanese scientists in the year 2016 in which they found that a bacterium can easily break the plastic polyethylene terephthalate (PET) [3]. Further, they also found another bacterium called Ideonella sakaiensia obtained from the genus Ideonella and from the family Comamonadaceae which can break the plastic polyethylene terephthalate (PET) [4]. Once the bacterium acts, PET gets broken down N. Hemalatha St Aloysius College of Management and Information Technology, Kotekar, Karnataka, India W. Akhil (B) Lulu Financial Group, Kakkanad, Kerala, India e-mail: [email protected] R. Vinod Techtern Pvt Ltd, Kannur, Kerala, India T. Akhil TCS, Kochi, Kerala, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_22
285
286
N. Hemalatha et al.
into two, i.e., ethylene glycol and DMT which can be used to create other materials [3]. In this paper, we have worked on developing a computational prediction tool with regard to plastic-degrading protein sequences [5, 6]. Also have developed a database that consists of protein sequences collected from previous works which can degrade plastics. This research paper has been distributed like this: part 2 represents the different materials and also the methodology used to carry out this work. Then followed by the results and discussions in part 3 and the paper is concluded in part 4.
2 Methodology This section describes the dataset used, different algorithms, and features used for generating the computational tool.
2.1 Dataset Plastic degrading protein sequences belonging to the alkB and CYP153 genes were accumulated from databases such as NCBI and UniProtKB. Around nine thousand positive protein sequences and six thousand negative sequences were obtained. For training data, we used the ratio of 60:40 for positive and negative sequences. This ratio was used for Holdout validation.
2.2 Features Used Six different features, namely, amino acid counts, dipeptide counts, amino acid ratio, hydrophobicity, hydrophilicity, acidity, and basicity from microbial protein sequences were used for developing prediction. Features are explained in the below sections. Amino acid count. In the amino acid count, we took the count of all the 20 amino acids in each microbial protein sequence making the dimension size 20 (Table 1) [7]. Equation used is A.Aount = N (ai ) wher e[i = 1 to 20]
(1)
Amino acid ratio. Here, the count of each amino acid in a sequence was divided by the length of the protein sequence [8]. For this feature dimension was 20 (Table 1). Equation used is
22 Computational Prediction of Plastic Degrading Microbes Using …
287
Table 1 Details of the amino acids contained in different features Features
Amino Acids
Amino acid count
A,R,N,D,C,Q,E,G,H,T,L,K,M,F,P,S,T,W,Y,V
Dipeptide
((A,A), (A,R), (A,N), (A,D) etc.…)
Hydrophobicity
A,G,T,L,M,F,P,V,W
Hydrophilicity
S,T,C,N,Q,Y
Acidity
D, E
Basicity
K,H,R
A.Aratio =
N (ai ) wher e[i = 1 to 20] lengt
(2)
Dipeptide count. In the dipeptide count, the count of the occurrence of all dipeptides in the protein sequence was taken and the dimension was of size 400 (Table 1) [7]. Di pepCount = N ai a j wher e[i, j = 1 to 20]
(3)
Physicochemical properties. In this feature, hydrophobicity, hydrophilicity, acidity, and basicity were considered (Table 1) [9]. Hydrophobicity had 9 amino acids and hence a dimension of 6 whereas hydrophilicity six, acidity two, and basicity had a dimension of three.
2.3 Data Pre-processing Handling the missing values is very important during the pre-processing as many machine learning algorithms do not support the missing data. The missing data can impact the performance of the model which is done by creating bias in the dataset. In our dataset since we did not have the problem of missing values, the data pre-processing step was not required.
2.4 Algorithms Used To classify the plastic-degrading microbes, we used supervised machine learning techniques such as classification algorithms. The following subsections explain each classifier in detail. Decision Tree. A decision tree is a classification algorithm that has nodes and leaves [10]. Nodes are split based on certain conditions and outcomes are obtained on the leaves. This is one of the most commonly used simplest classification algorithms.
288
N. Hemalatha et al.
Random Forest. It is a classification algorithm that generates multiple decision trees based on the dataset [11]. Each of the decision trees predicts the output and then finally it combines the outputs by voting. Support Vector Machine. This classifier creates a hyperplane on the dataset and then classifies the data points into classes [12]. KNN. It is known as a lazy learner algorithm [13]. This algorithm classifies the input values based on their similarity.
2.5 Performance Measures For measuring the performance of the computational prediction tool measures used were accuracy, confusion matrix, ROC-AUC, and heat map which are described below. Accuracy. Accuracy can be defined as the ratio of a number of the correct predictions to the total number of input values [9]. Accuracy = (TP + TN)/(TP + TN + FP + FN)
(4)
Confusion Matrix. Confusion matrix (Fig. 1) gives the output in the matrix form and explains the whole evaluation measures of the model which was created [14]. F1 Score is defined as a measurement of the model’s accuracy or harmonic mean of precision and recall [9]. This ranges from 0 to 1. It tells you among the data how many instances it predicts correctly. The precision and recall can be calculated by using the following equations: Precision = TP/(TP + FP)
(5)
Recall = TP/(TP + FN)
(6)
ROC-AUC. AUC can be expanded as Area Under Curve [15]. It is one of the metrics used for evaluation. As the value of the AUC rises the model performance for classification also rises. Before studying the AUC, we need to understand the Fig. 1 Confusion matrix
22 Computational Prediction of Plastic Degrading Microbes Using …
289
following measurements. True Positive Rate (TPR) can be defined as TP/(FN + TP). It measures among all actual positive samples how many are correctly classified. True Negative Rate (TNR) can be defined as TN/(FP + TN). False Positive Rate (FPR) can be defined as FP/(FP + TN). It measures among all actual negative samples how many are incorrectly classified. False Negative Rate (FNR) can be defined as TN/(FN + TP). Heat Map. A heat map is nothing but a data visualization technique that helps to detect the correlation between the features [16]. Scikit learn. Scikit learn is a python library that gives many supervised and unsupervised learning algorithms [17]. We used NumPy, pandas, and matplotlib from this library for this research work.
3 Results and Discussions This section discusses the results obtained from pre-processing the dataset and then working on the features of the dataset with four different machine learning classifiers.
3.1 Data Pre-processing In the present work, since 20 amino acids were compulsorily involved in all the plastic-degrading microbes the chances of the existence of the null values were nil. Using the Python environment, this was confirmed. Datasets used were also confirmed for skewness and the existence of outliers. Details of all the pre-processing are available in Table 2.
3.2 Algorithm Result During experimentation of six features with four classifiers namely decision tree, Random Forest, SVM, and KNN model obtained different accuracies which are listed in Table 3. From Table 4, it could be concluded that Random Forest obtained the best accuracy of 99% for the feature amino acid count. Diagrammatic representation is shown in Figs. 2 and 3.
4 Conclusion In the present world to sustain without plastic is unimaginable but to overcome such a situation only way out is to use microbes that can degrade the used plastic. In this
290
N. Hemalatha et al.
Table 2 Table showing results of missing values, skewness, and existence of outlier Data pre-processing Result
Remarks
Null values
When we go through all the 20 amino acids it is found that there is no null values associated with this dataset
Outliers
By plotting the boxplots, we can easily identify the outliers in the dataset. Since it is an amino acid in a microbial sequence we cannot treat this outlier because the count of amino acids is varying in each microbe
Skewness
There is no skewness existing in the dataset
22 Computational Prediction of Plastic Degrading Microbes Using …
291
Table 3 Accuracies of different algorithms Classifier
Amino acid count
Dipeptide
Hydrophobic
Hydrophilic
Acidic
Basic
Amino acid ratio
RF
99.1
12.088
98.43
93.73
89.2
84.86
98.76
SVM
98.7
36.55
96.13
88.7
89.03
59.9
97.5
DT
98.33
12.08
97.16
90.5
89.16
83.7
98.3
KNN
99.1
26.88
98.26
90.73
89.0
83.6
99.2
Table 4 Accuracies of different algorithms for Amino acid count
Classifier
Accuracy
RF
99.1
SVM
98.7
DT
98.33
KNN
99.1
Fig. 2 ROC-AUC curve of random forest for the amino acid count
work, we have attempted to develop a computational prediction tool that can classify a microbial protein if it is biodegradable or not. Four machine learning algorithms were used for this purpose along with six different features of proteins. Out of the six features, it was found that the Amino acid count gave an accuracy of 99.1% with Random Forest and KNN classifiers. In future work, we plan to develop a web server where we can host the prediction tool.
292
N. Hemalatha et al.
Fig. 3 Bar diagram indicating the accuracies of four classifiers for the amino acid count
Acknowledgements This work is a part of the major research project funded by St. Aloysius Management.
References 1. Caruso G (2015) Plastic degrading microorganisms as a tool for bioremediation of plastic contamination in aquatic environments. J Pollut Effects Control 3(3):1–2 2. Raziya Fathima M, Praseetha PK, Rimal Isaac RS (2016) Microbial degradation of plastic waste: a review. Chem Biol Sci 4:231–242 3. Swapnil Kale K, Amit Deshmukh G, Mahendra S, Vikram Patil B (2015) Microbial degradation of plastic: a review. J Biochem Technol 6(2):952–961 4. Vignesh R, Charu R, Manigandan P, Janani R (2016) Screening of plastic degrading microbes from various dumped soil samples. Int Res J Eng Technol 3(4):2493–2498 5. Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comput Eng 160(1):3–24 6. Sen PC, Hajra M, Ghosh M (2020) Supervised classification algorithms in machine learning: a survey and review. Emerging technology in modelling and graphics. Springer, Singapore, pp 99–111 7. Hemalatha N, Narayanan NK (2016) A support vector machine approach for LTP using amino acid composition. Lecture notes in electrical engineering, vol 396. Springer, pp 13–23 8. Hemalatha N, Rajesh MK, Narayanan NK (2014) A machine learning approach for detecting MAP kinase in the genome of Oryza sativa L. ssp. Indica. IEEE Explore. https://doi.org/10. 1109/CIBCB.2014.6845513 9. Hemalatha N, Brendon VF, Shihab MM, Rajesh MK (2015) Machine learning algorithm for predicting ethylene responsive transcription factor in rice using an ensemble classifier. Procedia Comput Sci 49:128–135 10. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37 11. Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22
22 Computational Prediction of Plastic Degrading Microbes Using …
293
12. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 13. Mucherino A, Papajorgji PJ, Pardalos PM (2009) k-nearest neighbor classification. In: Data mining in agriculture. springer optimization and its applications, vol 34. Springer, New York, NY 14. Kohavi R, Provost F (1998) Glossary of terms. In: Editorial for the special issue on applications of machine learning and the knowledge discovery process 15. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36 16. Wilkinson L, Friendly M (2009) The history of the cluster heat map. Am Stat 63(2):179–184 17. Pedregosa et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825– 2830
Chapter 23
Computational Yield Prediction of Rice Using KNN Regression N. Hemalatha, W. Akhil, and Raji Vinod
1 Introduction Machine learning, a subset of Artificial Intelligence (AI), is a practical method using several features for yield prediction. Machine learning (ML) can be used to find patterns and discover correlations among these data from datasets. In ML, models have to be trained using given datasets, where the outcomes will be characterized based on past experience. Several features are used to build the predictive models and as such, model parameters are calculated using historical data during the training phase [1]. For the testing purpose, historical data that is not used for training is used for the evaluation phase. An ML model can be descriptive or predictive depending on the research problem. A descriptive model on one side is used to gather knowledge from the collected data, while predictive models are used for future prediction. Lots of challenges are faced in building a high-performance predictive model which involves choosing the right algorithm and also handling the voluminous amount of data. In this paper, we are analyzing the past yield of rice and predicting the future yield with the help of certain agricultural features in the state of Kerala with the help of ML.
N. Hemalatha · W. Akhil St. Aloysius College of Management and Information Technology, Kotekar, Karnataka, India e-mail: [email protected] R. Vinod (B) Techtern Pvt Ltd, Kannur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_23
295
296
N. Hemalatha et al.
2 Methodology 2.1 Data Source The data set is collected from the Department of Soil Survey and Soil Conservation Government of Kerala, Planning and Economic Affairs Department Government of Kerala, Kerala Agricultural University, Meteorological department, Government of India. It is collected by Focus Group Discussion, Documents, Records, Oral histories and also from surveys. From the data, a study was conducted based on the area and the yield of rice in all the block panchayats in the state of Kerala, India.
2.2 Features Used The data has features like ‘State’, ‘District’, ‘Blocks’, ‘Soil Types’, ‘Organic Carbon (%)’, ‘Phosphorous (Kg/Ha)’, ‘Potassium (Kg/Ha)’, ‘Manganese (ppm)’, ‘Boron (ppm)’, ‘Copper (ppm)’, ‘Iron (ppm)’, ‘Sulphur (ppm)’, ‘Zinc (ppm)’, ‘Soil PH’, ‘Temperature (deg/Celsius)’, ‘Humidity’, ‘Precipitation (in)’, ‘Crop’, ‘Area (Ha)’, ‘Yield (Tones)’. The data is in xlsx format.
2.3 Workflow The different processes in the ML model involve the study of the current farm management systems. It involves Granular, Agrivi, Trimble, and FarmERP which are some of the most commonly used farm management software. RML Farmer, Pusa Krishi, AgriApp, Khethi-Badi, Krishi Gyan, Crop, and AgriMarket are some of the other Indian Agricultural applications. For this research work, the data is collected from various government sources in Kerala. The feature involves microclimatic conditions, soil properties, and the area of the paddy. Then the data cleaning process which involves missing value detection, skewness, and outliers are treated. Using the standardization and normalization techniques the data transformations are applied to the dataset. By using the principal component analysis, the data reduction is performed on the dataset. The data is split into training and testing, then after the successive splitting of the data different regression algorithms such as Linear Regression, K Nearest Neighbor Regression, XGBoost Regression, and Support Vector Regression algorithms are applied in order for rice yield prediction. Based on the accuracies obtained from different models, the best algorithm is selected for the model deployment (Fig. 1).
23 Computational Yield Prediction of Rice Using KNN Regression
297
Fig. 1 The architecture of the ML model
2.4 Data Preprocessing Checking Missing Values. One of the major problems that ML is facing is the presence of missing values in the data. We have to detect the missing values in the dataset and cannot simply ignore missing data in the dataset. We need to treat them. There are different deletion and imputation techniques available in order to treat the missing values in the dataset. Checking Skewness. Skewness is defined as the measurement of the difference between the probability distribution of a random variable from the normal distribution. The skewness does not detect the outliers but it can give the direction to the outlier in the dataset. The data can be positively skewed, negatively skewed, or normally distributed. We can use box plots and distplots to find the skewness in the data. If Q1 is the lower quartile, Q2 is the median, and Q3 is the upper quartile then the positive skewness can be shown by measuring the distance between the quartiles that is Q3 − Q2 > Q2 − Q1
(1)
By using the quartile distance, we can also determine the negative skewness. The negative skewness exists when Q3 − Q2 < Q2 − Q1
(2)
298
N. Hemalatha et al.
Similarly, the data is symmetrically distributed when Q3 − Q2 = Q2 − Q1
(3)
There are different treatments available for skewness which involve log transformation, box-cox transformation, etc. In box-cox transformation λ which is varying from −5 to 5. Here all λ values are considered and the optimal value for the λ is selected. The transformation of the y can be represented as y(λ) = {
yλ − 1 , i f λ = 0loglogy, i f λ = 0 λ
(4)
This we can use for the positive data only. Similarly, we can apply the box-cox transformation to the negative values by applying the following formula. y(λ) = {
(y + λ2 )λ1 − 1 , i f λ = 0y + λ2 ), i f λ = 0 λ1
(5)
Checking Outliers. Outliers are nothing but the observation which is different from the other observations in the data. With the help of the boxplots, we can easily detect the outliers that exist in the dataset. Usually, we can say that the outliers are associated with the InterQuartile Range (IQR). This shows how the data is spread over the median. IQR = Q3 − Q1
(6)
where Q1-First Quartile, Q3-Third Quartile. The interquartile rule can be used to detect the outliers. The rule involves the detection of the upper bound and the lower bound from the data. The observations which are not between the lower and upper bound are considered the outliers. Lower Bound = Q1 − 1.5 × IQR
(7)
Upper Bound = Q3 + 1.5 × IQR
(8)
There are different techniques available for outlier treatment such as the outlier dropping method and Winsorizing methods. Here we have used the Winsorizing method to treat the existing outliers in the dataset. We have created a python function in order to apply Winsorization to the outliers. This technique replaces the outlier values between the upper bound and lower bound. Data Transformation. For data transformation here we have used min–max normalization. Since the measurement of the features is different, we need to apply the normalization techniques in this particular research work. In this technique, the minimum value is converted into zero and the maximum value is converted into one.
23 Computational Yield Prediction of Rice Using KNN Regression
299
All other values in the dataset are converted into a decimal between zero and one. We have x_scaled = (x − x_minimum)/(x_maximum)
(9)
where x—the observations in the data x_minimum—the minimum value in the observation x_maximum—the maximum value in the observation Data Reduction. The dataset has 20 features so we can apply the data reduction techniques. Here we applied the principal component analysis as a dimensional reduction technique. The principal component analysis basically converts a large set of features to smaller sets that still consist of most of the information in the data. Label Encoding. The dataset contains four non-numerical variables as State, Block, District, and Soil Type. In order to apply the machine learning algorithm, we need to convert this non-numerical feature into numerical form. Here the categorical value is converted to a numeric value between 0 and the number of classes minus 1. Label Encoder is the package used in python which encodes the labels with a value between 0 and n_classes-1 where n is the number of distinct class labels. If a label repeats it is assigned the same value as assigned earlier. Data Splitting. In machine learning algorithms the first step is to divide the datasets into train and test sets. Here for this particular research work, the data was split in the ratio of 80:20. After the successful splitting, different supervised algorithms can be applied to predict the rice yield. Algorithms Linear Regression. This algorithm is used to represent the relationship between two variables which can be represented by a linear equation [2]. Out of the two variables, one is dependent and the other is independent. The correlation coefficient used in the equation measures the extent of the variable relationship whose range lies between −1 and +1. This coefficient also shows the strength of the association of observed data for two variables. A linear regression line equation is written in the form of: y = ax + b
(10)
Decision tree regression. Here in the decision tree regression, multiple regression trees are created from the dataset and then these trees will predict the future yield. A decision tree works by asking a series of questions to arrive at an answer to the input data [3]. Getting possible values through different questions until the model is confident enough to make a single prediction. Decision tree regression normally uses mean squared error (MSE) to split a node into two or more sub-nodes [4]. Here we used a decision regression to predict the rice and pepper yield. Random Forest Regression. Random Forest is one of the best algorithms that are used for Machine Learning studies [5]. Here we used a random forest algorithm for regression studies [6]. One of the advantages of this algorithm is it predicts outcomes
300
N. Hemalatha et al.
with higher accuracy most of the time even when data sets that do not have proper parameter tuning. Therefore, we can say that it has simplicity compared to the other algorithms and it is very much popular. In the Random Forest, it creates a forest in a random manner. Here Multiple decision trees are created with this algorithm and are merged together in order to produce even more accurate predictions. From this algorithm, we can say that, as the number of the decision tree in this algorithm increases, the stability of the predictions also increases [7]. The random forest itself is working as an ensemble algorithm hence it will give good accuracy in most of the problems [8]. KNN Regression. KNN expanded as K nearest neighbor is a simple algorithm predicting based on the similarity measure. A simple application of KNN regression is to analyze the average of the mathematical target of the K nearest neighbors. Another method uses an inverse distance weighted average of the KNNs. It seems that the KNN regressor uses similar distance functions as the KNN classification. Here, this algorithm basically works on the basis of distance functions. We can set the number of neighbors in this algorithm. Here in this particular project, we have used a python environment to implement this algorithm [9]. XGBoost Regression. In supervised regression algorithms, the XGBoost algorithm has an important role [10]. The ensemble learning process involves the training and the combining of individual models to get a single prediction out of it, and we can say that this XGBoost algorithm is one of the ensemble algorithms [11]. In XGBoost, the trees are built successively such that each succeeding tree aims to decrease the errors of the previous tree. Each tree acquires knowledge from its ancestors and updates the residual errors. The tree growing next in the sequence will be learning from the updated version of the residuals. Support Vector Regression (SVR). In the regression analysis, we cannot ignore the support vector regression algorithm because for this type of regression problem this algorithm fits well [12]. SVR is a supervised machine learning technique. A support vector machine (SVM) can be used for both classification and regression but here we are using a support vector regression algorithm. There exist some differences between the SVM and SVR algorithms. IN SVR a marginal plane is created in order to minimize the error. In the case of regression, an approximation is set for the margin of tolerance. However, the main idea is to reduce error, finding the hyperplane which will maximize the margin and reduce the error.
3 Results and Discussions 3.1 Pre-Processing Results Here there are no missing values associated with the dataset. Skewness existed in the rice dataset. So, data was treated for skewness using the box cox transformation [13]. The skewness after the box-cox transformation is given in Table 1.
23 Computational Yield Prediction of Rice Using KNN Regression Table 1 Skewness after the box-cox transformation
Organic Carbon (%)
301 0.09 932
Phosphorous (Kg/Ha)
−0.396601
Pottassium (Kg/Ha)
−0.013161
Manganese (ppm)
0.010392
Boron (ppm)
0.124288
Copper (ppm)
0.000970
Iron (ppm)
0.034329
Sulphur (ppm)
0.001081
Zinc (ppm)
0.002723
Soil PH
−0.098701
Temperature (°C)
−0.021672
Humidity
−0.003643
Precipitation (in)
−0.342416
Area (Ha)
−0.049400
Yield (Tones)
−0.022793
The outliers are checked with help of boxplots and here in this dataset there exist outliers. The outliers are treated using the Winsorizing method [14], and the result is shown in Fig. 2. The outliers are present in the four variables and it is successful. In order to normalize the data we have used min-max normalization and the result is given in Table 2. The data reduction is performed by using the principal component analysis [15], and the result is given below: Explained variation per principal component: [0.23492603, 0.16536821, 0.11506274, 0.10910176]. Here the principal component 1 has 23.49% of information, the principal component 2 holds 16.53% of information, the principal component 3 holds 11.50% of information, and the principal component 4 holds 10.91% of information. From Fig. 3, we can see that Laterite soil is present in most of the places in Kerala and most of the crops can be planted in this type of soil. The contents in this soil support all the crops not only rice but also other crops in Kerala.
Fig. 2 Boxplot after Winsorizing
0.0211
0.0177
0.0060
0.0021
0.0020
0.0007
0.0007
0.0007
0.0001
0.0218
0.05453
0.6701
0.2009
0.0030
0.0003
Potassium (Kg/Ha)
Phosphorus (Kg/Ha)
Organic carbon (%)
Table 2 Normalized data
0.0018
0.0040
0.0042
0.0044
0.0457
Manganese (ppm)
0.0002
0.0010
0.0010
0.0003
0.0002
Boron PH (ppm)
0.001
0.001
0.0001
0.001
0.0010
Copper (ppm)
0.0054
0.004
0.004
0.004
0.0334
Iron (ppm)
0.008
0.0118
0.012 1
0.0 140
0.0038
Sulphur (ppm)
0.001
0.001
0.001
0.001
0.003
Zinc (ppm)
0.827
0.002
0.002
0.001
0.007
Soil PH
302 N. Hemalatha et al.
23 Computational Yield Prediction of Rice Using KNN Regression
303
Fig. 3 Pie chart-soil types
The average humidity in Kerala is 83.0 when compared to the humidity in all the districts in Kerala (Fig. 4). 2500 (mm) is the most occurring precipitation in the entire Kerala (Fig. 5). It is found that 26.2 °C temperature is the most occurring temperature in Kerala which is suitable for rice (Fig. 6).
Fig. 4 Humidity
304
N. Hemalatha et al.
Fig. 5 Precipitation
It is found that a high yield of rice is obtained in hill soil and alluvium soil (Fig. 7). From Fig. 8, it is clear that the yield is high in Palakkad and Alappuzha. The crop yield of rice is high in Hill soil and Alluvium soil. The area where rice is cultivated with laterite soil is very less. The Alluvium soil is mostly present in the Alappuzha district, and the 9 districts have cultivated the rice in laterite soil but the yield is less. From the graph, it is found that the presence of the hill soil makes an impact on rice production. In Kozhikode and Idukki, the yield of the rice is very less when compared with the other 12 districts. From Table 3, the important observation that we can make is that Palakad is getting a high yield for rice but when compared to other districts and also at the same time it is also observed from Table 3 that Palakad is having very less area for rice crop cultivation. The soil and climatic conditions of Palakad help to produce high yield of rice even in the less area of farming. On comparing Alappuzha district with Palakad it is found Alappuzha has more area of farmland for rice but it is giving less yield than Palakad. Idukki district has very little area for rice crop and the yield obtained from this district is also very less. A total of 545192.072 tons of rice was obtained from 152 block panchayats in Kerala in the year 2018–2019 from the 114715.4 ha of land.
23 Computational Yield Prediction of Rice Using KNN Regression
305
Fig. 6 Temperature
Fig. 7 Bar chart of yield versus area (Ha)
So, we can say that the state of Kerala has a very good role in the rice production in India. It’s also found that alluvium soil has a very good impact on the yield of rice in the district of Alappuzha. From this past data of yield, we can predict the future yield of rice crops based on the soil, climatic factors, and the area of the farm using the Machine learning model.
306
N. Hemalatha et al.
Fig. 8 Bar chart of district yield for rice
Table 3 District versus area and yield of rice
District Kasargod
Area 1905.27
Yield 4401.44
Kannur
4193.41
9077.42
Kozhikode
2080.34
3336.56
Wayanad
6945.58
20,127.99
Malappuram
7536.87
24,776.74
Palakkad
2685.11
209,505.17
19,154.04
59,972.33
4709.56
10,501.83
Thrisur Ernakulam Idukki
653.31
1510.25
Kottayam
20,953.64
57,038.68
Alappuzha
37,241.98
125,115.538
Pathanamthitta
2848.25
10,478.87
Kollam
1928.72
4433.05
Thiruvananthapuram
1879.32
4916.22
3.2 Algorithm Results Table 4 shows the different Machine learning regression algorithms used in this particular research. Therefore we can say that by using this algorithm we can predict the future yield of a crop. In this paper, we have predicted the future yield of the rice. From Table 4, we can say that different models predicted the future yield of rice with good accuracy. The rice dataset gave the highest accuracy of 98.77% for KNN regression and 97.68% for Support Vector Regression algorithms. Out of these 6 models we can fix KNN to be the best model for the deployment of the model.
23 Computational Yield Prediction of Rice Using KNN Regression Table 4 Results of algorithm
Models
307 Accuracy (rice) in %
Linear Regression
95.45
Decision tree Regression
94.94
Random Forest Regression
94.70
KNN Regression
98.77
XGBoost Regression
94.78
Support Vector Regression
97.68
4 Conclusion Using machine learning techniques, we can analyze the past data and draw a trend out of them. Based on the past data obtained from the crops, present climatic conditions, and soil properties we can predict the future yield of the crop with the help of machine learning. This will help the farmers to take appropriate decisions and precautions in the farmland. Similarly, we can analyze and predict the other crops in the different regions by collecting past and present data on the crops. In future studies, we plan to conduct similar studies for other plants and develop a regression model for other plants too. Acknowledgements The authors extend their appreciation to the Deputyship of the RESEARCH AND INNOVATION wing of TECHTERN Pvt. Ltd. through SMART-AGRO research and for providing all support for this research work with the project number TTRD-DS-05-2021. This is also an extension of a post-doctoral research program at Kannur University.
References 1. Madhavi RP, Niranjan G, Hedge SM, Rajath Navada PR (2017) Survey paper on agriculture yield prediction tool using machine learning. Int J Adv Res Comput Sci Manage Stud 5(11):36– 39 2. Freedman DA (2009) Statistical models: theory and practice. Cambridge University Press, p 26 3. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37 4. Decision trees for regression. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Boston 5. Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22 6. Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition. IEEE, pp 278–282 7. Priya P, Muthaiah U, Balamurugan M (2018) Predicting yield of the crop using machine learning algorithm. Int J Eng Sci Res Technol 7(4):1–7 8. Kumar A, Kumar N, Vats V (2018) Efficient crop yield prediction using machine learning algorithms. Int Res J Eng Technol (IRJET) 5(6):3151–3159 9. Song Y, Liang J, Jing L, Zhao X (2017) An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing 251:26–34
308
N. Hemalatha et al.
10. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining [Internet]. ACM, New York, pp 785–94 11. Zhou ZH (2009) Ensemble learning. In: Editor SZ, Jain A (eds) Encyclopedia of biometrics. Springer, Boston, MA 12. Smola A, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199– 222 13. Daimon T (2011) Box–cox transformation. In: Lovric M (eds) International encyclopedia of statistical science. Springer, Berlin, Heidelberg 14. Reifman A, Keyton K (2010) Winsorize. In: Salkind NJ (ed) Encyclopedia of research design. Thousand Oaks, CA, Sage, Wins, pp 1636–1638 15. Karl Pearson FRS (1901) LIII. On lines and planes of closest fit to systems of points in space. Lond Edinburgh Dublin Philos Mag J Sci 2(11):559–572
Chapter 24
Exploiting Deep Learning for Overlapping Chromosome Segmentation Alexander Nikolaou and George A. Papakostas
1 Introduction Karyotyping [1] is the process of pairing and ordering all the chromosomes of an organism. All the process is made by cytogeneticists using their hands in order to export the karyotype, a representation of a genome, similar to a snapshot that contains the characteristics—structural features—for each chromosome. A deep analysis of karyotypes can lead to distinguishing the chromosomal mutations, such as deletions, duplications, translocations, or inversions. The molecular karyotype has changed dramatically the way to approach the diagnosis and allows the extraction of critical information from chromosome changes. Chromosomal abnormality [2] is a condition in an organism or a cell where the structure of any chromosome or the number of chromosomes differs from the normal karyotype. Every human body has trillions of structural units known as cells. Inside a cell, there are thousands of genes consisting of DNA and determining how the organization operates. The genes are organized next to each other in structures called chromosomes. Each cell has 46 chromosomes or otherwise 23 pairs of chromosomes. The 22 pairs are common for men and women, while the 23rd pair determines the sex of the individual. The chromosomes are examined using a microscope that is able to capture microscopic images as shown in Fig. 1 in which there are isolated and non-touching chromosomes. Clinical cytogeneticists analyze human karyotypes in order to detect gross genetic changes and how they are associated with aneuploid conditions, such as trisomy 21 (Down syndrome). To perform the karyotyping analysis, the cytogeneticist inspects lots of images and selects those images where the chromosomes are in the metaphase stage and A. Nikolaou · G. A. Papakostas (B) MLV Research Group, Department of Computer Science, International Hellenic University, 65404 Kavala, Greece e-mail: [email protected] A. Nikolaou e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_24
309
310
A. Nikolaou and G. A. Papakostas
Fig. 1 A microscope image with the ideal topology of non-touching chromosomes
isolated as shown in Fig. 1. Most of the candidate images are going for posterior analysis, where the cytogeneticists analyze each image by counting and identifying each chromosome. Karyotyping usually is performed manually in many hospitals, being a laborious procedure, it is a very time-consuming process. Geneticists are facing many dilemmas [3] during the export of a karyotype as shown in Fig. 2. Some of them are the detection of the chromosomes, separation of the chromosomes if they are touching or overlapping and the detection of nucleotides or other unknown objects in order to remove them from the image. It is difficult for them to determine the fragmentation of each chromosome in the microscope image and for that reason the analysis of a microscope image can take up to a few weeks, remaining time and cost expensive. Gene related sciences require solutions for automation of the karyotyping process, which is the isolation of the abstract geometric shapes of the chromosomes. The problems during automatic metaphase [4] are noticeable in the images, which may contain small noise objects of low contrast or dried border drops of dye. Because of the existence of that unordinary type of noise, it is more important to classify each pixel rather than to detect the object as a bounding box. The segmentation techniques are based on pixel classification and can yield arbitrary sets of pixels as “segments”, becoming more suitable for the overlapping scenarios and for the microscopic noisy images. This paper investigates the impact of the first manually annotated dataset for semantic segmentation. For the overlapping scenarios where there are at least two chromosomes with overlapping regions, three classes were used to classify the cases of “Chromosome A”, “Chromosome B”, and “Overlap”. The labels “Chromosome A” and “Chromosome B” represent the two classes of chromosomes that overlap and the label “Overlap” represents the common area of them. To describe the foreground and the background chromosomes as well as their common area of them, three labels were used. The label “Chromosome A” represents the foreground, the
24 Exploiting Deep Learning for Overlapping Chromosome Segmentation
311
Fig. 2 A karyotype exported from a microscope image
label “Chromosome B” represents the background and the label “Overlap” represents the common area of them. The 62 microscope images from BioImLab segmentation dataset [5] were used to create manual annotations of labels “background”, “nucleotide”, “chromosome”, “Chromosome A”, “Chromosome B”, “Overlap”, and “Unknown” entities on the microscope image. One of the contributions is a python package that is able to draw the annotations, to resize and augment the images. Finally, applied transfer learning using pretrained Convolutional Neural Networks (CNNs) to investigate the performance of the proposed dataset. Ten purposely microscopic images were selected from which ten synthetic Folds were generated using the Holdout Cross Validation method, which is novelty contributions. As for the training experiments, five semantic segmentation backbones were tested in order to compare the performance of each backbone combination with the U-Net model in the segmentation of the microscope entities. The weights of the best performance combination for each backbone are also a part of the contribution.
312
A. Nikolaou and G. A. Papakostas
2 Related Work There are many methodologies to approach the segmentation of the chromosomes in microscope images, which can be achieved either by detecting the regions of chromosomes in the image or either by classifying each region of a chromosome in which one of the 24 classes it belongs. During cell evolution chromosomes are moving and there might be cases where two or more of them bump each other and as a result one ends up on top of the other. There are a few papers that work on overlapping areas but none of them face the overlapping scenarios along with nucleotides or other objects at the same time. The separation of the overlapping area is a challenging problem for cytogeneticists [6] due to the chromosomes being randomly distributed resulting in an extra delay to export the karyotype. For the karyotyping process, there are methodologies for the detection or the classification of the chromosomes. Both approaches are able to support the geneticists to continue the clinical study of chromosomes, which is to identify suspicious anomalies or characteristics of a syndrome.
2.1 Chromosomes Detection The basic goal is the detection of the chromosomes present in the microscope image and the related work in chromosome detection increased in the last 5 years by presenting new methodologies and new approaches to detect either the bounding boxes or the centromere of each chromosome. Several works did not mention the source of the images they used; however, the methodologies were applied as shown in Table 1. The most used dataset is M-FISH [7–9]. In [7] the gradient magnitude of each channel is computed and reconstructed Greyscale for each channel. Watershed transforms were used for feature extraction and processed by FCM. Another approach [8] presents a multichannel gradient magnitude region-based classifier in segmentation, Table 1 The related works for chromosome detection References
Dataset
Methodology
Year
[7]
M-FISH
Watershed transforms/FCM
2017
[8]
M-FISH
Region-based classifier
2009
[9]
M-FISH
Gradient Vector Flow active contours
2010
[10]
UW G-Band
HAFCO
2018
[11]
CRCN-NE
FUZZY Thresholding
2018
[12]
Not defined
Image Processing
2016
[13]
Not defined
Statistical Moments
2012
24 Exploiting Deep Learning for Overlapping Chromosome Segmentation
313
so classifies each pixel. The most attractive approach [9] proposes a hybrid algorithm that uses gradient vector flow active contours, discrete curve evolution based on skeleton pruning, and morphological thinning in order to provide a robust and accurate detection of the centerline of the chromosome. One outstanding approach [10] used the UW G-Band dataset, presenting the HAFCO technique, which used geometric features from skeletonized chromosomes to determine the number of chromosomes. Another important dataset is the CRCN-NE [11] where two techniques were applied, the first was adaptive thresholding and the second was fuzzy thresholding. The two techniques are combined together to produce the segmented image. The interesting work [12] used a simple image processing, the Gray Level Mask, while in [13], statistical moments were used in order to accurately determine the centromere position.
2.2 Chromosomes Classification The next goal is the classification of chromosomes and the related work is in the proper direction, while new methodologies and new models are being proposed as shown in Table 2. The most popular dataset is the M-FISH [14–19] and most of the researchers used the DAPI image, which carries information from all five channels, while only one research [14] used the grayscale reconstruction of the multichannel gradient magnitude to SVM classifier reaching 90% accuracy. An improved performance [15] for segmentation was achieved by using the Expectation Maximization (EM) algorithm with a mean efficiency of 90.6% followed by KNN to classify the chromosomes with an accuracy of 91.68%. An interesting feature extraction technique was presented in [16] by producing five features for each pixel of the DAPI image and using K-means Table 2 The related works for chromosome classification References
Dataset
Methodology
Year
[14]
M-FISH
Multichannel Gradient Magnitude/SVM
2009
[15]
M-FISH
Expectation Maximization/KNN
2019
[16]
M-FISH
K-Means
2010
[17]
M-FISH
Adaptive Fuzzy C-Means (AFCM)
2011
[18]
M-FISH
Fuzzy C-Means
2017
[19]
M-FISH
Active Contours
2017
[20]
UW G-Banded
Gray Level Co-occurrence Matrix (GLCM)
2016
[21]
UW G-Banded
Noise Removal
2018
[22]
Hospital
SMAC crowdsourcing
2017
[23]
Not defined
GLCM/Fuzzy C-Means/SVM
2015
[24]
Not defined
Active Contours
2016
314
A. Nikolaou and G. A. Papakostas
to cluster the chromosome pixels into the 24 chromosome classes with 61% accuracy. Another experiment revealed that by using normalized DAPI images the accuracy is raised up to 72%. There is an interesting approach [17], where 88% accuracy was achieved by combining the segmentation and classification using an Adaptive Fuzzy C-Means (AFCM) clustering-based algorithm. Another interesting approach in [18], where different techniques were compared combined with Fuzzy C-means in order to minimize the false positives from 9.7 to 2.63%, while the accuracy dropped from 92 to 87.2%. As for the software contribution only [19] proposed an algorithm to straighten the bent chromosomes. The software uses active contours to segment the chromosomes in a variety of straight, bent, touching, and overlapping cases and can be used to uncover the structural abnormalities present in the chromosomes. The UW G-Banded dataset is useful to describe features of each chromosome, despite the fact that some works [20, 21] mentioned it but did not specify which dataset was used. One of them [20] presents a methodology where morphological features using Gray Level Co-occurrence Matrix (GLCM) were extracted based on the Denver group and then 24 chromosome pairs were gathered into 7 Denver classes and used an ANN to classify the chromosomes. Another work [21] proposed an end-to-end segmentation, noise removal along with the rejection of unwanted objects that achieved 97.8% accuracy. It is important to mention that only one dataset was collected from a hospital [22] and manually segmented by the authors and a subset was validated by doctors, to ensure that the label for each segmented chromosome image is identifiable. In this work two techniques were deployed, the SMAC crowdsourcing for medial axis annotations and the SPV. As for the model used, Siamese Network combined either with an MLP or either with a KNN, achieving 85.2% and 85.6% accuracy, respectively. A very interesting approach [23] extracted features based on the Gray Level Co-occurrence Matrix (GLCM) with the usage of Fuzzy C-Means, which coordinates SVM in order to classify chromosomes with 82.5% accuracy. As for the software contribution, a tool presented [24] called the “segmentation tool” proposed to use G-band images in order to detect and separate the chromosomes. One of the main goals extracts the chromosomes and gathers them in a karyotype image using custom variants of parametric active contours. The tool obtains the interesting (concave) points on the image contour and constructs proper hypotheses for segmentation and separation.
2.3 Chromosomes Overlapping As for the chromosome overlap area, the related work is limited due to several factors. The majority of them are related with the complexity of the overlapping topology of the involved chromosomes as shown in Table 3. Although there has been conducted promising research [25] for the overlapping area using a dataset from the Birth Registry of India with G-band chromosomes, performed segmentation by obtaining concave points and convex points along with
24 Exploiting Deep Learning for Overlapping Chromosome Segmentation
315
Table 3 The related works for chromosome overlapping References
Dataset
Methodology
Year
[25]
Not defined
Image Processing
2012
[26]
Kaggle
Multilabel Segmentation/U-Net
2017
[27]
Kaggle
Test Time Augmentation (TTA)/U-Net
2019
[28]
Kaggle
Dilated Convolutions/U-Net-FIGI
2021
[29]
Kaggle
Adversarial Multiscale Feature Learning Framework/cGAN
2020
[30]
Not defined
Compact Seg-U-Net
2021
a polygonal approximation. The methodology achieved the separation of the overlapping region from the touching chromosomes with 99.68% IoU. Another publicly available dataset provided in Kaggle at the resolution of 88 × 88 pixels and used in [26–29] from which stands out as interesting research [26] that examined the multilabel segmentation to train a convolutional neural network based on U-Net architecture. The results were 94.7% IoU for overlap, 88.2% IoU for the first chromosome, and 94.4% IoU for the other chromosome. Another research [27] used Test Time Augmentation (TTA) with U-Net architecture, reaching IoU score of up to 99.68%. When TTA was applied to the model [26] improved the IoU performance by 2%. There is a very promising approach [28] where three improved dilated convolutions are used in the chromosome image segmentation models based on U-Net. The best model on unseen images was U-Net-FIGI which reached 93.12% IoU score for overlapping chromosomes, meanwhile suggested to replace the standard convolutions in any network for stability performance improvement. It is quite interesting that the adversarial multiscale feature learning framework [29] reached 97.5% IoU score by improving the accuracy and adaptability of overlapping chromosome segmentation. The main part of this framework was the Conditional Generative Adversarial Network (cGAN) which had the role to generate fake segmented images. The training stability of the cGAN was enhanced by applying the least-square GAN objective loss rather than the original GAN loss. Finally applied Lovasz-Softmax to overcome some optimization problems of the network. One of the publicly available datasets is the BioImLab [5] where an automatic chromosome identification method was proposed based on a pre-processing stage in which a complete hypothetical tree for each topology was generated. The results are interesting, while 94% of the chromosomes were correctly segmented, solving 90% of the overlaps and 90% of the touching chromosome cases. The most recent work in the field [30] proposed a deep learning neural network architecture called Compact Seg-U-Net, which is a hybrid architecture between SegNet and U-Net. The contribution includes the dataset, the pre-processing methodology of the data, the network, and the evaluation metrics that reached 93.63% IoU score for the underlined chromosome, 98.11% IoU score for the top chromosome, and 88.21% IoU score for the overlapping area.
316
A. Nikolaou and G. A. Papakostas
3 Proposed Methodology This section is described the main stages of this study which are (a) manual annotations on BioImLab Segmentation Dataset, (b) creation of the 10 Holdout Folds, (c) performing transfer learning experiments using five different pretrained backbones into a U-Net shape model, (d) the results from the experiments. The detailed pipeline of the proposed methodology is shown in Fig. 3.
3.1 Datasets Creation The data availability in the medical image microscopy is limited because each image presents the genetic characteristics of a human, so it’s difficult to make those data available for public usage and when it happens the organization should provide the images anonymously. The most common dataset is M-FISH [31] that created by Ph.D student Dr. Wade Schwartzkopf under the guidance of Dr. Kenneth Castleman at Advanced Digital Imaging Research, LLC, Friendswood, TX acknowledged the NIH funding and IRIS International for their support in this endeavor. The Biological Dosimetry Laboratory (CRCN-NE) collaborated with The Computing Department (DC) of the Rural Federal University of Pernambuco (UFRPE) in order to create the CRCN-NE [32] dataset, which contains 74 images of colored metaphases acquired with a Leica microscope as original size and with adaptive thresholding to create the labels for the segmentation. Another common dataset is from the Wisconsin State Laboratory of Hygiene of the University of Wisconsin System and it is the UW G-Band [33] dataset. It is made for teaching purposes using their own Laboratory
Fig. 3 The pipeline of the proposed methodology, and the creation of the 10 synthetic Folds
24 Exploiting Deep Learning for Overlapping Chromosome Segmentation
317
Fig. 4 The microscope image “original 9a” from BioImLab segmentation dataset
of Hygiene and data availability is limited. One of the most cited datasets for overlapping scenarios is the synthetic dataset available in Kaggle [34], which provides synthetically annotated images at a resolution of 93 × 94 pixels. The most recently used dataset in the topic [35] was created by using a slightly modified methodology used for the creation of the Kaggle [34] dataset having a more realistic image at resolution 96 × 96 pixels. Another organization that provides a dataset that can be used for research and educational purposes is the BioImLab providing two datasets related to chromosomes, one for classification [36] and one for Segmentation [5]. Every image is collected from a medical instrument called an “optical microscope”. The BioImLab Classification dataset contains 5.474 single chromosome images (119 folders of 46 images in each folder) and each image is rotated according to the international system for cytogenetic nomenclature (ISCN). The BioImLab segmentation dataset contains 162 prometaphase images as shown in Fig. 3 and 117 manual karyograms (Fig. 4). Manual Annotations The cytogeneticists typically use specific software to label the areas and complete the karyotyping process in order to complete a diagnostic report. In the case of BioImLab there are only gray images that were manually annotated for cases where all the entities could be humanely identifiable. In clinical medicine, every annotation should be verified by several cytogeneticists for the correctness of the science. For the purpose of this research, as shown in Fig. 5 the annotation was conducted using a tool called “VGG Image Annotator” [37] which is a web application (website) that allows to mark areas of interest on each image and exports a single file in “json” format. Fig. 5 The process for the manual annotations
318
A. Nikolaou and G. A. Papakostas
Table 4 The classes used for manual annotations and their color representations of them Class
Color
R
G
B
Label
Background
Black
0
0
0
0
Chromosome
Yellow
255
255
0
1
Chromosome A
Blue
0
0
255
2
Chromosome B
Green
0
255
0
3
Overlap
Red
255
0
0
4
Nucleotide
Pink
255
120
255
5
Unknown
Gray
100
100
100
6
In this paper, 62 microscope images were selected from the BioImLab segmentation dataset at a resolution of 768 × 576 pixels, rather than 88 × 88 pixels as the most of the related work in this field. For the investigation of the major problems that cytogeneticists are facing, ten targeted folds consisting of microscope images were selected and used to produce ten synthetic datasets. For the representation of the annotations as shown in Table 4, one color for each class was used, in order to produce the color image, while for the representation of the labels sparse representation was used. Those ten targeted images were selected purposely not only to contain the overlapping scenario as the related work, but every microscopic image needs to contain at least one of each of the following entities: • • • •
Single chromosomes Overlapping chromosomes Nucleotides Unknown objects
Draw Annotations To convert the annotations from “VGG Image Annotator” into colors and sparse labels, a Python package was developed. This package loads the annotation information from the “json” file and draws all of them on a new image, so the color images and the labels are produced at the original resolution of 768 × 576 pixels. The logic behind this package is to iterate on all annotations and paint the inside pixels with the corresponding class’s color as shown in Table 4. As a result, the color representation of the annotated image is created as shown in Fig. 6 and the sparse representation of the labels is not visualizable. Resize Images All the resolutions of the images have to match the models’ input layer. This means that the manual annotations should have the same resolution as the BioImLab microscope images, which is 768 × 576 pixels. The pretrained models of Keras have an input layer 224 × 224 pixels, in order to achieve the transfer learning, all images are resized to 224 × 224 pixels size. This caused serious deformation compared to
24 Exploiting Deep Learning for Overlapping Chromosome Segmentation
319
Fig. 6 The color representation of the annotated image “original 9a” from BioImLab segmentation dataset
the original microscope image and as a result the density became sparser due to the unrespectable aspect ratio of 224 × 224 pixels, compared with the aspect ratio of 768 × 576 pixels. Augment Images All deep learning models need a huge amount of data to train on, but it is difficult to annotate an enormous number of images in order to start to generalize the microscopic scenario. To overcome this, the annotated images at a topology close enough to reality were augmented. There are two approaches in case of augmentation, the first approach is to augment the images on the fly on every epoch [27] while the second approach is to precompute all the augmentations and export them as a concrete dataset. The first approach is time consuming due to an extra preprocess before each epoch. In this paper, the second approach was applied in order to do preliminary research using the same concrete images. For each annotated microscopic image 200 images were produced, resulting in a total of 12,462 synthetic microscopic images at a topology close enough to reality. To achieve augmentations that are more realistic applied the following transformations on both X and Y axis: Rescale with random choice between 80 and 110%. Translate with random choice between −0.2 and 0.2%. Rotate with random choice between −20 and 20°. Export Holdout Datasets The holdout method is the simplest kind of cross validation. The dataset is separated into two sets, the training set and the testing set. The model fits using the training set only and at the end of each epoch, the model is asked to predict the labels for the testing set, which contains the labels that have never been seen before. Each Fold has for Test 1 selected original image and its 49 augmentations (total of 50). The
320
A. Nikolaou and G. A. Papakostas
Fig. 7 The exported 10 Holdout Folds and the content of 1st Fold in detail about Test, Validation, and Training images
remaining 61 original images with their augmentations (total of 12,261) were split into Train (total of 6161) and Validation (total of 6100) as shown in Fig. 7.
3.2 Models In many semantic segmentation problems, the solutions are not so obvious, although there have been developed solutions that leverage widely known convolution neural networks architectures. Some of them have billions of parameters while some of them have a few million parameters. For the purpose of this work, selected five CNNs that can be used on our hardware such as Vgg19 [38], DenseNet201 [39], MobileNetv2 [40], ResNet50 [41], Inceptionv3 [42], and U-Net [43]. In the field of medical segmentation, the U-Net [43] is a very common model architecture and in this study, we examine the combination of the U-Net with the above-mentioned CNNs. All these CNNs had pretrained weights on ImageNet dataset and were kept frozen during the training process, in order to achieve transfer learning, while the U-Net [43] was fully trainable. To be fair comparison between those 5 CNNs, created experiments to train them for 400 epochs, using Train and Validation images, while evaluating them on Test images. The training process is shown in Fig. 8. One of the best models is the Vgg19 [38] developed by the Visual Geometry Group, Department of Engineering Science, University of Oxford. It has been used for medical purposes, like chest X-ray detection [44] and for mask detection on workers [45] to prevent COVID-19 spread. One of the important advantages of DenseNets [39] is the improved flow of information developed by a collaboration of Facebook AI Research and Cornell University. It has been used for medical purposes [46] like skin lesion classification, also used for intelligent tomato classification [47]. The most lightweight among all models is the MobileNets [40] as they have the ability to
24 Exploiting Deep Learning for Overlapping Chromosome Segmentation
321
Fig. 8 The transfer learning process using as an example the Fold 1
keep performance as high as possible. Developed by Google Research and applied for medical purposes in order to detect skin cancer [48], while also used in a system that recognizes gestures [49], so it could not be ignored for the experiments. On the other hand, there are deep models like ResNet [41] which are possible to train up to hundreds or even thousands of layers and still achieves compelling performance. It was developed by Microsoft Research and has been used for medical purposes to diagnose COVID-19 and pneumonia from X-ray images [50]. The most power efficient model among all is the inceptionv3 [42] which focuses on burning less computational power, by modifying the previous Inception architectures. It was developed by the collaboration of Google Research and University College London. The evaluation process for all models was made as a guide to investigate the performance of each model. As for the training, validation, and test metrics the Intersection of Union (IoU) as shown in Eq. (1) was used and the best model is exported as shown in Fig. 8. After the end of all experiments, the average IoU of all ten Folds reveals the ability of each model to adapt to training and generalize on test images J ( A, B) =
A∩B A∪B
(1)
All experiments had the same parameters, such as optimizer, learning rate, and loss function. More specifically, the optimizer was adadelta with a fixed learning rate of 0.001 to keep the same learning rate as the backbone trained on ImageNet dataset. The loss function was a combination of “categorical focal loss” as presented
322
A. Nikolaou and G. A. Papakostas
in Eq. (2) due to an imbalance in the occurrence of classes in the microscopic images and the “Jaccard loss” as shown in Eq. (3) due to small distance between the classes. L(gt, pr ) = −gt ∗ a ∗ (1 − pr )γ ∗ loglog( pr ) L(A, B) = 1 −
(2)
A∩B A∪B
(3)
4 Results All experimental results that are presented in Table 2 were conducted on a highperformance computer using an RTX 3090 with 24 GB VRAM (Video Ram Memory). The total time required for the training of all models was 3 months. The time and the amount of VRAM required to perform one epoch depend on the architecture of the model. The fastest architecture for training was Vgg19 with epoch time around 2 min, while allocating 12 GB of VRAM. The slowest architecture was DensNe201 with epoch time around 13 min and the allocation of 14 GB VRAM. In most cases, the best IoU on the Test images was achieved between 150 and 350 epochs. The Vgg19 backbone, using the ImageNet pretrained weights, combined with U-Net that reached a performance of 66.67% IoU at Fold 1 as shown in Table 5. The best prediction is presented in Fig. 9, where the model seems to have the capability to distinguish the classes “Overlap” and “Chromosome” correctly, meanwhile weakness to distinguish the classes “Nucleotide”, “Unknown”, and “Chromosome B” are also visible. The DenseNet201 backbone, using the ImageNet pretrained weights, combined with U-Net that reached performance of 49.57% IoU at Fold 8 as shown in Table 5. The best prediction is presented in Fig. 10, where the model is able to distinguish the classes “Nucleotide” and “Chromosome”, while class “Chromosome B” seems to be Table 5 The results (IoU %) of all architectures combined with U-Net using transfer learning for 400 epochs Backbone
Fold 1
Vgg19
66.67 52.55 61.10 61.23 61.81 54.37 63.71 53.29 49.59 46.68 57.10
Fold 2
Fold 3
Fold 4
Fold 5
Fold 6
Fold 7
Fold 8
Fold 9
Fold 10
Mean
DenseNet201 46.01 46.24 48.11 48.59 45.11 46.88 44.92 49.57 44.51 48.82 46.87 MobileNetv2 46.83 45.78 54.87 49.52 47.76 48.52 56.92 54.00 46.41 57.69 50.83 ResNet50
45.38 45.42 48.42 48.60 44.22 46.60 54.42 52.12 45.73 49.19 48.01
Inceptionv3
45.68 47.01 55.97 52.60 47.71 47.60 56.15 45.62 45.03 54.81 49.82
24 Exploiting Deep Learning for Overlapping Chromosome Segmentation
323
Fig. 9 On the left is the ground truth of Fold 1 and on the right is the best prediction of Vgg19 combined with U-Net
misclassified as “Unknown”. As for the classes “Chromosome A” and “Overlap”, they are not distinguished at all. The MobileNetV2 backbone, using the ImageNet pretrained weights, combined with U-Net that reached performance of 57.69% IoU at Fold 10 as shown in Table 5. The best prediction is visible in Fig. 11 where class “Chromosome A” is segmented correctly, although the same cannot be told for the class “Chromosome B”. Classes belonging to “Overlap” are not detected at all. The ResNet50 backbone, using the ImageNet pretrained weights, combined with U-Net that reached performance of 54.42% IoU at Fold 7 as shown in Table 5. The
Fig. 10 On the left is the ground truth of Fold 8 and on the right is the best prediction of DenseNet201 combined with U-Net
324
A. Nikolaou and G. A. Papakostas
Fig. 11 On the left is the ground truth of Fold 10 and on the right is the best prediction of MobileNetv2 combined with U-Net
best prediction is visible in Fig. 12 where classes “Chromosome”, “Nucleotide” and most of the region of class “Unknown” are segmented correctly, while classes “Chromosome A”, “Chromosome B” and “Overlap” are not segmented at all. The InceptionV3 backbone, using the ImageNet pretrained weights, combined with U-Net that reached performance of 56.15% IoU at Fold 7 as shown in Table 5. The best prediction is visible in Fig. 13 where classes “Chromosome”, “Nucleotide”, along with the most region of class “Unknown” are segmented correctly. While classes “Chromosome A”, “Chromosome B” and “Overlap” are not distinguished at all.
Fig. 12 On the left is the ground truth of Fold 7 and on the right is the best prediction of ResNet50 combined with U-Net
24 Exploiting Deep Learning for Overlapping Chromosome Segmentation
325
Fig. 13 The left is the ground truth of Fold 7 and on the right is the best prediction of Inceptionv3 combined with U-Net
All the experiments show difficulties in the classification of the classes “Chromosome A”, “Chromosome B”, and “Overlap”, although the intention was to investigate the performance impact by increasing the number of the annotated images rather than using the augmentation process to produce synthetic data. Another thing that has room for investigations is to discover the optimal combination of all hyperparameters, as it is already known, is a very time-consuming process. One of the hyperparameters is the optimizer, which needs to be tested for the impact of the optimizer between Adam, SGD, Adamax, Nadam, RMSprop, Ftrl, or even another Neural Network that could give a performance boost. Another crucial parameter is the learning rate, which can either be fixed, or it can be dynamically changed. Usually, the learning rate changes based on a learning rate scheduler pattern such as the Exponential Decay or the Adaptive Learning Rate. Another field that has plenty of space for investigation is the optimal loss function for the problem, by investigating the impact of other loss functions, such as Categorical Focal Dice Loss. As for the weights, there are two options. The first one is to use the pretrained weights of known networks in popular resolutions like 224 × 224 pixels. The second one is to use random weights from the beginning of the training and train the networks on native resolution of 768 × 576 pixels. The intention of this paper was not to directly compare the results with existing methodologies. Instead, it was an early attempt to take the current methodologies a step further in order to evolve such systems to recognize and discard irrelevant information which appears during the karyotyping process. Regarding the direct comparison, we consider it to be inconclusive due to the fact that preview studies that leverage machine/deep learning methodologies had less classes of interest. More classes require higher network capacity or extended training time.
326
A. Nikolaou and G. A. Papakostas
5 Conclusion The experimental results reveal that classes “Chromosome” and “Nucleotide” are classified by all models. Objects that belong in the “Unknown” class were classified in two cases. For the case of overlapping regions where three classes should be present, there are three cases, where “Chromosome A” is classified, two cases, where “Chromosome B” is classified and only one case, where the overlapping region was classified. The ResNet50 and the Inceptionv3 distinguish correctly the classes “Unknown” and “Nucleotide”, while the other three models are capable of distinguishing those classes under certain circumstances. Only in the case of Vgg19, all the classes were segmented correctly. The reason is that it includes a normalization preprocessing function before the Input layer. The DenseNet201 and the MobileNetv2 reveal weaknesses in distinguishing due to high brightness on those regions on the original microscope image. The segmentation of the class “Nucleotide” is very promising in order to remove the nucleotides, but definitely the label “Unknown” can be changed to “Nucleotide”, in order to investigate the impact in the performance of the segmentation after removal of class “Unknown”. The ResNet50 and Inceptionv3 architectures segmented correctly the classes “Unknown” and “Nucleotide”, but the classes “Overlap”, “Chromosome A” and “Chromosome B” were not distinguished at all. With optimal hyperparameter fine-tuning, both models could support geneticists to remove any unwanted entities from the microscopic image. The best result was 66.67% IoU in the case of Vgg19 backbone combined with UNet with a mean IoU score 57.1%. To achieve the complete segmentation of the class “Overlap”, two models are suggested. The first model is the InceptionV3 to segment and remove the classes “Nucleotide” and “Unknown”, and the second one is the Vgg19 to segment the classes “Chromosome A”, “Chromosome B”, and “Overlap”. The combination of them looks promising in assisting the geneticists during the karyotyping process. The future work of this study aims to enlarge the number of the annotated images to as many as possible, in order to rely less on data augmentation methods. Another important feature goal is to verify all the annotations by cytogeneticists, to make sure that all the labels and the topology of them are correct. As for the models, the main goal is to avoid transfer learning due to the small size of the input layer which is 224 × 224 pixels. This can be achieved by training the models on the native resolution of the microscope which is 768 × 576 pixels, relying less on data augmentation methods. As a result, the models will be trained from scratch on microscope images, without distortion, showing the strengths and the weakness of the five representative models as well as any other model that can be used on the available hardware.
24 Exploiting Deep Learning for Overlapping Chromosome Segmentation
327
References 1. Karyotyping for Chromosomal Abnormalities. https://www.nature.com/scitable/topicpage/kar yotyping-for-chromosomal-abnormalities-298. Last accessed 18 May 2021 2. Sunil KP, Vijay KK, Ankur J, Madhavi P, Pratiksha S, Kanchan R, Seema K (2016) Cytogenetic analysis for suspected chromosomal abnormalities. A five years experience. J Clin Diagn Res 10:GC01–GC05 3. Shervin M, Mehran F, Babak HK (2014) A geometric approach to fully automatic chromosome segmentation. In: IEEE signal processing in medicine and biology symposium (SPMB), pp 1–6. Philadelphia PA, USA 4. Seema AB, Mousami VM, Alwin DA, Rupali SK (2021) Automated metaphase chromosome image selection techniques for karyotyping current status and future prospects. Turk J Comput Math Educ TURCOMAT 12:3258–3266 5. Grisan E, Poletti E, Ruggeri A (2009) Automatic segmentation and disentangling of chromosomes in Q-band prometaphase images. IEEE Trans Inf Technol Biomed 13:575–581 6. Sharma M, Saha O, Sriraman A, Hebbalaguppe R, Vig L, Karande S (2017) Crowdsourcing for chromosome segmentation and deep classification. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW) 7. Manohar R, Gawande J (2017) Watershed and clustering based segmentation of chromosome images. In: IEEE 7th international advance computing conference (IACC) 8. Karvelis PS, Fotiadis DI, Tsalikakis DG, Georgiou IA (2009) Enhancement of multichannel chromosome classification using a region-based classifier and vector median filtering. IEEE Trans Inform Technol Biomed 9. Akila AS, Samarabandu J, Knoll J, Khan W, Rogan P (2010) An accurate image processing algorithm for detecting fish probe locations relative to chromosome landmarks on Dapi stained metaphase chromosome images. Canadian conference on computer and robot vision 10. Kubola K, Wayalun P (2018) Automatic determination of the G-band chromosomes number based on geometric features. In: 15th international joint conference on computer science and software engineering (JCSSE) 11. Andrade MF, Cordeiro FR, Macario V, Lima FF, Hwang SF, Mendonca JC (2018) A fuzzy-adaptive approach to segment metaphase chromosome images. In: 2018 7th Brazilian conference on intelligent systems (BRACIS) 12. Keerthi V, Remya RS, Sabeena K (2016) Automated detection of centromere in G banded chromosomes. Int Conf Inform Sci (ICIS) 13. Ehsani SP, Mousavi HS, Khalaj BH (2012) Iterative histogram matching algorithm for chromosome image enhancement based on statistical moments. In: 2012 9th IEEE international symposium on biomedical imaging (ISBI) 14. Karvelis PS, Fotiadis DI, Georgiou I, Sakaloglou P (2009) Enhancement of the classification of multichannel chromosome images using support vector machines. In: Annual international conference of the IEEE engineering in medicine and biology society 15. Menaka D, Vaidyanathan SG (2019) Expectation maximization segmentation algorithm for classification of human genome image. In: 3rd international conference on computing methodologies and communication (ICCMC) 16. Karvelis P, Likas A, Fotiadis DI (2010) Semi unsupervised M-FISH chromosome image classification. In: Proceedings of the 10th IEEE international conference on information technology and applications in biomedicine 17. Cao H, Deng H-W, Wang YP (2012) Segmentation of M-FISH images for improved classification of chromosomes with an adaptive fuzzy C-means clustering algorithm. IEEE Trans Fuzzy Syst 20:1–8 18. Dougherty AW, You J (2017) A kernel-based adaptive fuzzy C-means algorithm for M-fish image segmentation. In: International joint conference on neural networks (IJCNN) 19. Arora T, Dhir R, Mahajan M (2017) An algorithm to straighten the bent human chromosomes. In: Fourth international conference on image information processing (ICIIP)
328
A. Nikolaou and G. A. Papakostas
20. Neethu SM, Remya RS, Sabeena K (2016) Automated karyotyping of metaphase chromosome images based on texture features. In: International conference on information science (ICIS) 21. Yilmaz IC, Yang J, Altinsoy E, Zhou L (2018) An improved segmentation for raw G-band chromosome images. In: 5th international conference on systems and informatics (ICSAI) 22. Swati, Gupta G, Yadav M, Sharma M, Vig L (2017) Siamese networks for chromosome classification. In: IEEE international conference on computer vision workshops (ICCVW) 23. Saranya S, Loganathan V, RamaPraba PS (2015) Efficient feature extraction and classification of chromosomes. In: International conference on innovation information in computing technologies 24. Uhlmann V, Delgado-Gonzalo R, Unser M, Michel PO, Baldi L, Wurm FM (2016) Userfriendly image-based segmentation and analysis of chromosomes. In: IEEE 13th international symposium on biomedical imaging (ISBI) 25. Madian N, Jayanthi KB (2012) Overlapped chromosome segmentation and separation of touching chromosomes for auto-mated chromosome classification. In: Annual international conference of the IEEE engineering in medicine and biology society 26. Hu LR, Karnowski J, Fadely R, Pommier JP (2017) Image segmentation to distinguish between overlapping human chromosomes. In: 31st conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA 27. Saleh HM, Saad NH, Isa NA (2019) Overlapping chromosome segmentation using U-Net: convolutional networks with test time augmentation. Procedia Comput Sci 159:524–533 28. Sun X, Li J, Ma J, Xu H, Chen B, Zhang Y, Feng T (2021) Segmentation of overlapping chromosome images using U-Net with improved dilated convolutions. J Intell Fuzzy Syst 40(3):5653–5668 29. Mei L, Yu Y, Weng Y, Guo X, Liu Y, Wang D, Liu S, Zhou F, Lei C (2020) Adversarial multiscale feature learning for overlapping chromosome segmentation. Cornell University Ithaca, New York, USA 30. Song S, Bai T, Zhao Y, Zhang W, Yang C, Meng J, Ma F, Su J (2021) A new convolutional neural network architecture for automatic segmentation of overlapping human chromosomes. Neural Process Lett 31. M-FISH Chromosome Imaging Database. http://live.ece.utexas.edu/research/mfish.html. Last accessed 03 July 2021 32. Andrade MFS, Cordeiro FR, Silva JJG, Lima FF, Hwang S, Macário V (2019) CRCN-NE chromosomes dataset 33. Human karyotypes for teaching. https://worms.zoology.wisc.edu/zooweb/Phelps/karyotype. html. Last accessed 07 April 2021 34. Overlapping chromosomes. https://www.kaggle.com/jeanpat/overlapping-chromosomes. Last accessed 10 July 2021 35. CHR OVERLAPPING DATASET. https://github.com/SifanSong/Chr_overlapping_datasets. Last accessed 02 Jan 2022 36. Poletti E, Grisan E, Ruggeri A (2008) Automatic classification of chromosomes in Q-band images. In: 2008 30th annual international conference of the IEEE engineering in medicine and biology society (2008) 37. VGG Image Annotator. https://www.robots.ox.ac.uk/~vgg/software/via. Last accessed 6 Jan 2021 38. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Cornell University Ithaca, New York 39. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition (CVPR) 40. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) MobileNetV2 Inverted residuals and linear bottlenecks. In: IEEE/CVF conference on computer vision and pattern recognition 41. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR) 42. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: IEEE conference on computer vision and pattern recognition (CVPR)
24 Exploiting Deep Learning for Overlapping Chromosome Segmentation
329
43. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. Lecture notes in computer science, pp 234–241 44. CHEST XRAY DISEASE DETECTION USING VGG19. https://devmesh.intel.com/projects/ chest-xray-disease-detection-using-vgg19. Last accessed 20 June 2021 45. Xiao J, Wang J, Cao S, Li B (2020) Application of a novel and improved VGG-19 network in the detection of workers wearing masks. J Phys Conf Ser (2020) 46. Godlin Jasil SP, Ulagamuthalvi V (2021) Skin lesion classification using pre-trained DENSENET201 deep neural network. In: 3rd international conference on signal processing and communication (ICPSC) 47. Lu T, Han B, Chen L, Yu F, Xue C (2021) A generic intelligent tomato classification system for practical applications using DenseNet-201 with transfer learning. Sci Rep (2021) 48. To˘gaçar M, Cömert Z, Ergen B (2021) Intelligent skin cancer detection applying autoencoder, mobilenetv2 and spiking neural networks. Chaos, Solitons Fract (2021) 49. Huu PN, Thi Thu HN, Minh QT (2021) Proposing a recognition system of gestures using mobilenetv2 combining single shot detector network for smart-home applications. J Electr Comput Eng 1–18 50. Bharati S, Podder P, Mondal MR, Prasath VBS (2021) CO-ResNet optimized resnet model for covid-19 diagnosis from X-ray images. Int J Hybrid Intell Syst 17:71–85
Chapter 25
Deep Transfer Modeling for Classification and Identification of Tomato Plant Leaf Disease Rajeev Kumar Singh, Akhilesh Tiwari, and Rajendra Kumar Gupta
1 Introduction Tomato is the most popular agricultural produce in the country or all around the world. It is the second-largest agricultural produce next to the potato. Tomato is a pulpy edible fruit and color is glossy red, yellowish, or greenish eaten as a vegetable or mostly in the salad. According to the monthly report tomato, HSDD of agriculture, Ministry of agriculture in cooperation and farmers’ welfare, New Delhi [1] total production of the year 2018–19 is 19007.24 metric Tonnes and as per the first advance estimate, tomato production in 2019–20 is estimated to be 193.27 lakh Tonnes [1]. The tomato crop is subject to many diseases and attacked by server insects. Tomato leaf disease causes a major loss of economic and production in the agriculture produce. The disease diagnosis is a challenging task depending on its symptoms such as color spots or streaks seen on the leaves of a plant. In tomato plants fungi, bacteria, and viruses cause most of the leaf diseases. The diseases caused due to these organisms are featured by various visual symptoms that could be analyzed in the leaves or stem of a plant because disease is mostly affected by the roots, the stems, leaflets, leaves, and the fruits of the tomato plants. The prevention from tomato plant disease is identified by using computer vision techniques. The present era is the age of computer vision technology with the wide application of deep learning models. The various researchers develop a new evaluation in the interesting area of computer science like artificial intelligence, image processing, symbolic language detection, the processing of human pattern identification, voice or audio recognition, real-time communication identification, medical diagnosis, etc. R. K. Singh (B) · A. Tiwari Department of IT, Madhav Institute of Technology and Science, Gwalior (M.P.), India e-mail: [email protected] R. K. Gupta Department of CSE, Madhav Institute of Technology and Science, Gwalior (M. P.), India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_25
331
332
R. K. Singh et al.
The smart mobile-based diagnosis system [2] has been proposed for plant disease diagnosis of citrus plant disease based on the DenseNet. That is an intelligent service computing-based citrus disease diagnosis system. A CNN-based technique of regularization with the use of better generalization properties [3] has been discussed. The approach proves a better application model developed in the medical field. The performance of the automated system [4] has been investigated for the diagnosis of cucumber plant disease classification. Divide into two different categories such as healthy and unhealthy with the help of a convolution neural network. Investigating the practicability of CNN classification [5] based on the shape using the light particle shape. The six different light particles were used for the light-scattering pattern by the mimicking of shape classification of micro-sized phytoplankton. Perform the two different diagnosis applications based on CNN [6] that is interstitial lung disease classification and thoracoabdominal lymph node detection. The CNN model [7] for the identification of the leaf vein patterns has been evaluated. In the continuation, research work in interstitial lung disease for classification [8] of lung pattern using the Deep CNN has been developed. This deep CNN to classify the lung CT scan images patches into the 7 classes such as healthy tissue and 6 various ILD patterns.
2 The Proposal of Layering Architecture of Convolution Neural Network The convolution neural networks (CNN) are the process of multiple images that include multiple layers connected with neurons. CNN can be processing multidimensional data based on the feed-forward neural network. The artificial neural network (ANN) is built on the mathematical model structure of human brain neurons and synapses connecting to each other. Convolution neural networks are an upgraded version of ANN. The based models of the CNN are the combination of various layers such as input layer (Input image from Plantvillage Dataset), convolution layer, Two pooling layer (max pooling layer and Average pooling layer), Dense or fully connected layer, and lastly softmax layer as an output layer. The CNN architecture is explained in Fig. 1.
Fig. 1 Convolution neural network architecture
25 Deep Transfer Modeling for Classification and Identification of Tomato …
333
2.1 Conv2D or Convolution Layer The conv2D or convolution layer is extracting features at the input image. We can perform multiple convolution layers for the selection and extraction of the feature of an image. The proposed CNN architecture is working on a three-layer of convolution layers [9]. Each layer of convolution filters the size and extracts the features such as edges, shape, corner, etc. and identification of image and resulting output features are combined with the multiple convolutions which are calculated by Eq. 1. pil = f (
j∈R j
pl−1 ∗ qil j + bil ) j
(1)
where qi j is shown as a kernel, l is shown as a layer, bi is shown as bias, Ri is shown as input map as resulting output of pi .
2.2 Pooling Layer The pooling layer is a design of a block of convolution neural network that is Max Pooling and Average Pooling layer. Max pooling identifies the maximum feature property of the desired image, Average Pooling identifies the mean feature value in the image, and both the pooling layer can help to control the over-fitting and identify the disease feature of tomato plant leaves. The input volume size of the pooling layer W1 ∗ H1 ∗ D1 and resultant output size of volume W2 ∗ H2 ∗ D2 , which is, used to stride as Sand spatial extent as F. Here, W2 =
W1−F +1 S
(2)
H2 = (H 1 − F)/S + 1, and D2 = D1 Convolution neural network improves the ability of adding the network by using Average Pooling layer and Max Pooling layer.
334
R. K. Singh et al.
2.3 Fully Connected Layer A Fully Connected layer characterizes a very large number of free-trained parameters compared to the standard artificial neural network. The connectivity of sparse neural and to share of the filter value of the image location exploiting conversion are in variableness. The Fully Connected layer is where the input layers of the complete connection of the feed-forward network are fit with flattened data. Flattening collapses, the spatial dimensions of input into channel dimensions, shrinking the image of the max pooling layer and putting this into a tensor known as a vector so we have you know of the shrink the image from the max pooling layer that is convolutional operation.
2.4 Softmax Function or Normalized Exponential Function The softmax function or also known as softargmax function is equal to the exponential of the element at position 1 divided by the sum of the exponentials of the elements of the vector so while the other activation functions get an input value and transform it inattentive of the anther elements that are softmax function. The softmax hypothesis function is show as: h θ (x) =
1 1 + e(−θ T x)
(3)
The train model parameter θ and cost function j (θ ) is m j (θ ) = − y i logh θ x i + (1 − y i )log(1 − h g (x i )) i=1
(4)
The statistical probability class of x and class of j is for various dataset train data varies range from x 1 , y 1 , . . . . . . .x m , y m ar eas eθ j p x i = j y i,θ = k
T
yi
l=1 e
θiT y i
(5)
The identification and classification of tomato plant leaf disease used to train networks based on the Backpropagation gradient. The convolution layer and its activation neurons will be used for the identification of tomato plant disease detection and respect for them.
25 Deep Transfer Modeling for Classification and Identification of Tomato …
335
2.5 CNN Training Algorithm of Backpropagation Gradient The CNN training algorithm of Back propagation gradient is calculating gradients of the loss function or error function, then improving the existing measurable factor in response to the gradients. The error function of mean squared over P train samples isE=
1 P m p p 2 tk − yk n=1 k−1 P
(6)
Here. p tk Is the kth calculated size of the pth recognition pattern label and. p yk is the value of the kth resulting of output layer unit in process to the pth input recognition pattern. The resulting vector of the current level layer j using the input layer h as x j = f W j x j−h + b j
(7)
Here W J X j−h is the addition function output and b j is bias. The phase of backward using the given connection to the back propagates error from high too low to the layers. T δ j = W j+1 δ j+1 f W j x j−h + b j
(8)
The weight updating representation by W j . ∂E T = x j−h (δ j ) j ∂W W j = −η
∂E ∂W j
(9) (10)
2.6 Input Image Acquire of Tomato Plant Disease Plant village dataset used for identification and classification of tomato plant disease. The plant village dataset contains nine-disease identification only in a tomato plant and with healthy leaves used for comparisons. The entire dataset is divided into two categories: the first is the train set, and second is the test set. Dataset using 600 images and 90% image for testing dataset and 10% image for the training dataset. The image dataset mainly 09 various tomato leaf diseases with one dataset healthy tomato leaf (Table 1).
336 Table 1 Layer Implementation of the CNN model
R. K. Singh et al. Layer
Filter size
Output size
Input
–
256*256*3
Conv2D Layer 1
11
256*256*32
Max_Pooling_Layer 1
5
85*85*32
Conv2D_Layer 2
7
85*85*64
Max_Pooling_Layer 2
3
42*42*64
Conv2D Layer 3
5
42*42*128
Max_Pooling_Layer 3
3
21*21*128
Output
–
15*1
3 Analysis of Proposed Techniques In the proposed study, the convolution neural network-based techniques of supervised deep learning are used for disease detection of tomato leaf disease. We have 1800 images used for the Plantvillage dataset for 09 different tomato leaf diseases each of individual tomato leaf diseases used for 200 images. Tomato plant images divide into two kinds of images i.e. unhealthy and image and healthy leaf disease dataset. The qualities of Plantvillage dataset images are good and there is no bluer or noise then we can use without preprocessing, selected list of images one by one labels groups of tomato leaf disease with an image size of 256*256 and its corresponding feature maps. Tomato leaf diseases use the dataset of Plantvillage in each of the disease categories shown in Fig. 2a–i and healthy leaf image can be seen in Fig. 2j. We compare the identification performance of disease to various parameters of the convolution neural network model and perform convolution filter, max pooling filter, image kernel size, and input shapes. Training set image of each disease using 200 images and testing set image use 150 images for individual tomato leaf diseases and compared the convolution filter size of 32*32, 64*64, and 128*128, kernel size is 3*3, 2*2, and 2*2 and Tensor shape size is 32*32*3, 64*64*3, and 128*128*3. Tomato leaf disease using implementations of the CNN model obtains the result accuracy shown in Table 2. The calculating model accuracy used after total filer of 591/59 and the average test accuracy 96.84%. The calculated data of the CNN model as per above table the Train and valid accuracy and losses for tomato leaf disease shown in Fig. 3a, b.
4 Conclusion The Proposed work has been implemented and investigated using convolution neural networks for identification and classification of tomato leaf disease. The presented model obtains the better identification and recognition accuracy and performance as comparisons to another deep learning and extraction of feature models. In this
25 Deep Transfer Modeling for Classification and Identification of Tomato …
337
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
Fig. 2 a Tomato__Target_Spot. b Tomato__Tomato_mosaic_virus. c Tomato__Tomato_YellowLeaf__Curl_Virus. d Tomato_Bacterial_spot. e Tomato_Early_blight. f Tomato_Late_blight. g Tomato_Leaf_Mold. h Tomato_Septoria_leaf_spot. i Tomato_Spider_mites_Two_spotted_spider_mite. j Tomato_healthy_Leaf
338
R. K. Singh et al.
Table 2 Results of tomato leaf disease identification using CNN model Total Filter_size Tensor Epochs = of shape 25 convolution Batch_Size (BS) = 32
Max_pooling Accuracy Accuracy Validations Accuracy size loss for train loss for dataset validation of dataset
01/25
32*32*3
3*3
32*3*3
0.2021
0.9391
0.6433
0.8752
02/25
32*6*6
64*64*3
2*2
0.1509
0.9473
0.7832
0.9010
03/25
32*9*9
128*128*3 2*2
0.1487
0.9485
1.5699
0.8750
04/25
64*3*3
32*32*3
3*3
0.1224
0.9579
1.0015
0.8759
05/25
64*6*6
64*64*3
2*2
0.1125
0.09610
0.7103
0.8979
06/25
64*9*9
128*128*3 2*2
0.1098
0.9619
0.4621
0.9124
07/25
128*3*3
32*32*3
3*3
0.1055
0.9630
0.1446
0.9517
08/25
128*6*6
64*64*3
2*2
0.0935
0.9668
0.6412
0.9030
09/25
128*6*6
128*128*3 2*2
0.0841
0.9691
0.2328
0.9470
10/25
32*3*3
32*32*3
3*3
0.0751
0.9730
0.3133
0.9341
11/25
32*6*6
64*64*3
2*2
0.0708
0.9749
0.4679
0.9174
12/25
32*9*9
128*128*3 2*2
0.0744
0.9727
0.4710
0.9231
13/25
64*3*3
32*32*3
3*3
0.0603
0.9772
0.1772
0.9559
14/25
64*6*6
64*64*3
2*2
0.1111
0.9649
1.8174
0.8750
15/25
64*9*9
128*128*3 2*2
0.0854
0.9676
0.7875
0.8970
16/25
128*3*3
32*32*3
3*3
0.0714
0.9742
0.6028
0.9148
17/25
128*6*6
64*64*3
2*2
0.0617
0.9781
0.5249
0.9155
18/25
128*6*6
128*128*3 2*2
0.0639
0.9759
0.2399
0.9412
19/25
32*3*3
32*32*3
3*3
0.0605
0.9790
0.1152
0.9621
20/25
32*6*6
64*64*3
2*2
0.0564
0.9791
0.2154
0.9561
21/25
32*9*9
128*128*3 3*3
0.0579
0.9795
0.0755
0.9751
22/25
64*3*3
32*32*3
3*3
0.0482
0.9820
0.0630
0.9788
23/25
64*6*6
64*64*3
2*2
0.0478
0.9832
0.2837
0.9503
24/25
64*9*9
128*128*3 2*2
0.0427
0.9841
0.0652
0.9773
25/25
128*3*3
32*32*3
0.0424
0.9843
0.1153
0.9684
3*3
research, the main challenge is the large number of convolution layers associated with max pooling layer, and activation layer in each epoch iteration and control of the large number of Plantvillage dataset of tomato disease. This CNN model achieved better accuracy 96.84% as compared to other classification techniques. In future work, focuses on tensor-based algorithms in various fields of automatic computer vision such as automatic diagnosis of medical, automatic disease detection in human or agriculture, remote sensing device, and automated aviation field.
25 Deep Transfer Modeling for Classification and Identification of Tomato …
(a)
339
(b)
Fig. 3 a Training and validation accuracy. b Training and validation loss
References 1. Akerlof (1970) Monthly report tomato (January 2020) on Horticulture statistics division department of agriculture, cooperation & farmers welfare Ministry of agriculture & farmers welfare government of India, New Delhi. J Chem Inf Model 53(9):1689–1699 2. Pan W, Qin J, Xiang X, Wu YAN, Tan YUN, Xiang L (2019) A smart mobile diagnosis system for citrus diseases based on densely connected convolutional networks. IEEE Access 7 3. Khatami A, Nazari A, Khosravi A, Lim CP, Nahavandi S (2020) A weight perturbationbased regularisation technique for convolutional neural networks and the application in medical imaging. Expert Syst Appl 113196 4. Kawasaki Y, Uga H (2015) Basic study of automated diagnosis of viral plant diseases using convolutional. 1:638–645 5. Ding C (2020) Convolutional neural networks for particle shape classification using lightscattering patterns. J Quant Spectrosc Radiat Transf 245:1–7 6. Shin H et al (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35:1–14 7. Grinblat GL, Uzal LC, Larese MG, Granitto PM (2016) Deep learning for plant identification using vein morphological patterns. Comput Electron Agric 127:418–424 8. Anthimopoulos M et al (2016) Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE Trans Med Imaging 0062:1–10 9. Khamparia A, Singh A, Kr A, Pandey B (2019) Sustainable computing: informatics and systems classification and identification of primitive kharif crops using supervised deep convolutional networks. Sustain Comput Informatics Syst
Chapter 26
Designing Real-Time Frame Modification Functions Using Hand Gesture Recognition Shivalika Goyal, Himani, and Amit Laddi
1 Introduction At present, several factors have come together to bring about a computer vision revolution. Computer vision algorithms are used to recognize objects in photos or videos and extract high-dimensional data from the real world to generate numerical or symbolic information. Mobile technology with built-in cameras, more inexpensive and easily available computing power, supportive hardware designs, and various new algorithms that take advantage of computer vision technology are all on the horizon as seen in the work by Varun et al. [1] and Sai Mahitha et al. [2]. Aksaç et al. [3] explained gesture as a type of non-verbal communication in which the body’s movement is used to express a specific message emanating from several areas of the human body, the most popular of which is the hand and face. Gesture recognition is a computational method that uses mathematical algorithms to identify and understand human gestures. Building application interfaces that control each part of the human body to communicate naturally is a significant focus of research in Human–Computer Interaction (HCI), with the hands as the most effective interaction element, given their capabilities. Thakur et al. [4], Tran et al. [5], and Ghodichor and Chirakattu [6] conducted studies justifying, when communicating, hand movements are purposeful, be it simple or complicated. Hands are used to indicate objects or people (Deictic Gestures) while elaborating any abstract ideas (Metaphoric Gestures), explaining the visual images of what a person is talking about (Iconic gestures) or conveying physical articulations paired with their grammar and S. Goyal (B) · Himani · A. Laddi CSIR-CSIO, Chandigarh, India e-mail: [email protected] Himani e-mail: [email protected] A. Laddi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_26
341
342
S. Goyal et al.
vocabulary (sign languages). As a result, when one uses hand gestures to connect with computers instead of a mouse, joystick, glove, or any other external hardware device, they find it a more interactive way to communicate with the machines. Many frameworks or machine learning libraries for hand gesture detection have been developed recently to make it easier for researchers to create AI (Artificial Intelligence) applications as specified in work presented by Chowdhury et al. [7] and Oudah et al. [8]. In the past, most of the research focused on hand gesture identification and tracking, with little emphasis dedicated to the functions that can be performed following hand gesture tracking. In this paper, the research work focuses on designing algorithms and functions for real-time frame modifications using distance-based calculation and ROI selection with hand gestures. The MediaPipe framework is used to identify data points on hands, and a camera is used to collect hand positions.
2 Research Methodology 2.1 Setup Design All the experimental works were executed utilizing Python programming language (3.7) with OpenCV and Numpy libraries along with MediaPipe framework on desktop(OS—64bit) with Intel(R) Core(TM) i7 processor, 12 GB RAM, and a USB Camera (resolution: 640 × 480). Python is a high-level programming language that is interactive, object-oriented, and adaptable for various applications [9]. OpenCV (Open Source Computer Vision Library) is a real-time computer vision programming library. It is intended to provide a common infrastructure for computer vision applications and make it easier for commercial products to incorporate machine perception [10]. For scientific computing, NumPy is the most significant Python package. This library provides support for massive, multi-dimensional arrays, matrices, and various high-level mathematical functions that can be used with these arrays. It creates objects (such as masked arrays and matrices) and routines for executing quick array operations, including mathematical, shape, sorting, logical, discrete Fourier transforms, basic statistical operations, and more [11].
2.2 Experimental Design The conceived and constructed frame modification functions functional in realtime include multiple steps. In the next section, each function with its algorithm is presented in detail. A flowchart of hand gesture recognition with designed functionality is shown in Fig. 1.
26 Designing Real-Time Frame Modification Functions Using Hand …
343
Fig. 1 Flowchart of functionalities using hand gesture recognition
MediaPipe is used for hand tracking. It is a framework for creating machine learning pipelines for time-series data such as video and audio. It is so light and efficient that it can operate on embedded IoT devices. MediaPipe Hands is a highresolution hand and finger tracking solution. A single frame is used to determine 21 3D landmarks of a hand, as shown in Fig. 2, using machine learning (ML). It employs a machine learning pipeline that entails the collaboration of many models [12]. Fig. 2 21 3D landmarks of a hand using MediaPipe
344
S. Goyal et al.
2.3 Modification Functions Virtual Pen in real time Fingers can be used to draw on the screen as a virtual pen, i.e., any number, alphabet, or drawing can be designed, or a real-time person or object can be marked via fingers. The function performs when the index finger is up and stops on increasing the number of fingers upward. It uses NumPy function to create a 2D array of zeros resulting in the proposed canvas image as follows: canvas image = np.zeros((480, 640, 3), np.uint8) The canvas image is then converted to an inverse image (img_inverse) using the cv2.threshold function with the following parameter: cv2.THRESH_BINARY_INV The white backdrop is now visible, and the index finger can be used to draw in black on it. It is shown in Fig. 3a. Further, to bring this virtual drawing on the real-time frame, the used function is as follows:
(a)
(b)
(c)
(d)
Fig. 3 Virtual writing and drawing a Binary inverse image, b Conjunction of original and canvas image, c Color of interest chosen as green, and d Disjunction of original and inverse image
26 Designing Real-Time Frame Modification Functions Using Hand …
345
cv2.bitwise_add(real frame, img_inverse) Bitwise “ADD” operation performs conjunction of images overlaying the drawing on a real-time frame Fig. 3b. An “OR” operation is utilized between the frame and canvas image to change the color of the drawing pen from black to another color, which conducts a bitwise disjunction of images. Following that, as illustrated in Fig. 3c, d, the color of interest can be chosen. cv2.biwise_or(frame, canvas) ROI selection and cropping The proposed algorithm involves selecting points (x, y) when the index and middle fingers are raised. The distance between the tips of these two fingers is calculated using the following function: math.hypot(tip of middle(x, y)-tip index(x, y,)) Next, the Draw mode is activated if the distance between the index and middle fingers is less than 25 pixels. The index finger must be raised to mark a point. The x and y pixel value appends into x_array and y_array, respectively. It automatically quits the draw mode after the point is marked. The remaining three locations are marked in the same way, either clockwise or anticlockwise. When the four points are indicated, the array length criteria of 4 are met, resulting in a red line connecting them, and the Region within is made visible in real time as the Region of Interest in a separate window. The results, i.e., ROI in real time, are illustrated in Fig. 4a, b using a four-point selection method. Saving the Desired Frame This function allows the user to save any desired frame from a video in real time. The index, middle, and ring fingers are used, as shown in Fig. 5. The frame is stored when all three of them are up, and the distance between tips of the index and ring finger is less than 35 pixels. The pixel values
(a) Fig. 4 ROI selection a Real-time original frame b Region of interest
(b)
346
S. Goyal et al.
may be changed according to the user’s operable distance and finger’s thickness. The operation used is cv2.imwrite(“image_ name” + timestamp + “.jpg”,image_ variable) Color Mode Change Human vision is not designed to pick up on subtle changes in grayscale pictures. Human eyes are more sensitive to color changes. There is often a need to recolor photographs to get a sense of them. A Colormap is the mapping of data values to colors resulting in visual data structures. Colormaps are widely utilized in computer sciences, including computer graphics, visualization, computer vision, and image processing, to name a few [12]. The function operates when the index finger, middle finger, ring and pinky finger are up, and the distance between the tip of the index and pinky finger is less than 410 (distance can be modified as per user), shown in Fig. 6a, b. The function used to perform this operation is cv2.applyColorMap() with COLORMAP_BONE parameter. Window closing When the user is interested in closing the virtual window displaying real-time video captured by the camera, the user has to put down the index finger, middle finger, ring finger, and pinky finger. The real-time window frame will be closed, and the code will successfully quit. The operation used for detection of the particular gesture is cv2.destroyAllWindows() exit(0)
Fig. 5 Capture frame from real time with three fingers up
26 Designing Real-Time Frame Modification Functions Using Hand …
(a)
347
(b)
Fig. 6 Color mode change a Original frame, b Frame with Colormap one
3 Results and Conclusion The real-time frame modification functions based on the detection and tracking of hand gestures are presented in this paper. There are five functions explained in detail: Virtual Pen in real time, ROI Selection and Cropping, Saving the Desired Frame, Color Mode Change, and Window Closing. These processes use image processing and the application of machine learning and computer vision-based algorithms. With no markers, gloves, or a mouse, the person uses their hand to interact with the computer via a camera. Testing results indicate that the features perform effectively in several lighting circumstances, as shown in Fig. 7a–c, with various backdrops and tracks responding across the camera’s field of view. These functions require no further training and hence work well with low computing power. They are also simple to use, with no restrictions on hand mobility and no requirement for external gear such as gloves or finger caps. The customizable function design technique will exhibit precise hand motion estimations and practical applications for medical settings, graphic designers, and virtual and augmented reality setups. The above computer vision-based functions can be highly beneficial to the disabled or individuals with special needs.
(a)
(b)
(c)
Fig. 7 Results under different lighting conditions a Room lights ON with ambient light, b Room lights ON without ambient light, c Only ambient light
348
S. Goyal et al.
4 Discussion and Future Work In recent years, improvements in computer vision, sensors, gesture detection and tracking, machine learning, and deep learning have made technology more accessible and accurate, and it has begun to permeate different sectors as elaborated in detail by Sharma and Verma [13], and Rautaray and Agrawal [14]. The above-designed functions are real-time frame modification operations which on further improvements are of great value in areas like health care: surgical setups, empowerment of disabled, sign-language-controlled applications, virtual reality: hand-controlled augmentation, and consumer electronics: gesture-based devices, smart home systems, and more. Five primary actions are done in real time in this study, with the possibility of developing additional functionalities. There are a few constraints that can be worked around. Rather than collecting preset pixel values, the ROI selection can be improved by manually selecting the region to be captured by hand. We plan to expand our work to handle additional gestures and communicate with other smart environments, giving users the freedom to develop gestures based on their own needs, practicality, and use cases.
References 1. Varun KS, Puneeth I, Jacob TP (2019) Virtual mouse implementation using open CV. In: 2019 3rd international conference on trends in electronics and informatics (ICOEI). IEEE, pp 435–438 2. Sai Mahitha G, Revanth B, Geetha G, Sirisha R (2021) Hand gesture recognition to implement virtual mouse using open source computer vision library: python. In: Proceedings of international conference on advances in computer engineering and communication systems. Springer, Singapore, pp 435–446 3. Aksaç A, Öztürk O, Özyer T (2011) Real-time multi-objective hand posture/gesture recognition by using distance classifiers and finite state machine for virtual mouse operations. In: 2011 7th international conference on electrical and electronics engineering (ELECO). IEEE, pp II-457 4. Thakur S, Mehra R, Prakash B (2015) Vision based computer mouse control using hand gestures. In: 2015 international conference on soft computing techniques and implementations (ICSCTI). IEEE, pp 85–89 5. Tran DS, Ho NH, Yang HJ, Kim SH, Lee GS (2021) Real-time virtual mouse system using RGB-D images and fingertip detection. Multimed Tools Appl 80(7):10473–10490 6. Ghodichor A, Chirakattu B (2015) Virtual mouse using hand gesture and color detection. Int J Comput Appl 975, 8887 7. Chowdhury SR, Pathak S, Praveena MA (2020) Gesture recognition based virtual mouse and keyboard. In: 2020 4th international conference on trends in electronics and informatics (ICOEI) (48184). IEEE, pp 585–589 8. Oudah M, Al-Naji A, Chahl J (2020) Hand gesture recognition based on computer vision: a review of techniques. J Imaging 6(8):73 9. Welcome to Python.org. Python.org. (n.d.). Retrieved March 10, 2022, from https://www.pyt hon.org/about/ 10. About. OpenCV (2020). Retrieved March 10, 2022, from https://opencv.org/about/ 11. About Us. NumPy. (n.d.). Retrieved March 10, 2022, from https://numpy.org/about/ 12. Hands. mediapipe. (n.d.). Retrieved March 10, 2022, from https://google.github.io/mediapipe/ solutions/hands.html
26 Designing Real-Time Frame Modification Functions Using Hand …
349
13. Sharma, R. P., & Verma, G. K. (2015). Human computer interaction using hand gesture. Procedia Comput Sci 54:721–727 14. Rautaray SS, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43(1):1–54
Chapter 27
Benchmarking of Novel Convolutional Neural Network Models for Automatic Butterfly Identification Manjunath Chikkamath, DwijendraNath Dwivedi, R. B. Hirekurubar, and Raj Thimmappa
1 Introduction Insects are the most diverse group of animals; they include more than a million described species and represent more than half of all known living organisms. The total number of species is estimated at between six and ten million [1] potentially over 90% of the animal life forms on Earth are insects. Insects may be found in nearly all environments, although only a small number of species reside in the oceans. Class Insecta has multiple orders, one of the most important orders of Class Insecta is Lepidoptera. Butterflies are the prominent species of class Lepidoptera along with the moths and the skippers. Butterflies are part of numerous species of insects belonging to multiple families and species. Butterflies live in many different types of habitats as they can be found on every continent, except Antarctica. There are many kinds of Butterflies which are very complex and diverse. The most interesting features of the butterflies are the scales that cover the whole-body including wings and proboscis [2]. This results in various variances of butterfly wing color and shape. According to recent estimates, the number of butterfly species found in the world varies from 15,000 to 21,000 [3]. Such a huge number of species with various wing shapes and color makes the identification of a specific species very difficult. The conventional butterfly identification needs the butterfly to be captured manually by trap\net, which is a time and energy consuming process. Not only that, but there M. Chikkamath Bosch Global Software Technologies, Bangalore, India D. Dwivedi (B) Krakow University of Economics, Rakowicka 27, 31-510 Kraków, Poland e-mail: [email protected] R. B. Hirekurubar University of Horticultural Sciences, Bagalkot, India R. Thimmappa University of Agricultural Sciences, Bangalore, Karnatka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_27
351
352
M. Chikkamath et al.
is also a need for an expert entomologist with vast insect taxonomic experience to identify specific species. Sometimes there arises a need to refer to a butterfly classification encyclopedia which is large and not to be carried around on the field trip [4]. Due to numerous species, high similarity and the characteristics of the distinction of butterflies are not evident, the identification and classification of butterflies have the problems of low accuracy and slow recognition. Furthermore, the number of entomologists with specialization in taxonomy and trained technicians are very less. Thus, there is a need to change traditional identification methods based on anatomy, morphology, genital differences, etc. Therefore, there arises a more flexible approach to use image data for the automatic identification of butterflies. By using the novel image processing techniques, the problem of approaching an entomologist and bringing an encyclopedia can be solved by just a click. Thus, based on the problems discussed, a study of butterfly species identification using image processing technique and Convolution Neural Network (CNN) is proposed. CNN is known as a deep learning algorithm used to classify images, detect objects, and perform segmentation. The most popular pre-trained available CNN algorithms are AlexNet [5], LeNet [6], GoogLeNet [7], ZFNet [8], VGGNet [9], EfficientNet [10], DenseNet [11], and ResNet [12]. Some studies have been carried out on classifying butterflies using these algorithms in a silo [13–17] or they considered a lesser number of species for their study, than what authors considered in their current work [13–15, 17–21]. So, in this paper authors tried to benchmark different convolutional neural networks for butterfly species. Authors downloaded the dataset of butterfly pictures from Kaggle data repository, and some were downloaded from freely available insect websites. In this study, authors collected images of 75 different butterfly species to learn multiple species recognition and train various classifier models with strong generalization ability for a wide range of species recognition. After applying some preprocess to the images of butterflies, authors used deep convolutional neural network-based VGG16 [9], VGG19 [9], EfficientNetB3 [10], EfficientNetB7 [10], and DenseNet201 [11] architectures to build a model for butterfly classification and benchmark the results obtained from each algorithm. This paper is organized as follows: Sect. 2 focuses on the relevant and prominent work done in the concerned field. Section 3 elucidates the materials and the methods used along with the steps taken to obtain the necessary results. Section 4 pertains to the results and the analysis. Section 5 includes the conclusion of the paper and provides the scope for future work.
2 Literature Study and Related Work 2.1 CNN Models for Plants Disease Identifications Shao Xiang et Al. [22] a novel lightweight convolutional neural network (CNN)based network with channel shuffle operation and multiple-size module (L-CSMS)
27 Benchmarking of Novel Convolutional Neural Network Models …
353
is proposed for plant disease severity recognition. Vibhor Kumar Vishnoi (2020) used modern feature extraction techniques for several crop categories. Sasikala Vallabhajosyula (2021) [23] proposed an automatic plant disease detection technique using deep ensemble neural networks (DENN). The performance of DENN outperforms state-of-the-art pre-trained models such as ResNet 50 & 101, InceptionV3, DenseNet 121 & 201, MobileNetV3, and NasNet. Sharifah Farhana Syed-Ab-Rahman et al. [24] employed a two-stage deep CNN model for plant disease detection and citrus disease classification using leaf images. The proposed model delivers 94.37% accuracy in detection and an average precision of 95.8%. Rajeev Kumar Singh et al. [25] explored the AlexNet model for fast and accurate detection of leaf disease in maize plants. By using various iterations such as 25, 50, 75, and 100, our model has obtained an accuracy of 99.16%. Shantkumari et al. (2020) proposed an Adaptive Snake Model for segmentation and region identification’s (Adaptive Snake Model) is two phase segmentation model namely common segmentation and absolute segmentation. Namita Sengar et al. [26] used adaptive intensity-based thresholding for the automatic segmentation of powdery mildew disease which makes this method invariant to image quality and noise. The proposed method was tested on the comprehensive dataset of leaf images of cherry crops, which achieved a good accuracy of 99%. Edson Ampélio Pozza et al. [27] evaluated computer vision using red, green, blue (RGB) imagery and machine learning algorithms to detect seed-borne fungi on common bean (Phaseolus vulgaris L.) seeds. The use of spectral indices derived from RGB imagery has extended the training capability of algorithms, demonstrated by the importance of the variables and decision tree used for target prediction by the rf and rpart1SE algorithms. Chandrasen Pandey et al. [28] proposed an automatic computer vision-based method for the identification of yellow disease, also called Chlorosis, in a prominent leguminous crop like Vigna mungo. Nandhini et al. [29] presented an efficient Mutation-based Henry Gas Solubility Optimization (MHGSO) algorithm to optimize the hyperparameters of the DenseNet121 architecture. When tested with a field dataset with complicated backgrounds, the proposed MHGSO-optimized DenseNet-121 architecture achieves accuracy, precision, and recall scores of 98.81, 98.60, and 98.75, respectively. Viswanath Muthukrishnan et al. (2021) converted the Philodendron leaf from natural color to grayscale and applied the technique of hue, saturation, and value to the gray image for disease recognition. Gunjan Mukherjee et al. (2021) presented a computer vision-based system to classify the medicinal leaves along with the corresponding maturity level. The presented CNN-driven computer vision framework can provide about 99% classification accuracy for the simultaneous prediction of leaf species and maturity stage. Ruchi Gajjar et al. (2021) proposed a deep convolutional neural network architecture to classify the crop disease, and a single shot detector is used for the identification and localization of the leaf. Ismail El Massi et al. (2020) proposed a system that is built on the classifiers combination method, it included two variants of combination: serial amalgamation of two classifiers, and hybrid combination of three classifiers including a serial amalgamation of two classifiers in parallel with an individual classifier. Siddharth Singh Chouhan et al. [30] proposed a method named IoT_FBFN using Fuzzy Based Function Network (FBFN) enabled with IoT. The proposed IoT_FBFN
354
M. Chikkamath et al.
network having the computational power of fuzzy logic and learning adaptability of neural networks achieves higher accuracy for identification and classification of galls when compared with the other approaches. Junde Chen et al. (2021) started with the enhancement of the artificial neural network where the extracted pixel values and feature values have been used as input to the enhanced artificial neural network for the image segmentation. Following the establishment of a CNN-based model, the segmented images are input to the proposed CNN model for the image classification. Athiraja et al. (2020) used adaptive Neuro-Fuzzy Inference System and case-based reasoning to identify the banana diseases in the earliest stage. Archana et al. (2021) used the modified K-means segmentation algorithm to separate the targeted region from the background of the rice plant image.
2.2 CNN Models for Butterfly and Other Species Prediction Prudhivi et al. [22] aimed to introduce efficient techniques for animal species image classification with the goal of achieving a good amount of accuracy. Bottleneck features were trained and synched to the pretrained architecture to achieve high accuracy at the same time numerous deep learning architectures are compared with the dataset. Yang et al. [23] found that state-of-the-art lightweight convolutional neural networks (such as SqueezeNet and ShuffleNet) have the same accuracy as classical convolutional neural networks such as AlexNet but fewer parameters, thereby not only requiring communication across servers during distributed training but also being more feasible to deploy on mobile terminals and other hardware with limited memory. Almryad and Kutucu [13] proposed an automated butterfly species recognition model with the help of deep neural networks. Experimental outcomes on 10 familiar butterfly species showed that the technique successfully identified almost all the butterfly species. Wang et al. [24] built an effective CNN-based automatic hand-held mobile pest monitoring system to detect and count rice planthoppers. Tang et al. [18] showed Experimental results on the public Leeds Butterfly dataset demonstrated their method outperforms the state-of-the-art deep learningbased image segmentation approaches. Tetila et al. [18] compared the performance of Inception-v3, the resnet-50, VGG-16, VGG-19, and Xception and found that the architectures evaluated can support specialists and farmers in the pest control management in soybean fields. To shape a organic to fine object perception, a hierarchical convolutional neural network (CNN) representing the skip connections convolutional neural network (S-CCNN) was proposed by Lin et al. [19], focusing on a butterfly domain at the subspecies level owing to the fine-grained structure of the category taxonomy. Amarathunga et al. [26] conducted a systematic literature review (SLR) to analyze and compare primary studies of image-based insect detection and species classification methods. To perform rapid identification of tiny congeneric species and to improve rapid and automatic identification in the field, Takimoto et al. [27] customized an existing CNN-based method for a field video involving two Phyllotreta beetles. Team performed data augmentation using transformations, syntheses,
27 Benchmarking of Novel Convolutional Neural Network Models …
355
and random erasing of the original images. They then proposed a two-stage method for the detection and identification of small insects based on CNN.
3 Materials and Methods 3.1 Dataset In this study authors collected more than 9000 images belonging to 75 different species of butterflies. The dataset contains both specimen and natural images. Authors downloaded images from Kaggle data repository [31], and some were downloaded from freely available insect websites. All the downloaded images belonged to the Red Green Blue color space and were stored in the JPG format. Dataset has captured images in various positions, from the top, front, rear, right, or left side. Each of the downloaded images was saved to the respective directory named after each species. The labeled butterfly images were crosschecked and confirmed by the Entomologist professional. So, the dataset contains a total 75 classes of images specific to each species of butterfly. Sample images for few of the species are given in Fig. 1. If the images contain more than one butterfly even if they are of the same species were deleted. The images which contain butterfly larvae and caterpillars in them are also deleted.
Fig. 1 Butterfly species
356
M. Chikkamath et al.
The downloaded dataset consisted of photographs with less noise and therefore, noise removal will no longer become an essential preprocessing step. Since the images are of different resolutions, only resizing of images as the essential preprocessing step was followed. Resizing is a method in which all pieces of the matrix seem like distinct insulated pixels in the displayed image. In a real color image the first level characterizes the depth of red pixels, the second plane denotes the depth of green pixels, and the last level symbolizes the depth of blue pixels. To alter the dimensions of an image authors used the above approach to demonstrate two behaviors in the column. If the input image has extra dimensions, only the primary two dimensions can be corrected [28]. Here, every image is rescaled into a 224 × 224 ratio scale, as follows (Eq. 1). r (I mg, scale) =
n k=1
(I mg K )
(1)
3.2 Deep Neural Network Deep learning, particularly over the past few years, has become essential research for applications driven by artificial intelligence [29]. As a result of significant accomplishments in the area of computer vision, natural language processing, and speech recognition, the rate of its use is expanding very fast. Deep learning algorithms primarily built on Artificial Neural Networks, motivated by a simplification of human brain neurons, come to the vanguard with their success in the learning stage. Deep learning algorithms can resolve the issues like feature extraction and selection by robotically selecting the unique features from the input data. Plenty of labeled data is required to build deep learning models in comparison to traditional neural network models. The fast growth of plausible information nowadays has made the part of deep learning exceptionally vital in problem-solving. These comments have attracted the eyes of many researchers in the area of computer science and its application. Convolutional Neural Networks (CNNs) are considered one of the fundamental architectures of deep learning and a Multi-Layer Perceptron (MLP) forward-feed neural network motivated by the vision of humans [32]. CNNs are mainly used for image classification, similarity detection, and object recognition. CNNs, which emphasizes largely on image classification, are nowadays used in almost every field involving classification. CNN’s architecture consists of many constructing blocks, inclusive of convolution layers, clustering layers, and fully connected dense layers. CNN is based on three foremost modules: convolution layers, pooling layers, and activation functions. A typical architecture is the repetition of a heap of multiple convolution and pooling layers, trailed by a fully connected dense layer or layers and in the end the output layer (softmax layer) for classification. A sample CNN structure is shown in Fig. 2
27 Benchmarking of Novel Convolutional Neural Network Models …
357
Fig. 2 CNN architecture
In the first layer, the input layer, images are given as input data to the model. The structure of the input layer is quite vital in achieving desired results from the model. Wherever it is necessary some preprocessing steps like scaling and noise reduction techniques are applied to input data. Within the convolution layer, also known as a transformation layer, the convolution procedure is applied to the data through filters as feature selection. Filters can be predetermined or can be randomly generated. The outcome of the convolution process generates a feature map of the input data. This step is repeated for all the images for all filters. The features are generated at the end of this process. If the input images have 3 channels (RGB), the convolution process is applied to every channel. After the convolution step, the coordinates of the matrix are obtained by employing the mxm filter to the image of the nxn dimension by the following Eq. (2). n x, n y =
n + 2p − m +1 s
(2)
where p = Padding size, s = Stride number. The pooling layer is used to reduce the number of feature vectors, for the next convolution layer. Max pooling and average pooling are the most commonly used pooling approaches in the pooling layer. By pooling, x b-sized windows are advanced by a certain step to create a new image by fetching the maximum value (maxpooling) or the average value (average-pooling) of the kernel. The fully connected layer follows after multiple consecutive convolution and pooling layers. In a fully connected layer, the output from the pooling layer gets decreased to one dimension data. In this layer all neurons are connected to each other, so it is called a fully connected layer. In this layer, the categorization procedure is carried out and activation functions such as ReLU are used. The dense layer is followed by a softmax activation function (Eq. 3) to compute the probability scores for all classes and categorize them into respective classes. exp (m i ) So f tmax(m i ) = j exp (m j )
(3)
Dropout regularization technique is used (mostly with a keep probability of 0.5) to avoid overfitting the train dataset. Dropout regularization randomly drops neurons
358
M. Chikkamath et al.
in the network during each iteration of training to reduce the variance of the model and simplify the network, which aids in the prevention of overfitting. The CNN architecture consists of two main parts. The first part, convolutional and pooling layers are used for the extraction of features. The second part, fully interconnected layers are used for classification, and activation layers are used to introduce nonlinearity in the network to perform problem-specific classification.
3.3 Transfer Learning In machine learning, when we reuse the pretrained model on some datasets, to solve similar or different kinds of new problems, then we call it Transfer Learning [5]. Through transfer learning, we fundamentally try to utilize what we’ve learned in one task to improve the understanding of the concepts in another one. Weights are being automatically learned to a network executing “task A” from a network that executed new “task B.” In image processing, neural networks typically target boundaries to detect in the first layer, shapes in the middle layer, and given problem-specific features in the final layers. The initial and middle layers are used in transfer learning, and the final layers are the lone layers which are retrained. Nowadays transfer learning is becoming more and more popular and widely accepted in computer vision and natural language processing due to the reduced time in the training model, improved model performance, and the lack of availability of large amounts of data [33]. The most popular models which use transfer learning to train models are AlexNet [5], LeNet [6], GoogLeNet [7], ZFNet [8], VGGNet [9], EfficientNet [10], ResNet [12], and DenseNet [11] are applied to ImageNet [30] which has more than 1.2 million images and have achieved very good accuracy. Among these, in the current work authors used VGGNet [9], EfficientNet [10], and DenseNet [11] algorithms to benchmark and compare the results for identification and detection of different butterfly species. VGGNet VGGNet is the convolutional neural network architecture developed by Simonyan and Zisserman in their 2014 paper [9]. This architecture is characterized by its simplicity, using 3 × 3 convolutional layers loaded on top of each other in increasing depth. Max pooling is used to decrease the number of feature vectors by reducing volume size. Two fully connected layers, each with 4,096 nodes are trailed by a softmax activation function to classify the images. VGG architecture is shown in Fig. 3. VGG16 and VGG19 are the two released VGG models, the 16 and 19 stands for the number of weight layers in the network. Here, both VGG16 and VGG19 models are used to train the data. The input size of image required to train a VGG model is 224 × 224 RGB image. EfficientNet EfficientNet is a convolutional neural network architecture and scaling method that homogeneously scales all components of depth/width/resolution using a compound
27 Benchmarking of Novel Convolutional Neural Network Models …
359
Fig. 3 VGG architecture [34]
coefficient. It was proposed by Mingxing and Quoc in their 2020 paper [10]. So far, a total of 7 types of EfficientNet, Efficient-B0 to B7 are introduced. The distinctive features of these versions are the sum total of layers used in each of them. The number of layers used in EfficientNet-B0 and EfficientNet-B7 is 237 and 813 respectively. Here, EfficientNet-B3 and EfficientNet-B7 algorithms are used to train the butterfly dataset. Figures 4, 5, and 6 give us detailed information about the utilization of so many layers which are connected to each other in EfficientNet-B3 and B7. Figure 4 shows the representation of the stem and final layers in the network. Figure 5 shows the group of modules which are linked to each other and replicated in the network. In Fig. 6, ×2, ×3, ×4, ×5, ×8, and ×11 signify that the modules within the brackets will recur that many times, respectively. DenseNet In a DenseNet architecture, all the layers are connected directly to each other layer, hence the name Densely Connected Convolutional Network. For ‘N’ layers, there are N(N + 1)/2 direct connections. In every layer, the features of all the earlier layers are not added up but concatenated and used as input in the next layer. DenseNets are
Fig. 4 Stem and final layers of EfficientNet [35]
360
M. Chikkamath et al.
Fig. 5 Modules of EfficientNet [35]
Fig. 6 The architectures of EfficientNet [35] [EfficientNetB7]
separated into DenseBlocks, where the magnitudes of the feature maps remain stable inside a block, but the total filters between them are modified [11]. The layers joining different dense blocks are called Transition Layers which decrease the number of networks to half of that of the current networks by carrying out convolution and pooling. Each DenseNet architecture consists of four DenseBlocks with different numbers of layers. In this paper authors used DenseNet201 to train the model to classify different butterfly images. It has 6, 12, 48, and 32 layers in each dense block as shown in Fig. 7.
27 Benchmarking of Novel Convolutional Neural Network Models …
361
Fig. 7 DenseNet architecture [11]
4 Results and Discussion In this study, the authors used different deep-learning models to classify the images in a given dataset. Here, fine-tuned transfer learning approaches are used for the categorization of the butterfly images. The convolutional neural network algorithms built on transfer learning, VGG16, VGG19, EfficientNet-B3, EfficientNet-B7, and DenseNet201 are used to train our butterfly dataset. A total of 9660 images belonging to 75 different species of butterfly are used to train the model, and 375 images belonging to the same number of species were set aside for model validation. In order to increase the dataset, automatic data augmentation techniques have been used by randomly rotating the images by a small amount, horizontal flipping, and vertical and horizontal shifting of images. The deep learning framework Keras/Tensorflow is used to train all the models, on a laptop Intel Xeon E31505M v5. To evaluate the performance of the classifier’s authors used an accuracy matrix derived from the confusion matrix. Epochs wise the graphical representation of training and validation accuracy and loss are shown in Figs. 1, 2, 3, 4, and 5 for each of the algorithms in the Appendix. Authors used different numbers of epochs for different models to get better accuracy. The accuracy matrix of all the CNN models is provided in Table 1. The DenseNet201 and EfficientNet-B3 models achieved higher accuracy in the validation dataset than other models. There is not much difference in the accuracy
362 Table 1 CNN results comparison
M. Chikkamath et al. CNN model
Train accuracy (%) Validation accuracy (%)
VGG16
91.69
91.20
VGG19
74.12
87.20
EfficientNet-B3 99.63
97.86
EfficientNet-B7 99.64
96.53
99.10
97.87
DenseNet201
matrix of DenseNet201 (97.87%) and EfficientNet-B3 (97.86%) in validation data, even though the train accuracy is comparatively higher in EfficientNet-B3 than DenseNet201. EfficientNet-B7 also achieved a good accuracy of 96.53% on validation compared to VGG’s but less compared to DenseNet201 and EfficientNet-B3. Among all the models both train and validation accuracy were significantly less in the VGG19 model in spite of training the models for a higher number of epochs. The DenseNet model is performing better followed by EfficientNet’s and VGG’s.
5 Conclusion In this paper, authors used butterfly images, some from Kaggle, and some downloaded from freely available sources from the internet. The image classes were confirmed for an entomologist. The butterfly images are classified with deep learning models without using any feature mining approaches. We used pretrained models and carried out transfer learning to train our data. Comparison and evaluation of the results obtained from five different algorithms are performed. According to the accuracy matrix, the highest accuracy was attained by DenseNet201 followed by EfficientNet-B3 architecture. We have achieved more than 90% accuracy for both train and validation data for all the algorithms except VGG19. In conclusion, we found that the transfer learning method can be successfully employed in a variety of agricultural datasets like butterfly images. In our future work, we are planning to use these pre-trained models to develop mobile applications for identifying and classifying different classes of butterflies. We are also planning to increase the number of butterfly species to train and classify. Grant/Funding The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
27 Benchmarking of Novel Convolutional Neural Network Models …
363
References 1. Chapman AD (2006) Numbers of living species in Australia and the World. Canberra, Aust Biol Resour Study. ISBN 978-0-642-56850-2 2. Powell JA (2009) Lepidoptera: moths, butterflies. In: Resh VH, Cardé RT (eds) Encyclopedia of insects. Academic Press, Massachusetts, pp 559–587 3. Stork NE (2018) How many species of insects and other terrestrial arthropods are there on Earth? Annu Rev Entomol 63:31–45. https://doi.org/10.1146/annurev-ento-020117-043348 4. Slager BH, Malcolm SB (2015) Evidence for partial migration in the southern monarch butterfly, Danaus erippus, in Bolivia and Argentina. Biotropica 47(3):355–362 5. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60:84–90. https://doi.org/10.1145/3065386 6. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551. https://doi.org/10.1162/neco.1989.1.4.541 7. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842, https://doi.org/10.48550/arXiv. 1409.4842 8. Zeiler M, Fergus R (2013) Visualizing and understanding convolutional networks. Eur Conf Comput Vis 8689:818–833 9. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv eprint arXiv:1409.1556 10. Tan M, Le QV (2020) EfficientNet: rethinking model scaling for convolutional neural networks. arXiv eprint arXiv: 1905.11946 11. Huang G, Liu Z, van der Maaten L, Weinberger KQ (2018) Densely connected convolutional networks. arXiv eprint arXiv: 1608.06993 12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/ 10.1109/CVPR.2016.90 13. Almryad AS, Kutucu H (2020) Automatic identification for field butterflies by convolutional neural networks. Eng Sci Technol Int J 23(1):189–195. https://doi.org/10.1016/j.jestch.2020. 01.006 14. Arzar NNK et al (2019) Butterfly species identification using convolutional neural network (CNN). In: IEEE international conference on automatic control and intelligent systems (I2CACIS). IEEE 15. Kaya Y, Kayci L (2014) Application of artificial neural network for automatic detection of butterfly species using color and texture features. Vis Comput 30:71–79. https://doi.org/10. 1007/s00371-013-0782-8 16. Skreta M, Luccioni S, Rolnick D (2020) Spatiotemporal features improve fine-grained butterfly image classification. Tackling Climate Change Mach Learn, NeurIPS 2020. https://www.climat echange.ai/papers/neurips2020/63 17. Fauzi F, Permanasari AE, Setiawan NA (2021) Butterfly image classification using convolutional neural network (CNN). In: 2021 3rd international conference on electronics representation and algorithm (ICERA). IEEE. https://doi.org/10.1109/ICERA53111.2021.9538686 18. Tang H, Wang B, Chen X (2020) Deep learning techniques for automatic butterfly segmentation in ecological images. Comput Electron Agric 178:105739. https://doi.org/10.1016/j.compag. 2020.105739 19. Lin Z, Jia J, Gao W, Huang F (2020) Fine-grained visual categorization of butterfly specimens at sub-species level via a convolutional neural network with skip-connections. Neurocomputing 384:295–313. https://doi.org/10.1016/j.neucom.2019.11.033 20. Bakri BA, Ahmad Z, Hatim S (2019) Butterfly family detection and identification using convolutional neural network for lepidopterology. Int J Recent Technol Eng 8(2S11). ISSN: 2277–3878
364
M. Chikkamath et al.
21. Chang Q, Qu H, Wu P, Yi J (2017) Fine-grained butterfly and moth classification using deep convolutional neural networks. Appl Sci 2020(10):1681. https://doi.org/10.3390/app10051681 22. Prudhivi L, Narayana M, Subrahmanyam C, Krishna MG (2021) Animal species image classification. Mater Today Proceed. https://doi.org/10.1016/j.matpr.2021.02.771 23. Yang Z, Yang X, Li M, Li W (2022) Automated garden-insect recognition using improved lightweight convolution network. Inf Process Agricult. https://doi.org/10.1016/j.inpa.2021. 12.006 24. Wang F, Wang R, Xie C, Zhang J, Li R, Liu L (2020) Convolutional neural network based automatic pest monitoring system using hand-held mobile image analysis towards non-sitespecific wild environment. Comput Electron Agric 187:106268. https://doi.org/10.1016/j.com pag.2021.106268 25. Tetila EC, Machado BB, Astolfi G, de Souza Belete NA, Amorim WP, ARoel AR, Pistori H (2020) Detection and classification of soybean pests using deep learning with UAV images. Comput Electron Agric 179. https://doi.org/10.1016/j.compag.2020.105836 26. Amarathunga DC, Grundy J, Parry H, Dorin A (2021) Methods of insect image capture and classification. A Syst Lit Rev, Smart Agr Technol 1:100023. https://doi.org/10.1016/j.atech. 2021.100023 27. Takimoto H, Sato Y, Nagano AJ, Shimizu KK, Kanagawa A (2021) Using a two-stage convolutional neural network to rapidly identify tiny herbivorous beetles in the field. Eco Inform 66:101466. https://doi.org/10.1016/j.ecoinf.2021.101466 28. Prajwala TM, Pranathi A et al (2018) Tomato leaf disease detection using convolutional. In: Proceedings of 2018 eleventh international conference on contemporary computing (IC3), 2–4 August 2018, Noida, India 29. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10. 1038/nature14539 30. Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848 31. https://www.kaggle.com/gpiosenka/butterfly-images40-species 32. Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36:193–202. https://doi.org/ 10.1007/BF00344251 33. Sharma P (2022) Understanding transfer learning for deep learning. Accessed 25 February 2022. https://www.analyticsvidhya.com/blog/2021/10/understanding-transfer-lea rning-for-deep-learning/ 34. Rosebrock A (2022) ImageNet: VGGNet, ResNet, inception, and xception with keras. Accessed: 2022–02–20 [Online]. https://pyimagesearch.com/2017/03/20/imagenet-vggnet-res net-inception-xception-keras/ 35. Agarwal V (2022) Architectural details of all efficientnet models. https://towardsdatascience. com/complete-architectural-details-of-all-efficientnet-models-5fd5b736142
Chapter 28
Machine Learning Techniques in Intrusion Detection System: A Survey Roshni Khandait, Uday Chourasia, and Priyanka Dixit
1 Introduction In general information, this keyword “security” represents a set of processes and techniques. procedures which protect a system, network, progr mamme, or information through intruders as well as illegal changes, modification and elimination. In recent years, information security has seen significant advances in computer systems discipline and focus on protection of sensitive information that can be obtained to corrupt, damage and destroy [1]. Firewalls, antivirus software, detection systems (IDSs), and intrusion protection technologies (IPSs) are just a few techniques that work together to block threats as well as detect security problems. As day by day the use of internet increases the connected systems increase, which leads to attack shell increases, hence risk of attack. New and unseen cyber-attacks such as zero-day attacks, polymorphic malware or other persistent and advanced threats are undetected by traditional machine learning algorithms. It will increasing a challenge to the information security as types, variability and amount threats has been improving. Machine learning technologies developed quickly with applications ranging from computer vision to image analysis to self-driving cars to facial recognition. AI techniques can be used in cyber security field to provide better tools and techniques from attack defense [2]. Cyber attackers continue adopting alternate methods as well as advanced methods to enhance the speed and volume of their strikes. Like a basis, highly dynamic, responsive, more sophisticated cybersecurity systems are required to obtain large number of malicious activity. Artificial intelligence during R. Khandait (B) · U. Chourasia · P. Dixit University Institute of Technology, RGPV, Bhopal, M.P., India e-mail: [email protected] U. Chourasia e-mail: [email protected] P. Dixit e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_28
365
366
R. Khandait et al.
the last days has increased in number, because they tend to play an important role in controlling and mitigating cyber threats. Sports, natural language processing, health care, manufacturing, and education are all instances for sectors which use natural language processing as well as other areas that all benefit from AI. This characteristic is impacting the field too, information security, in which machine learning has just been used to attack and defend in the digital world. In offensive terms, cybercriminals may use machine learning to enhance the intensity and range of their cyberattacks. In the defensive corner, artificial intelligence is being used to improve defense methods so that defence processes are now more complex, resilient, adaptable, productive, that includes adaptability to environmental modifications in order to reduce this impact. Paper is organized as follows. Section II provides background of intrusion detection as well as machine learning. Section III provides the background related to deep learning. Section IV provides literature review related to machine learning techniques. Section V presents comparative analysis of machine learning algorithms. Section VI concludes the paper.
2 Machine Learning Intrusion Detection (ID) is a critical component of network security. It is described as such technology which keeps track of the system and network operations in order to detect unusual as well as harmful activity, such as network attack attempts. The largest issue for ID is identifying those attacks quickly, correctly, and efficiently. Traditional systems were built to detect well-known attacks, but they are incapable of detecting unexpected threats. Computer science is the study of how to gwt computers to act without being explicitly programmed is known as artificial machine learning. Machine learning, data mining, and pattern recognition algorithms can be used in intrusion detection to discriminate between legitimate and malicious communications [3]. Machine learning is a sub-discipline of science which deals with creating systems that learn without being explicitly designed to do so. The process of teaching machines to perform tasks on their own is known as machine learning [3]. While in the area of information, machine learning is used in security to detect malware in encrypted communications as well as to predict internal risks., predict in which “bad neighbourhoods’ ‘are online to maintain individuals while suffering in safety, as well as secure information stored in the cloud revealing dubious person behavior. The cyber threat landscape requires the continuous surveillance and correlation of millions of external and internal data points across an organization’s infrastructure and user. This volume of data is basically unmanageable with today’s technology, a tiny number of the people. Machine learning succeeds in this field because it can easily detect patterns that indicate risks in huge data sets. By automating the analysis, cybersecurity experts can instantly recognize threats and events that need additional human study.
28 Machine Learning Techniques in Intrusion Detection System: A Survey
367
Fig. 1 IDS machine learning methods [3]
Machine learning algorithms are classified as supervised, semi supervised and unsupervised. Unsupervised machine learning is the process of training an artificial intelligence algorithm with data which is not the class not either labeled, as well as then letting the method which will act as a result of it in the absence of supervision. Unsupervised learning includes regression and classification. Learning a function which maps an input to an output depending on sample input- output pairs is a machine learning challenge, referred to as since the process of an algorithm learning from a training dataset can be compared, supervised learning is used to supervise the learning process. It is exemplified by the clustering technique showed in Fig. 1. Shadow learning and deep learning are categorized in machine learning technology used in IDS. There are two different forms of shadow learning algorithms— supervised shadow learning, which uses support vector machines and k-nearest neighbors, and unsupervised shadow learning, which uses clustering and association.
2.1 Support Vector Machine In most machine learning techniques, the (SVM) has been the most precise, efficient, as well as robust method [4]. Support Vector Regression (SVR) and Support Vector Class cation are two of the most used methods (SVC). It’s founded in this idea that decision making limits. The decision border that divides two groups with distinct class values. It has categories which are both binary and multi-class. Best support vector is determined by the main idea nearest to the plane of hypertension. In the process, the input vector is assigned to one class, and the locations are assigned to the opposite plane side. Kernel functions are used to separate data that cannot be separated linearly shown in Fig. 2.
368
R. Khandait et al.
Fig. 2 Support Vector Machine [4]
2.2 K Nearest Neighbor The machine learning that is supervised by kNN k-nearest neighbour [4] is built on the basis of a distance function. It compares two cases to see how similar or different they are. Between two examples x and y, the d(x, y) is the standard Euclidean distance:. ┌ | n |∑ d(x, y) = | (xk − yk )2 k=1
where xk and yx are featured elements of x instance and y are featured element of y instance showed in Fig. 3.
2.3 Decision Tree A decision tree is a machine learning model used to predict outcomes [4]. A model represents the mapping of values between object values and object characteristics. An internal node is represented by a tree structure, and test output is represented by a branch. CART, C4. 5, and ID3 are the most often used decision tree models shown in Fig. 4. Fig. 3 k-nearest neighbor [3]
28 Machine Learning Techniques in Intrusion Detection System: A Survey
369
Fig. 4 Decision tree [4]
3 Deep Learning Deep learning has two categories—supervised deep learning and unsupervised deep learning. supervised learning uses convolutional feedforward deep neural networks and fully connected feedforward deep neural networks, and unsupervised deep learning, which uses stacked autoencoders and deep belief networks.
3.1 Deep Belief Network Deep belief network is a statistical machine learning algorithm. The architecture of deep belief networks made up of hidden and numerous layers of stochastic variables [4]. This model is based on a generative probabilistic model with numerous hidden layers as well as stochastic variables. Because constructing as well as building a large number of RBMs allows numerous through activations of one RBM, hidden layers are used to efficiently train information to the future. The Restricted Boltzmann machine and the DBN are connected during the training stages. A Boltzmann machine with a specific topological structure is known as an RBM (BM) shown in Fig. 5.
3.2 Recurrent Neural Network The recurrent neural networks [5] are good at learning sequences, especially on topologies plus unit constructions that are gated like long short term memory. Recurrent neural network is widely used technology for dealing out classification and other tasks types of study on sequence of data. LSTM is a significant approach in the execution of machine learning tasks that use sequential data. It is a subset network of RNN. RNNs are referred to as since they are recurrent and complete a similar task to each object in a series of events, with the outcome becoming rely upon past calculations shown in Fig. 6.
370
R. Khandait et al.
Fig. 5 Deep belief network [4]
Fig. 6 Recurrent neural network [5]
3.3 Long Short Term Memory A recurrent network is a long short term memory [5] which has been proposed as one of the artificial intelligence strategies to solve the wide range of sequential data challenges. LSTM aids in the preservation of error that has the ability to spread backwards in duration as well as layering [6]. It is a significant approach in the execution of machine learning tasks that use sequential data. It is a subset network of RNN. The buried layer’s units are all replaced by block of memory shown in Fig. 7.
3.4 Convolutional Neural Network This type of neural network is a type of artificial neural network which is frequently used and is based on a machine learning paradigm. The most common applications of convolutional neural networks are image recognition and audio analysis. It gained popularity in the area of speech analysis as well as in the picture recognition field.
28 Machine Learning Techniques in Intrusion Detection System: A Survey
371
The organization of its weight—sharing network is close to that of a biogical brain network, decreasing model complexity as well as amount of weights Input, learning, convolution, pooling, fully connected, dropout, activation function, and classification layers make up the CNN architecture [7]. Data is passed through kernals, pooling, fully connected, and softmax functions in CNN input. The classification layer will classify data into one of two categories: 0 or 1 shown in Fig. 8. · Convolution Layer—It extracts features from the input data. The layer performed a convolution operation in between filter MxM size data and input data and then generated a Feature map output using the dot product of the filter data [8]. · Pooling Layer—The pooling layer in CNN design reduces computational expenses by reducing the size of the convolved feature map. Pooling operations can be divided into three categories: minimum, average, and maximum. It connects the FC and the Convolution [9]. · The fully connected layer—It’s utilized to link the neurons on two different levels and contains neurons, as well as weights and biases [10]. · Dropout layer—This is used to alleviate training dataset’s overfitting issues [11].
Fig. 7 Long short term memory [5]
Fig. 8 Convolutional neural network [13]
372
R. Khandait et al.
· Activation function—These being applied in the learning and approximation of any continuous or network variables have a complicated relationship. It gives nonlinearity. The sigmoid and softmax functions are preferred to binary classification CNN models, while softmax is typically used for multi-class classification [13].
4 Literature Survey YinChuan-long et al. [6] Proposed RNN based intrusion detection system is a deep learning method which was used to construct the model. The performance was measured using binary and multiclass classification, as well as the number of layers, neurons, and different strategies such as ANN, RF, SVM, and other machine learning algorithms that have previously been used. The experimental results showed that the suggested intrusion detection model is suitable for modeling complicated classification models with high accuracy and performance, and that it outperforms existing machine learning classification approaches in binary and multiclass classification. The proposed model ensures enhanced intrusion detection accuracy and also provides a better intrusion detection strategy. Nathan Shone et al. [7] introduces nonsymmetric deep autoencoder (NDAE) for unsupervised feature learning, a revolutionary deep learning technique for intrusion detection. A novel deep learning classification model developed with stacked NDEs is also included in the method. The classifier was tested using the benchmark KDD cup’99 and NSL_KDD datasets, which were developed in Tensorflow with GPU support. So far, the model has demonstrated effectiveness, indicating advances over existing methodologies and the model’s high potential for application in modern NIDSs. Staudemeyer et al. [12] proposed that all attacker categories hidden in the KDD Cup’99 training dataset may be learned using the LSTM-RNN model. They put the system to the check on all features as well as selected minimum features from the datasets, the DT and backward elimination processes were utilized. They calculated the ROC curve and the related AUC value to highlight the model’s performance. Niyaz et al. [13] introduced the trials that were analyzed on the NSL_KDD dataset for network intrusion using a deep learning based IDS model and a DL technique adopting self-taught learning (STL). The proposed approach improves other conventional approaches used in earlier reseach. Certain characteristics, such as accuracy, precision, recall, and f- measure values, can be evaluated. Niyaz et al. [14] introduced a DL- based multi-vector DDoS detection system in an Sdn environment with an attack detection model against cyberattacks. SDN allows network devices to be programmed to perform various tasks. The suggested system is a network application that runs on top of a software-defined network environment controller. Deep learning is used for features selection and categorization. In the suggested system, the result revealed good accuracy with a reduced FP rate for threat detection.
28 Machine Learning Techniques in Intrusion Detection System: A Survey
373
Radford et al. [15] presented Recurrent neural networks as well as long short term memory have been used to capture the langauge’s complexities and nuances. A language model by learning a model that is distinct to each network but generalized to normal pc to pc communication within and outside the network. On the benchmark ISCX IDS dataset, they utilized excellent unsupervised threat identification performance (AUC 0. 84). The experimental technique showed that the proposed technique is outstanding as compared with existing one. Farid et al. [16] had developed a representation for intrusion detection which employs among the most effective learning methods, naïve Bayesian techniques to classify data. The overall implementation of the developed intrusion detection method was evaluated on 10 percent of KDDCUP((> The results of experiments revealed a high level of accuracy with a minimal number of false positives. Mohammed et al. [17] With the resuscitation of the worldwide network known as the internet of things, revolutionsing into linking things jointly. This is performed by the use of a wireless sensor network that offers new security threats for IT academic researchers. The security problems in WSN are addressed in this study by proposing possible automated ways for recognizing issues. It also compares the contribution of different machine learning algorithms on two datsets, the KDD99 and WSN datsets. The goal is to analyze and secure WSN networks using Firewalls, Deep Packet Inspection (DPI), and Intrusion Prevention Systems (IPS), all of which are specialized in WSN network protection. Cross validation and percentage split were two of the testing strategies used. Based on the findings, the most accurate method and the quickest processing time for both datasets were recommended. da Costa et al. [18] Concerns about computer network security and privacy are becoming more widespread in today’s world. Because of the widespread use of information systems in ordinary living. Computer security has become a need of. As the internet penetration rate grows and modern inventions arise, like the internet of things, new and existing efforts to breach computer systems and networks exist. Companies are investing more in research to improve the discovery of these assaults. By evaluating the cheapest deals of accuracy, institutions are choosing smart methods to check and verify. As a result this research concentrates on the most up to date state of the art publication on machine learning approaches for internet of things and intrusion prevention to network security. As a result the activity aims to conduct a contemporary and in depth review of major works engaging with a number of expert systems and their used intrusion detection design in networked computers, with a focus on the internet of things and deep learning. Nasrin et al. [19] Because of the advent of programmed elements, software defined networking technology offers an opportunity for effective find as well as preventing malicious security challenges. To guard computer networks and address network security concerns, machine learning methods have recently been incorporated in an SDN based network intrusion detection system, in the context of SDN, a stream of advanced machine learning methodologies the deep learning technology (DL) is beginning to develop. We studied various present studies on machine learning techniques which use SDN to create NIDS in this study. We looked closely at deep learning methods to build SDN base NIDS designs in a SDN network in this study.
374
R. Khandait et al.
The research study concludes with a discussion of real issues for using machine learning to incorporate NIDS and upcoming projects. Hongyu and Lang [20] In today’s world, networks are critical, as well as cyber security has risen as a crucial field of research. An intrusion detection system (IDS), a vital cyber security mechanism, keeping track of the state of the network’s software and hardware. Present intrusion detection systems also confront difficulties with boosting detection performance, decreasing false positive rate, and identifying unexpected threats, even decades of development. Many academics have concentrated on building IDSs that use machine learning methods to overcome the difficulties mentioned above with great accuracy, machine learning systems can automatically determine the essential differences between normal as well as anomalous data. Machine learning algorithms are also good at detecting threats due to their high transferability. Machine learning is a type of machine learning that already has an impressive results and has become a hub to study. The study suggests an IDS classification system which categorizes and sums up machine learning as well as deep learning based IDS literature using data objects as the primary dimension. This form of additional methods, we assume, is suitable for cyber security experts. The survey starts by defining IDSs and their typology. The machine learning techniques, measurements and standard datasets that are often employed in IDDs are then discussed. Using the suggested systematic system as a baseline, we demonstrate how to use machine learning and deep learning approaches to solve importantIDS concerns using the relevant research as a guidance. Finally, current relevant researches are used to talk regarding obstacles and expected developments. Haq et al. [21] Many of the most critical problems in the present day are network security. With both the rapid expansion and extensive usage of the internet over the last decade, network security vulnerabilities have become a key problem. Suspicious activity and abnormal attacks on network security are discovered via intrusion detection systems. Many studies on intrusion detection systems have been performed in recent days. Therefore, in order to gain a better understanding of the state of machine learning strategies for handling intrusion detection issues, the survey aggregated 49 research through 2009 to 2014 which concentrated on the structure of standalone, mixed, and group encoder models. A statistical comparison of classification methods, datasets, employees and other trial settings is also included in this survey work, as well as analysis of the feature selection process. Kishor Wagh et al. [22] At today’s society, virtually all have access to the computer, and internet software is continually developing. As a result, network security is becoming a crucial component of any computer system. An intrusion prevention system (IDS) is a technology that detects system threats and categorizes them as normal and pathological. Intrusion detection systems, that play a vital role in detecting and preventing, have been given support by machine learning approaches. These research examined various machine methods to intrusion detection. The system architecture of an idss is also investigated in this work in order to minimize false detection rates and increase intrusion detection accuracy.
28 Machine Learning Techniques in Intrusion Detection System: A Survey
375
5 Comparative Analysis of Machine Learning Algorithm Table 1 represents the different types of attack categories in NSL-KDD data sets. The attacks present in NSL-KDD data sets are U2R, R2L, Proble and DoS. Table 2 represents the different data sets with features, training sets and testing sets of KDD-CUP99, UNSW-NB15 and NSL-KBB data sets. The KDD-CUP99 data set consists of 4,898,431 instances and 41features in the training set and 311,027 instances and 41 features in the test data set. The NSL-KDD data set consists of 125,973 instances and 42 features in the training set and 22,544 instances and 42 features in the test data set. The UNSW-NB15 data set consists of 175,341 instances and 49 features in the training set and 82,332 instances and 49 features in the test data set. Table 3 represents the comparative analysis of intrusion detection systems using machine learning techniques. The parameters applied in comparison are algorithm uses, accuracy, F1-score, and precision. The SVM algorithm uses an NSL-KDD dataset with F1-score of 0.78, 82.2% accuracy, and 74% precision. The PSO-SVM algorithm uses KDD-CUP-99 dataset with 99.2% accuracy, and 84% precision. The kFN-KNN algorithm uses an NSL-KDD dataset with F1-score of 0.28, 99% accuracy, and 98% precision. The Multi-DT algorithm uses KDD-CUP99 dataset with F1score of 0.18, 91.95% accuracy, and 99. 9% precision. The DBN algorithm uses KDD-CUP99 dataset with F1-score of 0.78, 93.50% accuracy, and 92.34% precision. The LSTM algorithm uses KDD-CUP99 dataset with F1-score of 0.48, 96.94% accuracy, and 98. 8% precision. The CNN algorithm uses the UNSW-NB15 dataset with F1-score of 0.85, 98.2% accuracy, and 98.1% precision. The RNN algorithm uses KDD-CUP99 dataset with F1-score of 0.78, 84.45% accuracy, and 77.55% precision. Table 1 Attack types in NSL-KDD Attacks in NSL-KDD
Attacks types
phf, warezclient, spy, multihop, ftp write, imap, guess password, warezmaster [31]
R2L
teardrop, back, pod, smurf, land, Neptune
DoS
Perl, loadmodule, butteroverflow, rootkit,
U2R
ipsweep, satan, nmap, portsweep
Proble
Table 2 Data set description
Data set
Training set
Test set
KDD-CUP 99 [32]
4898431, 41
311027, 41
NSL-KDD [31]
125973, 42
22544, 42
UNSW-NB15 [33]
175341, 49
82332, 49
376
R. Khandait et al.
Table 3 Machine learning algorithm with parameters Authors
Methods
Precision (%)
F1-score
Acccuracy (%)
Data set
Farid et al. [23]
SVM
74
0.78
82.2
NSL-KDD
Shapoorifard et al. [24]
kFN-KNN
98
0.82
99
NSL-KDD
Malik et al. [25]
Multi-DT
99.90
0. 81
91.95
KDD-CPP99
Gao et al. [26]
DBN
92.34
–
93.50
KDD-CUP99
Kim et al. [27]
LSTM
98.8
0.84
96.94
KDD-CUP99
Yu et al. [28]
CNN
98.2
0.85
98.1
UNSW-NB15
Raajan et al. [29]
RNN
77.55
0.73
84.45
KDD-CUP99
Saxena et al. [30]
PSO-SVM
84.1
–
99
KDDCUP 99
6 Conclusion In the area of cybersecurity, AI approaches can be applied to create improved tools and techniques for attack defense. To allow faster and grow their cyberattacks, hackers are adopting new and sophisticated methods. In machine learning applications intrusion detection systems are effective at analyzing attacks by drawing on knowledge from attack-detection mechanisms. MI algorithms in IDs systems analyze data to locate insider threats, identify suspicious user activity, secure data in the cloud, predict dangerous areas, keep online individuals safe while browsing, and detect encrypted traffic malware in networks. The identification of intrusions and the accuracy of dynamic assault detection have both improved thanks to machine learning methods. This paper presents a comprehensive literature review of machine learning algorithms to IDs. Acknowledgements I take this opportunity to gratefully acknowledge the assistant of my college faculty guide Prof. Uday Chourasia, prof. Priyanka Dixit, who had faith in me. I am also very much grateful to UIT-RGPV, DoCSE for motivating the students.
References 1. Gümüsbas D, Yıldırım T, Genovese A, Scotti F (2020) A comprehensive survey of databases and deep learning methods for cybersecurity and intrusion detection systems. IEEE Syst J 1–15 2. Wiafe I, Koranteng FN, Obeng EN, Assyne N, Wiaef A, Gulliver SR (2020) Artificial intelligence for cybersecurity: a systematic mapping of literature. IEEE Access 8:146598–146613 3. Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C (2018) Machine learning for security and the internet of things: the good, the bad, and the ugly. IEEE Access 6:35365–35382 4. Wiafe I, Koranteng FN, Obeng EN, Assyne N, Wiaef A, Gulliver SR (2020) Machine learning and deep learning methods for cybersecurity. IEEE Access 8:146598–146613 5. Muhuri PS, Chatterjee P, Yuan X, Roy K, Esterline A (2020) Using a long short-term memory recurrent neural network (LSTM-RNN) to classify network attacks. MDPI J 243–262
28 Machine Learning Techniques in Intrusion Detection System: A Survey
377
6. Yin C, Zhu Y, Fei J, He X (2017) A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access 7:21954–21962 7. Shone N, Ngoc TN, Phai VD, Shi Q (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell 2(1):41–50 8. Vinaykumar R, Alazab M, Soman KP, Poornachanndran P, Al-Nimrat A, Venkatraman S (2019) Deep learning approach for intelligent intrusion detection system. IEEE Access, vol 7, pp 41525–41551. 9. Yin C, Zhu Y, Fei J, He X (2017) A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access 5:21954–21961. https://doi.org/10.1109/ACCESS. 2017.2762418 10. Vinayakumar R, Soman KP, Poornachandran P (2017) Applying convolutional neural network for network intrusion detection. In: Proceedings of the international conference on advances in computing, communications, and informatics (ICACCI), Udupi, India, 13–16 September 2017. https://doi.org/10.1109/ICACCI.2017.8126009 11. Staudemeyer RC, Omlin CW (2013) Evaluating performance of long short-term memory recurrent neural networks on intrusion detection data. In: Proceedings of the South African institute for computer scientists and information technologists conference; association for computing machinery, New York, NY, USA, pp 218–224. https://doi.org/10.1145/2513456.2513490 12. Staudemeyer RC (2015) Applying long short-term memory recurrent neural networks to intrusion detection. S Afr Comput J 56:136–154. https://doi.org/10.18489/sacj.v56i1.248 13. Niyaz Q, Sun W, Javaid AY, Alam M (2016) A deep learning approach for network intrusion detection system. ICST, pp 1–6 14. Niyaz Q, Sun W, Javaid AY (2016) A deep learning based DDoS detection system in softwaredened networking (SDN) 15. Radford BJ, Richardson BD, Davis SE (2018) Sequence aggregation rules for anomaly detection in computer network traffic, pp 1–13 16. Farid D, Rahman MZ, Rahman CM (2011) Adaptive intrusion detection based on boosting and naïve Bayesian classifier. Int J Comput Appl 24(1):12–19 17. Alsahli MS, Almasri MM, Al-Akhras M, Al-Issa AI, Alawairdhi M (2021) Evaluation of machine learning algorithms for intrusion detection system in WSN. Int J Adv Comput Sci Appl 12(5):617–626. https://doi.org/10.14569/IJACSA.2021.0120574 18. da Costa KAP, Papa JP, Lisboa CO, Munoz R, de Albuquerque VHC (2019) Internet of Things: A survey on machine learning-based intrusion detection approaches. Comput Netw 151(2):147–157. https://doi.org/10.1016/j.comnet.2019.01.023 19. Sultana N, Chilamkurti N, Peng W, Alhadad R (2019) Survey on SDN based network intrusion detection system using machine learning approaches. Peer-to-Peer Netw Appl 12(2):493–501. https://doi.org/10.1007/s12083-017-0630-0 20. Liu H, Lang B (2019) Machine learning and deep learning methods for intrusion detection systems: a survey. Appl Sci 9(20):3–28. https://doi.org/10.3390/app9204396 21. Haq NF, Onik AR, Hridoy AK, Rafni M, Shah FM, Farid D (2015) Application of machine learning approaches in intrusion detection system: a survey. Int J Adv Res Artif Intell 4(3):9–18 22. KishorWagh S, Pachghare VK, Kolhe SR (2013) Survey on intrusion detection system using machine learning techniques. Int J Comput Appl 78(16):30–37. https://doi.org/10.5120/136081412 23. Farid DM, Pervez MS (2014) Feature selection and intrusion classification in NSL-KDD CUP 99 dataset employing SVMs. In: Proceedings of 8th international conference on software knowledge, information management and application (SKIMA), pp 1–6 24. Shapoorifard H, Shamsinejad P (2017) Intrusion detection using a novel hybrid method incorporating an improved KNN. Int J Comput Appl 173(1):5–9 25. Malik AJ, Khan FA (2017) A hybrid technique using binary particle swarm optimization and decision tree pruning for network intrusion detection. Clust Comput 2 (3):1–14 26. Gao N, Gao L, Gao Q, Wang H (2014) An intrusion detection model based on deep belief networks. In: Proceedings of 2nd international conference on advanced cloud big data, pp 247–252
378
R. Khandait et al.
27. Kim J, Kim J, Thu HLT, Kim H (2016) Long short term memory recurrent neural network classier for intrusion detection. In: Proceedings of international conference platform and technology service, pp 1–5 28. Yu Y, Long J, Cai Z (2017) Network intrusion detection through stacking dilated convolutional autoencoders. SecurCommun Netw 2(3):1–10 29. Raajan NR, Krishnan RB (2017) An intellectual intrusion detection system model for attacks classication using RNN. Int J Pharm Technol 8(4):23157–23164 30. Saxena H, Richariya V (2014) Intrusion detection in KDD99 dataset using SVM-PSO and feature reduction with information gain. Int J Comput Appl 98(6):25–29
Chapter 29
A Machine Learning Approach for Honey Adulteration Detection Using Mineral Element Profiles Mokhtar A. Al-Awadhi
and Ratnadeep R. Deshmukh
1 Introduction Honey is a natural food with high economic value making it a target for adulteration with inexpensive industrial sugar syrups. Honey adulteration aims to increase its volume and gain fast profits. Adulteration of honey decreases its quality and has economic and health repercussions [1]. Detecting adulterated honey is a challenging problem for consumers since it is difficult to detect adulteration using taste or smell. Several studies have used various technologies for detecting adulteration in honey. Among these technologies are traditional approaches, such as isotopic analysis [2], chromatographic analysis [3], and honey physicochemical parameter analysis [4]. These classical detection methods are time-consuming and require sample preparation and skillful professionals. Besides, the traditional approaches fail to detect adulteration with some industrial sugar syrup. Several researchers used modern methods for detecting honey adulteration. These techniques include various spectroscopic technologies [5–8], hyperspectral imaging [9], electronic nose and electronic tongue [10], and optic fiber sensors [11]. Although the modern techniques have overcome some of the limitations of the traditional methods, they are less accurate and do not provide information about the chemical composition of honey.
M. A. Al-Awadhi (B) · R. R. Deshmukh Department of Computer Science and IT, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, India e-mail: [email protected] R. R. Deshmukh e-mail: [email protected] M. A. Al-Awadhi Department of Information Technology, Faculty of Engineering and Information Technology, Taiz University, Taiz, Yemen © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_29
379
380
M. A. Al-Awadhi and R. R. Deshmukh
A few studies have investigated utilizing minerals with ML models to detect adulteration in honey. Therefore, there is a need to develop robust ML-based techniques for discriminating between authentic and adulterated honey. This study aims to develop an accurate ML model for detecting honey adulteration using mineral element data. In the present paper, we use the Random Forest (RF) algorithm for discriminating between pure and adulterated honey. Besides, we compare the performance of the RF model to the performance of other ML models, such as Logistic Regression (LR) and Decision Tree (DT). The dataset [12] used in the present study contains measurements of various mineral elements in samples of pure and adulterated honey. The botanical sources of the honey specimens are acacia, chaste, jujube, linden, rape, and Triadica Cochinchinensis (TC). The measured minerals are aluminum, boron, barium, calcium, iron, potassium, magnesium, manganese, sodium, phosphorus, strontium, and zinc. The mineral element content was measured using inductively coupled plasma optical emission spectrometry. The falsified honey samples were obtained by mixing pure honey with sugar syrup at different concentrations. Since some mineral elements were not detected in some sugar syrup and adulterated honey samples, they were labeled as Not Detected (ND), representing missing values in the dataset. The dataset consists of 429 instances and 13 variables, including the class labels. The dataset contains three class labels representing pure honey, sugar syrup, and adulterated honey. Figure 1 depicts the distribution of honey samples in the dataset according to their botanical origins. There are two previous works on this dataset. Liu et al. [13] used Partial Least Squares-Discriminant Analysis (PLS-DA) to detect adulteration in five monofloral honey types and one multi-floral honey type. The study achieved a classification accuracy of 93% for the mono-floral honey and 87.7% for the multi-floral honey. Templ et al. [14] used Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), and Artificial Neural Networks (ANN) to classify the honey samples in the dataset into pure and impure. The study reported that the ANN classifier obtained the lowest misclassification rate. 80 60 40 20 0 Acacia
Chaste
Jujube
Pure Honey
Linden
Rape
TC
Adulterated Honey
Fig. 1 The distribution of honey samples in the dataset according to their botanical origins
29 A Machine Learning Approach for Honey Adulteration Detection Using …
381
2 Proposed System The proposed approach for identifying adulterated honey, depicted in Fig. 2, comprises two main phases: preprocessing and classification. These two phases are described in detail in the next subsections.
2.1 Preprocessing The dataset utilized in the present study contains instances with missing values. The missing values were for the undetected mineral elements in some honey and sugar syrup samples. In this phase, we set the values of the missing-value attributes zeros. We use the min–max normalization approach [15] described by Eq. 1 to normalize the attributes in the dataset to the interval between zero and one, which helps improve the classification performance of ML models. Xn =
X o − min(X o ) max(X o ) − min(X o )
(1)
where X o and X n are the original and normalized values, respectively.
2.2 Classification We have used three different ML models to discriminate between authentic and adulterated honey in this phase. These models include LR, DT, and RF. We have chosen these three models because they represented linear and nonlinear classifiers and achieved the highest performance compared to other ML classification models. LR is an ML predictive analytic approach that can be used to solve classification tasks based on the probability concept. The independent variables in this model are used to
Fig. 2 The proposed system’s block diagram
382
M. A. Al-Awadhi and R. R. Deshmukh
predict the dependent variable and can be measured on a nominal, ordinal, interval, or ratio scale, while the dependent variable can have two or more categories. The dependent variable can have a linear or nonlinear relationship with the independent variables [16]. DT is a supervised machine learning algorithm that can be used to solve both classification and regression problems. Decision tree-based classifiers are easy to interpret, work with numeric and categorical attributes, and can solve binary and multi-class problems [17]. RF is a method for supervised machine learning. It is capable of carrying out classification and regression tasks. The RF algorithm is based on constructing a large number of decision trees. Following their creation, the trees vote for the most popular class [18].
2.3 ML Models Performance Evaluation To assess the efficiency of the ML models for discriminating between pure and adulterated honey, we have used the tenfold cross-validation assessment method. Cross-validation has the advantage of avoiding model overfitting. The performance metrics used were precision, recall, and F1 score. We employed these performance metrics because the dataset utilized in this study is unbalanced, and the classification accuracy is not sufficient for assessing the performance of the ML models. Precision is the ratio of successfully predicted positive observations to total expected positive observations. The recall is defined as the ratio of correctly predicted positive observations to all observations in the actual class. The F1 score is the weighted average of accuracy and recall. As a result, this score takes both false positives and negatives into account. F1 score is a very useful performance statistic, particularly for imbalanced datasets. Figure 3 illustrates the process of evaluating the performance of ML algorithms.
3 Results and Discussion In the present paper, we evaluated the performance of ML models for discriminating between authentic and adulterated honey using two methods. In the first method, we use one ML model to identify adulteration in honey from different botanical sources. The second method establishes six ML models to separately detect adulteration in the six honey varieties.
3.1 General Model for Detecting Honey Adulteration In this method, we used one ML model for classifying the samples in the dataset into pure honey, adulterated honey, and sugar syrup. We trained the model using all the
29 A Machine Learning Approach for Honey Adulteration Detection Using …
383
Fig. 3 ML models performance evaluation approach
samples in the dataset. The objective was to assess the effectiveness of ML models in distinguishing authentic honey samples from adulterated samples regardless of the honey botanical origin. In the experiments, we divided the samples in the dataset into three classes. The first and second classes include samples of pure and falsified honey from all floral sources. The third class comprises sugar syrup samples. Logistic Regression. Table 1 shows the various performance metrics of the logistic regression model for the three classes in the dataset. The performance metrics are precision, recall, and F1 score. Table 2 displays the confusion matrix of the model. The confusion matrix gives more information about a predictive model’s performance, including which classes are properly predicted, which are wrongly forecasted, and what sort of errors are created. The results show that the logistic regression model perfectly discriminates between honey and sugar syrup. For discriminating between pure and impure honey, the logistic regression model achieved a good recall of 73.1% for pure honey and 68.9% for adulterated honey. Decision Tree. Table 3 displays the performance of the decision tree model for detecting adulteration in honey. The model’s confusion matrix is shown in Table 4. The results show an excellent performance of the decision tree for discriminating between authentic honey, sugar syrup, and adulterated honey. Table 1 Performance of logistic regression for detecting honey adulteration
Honey type
Precision
Recall
F1 score
Authentic honey
0.721
0.731
0.726
Sugar syrup
1.000
1.000
1.000
Adulterated honey
0.700
0.689
0.694
384 Table 2 Confusion matrix of the logistic regression classifier
Table 3 Performance of decision tree for detecting honey adulteration
Table 4 Confusion matrix of the decision tree classifier
M. A. Al-Awadhi and R. R. Deshmukh Classified as
Authentic honey
Sugar syrup
Adulterated honey
Authentic honey
147
0
54
Sugar syrup
0
45
0
Adulterated honey
57
0
126
Honey type
Precision
Recall
F1 score
Authentic honey
0.970
0.970
0.970
Sugar syrup
1.000
1.000
1.000
Adulterated honey
0.967
0.967
0.967
Classified as
Authentic honey
Sugar syrup
Adulterated honey
Authentic Honey
195
0
6
Sugar Syrup
0
45
0
Adulterated Honey
6
0
177
Random Forest. Table 5 displays the performance of the RF model for detecting honey adulteration. Table 6 displays the confusion matrix of the model. The findings show that the RF model obtained excellent performance for detecting adulterated honey, achieving a recall of 98.6%. Table 5 Performance of random forest for detecting honey adulteration
Table 6 Confusion matrix of the random forest classifier
Honey type
Precision
Recall
F1 score
Authentic honey
0.995
0.970
0.982
Sugar syrup
1.000
1.000
1.000
Adulterated honey
0.968
0.986
0.981
Classified as
Authentic honey
Sugar syrup
Adulterated honey
Authentic honey
195
0
6
Sugar syrup
0
45
0
Adulterated honey
1
0
182
29 A Machine Learning Approach for Honey Adulteration Detection Using …
385
Table 7 Comparison between ML models performance ML model
Accuracy
Precision
Recall
F1 score
Logistic regression
0.741
0.741
0.741
0.741
Decision tree
0.972
0.972
0.972
0.972
Random forest
0.984
0.984
0.984
0.984
1 0.8 0.6 0.4 0.2 0 Accuracy
Precision
Logistic Regression
Recall
Decision Tree
F1 Score Random Forest
Fig. 4 Comparison between ML models’ performance for detecting honey adulteration
Table 7 and Fig. 4 compare the models’ performance in discriminating between authentic honey, sugar syrup, and adulterated honey. Results show that the ML models performed well for discriminating between pure and impure honey. Results also show that the RF model outperformed other classifiers, achieving the highest classification accuracy. In this study, we used linear and nonlinear classifiers for discriminating between pure and impure honey. Results show that the Logistic Regression model achieved the lowest detection accuracy, implying that the dataset is linearly inseparable. The Decision Tree achieved accuracy higher than the Logistic Regression since the Decision Tree is a nonlinear classifier. The Random Forest classifier achieved the highest accuracy since it selects the Decision Trees with the highest accuracy. All classifiers successfully distinguished sugar syrup from honey due to the significant, varied mineral element content in honey and the syrup.
3.2 Class-Wise Adulteration Detection In this method, we use six ML models for discriminating between pure and impure honey. Each ML model detects adulteration in honey from a specific botanical origin. Table 8 and Fig. 5 display the accuracy of the ML classifiers for detecting adulteration in the different honey varieties in the data set. The results show that all three
386
M. A. Al-Awadhi and R. R. Deshmukh
Table 8 The cross-validation accuracy of the ML classifiers for identifying adulterated honey from various botanical sources Acacia
Botanical source
Chaste
Jujube
Linden
Rape
TC
Average
Logistic regression
91.67
88.33
96.67
87.5
100
92.42
92.77
Decision tree
97.22
91.67
88.33
95.83
100
98.48
95.26
Random forest
98.61
100
100
98.61
100
100
99.54
100 95 90 85 80 Acacia
Chaste
Logistic Regression
Jujube
Linden
Decision Tree
Rape
TC
Random Forest
Fig. 5 The accuracy of ML classifiers for identifying adulterated honey from various botanical sources
classifiers achieve good performance, but the RF model achieves the best performance with an average cross-validation accuracy of 99.54%. The RF model perfectly detects adulteration in honey from the chaste, jujube, rape, and TC floral sources. The findings show little improvement in the classifiers’ performance in detecting honey adulteration using class-wise detection models. Like the general detection model, the Logistic Regression obtained the lowest accuracy, and the Random Forest achieved the highest accuracy. The classifiers’ detection performance varied according to the honey botanical origin, except the rape honey, where all classifiers perfectly detected the adulteration. The Random Forest classifier perfectly detected the impurity in the jujube honey, rape honey, and TC honey.
3.3 Mineral Element Importance for Identifying Adulterated Honey Figure 6 shows the significance of the mineral elements in detecting honey adulteration according to the Random Forest classifier. The figure reveals that barium,
29 A Machine Learning Approach for Honey Adulteration Detection Using …
387
Fig. 6 Mineral element importance for identifying adulterated honey
boron, sink, potassium, and iron are the most significant mineral elements for discriminating between authentic and adulterated honey. The barium was the most significant attribute since this mineral was absent in most authentic honey samples.
4 Conclusion Distinguishing genuine honey from adulterated honey is a challenging problem for consumers. This paper proposed an ML-based system for detecting adulterated honey using mineral element profiles. Experimental findings show that mineral element content provides discriminative information helpful for detecting honey adulteration. Experimental results also show that the RF classifier obtained the best performance, outperforming other ML classifiers. The performance of the ML classifiers to identify adulterated honey varied according to the honey’s botanical origin. Among six honey botanical origins, the adulteration was accurately detected in the chaste, jujube, rape, and TC honey. Moreover, this study concludes that ML models in combination with mineral element profiles can effectively discriminate between pure and impure honey regardless of the honey’s botanical origin. Acknowledgements This research was supported by the Department of Science and Technology’s Funds for Infrastructure through Science and Technology (DST-FIST) grant SR/FST/ETI-340/2013 to Dr. Babasaheb Ambedkar Marathwada University in Aurangabad, Maharashtra, India. The authors would like to express their gratitude to the department and university administrators for providing the research facilities and assistance.
388
M. A. Al-Awadhi and R. R. Deshmukh
References 1. Al-Awadhi MA, Deshmukh RR (2021) A review on modern analytical methods for detecting and quantifying adulteration in honey. In: 2021 international conference of modern trends in information and communication technology industry (MTICTI), pp 1–6. IEEE. https://doi.org/ 10.1109/mticti53925.2021.9664767 2. Tosun M (2013) Detection of adulteration in honey samples added various sugar syrups with 13C/12C isotope ratio analysis method. Food Chem 138:1629–1632. https://doi.org/10.1016/ j.foodchem.2012.11.068 3. Islam MK, Sostaric T, Lim LY, Hammer K, Locher C (2020) Sugar profiling of honeys for authentication and detection of adulterants using high-performance thin layer chromatography. Molecules (Basel, Switzerland) 25. https://doi.org/10.3390/molecules25225289 4. Al-Mahasneh M, Al-U’Datt M, Rababah T, Al-Widyan M, Abu Kaeed A, Al-Mahasneh AJ, Abu-Khalaf N (2021) Classification and prediction of bee honey indirect adulteration using physiochemical properties coupled with k-means clustering and simulated annealing-artificial neural networks (SA-ANNs). J Food Qual. https://doi.org/10.1155/2021/6634598 5. Song X, She S, Xin M, Chen L, Li Y, Heyden YV, Rogers KM, Chen L (2020) Detection of adulteration in Chinese monofloral honey using 1H nuclear magnetic resonance and chemometrics. J Food Compos Anal 86. https://doi.org/10.1016/j.jfca.2019.103390 6. Liu W, Zhang Y, Han D (2016) Feasibility study of determination of high-fructose syrup content of Acacia honey by terahertz technique. Infrared, Millimeter-Wave, Terahertz Technol IV. 10030, 100300J. https://doi.org/10.1117/12.2245966 7. Guelpa A, Marini F, du Plessis A, Slabbert R, Manley M (2017) Verification of authenticity and fraud detection in South African honey using NIR spectroscopy. Food Control 73:1388–1396. https://doi.org/10.1016/j.foodcont.2016.11.002 8. Azmi MFI, Jamaludin D, Abd Aziz S, Yusof YA, Mohd Mustafah A (2021) Adulterated stingless bee honey identification using VIS-NIR spectroscopy technique. Food Res 5:85–93. https:// doi.org/10.26656/fr.2017.5(S1).035 9. Al-Awadhi MA, Deshmukh RR (2022) Honey adulteration detection using hyperspectral imaging and machine learning. In: 2022 2nd international conference on artificial intelligence and signal processing (AISP), pp 1–5. IEEE. https://doi.org/10.1109/AISP53593.2022. 9760585 10. Bodor Z, Kovacs Z, Rashed MS, Kókai Z, Dalmadi I, Benedek C (2020) Sensory and physicochemical evaluation of acacia and linden honey adulterated with sugar syrup. Sensors (Switzerland). 20:1–20. https://doi.org/10.3390/s20174845 11. Irawati N, Isa NM, Mohamed AF, Rahman HA, Harun SW, Ahmad H (2017) Optical microfiber sensing of adulterated honey. IEEE Sens J 17:5510–5514. https://doi.org/10.1109/JSEN.2017. 2725910 12. Luo L (2020) Data for: discrimination of honey and adulteration by elemental chemometrics profiling. Mendeley Data V1. https://doi.org/10.17632/tt6pp6pbpk.1 13. Liu T, Ming K, Wang W, Qiao N, Qiu S, Yi S, Huang X, Luo L (2021) Discrimination of honey and syrup-based adulteration by mineral element chemometrics profiling. Food Chem 343:128455. https://doi.org/10.1016/j.foodchem.2020.128455 14. Templ M, Templ B (2021) Statistical analysis of chemical element compositions in food science: problems and possibilities. Molecules 26:1–15. https://doi.org/10.3390/molecules26195752 15. Singh D, Singh B (2020) Investigating the impact of data normalization on classification performance. Appl Soft Comput 97:105524. https://doi.org/10.1016/j.asoc.2019.105524 16. Tsangaratos P, Ilia I (2016) Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: the influence of models complexity and training dataset size. CATENA 145:164–179. https://doi.org/10.1016/j.catena.2016.06.004 17. Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21:660–674. https://doi.org/10.1109/21.97458 18. Breiman L (2001) Random forest. Mach Learn 45:5–32
Chapter 30
Change Detection on Earth’s Surface Using Machine Learning: A Survey Pathan Misbah, Jhummarwala Abdul, and Dave Dhruv
1 Introduction In remote sensing, change detection refers to the identification of changes that have occurred on the Earth’s surface. It is a quantitative measurement of the differences which have occurred in a region over time. The temporal analysis is important to understand the change in land cover and land use, glacier melting, urban sprawl, forestry/deforestation, flood mapping, etc. Some specific uses of satellite images and spatial data include monitoring mangrove ecosystems, monitoring oil palm, paddy fields, and rice production estimates, tracking meteorological droughts, measuring ground deformation, and multi-year tracking of forest fires. These applications can highly benefit from the temporal analysis of satellite imagery [1]. Large amounts of temporal data is being collected over the past two decades and would be a boon to such applications through automated change detection and analysis techniques. This data has to be well processed to derive meaningful information which is of the significance of use. Multispectral satellite data has varying characteristics such as spatial, spectral, temporal, and radiometric resolutions. Sentinel 2A Multispectral Instrument [2], Landsat 8 OLI (Operational Land Imager) [3], Landsat-5 Thematic Mapper (TM) [4], MODIS version 6 [5], Landsat 7 [6], time-series data, etc. datasets have been used for change detection. The process of change detection requires several steps which P. Misbah (B) Master of Computer Engineering, LDRP-ITR, Gandhinagar, Gujarat, India e-mail: [email protected] J. Abdul BISAG-N, Gandhinagar, Gujarat, India e-mail: [email protected] D. Dhruv Department of Computer Engineering, LDRP-ITR, Gandhinagar, Gujarat, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-7892-0_30
389
390
P. Misbah et al.
include data acquisition, image pre-processing, image classification, and analysis of classified images. This paper is organized as follows. The information related to image preprocessing, spectral indices, supervised machine learning algorithms, and unsupervised machine learning algorithms are described in Sect. 2. A brief description of recent approaches is discussed in Sect. 3. Section 4 gives details about the dataset and methodology that is used for performing change detection in this study. The results are presented and discussed in Sect. 5. The conclusion and recommendation of the study are in Sect. 6.
2 Background Study 2.1 Image Pre-processing Remote sensing data suffer from a variety of radiometric, geometric, and atmospheric errors. These errors affect the accuracy of the data and because of that, the utility of data is reduced. The three most commonly used preprocessing techniques that are used to minimize those errors are radiometric, geometric, and atmospheric corrections. The data products available from various sources are pre-processed at various levels. For e.g., NASA’s Earth Observing System Data and Information System (EOSDIS) data products have processing levels ranging from Level 0 (Raw data) to Level 4 (model output and results). The types of the analysis to be conducted depends on further processing of variables available from various levels of data products [7]. The DOS (Dark Object Subtraction) tool has been used for radiometric and atmospheric corrections [2]. Dark Subtraction is used to remove the effects of atmospheric scattering from the image by subtracting a pixel value that represents a background signature from each band and by subtracting this value from every pixel in the band, the scattering is removed [3]. The semi-Automatic Classification Plugin (SCP) tool [4] of QGIS software has been also used for radiometric and atmospheric corrections. The radiometric calibration of the data was done to reduce the effects due to atmosphere, sensor calibration, and sun angle at different dates [8]. The geometric registration method using Rational Polynomial Coefficients (RPC) and a Digital Terrain Model (DTM) is used to utilize feature matching methods such as Scale Invariant Feature Transform (SIFT) and Random Sample Consensus (RANSAC) [9].
2.2 Spectral Indices Calculation of spectral indices is an operation of image processing to highlight some important biophysical properties of the Earth’s surface. Spectral indices are
30 Change Detection on Earth’s Surface Using Machine Learning: A Survey
391
the combination of two or more different bands which are available in the multispectral data. There are a number of spectral indices which can be generated using various combinations of bands for different applications. Some of the indices which are widely used include (1) NDWI (Normalized Difference Water Index), which measures the water bodies on the Earth’s surface [2], (2) SAVI (Soil Adjusted Vegetation Index), (3) GCI (Green Chlorophyll Index) is used to estimate the amount of leaf chlorophyll in numerous classes of plants [10] and (4) NDBI (Normalized Difference Built-up Index) for calculation of urban changes [11]. A comprehensive list of indices that can be generated from multispectral and hyperspectral data is available [12] The spectral indices have been grouped according to the area of their application. Vegetative spectral indices aim to provide information about the different land covers. Other indices such as Clay Mineral Ration (CMR) highlights hydrothermally altered rocks containing clay and alunite and spectral indices such as Burn Area Index (BAI) are used to represent the amount of charcoal in post-fire images. For vegetative analysis, the standardized index that measures vegetation cover using a combination of RED and NIR bands is NDVI (Normalized Difference Vegetation Index). The index output value ranges between –1 and 1, the positive values representing greeneries, whereas negative values may be created from clouds, water, rock, and bare soil [13]. Ensemble of spectral indices proves to be effective in the classification of landuse/land-cover such as built-up land, barren and semi-barren land, vegetation, and surface water bodies [13]. The experiment has proved that the classification using indices is better rather than using image data from the direct band [14].
2.3 Machine Learning Techniques Change detection methods can be categorized either as supervised or unsupervised according to the nature of data processing [8]. Supervised Machine Learning. Supervised techniques require training data or labeled data in order to derive results. Classification and regression problems can be solved by supervised learning. Some most popular supervised classification algorithms include K-Nearest Neighbor (KNN) [15], Random Forest (RF) [16, 17] and Support Vector Machine (SVM) [18] and classification and regression tree (CART) [19]. Unsupervised Machine Learning. In unsupervised classification, the image analyst does not need to know about the land covers in the study area before the clustering analysis as the algorithm of this classification reads pixels and aggregates them into a number of clusters, known as spectral classes [20]. Iterative Self-Organizing Data Analysis Technique (ISODATA) [11], K-Means, Fuzzy C-Means (FCM) [14], are the most applied cluster-based unsupervised classification algorithms for change detection.
392
P. Misbah et al.
3 Analysis of Literature Review Supervised classification approach to map rosemary cover in one of the regions of Morocco using Sentinel 2A data [2]. Three spectral indices, namely NDVI, NDWI, and SAVI have been used. These indices were stacked together and used as an input to the Random Forest classifier. The correlation of indices was used for accuracy assessment. The impact of NDVI, SAVI, albedo, Bare Soil Index (BSI), tasseled cap greenness (TCG), and tasseled cap brightness (TCB) on vegetation and soil condition of the island of Crete, Greece using Landsat 5 TM, Landsat 7 Enhanced Thematic Mapper Plus (ETM+) and Landsat 8 OLI datasets have been assessed [6]. Unsupervised classification techniques with visible band (RGB) images and spectral index have been used for computing NDVI and NDWI using Landsat 8 datasets [14]. The authors have used NDVI and NDWI images as an input to FCM and an RGB image as an input for ISODATA and K-Means classifiers. In [21], the general procedure of land cover classification from satellite images and extraction of various land cover features for the Greater Noida region of Uttar Pradesh using supervised classification algorithms has been performed. The results have been compared by using Naïve Bayes, Decision Tree, Support Vector Machine, and Random Forest classification models with Sentinel 2A image data. In [10], Landsat 8 data have been used to analyze change in vegetation for a region in Bangladesh. Mean NDVI and GCI for six years has been taken from 2013 to 2019 to predict the vegetation change for the year 2020. The paper [22] presented the procedure to determine the forest degradation by fire in the region of Porto Dos Gauchos, Mato Grosso using the Linear Spectral Mixing Model (LSMM) and Landsat 8 imagery. The authors of this paper have generated the fraction images of soil, vegetation, and shade using LSMM to highlight the burned area of the forest. The time series of fraction images can be generated using Google Earth Engine [23]. A new vegetation index using the combination of NDVI, (Enhanced Vegetation Index 2 (EVI2), and Optimized SAVI (OSAVI) has been used to predict and forecasting Boro rice areas in the Haor region of Bangladesh using MODIS version 6 image data [5]. Forecasting was done for total Boro rice areas at the beginning of the Boro season (Dec-Jan), which is more than 3 months earlier than harvesting time without using any ground truth data. The authors in [5] have used the 3D scatter plot and K-Nearest Neighbor classifier for prediction. In paper [11], image classification in absence of ground truth data was performed using ISODATA and K-Means classifiers using Sentinel 2A data. The results were of low accuracy which can be attributed to the absence of ground truth data. Several other studies have attempted to compare existing approaches to determine the best suitable approach for change detection for a particular application. In [24], four classifiers namely, Support Vector Machine, Random Forest, CART, and MaxEnt were tested in which the MaxEnt performed best among them. The study in [25] noted that the post-classification approach using Maximum Likelihood Classification (MLC) is an appropriate change detection method and also stated that none of the single change detection methods is applicable in all cases. Comparison between
30 Change Detection on Earth’s Surface Using Machine Learning: A Survey
393
Random Forest (RF), Support Vector Machine (SVM), Artificial Neural Network (ANN), Fuzzy Adaptive Resonance Theory-Supervised Predictive Mapping (Fuzzy ARTMAP), Spectral Angle Mapper (SAM), and Mahalanobis distance (MD) have been done [26]. It was found that the Random Forest classifier gives better results in comparison to others. The approach for change detection using Landsat time-series datasets have achieved an accuracy of 88.16% [27]. The utilization of Google Earth Engine (GEE) based web applications and the GEE platform for mapping vegetation using MODIS-derived EVI products for Vietnam and NDVI composite, spanning the past 30 years for the USA [28]. The LULC (Land Use Land Cover) pattern of change in Lodhran District of Pakistan over 40 years by conducting an analysis of NDVI and NDBI has been done [29]. Various factors like temperature, and rainfall have been derived using Landsat 4,5 (TM), Landsat 7 (ETM+) and Landsat 8 OLI datasets. The performance of NDVI, NDWI, Modified NDWI, and Automated Water Extraction Index (AWEI) using Landsat 8 data in Nepal have been evaluated [30]. It has been observed that not a single index is able to extract water surfaces with better accuracy. Existing indices have been used to find an optimal index for the detection of Yellow Rust, a wheat disease, in Changping District, Beijing, and Hebei Province, China [31]. For this study, the hyperspectral datasets have been utilized as it has shown potential for the detection of plant disease. It was found that PRI (Photochemical Reflectance Index) and ARI (Anthocyanin Reflectance Index) are optimal for monitoring Yellow Rust in the early-mid and mid-late growth stages.
4 Material and Methodology 4.1 Sentinel 2A Based on the existing studies and considering the spatial resolution that is available in open-source data, this study image dataset has been derived from Sentinel 2A imagery. Table 1 shows the complete details for the study area and datasets used in this paper. Image data for five years was downloaded from the USGS Earth Explorer platform. All the images are from the month of December except for the year 2019 because of cloud cover. The tile id of the study area is 43QCF. The bands that are used in the calculation of spectral indices are green, red, and Near Infrared (NIR), with a spatial resolution of 10 m.
4.2 Methodology To assess the performance of the spectral indices, NDVI and NDWI has been used to find out the vegetation and water bodies (or the vegetative water content). The results
394
P. Misbah et al.
Table 1 Details of study dataset Date
Satellite
Band
Spatial resolution (m)
24/12/2017
Sentinel 2A Tile Id: 43QCF
Band 3—Green
10
23/12/2020
Band 4—Red
10
23/12/2021
Band 8—NIR
10
29/12/2018 29/11/2019
of indices were derived using Python programming and QGIS 3 software. The value of spectral indices ranges between –1 to 1. NDVI (Normalized Difference Vegetation Index). The combination of the RED and NIR bands gives the value of NDVI. The formula of NDVI is in [30]. The normalized difference vegetation index (NDVI) quantifies the ratio between energy absorbed by the vegetation canopy in the red portion of EMS and the energy reflected in the near-infrared (NIR) [29, 32]. NDWI (Normalized Difference Water Index). The combination of the GREEN and NIR bands gives the value of NDWI. The formula of NDWI is in [30].
5 Results and Discussion The approach of spectral indices to analyze the change over a period of five years has been used in this paper. The results are shown in Table 2. Positive value of indices indicates the maximum value of specific land cover types, like vegetation and water bodies whereas the negative value represents a decrease in their amount. Figure 1 shows images of NDVI and NDWI respectively for the years 2017, 2018, 2019, 2020, and 2021. The value of NDVI ranges from –0.292 to +0.813 in 2017, –0.327 to +0.816 in 2018, –0.199 to +0.623 in 2019, –0.334 to +0.865 in 2020, and –0.275 to +0.803 for the year 2021. The highest and lowest value of NDVI is in the year 2020 which is 0.865 and –0.334 respectively. Table 2 Spectral indices result Year
NDVI min
NDVI max
NDWI min
NDWI max
2017
–0.292
0.813
–0.675
0.452
2018
–0.327
0.816
–0.713
0.523
2019
–0.199
0.623
–0.524
0.320
2020
–0.334
0.865
–0.666
0.493
2021
–0.275
0.803
–0.654
0.470
30 Change Detection on Earth’s Surface Using Machine Learning: A Survey
395
NDVI - 2017
NDVI - 2018
NDVI - 2019
NDVI - 2020
NDVI - 2021
NDWI - 2017
NDWI - 2018
NDWI - 2019
NDWI - 2020
NDWI - 2021
Fig. 1 NDVI and NDWI images of the study area
The greater value of NDVI shows healthy and dense vegetation, the lower value of NDVI shows sparse vegetation, and the negative value shows water. The range value of NDVI and NDWI is divided into five different range groups for better visualization. For NDVI the range groups are less than 0 (zero), 0 to +0.15, +0.15 to +0.25, + 0.25 to +0.55, and +0.55 to +1. The less than 0 (zero) range group indicates the water and the +0.55 to +1 range group indicates the dense vegetation. The value of NDWI ranges from –0.675 to 0.452 in 2017, –0.713 to 0.523 in 2018, –0.524 to 0.320 in 2019, –0.666 to 0.493 in 2020 and –0.654 to 0.470 in 2021. The water area is clearly detected in both the index images. In NDWI the positive value indicates the water surface whereas the negative value indicates the non-water surface. The highest and the lowest NDWI value is in the year 2018 which is 0.523 and –0.713 respectively. For NDWI the range groups are less than –0.6, –0.6 to –0.3, –0.3 to –0.2, –0.2 to +0.2, and +0.2 to +1. The less than –0.5 range group indicates the vegetation and the +0.2 to +1 range group indicates the water. The reason behind the visual difference in images of NDVI and NDWI in the year 2019 could be the difference in the month of the dataset. The Tables 3 and 4 shows details of area in (%) for NDVI and NDWI respectively for the defined range group. The reason behind the major decrease and increase between the value of 2018 and 2019 can be the difference of season of the dataset that was used. The results of regression analysis are shown in Fig. 2. The ideal value of R square is 1 for the best-fitted model and in the worst case, it can be negative. The closer the value of R square to the 1, the better is the linear model. From the analysis between NDVI and NDWI, it is observed that it is an inverse relationship between them. The value of R square ranges from +0.926 to +0.947 for the years 2017 to 2021. The observation says that in such areas where the NDWI is positive, the NDVI is negative and vice-versa.
396
P. Misbah et al.
Table 3 NDVI area (%) details Range group
2017 (%)
2018 (%)
2019 (%)
2020 (%)
2021 (%)