127 56 29MB
English Pages 1098 [1042] Year 2022
Lecture Notes on Data Engineering and Communications Technologies 101
D. Jude Hemanth Danilo Pelusi Chandrasekar Vuppalapati Editors
Intelligent Data Communication Technologies and Internet of Things Proceedings of ICICI 2021
Lecture Notes on Data Engineering and Communications Technologies Volume 101
Series Editor Fatos Xhafa, Technical University of Catalonia, Barcelona, Spain
The aim of the book series is to present cutting edge engineering approaches to data technologies and communications. It will publish latest advances on the engineering task of building and deploying distributed, scalable and reliable data infrastructures and communication systems. The series will have a prominent applied focus on data technologies and communications with aim to promote the bridging from fundamental research on data science and networking to data engineering and communications that lead to industry products, business knowledge and standardisation. Indexed by SCOPUS, INSPEC, EI Compendex. All books published in the series are submitted for consideration in Web of Science.
More information about this series at https://link.springer.com/bookseries/15362
D. Jude Hemanth · Danilo Pelusi · Chandrasekar Vuppalapati Editors
Intelligent Data Communication Technologies and Internet of Things Proceedings of ICICI 2021
Editors D. Jude Hemanth Department of Electronics and Communication Engineering Karunya Institute of Technology and Sciences Coimbatore, India
Danilo Pelusi Faculty of Communication Sciences University of Teramo Teramo, Italy
Chandrasekar Vuppalapati Department of Computer Engineering San Jose State University San Jose, CA, USA
ISSN 2367-4512 ISSN 2367-4520 (electronic) Lecture Notes on Data Engineering and Communications Technologies ISBN 978-981-16-7609-3 ISBN 978-981-16-7610-9 (eBook) https://doi.org/10.1007/978-981-16-7610-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
We are honored to dedicate the proceedings of ICICI 2021 to all the participants and editors of ICICI 2021.
Foreword
This conference proceedings volume contains the written versions of most of the contributions presented during the conference of ICICI 2021. The conference provided a setting for discussing recent developments in a wide variety of topics including data communication, computer networking, communicational technologies, wireless and ad hoc network, cryptography, big data, cloud computing, IoT, and healthcare informatics. The conference has been a good opportunity for participants coming from various destinations to present and discuss topics in their respective research areas. ICICI 2021 conference tends to collect the latest research results and applications on intelligent data communication technologies and Internet of Things. It includes a selection of 77 papers from 275 papers submitted to the conference from universities and industries all over the world. All the accepted papers were subjected to strict peer reviewing by 2–4 expert referees. The papers have been selected for this volume because of quality and the relevance to the conference. ICICI 2021 would like to express our sincere appreciation to all authors for their contributions to this book. We would like to extend our thanks to all the referees for their constructive comments on all papers; especially, we would like to thank Guest Editors Dr. D. Jude Hemanth, Professor, Department of ECE, Karunya Institute of Technology and Sciences, India; Dr. Danilo Pelusi, Faculty of Communication Sciences, University of Teramo, Italy, and Dr. Chandrasekar Vuppalapati, Professor, San Jose State University, California, USA, for their hard working. Finally, we would like to thank Springer publications for producing this volume. Dr. K. Geetha Conference Chair—ICICI 2021
vii
Preface
It is with deep satisfaction that I write this Foreword to the Proceedings of the ICICI 2021 held in, JCT College of Engineering and Technology, Coimbatore, Tamil Nadu, from August 27 to 28, 2021. This conference was bringing together researchers, academics and professionals from all over the world, experts in Data Communication Technologies and Internet of Things. This conference encouraged research students and developing academics to interact with the more established academic community in an informal setting to present and discuss new and current work. The papers contributed the most up-to-date scientific knowledge in the fields of data communication and computer networking, communication technologies, and their applications such as IoT, big data, and cloud computing. Their contributions aided in making the conference as successful as it has been. The members of the local organizing committee and their assistants have put in a lot of time and effort to ensure that the meeting runs smoothly on a daily basis. We hope that this program will stimulate further research in intelligent data communication technologies and the Internet of Things, as well as provide practitioners with improved techniques, algorithms, and deployment tools. Through this exciting program, we feel honored and privileged to bring you the most recent developments in the field of intelligent data communication technologies and the Internet of Things. We thank all authors and participants for their contributions. Coimbatore, India Teramo, Italy San Jose, USA
Dr. D. Jude Hemanth Dr. Danilo Pelusi Dr. Chandrasekar Vuppalapati
ix
Acknowledgements
ICICI 2021 would like to acknowledge the excellent work of our conference organizing committee, keynote speakers for their presentation on August 27–28, 2021. The organizers also wish to acknowledge publicly the valuable services provided by the reviewers. On behalf of the editors, organizers, authors and readers of this conference, we wish to thank the keynote speakers and the reviewers for their time, hard work, and dedication to this conference. The organizers wish to acknowledge Dr. D. Jude Hemanth and Dr. K. Geetha for the discussion, suggestion, and cooperation to organize the keynote speakers of this conference. The organizers also wish to acknowledge for speakers and participants who attend this conference. Many thanks given for all persons who help and support this conference. ICICI 2021 would like to acknowledge the contribution made to the organization by its many volunteers. Members contribute their time, energy, and knowledge at a local, regional, and international levels. We also thank all the chair persons and conference committee members for their support.
xi
Contents
An Optimized Convolutional Neural Network Model for Wild Animals Detection Using Filtering Techniques and Different Opacity Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pavan Nageswar Reddy Bodavarapu, T. Ashish Narayan, and P. V. V. S. Srinivas
1
A Study on Current Research and Challenges in Attribute-based Access Control Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Vijayalakshmi and V. Jayalakshmi
17
Audio Denoising Using Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . S. Jassem Mohammed and N. Radhika
33
Concept and Development of Triple Encryption Lock System . . . . . . . . . A. Fayaz Ahamed, R. Prathiksha, M. Keerthana, and D. Mohana Priya
49
Partially Supervised Image Captioning Model for Urban Road Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Srihari and O. K. Sikha Ease and Handy Household Water Management System . . . . . . . . . . . . . K. Priyadharsini, S. K. Dhanushmathi, M. Dharaniga, R. Dharsheeni, and J. R. Dinesh Kumar
59 75
Novel Intelligent System for Medical Diagnostic Applications Using Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. P. Anithaashri, P. Selvi Rajendran, and G. Ravichandran
93
Extracting Purposes from an Application to Enable Purpose Based Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amruta Jain and Sunil Mane
103
Cotton Price Prediction and Cotton Disease Detection Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Priya Tanwar, Rashi Shah, Jaini Shah, and Unik Lokhande
115
xiii
xiv
Contents
Acute Leukemia Subtype Prediction Using EODClassifier . . . . . . . . . . . . S. K. Abdullah, S. K. Rohit Hasan, and Ayatullah Faruk Mollah
129
Intrusion Detection System Intensive on Securing IoT Networking Environment Based on Machine Learning Strategy . . . . . . D. V. Jeyanthi and B. Indrani
139
Optimization of Patch Antenna with Koch Fractal DGS Using PSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sanoj Viswasom and S. Santhosh Kumar
159
Artificial Intelligence-Based Phonocardiogram: Classification Using Cepstral Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Saritha Haridas, Arun T. Nair, K. S. Haritha, and Kesavan Namboothiri
173
Severity Classification of Diabetic Retinopathy Using Customized CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shital N. Firke and Ranjan Bala Jain
193
Study on Class Imbalance Problem with Modified KNN for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Sasirekha, B. Kanisha, and S. Kaliraj
207
Analysis of (IoT)-Based Healthcare Framework System Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Lalithadevi and S. Krishnaveni
219
Hand Gesture Recognition for Disabled Person with Speech Using CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. P. Shadiya Febin and Arun T. Nair
239
Coronavirus Pandemic: A Review of Different Machine Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bhupinder Singh and Ritu Agarwal
251
High Spectrum and Efficiency Improved Structured Compressive Sensing-Based Channel Estimation Scheme for Massive MIMO Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Baranidharan, C. Raju, S. Naveen Kumar, S. N. Keerthivasan, and S. Isaac Samson
265
A Survey on Image Steganography Techniques Using Least Significant Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y. Bhavani, P. Kamakshi, E. Kavya Sri, and Y. Sindhu Sai
281
Efficient Multi-platform Honeypot for Capturing Real-time Cyber Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Sivamohan, S. S. Sridhar, and S. Krishnaveni
291
Contents
A Gender Recognition System from Human Face Images Using VGG16 with SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Mandara and N. Manohar Deep Learning Approach for RPL Wormhole Attack . . . . . . . . . . . . . . . . T. Thiyagu, S. Krishnaveni, and R. Arthi Precision Agriculture Farming by Monitoring and Controlling Irrigation System Using Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Badri Deva Kumar, M. Sobhana, Jahnavi Duvvuru, Chalasani Nikhil, and Gopisetti Sridhar Autonomous Driving Vehicle System Using LiDAR Sensor . . . . . . . . . . . . Saiful Islam, Md Shahnewaz Tanvir, Md. Rawshan Habib, Tahsina Tashrif Shawmee, Md Apu Ahmed, Tafannum Ferdous, Md. Rashedul Arefin, and Sanim Alam Multiple Face Detection Tracking and Recognition from Video Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Athira, Arun T. Nair, Kesavan Namboothiri, K. S. Haritha, and Nimitha Gopinath Review Analysis Using Ensemble Algorithm . . . . . . . . . . . . . . . . . . . . . . . . V. Baby Shalini, M. Iswarya, S. Ramya Sri, and M. S. Anu Keerthika A Blockchain-Based Expectation Solution for the Internet of Bogus Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rishi Raj Singh, Manish Thakral, Sunil Kaushik, Ayur Jain, and Gunjan Chhabra
xv
309 321
331
345
359
373
385
Countering Blackhole Attacks in Mobile Adhoc Networks by Establishing Trust Among Participating Nodes . . . . . . . . . . . . . . . . . . . Mukul Shukla and Brijendra Kumar Joshi
399
Identification of Gene Communities in Liver Hepatocellular Carcinoma: An OffsetNMF-Based Integrative Technique . . . . . . . . . . . . Sk Md Mosaddek Hossain and Aanzil Akram Halsana
411
Machine Learning Based Approach for Therapeutic Outcome Prediction of Autism Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. S. KanimozhiSelvi, K. S. Kalaivani, M. Namritha, S. K. Niveetha, and K. Pavithra An Efficient Implementation of ARIMA Technique for Air Quality Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rudragoud Patil, Gayatri Bedekar, Parimal Tergundi, and R. H. Goudar A Survey on Image Emotion Analysis for Online Reviews . . . . . . . . . . . . G. N. Ambika and Yeresime Suresh
425
441
453
xvi
Contents
An Efficient QOS Aware Routing Using Improved Sensor Modality-based Butterfly Optimization with Packet Scheduling for MANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Arivarasan, S. Prakash, and S. Surendran IoT Based Electricity Theft Monitoring System . . . . . . . . . . . . . . . . . . . . . . S. Saadhavi, R. Bindu, S. Ram. Sadhana, N. S. Srilalitha, K. S. Rekha, and H. D. Phaneendra An Exploration of Attack Patterns and Protection Approaches Using Penetration Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kousik Barik, Karabi Konar, Archita Banerjee, Saptarshi Das, and A. Abirami Intrusion Detection System Using Homomorphic Encryption . . . . . . . . . Aakash Singh, Parth Kitawat, Shubham Kejriwal, and Swapnali Kurhade Reversible Data Hiding Using LSB Scheme and DHE for Secured Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. N. V. S. L. S. Indira, Y. K. Viswanadham, J. N. V. R. Swarup Kumar, Ch. Suresh Babu, and Ch. Venkateswara Rao Prediction of Solar Power Using Machine Learning Algorithm . . . . . . . . M. Rupesh, J. Swathi Chandana, A. Aishwarya, C. Anusha, and B. Meghana
463 477
491
505
519
529
Prediction of Carcinoma Cancer Type Using Deep Reinforcement Learning Technique from Gene Expression Data . . . . . . A. Prathik, M. Vinodhini, N. Karthik, and V. Ebenezer
541
Multi-variant Classification of Depression Severity Using Social Media Networks Based on Time Stamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Yohapriyaa and M. Uma
553
Identification of Workflow Patterns in the Education System: A Multi-faceted Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ganeshayya Shidaganti, M. Laxmi, S. Prakash, and G. Shivamurthy
565
Detection of COVID-19 Using Segmented Chest X-ray . . . . . . . . . . . . . . . P. A. Shamna and Arun T. Nair A Dynamic Threshold-Based Technique for Cooperative Blackhole Attack Detection in VANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Remya Krishnan and P. Arun Raj Kumar Detecting Fake News Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . Ritik H. Patel, Rutvik Patel, Sandip Patel, and Nehal Patel
585
599 613
Contents
xvii
Predicting NCOVID-19 Probability Factor with Severity Index . . . . . . . Ankush Pandit, Soumalya Bose, and Anindya Sen
627
Differentially Evolved RBFNN for FNAB-Based Detection of Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sunil Prasad Gadige, K. Manjunathachari, and Manoj Kumar Singh A Real-Time Face Mask Detection-Based Attendance System Using MobileNetV2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kishansinh Rathod, Zeel Punjabi, Vivek Patel, and Mohammed Husain Bohara A New Coded Diversity Combining Scheme for High Microwave Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yousra Lamrani, Imane Benchaib, Kamal Ghoumid, and El Miloud Ar-Reyouchi Extractive Text Summarization of Kannada Text Documents Using Page Ranking Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. P. Chandrika and Jagadish S. Kallimani Destructive Outcomes of Digitalization (Credit Card), a Machine Learning Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yashashree Patel, Panth Shah, Mohammed Husain Bohara, and Amit Nayak Impact of Blockchain Technology in the Healthcare Systems . . . . . . . . . . Garima Anand, Ashwin Prajeeth, Binav Gautam, Rahul, and Monika A Comparison of Machine Learning Techniques for Categorization of Blood Donors Having Chronic Hepatitis C Infection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sukhada Bhingarkar Monitoring the Soil Parameters Using IoT for Smart Agriculture . . . . . K. Gayathri and S. Thangavelu
643
659
671
683
697
709
731 743
NRP-APP: Robust Seamless Data Capturing and Visualization System for Routine Immunization Sessions . . . . . . . . . . . . . . . . . . . . . . . . . Kanchana Rajaram, Pankaj Kumar Sharma, and S. Selvakumar
759
Methodologies to Ensure Security and Privacy of an Enterprise Healthcare Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joseph George and M. K. Jeyakumar
777
Comparative Analysis of Open-Source Vulnerability Scanners for IoT Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christopher deRito and Sajal Bhatia
785
xviii
Contents
Emotion and Collaborative-Based Music Recommendation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Aparna, C. L. Chandana, H. N. Jayashree, Suchetha G. Hegde, and N. Vijetha Cricket Commentary Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Siva Balaji, N. Gunadeep Vignan, D. S. V. N. S. S. Anudeep, Md. Tayyab, and K. S. Vijaya Lakshmi Performance Comparison of Weather Monitoring System by Using IoT Techniques and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naveen S. Talegaon, Girish R. Deshpande, B. Naveen, Manjunath Channavar, and T. C. Santhosh A Study on Surface Electromyography in Sports Applications Using IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. Nithya, G. Nallavan, and V. Sriabirami
801
825
837
855
Detection of IoT Botnet Using Recurrent Neural Network . . . . . . . . . . . . P. Tulasi Ratnakar, N. Uday Vishal, P. Sai Siddharth, and S. Saravanan
869
Biomass Energy for Rural India: A Sustainable Source . . . . . . . . . . . . . . Namra Joshi
885
Constructive Approach for Text Summarization Using Advanced Techniques of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shruti J. Sapra, Shruti A. Thakur, and Avinash S. Kapse Lane Vehicle Detection and Tracking Algorithm Based on Sliding Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Rajakumar, M. Charan, R. Pandian, T. Prem Jacob, A. Pravin, and P. Indumathi
895
905
A Survey on Automated Text Summarization System for Indian Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Kadam Vaishali, B. Khandale Kalpana, and C. Namrata Mahender
921
A Dynamic Packet Scheduling Algorithm Based on Active Flows for Enhancing the Performance of Internet Traffic . . . . . . . . . . . . . . . . . . . Y. Suresh, J. Senthilkumar, and V. Mohanraj
943
Automated Evaluation of Short Answers: a Systematic Review . . . . . . . Shweta Patil and Krishnakant P. Adhiya
953
Interactive Agricultural Chatbot Based on Deep Learning . . . . . . . . . . . . S. Suman and Jalesh Kumar
965
Analytical Study of YOLO and Its Various Versions in Crowd Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ruchika, Ravindra Kumar Purwar, and Shailesh Verma
975
Contents
IoT Enabled Elderly Monitoring System and the Role of Privacy Preservation Frameworks in e-health Applications . . . . . . . . . . . . . . . . . . Vidyadhar Jinnappa Aski, Vijaypal Singh Dhaka, Sunil Kumar, and Anubha Parashar
xix
991
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007 Kavita Bhagat and Ashish Suri Multi-Class Detection of Skin Disease: Detection Using HOG and CNN Hybrid Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025 K. Babna, Arun T. Nair, and K. S. Haritha DeepFake Creation and Detection Using LSTM, ResNext . . . . . . . . . . . . 1039 Dhruti Patel, Juhie Motiani, Anjali Patel, and Mohammed Husain Bohara Classification of Plant Seedling Using Deep Learning Techniques . . . . . 1053 K. S. Kalaivani, C. S. Kanimozhiselvi, N. Priyadharshini, S. Nivedhashri, and R. Nandhini A Robust Authentication and Authorization System Powered by Deep Learning and Incorporating Hand Signals . . . . . . . . . . . . . . . . . . 1061 Suresh Palarimath, N. R. Wilfred Blessing, T. Sujatha, M. Pyingkodi, Bernard H. Ugalde, and Roopa Devi Palarimath Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1073
About the Editors
Dr. D. Jude Hemanth received his B.E. degree in ECE from Bharathiar University in 2002, M.E. degree in communication systems from Anna University in 2006 and Ph.D. from Karunya University in 2013. His research areas include Computational Intelligence and Image processing. He has authored more than 120 research papers in reputed SCIE indexed International Journals and Scopus indexed International Conferences. His Cumulative Impact Factor is more than 150. He has published 33 edited books with reputed publishers such as Elsevier, Springer and IET. Dr. Danilo Pelusi received the Ph.D. degree in Computational Astrophysics from the University of Teramo, Italy. Associate Professor at the Faculty of Communication Sciences, University of Teramo, he is an Associate Editor of IEEE Transactions on Emerging Topics in Computational Intelligence, IEEE Access, International Journal of Machine Learning and Cybernetics (Springer) and Array (Elsevier). Guest editor for Elsevier, Springer and Inderscience journals, he served as program member of many conferences and as editorial board member of many journals. His research interests include Fuzzy Logic, Neural Networks, Information Theory and Evolutionary Algorithms. Dr. Chandrasekar Vuppalapati is a Software IT Executive with diverse experience in Software Technologies, Enterprise Software Architectures, Cloud Computing, Big Data Business Analytics, Internet of Things (IoT), and Software Product and Program Management. Chandra held engineering and Product leadership roles at GE Healthcare, Cisco Systems, Samsung, Deloitte, St. Jude Medical, and Lucent Technologies, Bell Laboratories Company. Chandra teaches Software Engineering, Mobile Computing, Cloud Technologies, and Web and Data Mining for Master’s program in San Jose State University. Additionally, Chandra held market research, strategy and technology architecture advisory roles in Cisco Systems, Lam Research and performed Principal Investigator role for Valley School of Nursing where he connected Nursing Educators and Students with Virtual Reality technologies. Chandra has functioned as Chair in numerous technology and advanced computing conferences such as: IEEE Oxford, UK, IEEE Big Data Services 2017, San Francisco xxi
xxii
About the Editors
USA and Future of Information and Communication Conference 2018, Singapore. Chandra graduated from San Jose State University Master’s Program, specializing in Software Engineering, and completed his Master of Business Administration from Santa Clara University, Santa Clara, California, USA.
An Optimized Convolutional Neural Network Model for Wild Animals Detection Using Filtering Techniques and Different Opacity Levels Pavan Nageswar Reddy Bodavarapu, T. Ashish Narayan, and P. V. V. S. Srinivas Abstract Despite the fact that there are numerous ways for object identification, these techniques under-perform in real-world conditions. For example, heavy rains and fog at night. As a result, this research work has devised a new convolutional neural network for identifying animals in low-light environments. In the proposed system, images of different animals (containing both domestic and wild animals) are collected from various resources in the form of images and videos. The overall number of samples in the dataset is 2300; however, because convolutional neural networks require more samples for training, a few data augmentation techniques are employed to raise the number of samples in the dataset to 6700. Horizontal flip, rotation, and padding are the data augmentation techniques. The proposed model has achieved an accuracy of 0.72 on the testing set and 0.88 on training set, respectively, without applying the edge detection techniques. The proposed model has achieved 0.81 accuracy after using Canny edge detection technique on animal dataset for outperforming the state-of-the-art models with ResNet-50 and EfficientNet-B7. Keywords Object detection · Edge detection · Convolutional neural network · Deep learning · Animal detection
1 Introduction There is a lot of research happening on in the field of object detection. Deep learning and computer vision have now produced incredible results for detecting various classes of objects in a given image. Recent advances in this domain assisted us in creating bounding boxes around the objects. Future developments in this sector may benefit visually impaired people [1, 2]. The most frequent strategy used in computer vision techniques for object detection is to transform all color images into grayscale format and subsequently into binary image. A later region convolutional neural network is constructed to outperform the standard computer vision algorithms P. N. R. Bodavarapu (B) · T. A. Narayan · P. V. V. S. Srinivas Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Guntur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_1
1
2
P. N. R. Bodavarapu et al.
in terms of accuracy [3, 4]. Object detection in remote sensing images is quite a bit of challenge. The major difference between natural image and remote sensing image is the size of the object, which is small in remote sensing images when compared to the background of natural image. This becomes hard to detect the object in the remote sensing images. Further challenge is that the remote sensing images are only top-down view [5]. The feature pyramids can generate segment proposals for object detection. The applications that can be combined with feature pyramids for object detection are: (1) regional proposal network and (2) fast R-CNN [6, 7]. It is difficult to detect items in images that do not include labeled detection data. However, YOLO9000 assists in detecting item types that do not contain label data. YOLO9000 can function in a real-time environment while performing these difficult tasks [8]. Unmanned aircraft systems play an important role in wild animal survey for managing a better ecosystem. Going through all these aerial photographs for manual detection of wild animals is a difficult task. Employing deep learning methods helps to detect wild animals more accurately with reduced time consumption. The steps involved for detecting wild animals in aerial images are: (1) image collection, (2) image splitting, (3) image labeling, (4) dividing data into train and test sets, (5) training and validation, and (6) testing [9]. The advantages of using ReLU activation function are the computations are cheaper and converge faster. The major objective, the vanishing gradient effect, may be addressed with the ReLU activation function. The error percentage of deep convolutional neural network (DCNN) with ReLU activation function is 0.8%, which is more advantageous than sigmoid and tanh activation functions, since the error percentage is 1.15 and 1.12, respectively, on MNIST dataset [10, 11]. Vehicle collisions with animals cause traffic accidents, resulting in injuries and deaths for both humans and animals. When the vehicle’s speed exceeds 35 kmph, the driver has a more difficult time avoiding a collision with an animal since the distance between the car and the animal is shorter. The influence of humans in road accidents is nearly 92%. Vehicle collision with animals can be categorized into direct and indirect collisions [12]. Convolutional neural network can be significantly effected (increase or decrease) by various techniques, namely (1) weighted residual connections, (2) cross stage partial connections, (3) cross mini-batch normalization, (4) self-adversarial training, (5) mish activation, (6) mosaic data augmentation, and (7) DropBlock regularization. The training of the object detection can be improved by (1) activation functions, (2) data augmentation, (3) regularization methods, and (4) skip connections [13, 14]. The steps involved in classifying the wild animals in video footage are: (1) input video, (2) background subtraction, (3) difference clearing, (4) calculate energy, (5) average variation, and (6) classification [15, 16]. The important contributions made in this research paper can be outlined as: (1) devised a novel convolutional neural network, (2) applied various edge detection techniques, (3) experimented on different opacity levels, and (4) compared all the results and provided valid conclusion. This research is based on animal detection in an image that has been taken in low-light conditions, and we collected different animals (both domestic and wild animals) from many resources in form of images and videos. These videos include several animals; we split the recordings into frames using a Python
An Optimized Convolutional Neural Network …
3
script, and relevant images were chosen and grouped into their respective directories. The size of the dataset is 2300 samples, since the convolutional neural networks need more number of samples for training. Also, few data augmentation techniques are used to increase the size of dataset to 6700 samples. The data augmentation techniques used here are horizontal flip, rotation, and padding. The proposed model contains four convolutional layers, two batch normalization layers, two maxpooling layers, and two dropout layers. The activation function used at convolutional layers is rectified linear unit (ReLU), and the activation function that has been used in output layer is softmax. The learning rate and weight decay used in this work are 0.0001 and 1e−4, respectively. The proposed model is then trained for 100 epochs with batch size 32.
2 Related Work Fu et al. [17] proposed a framework “deepside” to integrate the convolutional neural network features and also proposed a new fusion method to integrate different outputs from the network; this helps to obtain high-precision borderline details of objects. The deepside framework contains VGG-16 and backbone of deep side structures. The learning rate used here is 10–9 with a batch size of 1 in this research. The proposed framework is evaluated on various datasets. The inference time for linear deepside framework is 0.08 s. Similarly, the inference time for nonlinear deepside framework is 0.07 s. Hou et al. [18] have proposed a novel saliency method for salient object detection. This framework is designed by including short connections within the holistically nested edge detector. This approach is tested and evaluated on five various salient object detection benchmarks. The learning rate and weight decay used in this approach are 1e−8 and 0.0005, respectively. The total time taken to train this model is 8 h. The processing time for each image is 0.08 s. The data augmentation approach has helped to increase the performance of novel saliency method by 0.5%. Jia et al. [19] have proposed a salient object detection method, which can pass back global information much efficiently. The proposed method has obtained state-of-the-art results. VOC dataset and ImageNet are mixed to form a new dataset for salient object detection. The proposed method has obtained an F-measure of 0.87 on PASCAL-S dataset. Ren et al. [20] have proposed a network, which is a fully convolutional network to estimate object bounds and scores at every part. The frame rate of the proposed method on GPU is 5 frames per second (fps). Liu et al. [21] have proposed a model for region detection in order to increase the region detection performance. The author has employed center saliency, center, background saliency, and foreground that are combined in this method to make it more efficient. The runtime of the proposed model on the MSRA-1000 dataset is 0.2 s/image. This model with 20 color superpixels can detect the important objects in an image even though they touch the image boundary. Yu et al. [22] have proposed an algorithm for detecting the moving objects. The classifier used in this algorithm is Haar cascade classifier. The frame rate of the
4
P. N. R. Bodavarapu et al.
proposed method before adding the recognition algorithm is 43 fps, and the frame rate after adding the recognition algorithm is 36 fps. Othman et al. [23] have proposed a system for object detection in real time, which can run at high frames per second (fps). The author has used MobileNet architecture combined with a single shot detector, which is trained on framework Caffe. To implement this model, Raspberry Pi 3 is used for obtaining high frames per second, where movidius neural compute stich is used. Data augmentation is used, since the convolutional neural networks need large number of samples for training. This method on the Raspberry Pi 3 CPU has obtained 0.5 frames per second. Gasparovsky et al. [24] have discussed about the importance of outdoor lighting and the factors affecting it. The outdoor lighting depends on various conditions, namely (1) season, (2) time of day, and (3) no. of buildings and population of the area. Guo et al. [25] have proposed a neural network for two sub-tasks: (1) region proposals and (2) object classification. RefineNet is included after the region proposal network in the region proposals section for the best region suggestions. On the PASCAL VOC dataset, the proposed technique is tested. After analyzing the results, the author has explained that the fully connected layer with softmax layer must be fine-tuned. The proposed model on PASCAL VOC dataset has achieved 71.6% mAP. The state-of-the-art model R-CNN has obtained 66.0% mAP on PASCAL VOC dataset. The results clearly indicate that the proposed method performs significantly better than the R-CNN. Guo et al. [26] have proposed a convolutional neural network for object detection, which does not use region proposals for object detection. For detecting the objects, DarkNet is transferred to a fully convolutional network, and later, it is fine-tuned. The region proposal system is not effective in real time, since they take more run time. The proposed model has obtained 73.2% mAP, while fast R-CNN and faster R-CNN obtained 68.4% and 70.4% mAP, respectively.
3 Proposed Work 3.1 Dataset Description This study is focused on detecting animals in images shot in low-light settings. We collected numerous animals (both domestic and wild animals) from various resources in the form of images and videos. These movies include several animals; we split the recordings into frames using a Python script, and relevant images were chosen and grouped into their corresponding folders. The size of the dataset is 2300 samples, since the convolutional neural networks need more number of samples for training, and we used few data augmentation techniques to increase the number size of dataset to 6700 samples. The data augmentation techniques used are horizontal flip, rotation, and padding. After data augmentation, the dataset is divided in the ratio of 80:20, where 80% (5360 images) are used for training and 20% (1340 images) are used for evaluating (testing).
An Optimized Convolutional Neural Network …
5
3.2 Edge Detection Techniques The technique of finding boundaries of an image is called edge detection. This technique can be employed in various real-world applications like autonomous cars and unmanned aerial vehicles. The edge detection techniques help us to decrease the computation and processing time of data while training the deep learning model. There are several edge detection approaches; in this study, we utilize Canny edge, Laplacian edge, Sobel edge, and Prewitt edge detection.
3.2.1
Canny Edge Detection
Canny edge detection is used for finding many edges in images. The edges detected in this image will generally have high local maxima of gradient magnitude. This technique decreases the probability of not finding an edge in the image. The steps involved in this technique are, namely (1) smoothing, (2) find gradients, (3) non-max suppression, (4) thresholding, (5) edge tracking, and (6) output. Canny edge detection equation: Edge_gradient =
G 2x + G 2y
Angle(θ ) = tan−1
3.2.2
(1)
Gy Gx
Laplacian Edge Detection
Laplacian is the second derivative mask, which is susceptible to noise. If an image contains noise, Laplacian edge detection is not preferred, which is a major drawback of this technique. In order to use this technique, we need to follow certain steps if the image is containing noise. The first step is to cancel the noise by using denoising filters and then applying Laplacian filter to the corresponding image. Laplacian edge detection equation: ∇2 f =
∂2 f ∂2 f + ∂x2 ∂ y2
(2)
6
3.2.3
P. N. R. Bodavarapu et al.
Sobel Edge Detection
Sobel edge detection finds the edges, where gradient of image is very high. Unlike Canny edge detection, Sobel edge detection does not generate smooth edges, and also, the number of edges produced by Sobel edge detection is less than Canny edge detection. Sobel edge detection equation: M=
S2x + S2y
(3)
where Sx = (a2 + ca3 + a4 ) − (a0 + ca7 + a6 ) S y = (a0 + ca1 + a2 ) − (a6 + ca6 + a4 ) with constant c = 2.
3.2.4
Prewitt Edge Detection
Prewitt edge detection is used for finding vertical and horizontal edges in images. The Prewitt edge detection technique is fast when compared to Canny edge and Sobel edge techniques. For determining the magnitude and edge detection, it is considered as one of the best techniques. Prewitt edge detection equation: M=
S2x + S2y
(4)
where Sx = (a2 + ca3 + a4 ) − (a0 + ca7 + a6 ) S y = (a0 + ca1 + a2 ) − (a6 + ca6 + a4 ) with constant c = 1.
3.3 Algorithm Step 1: Input the animal dataset containing different animal images. Step 2: At first, convert all the images into JPG format and then the RGB color images are converted into grayscale format. Step 3: Secondly, resize all the corresponding grayscale images to 48X48 pixels. Step 4: Now, all the corresponding images are selected, and further, Canny edge detection technique is applied.
An Optimized Convolutional Neural Network …
7
Step 5: Then, the same process is repeated with Laplacian, Sobel, and Prewitt edge detection techniques. Step 6: Later, all the datasets are divided in the ratio of 80:20 for training and evaluation. Step 7: The proposed model and different state-of-the-art models are trained on the train set and tested on the test set, respectively. Step 8: Lastly, the performance metrics of different models are displayed based on the train and test sets.
4 Experimental Results 4.1 Performance of Various Models on Wild Animal Dataset See Table 1 and Figs. 1, 2, and 3. Table 1 shows the accuracy and loss comparison of different deep learning models with proposed model on animal dataset. The proposed model has achieved an accuracy of 0.72 on the testing set and 0.88 on training set. The state-of-the-art models ResNet-50 and EfficientNet-B7 are also trained and tested on the same dataset. The ResNet-50 model has achieved 0.56 accuracy, and EfficientNet-B7 achieved 0.64 accuracy on the test sets, respectively. The train loss of the proposed model is 0.26, and the test loss is 0.64, whereas the train and test losses of ResNet-50 are 1.48 and 2.69, respectively. The train and test losses of EfficientNet-B7 are 1.31 and 1.92, respectively. Here, we can clearly say that the proposed model is performing better than ResNet-50 and EfficientNet-B7 in terms of accuracy and loss of train set and test set. The train accuracy of proposed model is 0.88, which indicates that the proposed model is extracting important features for detecting animals in images, when compared to ResNet-50 and EfficientNet-B7, whose train accuracy is 0.70 and 0.76, respectively. The EfficientNet-B7 is performing better than the ResNet-50, whose train and test accuracies are significantly higher than ResNet-50 but less than the proposed model. After analyzing all the results, the proposed model is outperforming the state-of-the-art models ResNet-50 and EfficientNet-B7. The proposed model is able to achieve this high accuracy than the other two models is because it is able to detect the animals in low-light conditions like images taken during night Table 1 Outline of accuracy and loss of different models S. no.
Model name
Train accuracy
Test accuracy
Train loss
Test loss
1
ResNet-50
0.70
0.56
1.48
2.69
2
EfficientNet-B7
0.76
0.64
1.31
1.92
3
Proposed model
0.88
0.72
0.26
0.64
8
P. N. R. Bodavarapu et al.
Fig. 1 Accuracy and loss of ResNet-50
or during heavy fog. Below are the sample images that the proposed model is able to detect the animals in night and fog conditions (Fig. 4).
4.2 Performance of Proposed Model After Applying Different Edge Detecting Techniques See Table 2 and Figs. 5, 6, 7, and 8. The edge detection techniques help us to decrease the computation and processing time of data during training the deep learning model. There are different edge detecting techniques, and we use Canny edge detection, Laplacian edge detection, Sobel edge detection, and Prewitt edge detection in this work. The proposed model achieved 0.81 accuracy after using Canny edge detection technique on animal dataset. Similarly, the proposed model achieved 0.68, 0.68, and 0.65 accuracies when Laplacian, Sobel, and Prewitt edge detection techniques are applied, respectively. The results show that Canny edge technique is performing better than the remaining techniques on animal dataset. The train accuracy of proposed model after applying Canny edge detection is 0.92. The proposed model achieved 0.72 accuracy on animal
An Optimized Convolutional Neural Network …
9
Fig. 2 Accuracy and loss of EfficientNet-B7
dataset without any edge detection techniques, but after using Canny edge technique, the proposed model achieved an accuracy of 0.81 on animal dataset. There is a significant increase in train accuracy and test accuracy of proposed model, when Canny edge detection is applied. The model seems to perform better when Canny edge detection is applied on animal dataset. The proposed model achieved less accuracy when Prewitt edge detection is used. When all the above four edge detection techniques are compared, Canny edge detection is better, since it is achieving high accuracy than other edge detection techniques. The results clearly suggest that the proposed model’s accuracy on both train and test sets significantly improved when Canny edge detection is used.
10
P. N. R. Bodavarapu et al.
Fig. 3 Accuracy and loss of proposed model
Fig. 4 Animals detected by proposed model during night and fog conditions Table 2 Outline of accuracy and loss of proposed model after applying edge detection techniques S. no.
Technique
Train accuracy
Test accuracy
Train loss
Test loss
1
Canny
0.92
0.81
0.26
1.05
2
Laplacian
0.88
0.68
0.35
1.16
3
Sobel
0.90
0.68
0.31
1.60
4
Prewitt
0.81
0.65
1.13
2.01
An Optimized Convolutional Neural Network … Fig. 5 Accuracy and loss using Canny edge
Fig. 6 Accuracy and loss using Laplacian edge
11
12 Fig. 7 Accuracy and loss using Sobel edge
Fig. 8 Accuracy and loss using Prewitt edge
P. N. R. Bodavarapu et al.
An Optimized Convolutional Neural Network …
13
Table 3 Accuracy of proposed model for various opacity levels on animal dataset S. no.
Opacity level
No. of actual animals
No. of detected animals
Accuracy
1
1.0
7
7
100
2
0.9
7
6
85.7
3
0.7
7
5
71.4
4
0.5
7
3
42.8
5
0.3
7
0
0
6
0.1
7
0
0
4.3 Performance of Proposed Model on Different Opacity Levels Table 3 illustrates the accuracy of proposed model at different opacity levels. When the opacity level is 1, the proposed model has detected every object in the image and obtained 100% accuracy. We next reduced the opacity level to 0.9, and the suggested model obtained 85.7% accuracy, detecting 6 items out of 7 correctly. When the opacity level is adjusted to 0.5, the suggested model’s accuracy is 42.8, which means it detected only three items out of seven. When the opacity levels are 0.3 and 0.1, the accuracy of model is 0, that is, it did not detect any of the 7 objects in the image. Here, we can see that the accuracy of the model is decreasing as the opacity levels decrease. This shows that light is very important factor in an image for object detection. The future work of this research is to develop a system, which can work better at opacity levels less than 0.5. The drawback of the traditional models and the proposed model is that they do not perform well under the 0.5 opacity levels. Below is the sample number of objects detected in image by proposed model, when the opacity level is 1.0.
5 Conclusion In the proposed system, images of various kind of animals (containing both domestic and wild animals) are collected from many resources in form of images and videos. All the videos are then divided into frames using Python script, and appropriate images are selected. The size of the dataset is 2300 samples, since the convolutional neural networks require more number of samples for training, and we used few data augmentation techniques to increase the number size of dataset to 6700 samples. The data augmentation techniques used are horizontal flip, rotation, and padding. After the data augmentation, the dataset is divided in the ratio of 80:20, where 80% (5360 images) are used for training and 20% (1340 images) are used for evaluating (testing). The proposed model achieves an accuracy of 0.72 on the testing set and 0.88 on training set, respectively, without applying edge detection techniques. The
14
P. N. R. Bodavarapu et al.
proposed model achieved 0.81 accuracy after using Canny edge technique on animal dataset. Similarly, the proposed model achieved 0.68, 0.68, and 0.65 accuracies when Laplacian, Sobel, and Prewitt edge detection techniques are applied, respectively. The results clearly suggest that the proposed model’s accuracy on both train and test sets significantly improved when Canny edge detection is used, and it is outperforming the state-of-the-art models ResNet-50 and EfficientNet-B7.
References 1. Nasreen J, Arif W, Shaikh AA, Muhammad Y, Abdullah M (2019) Object detection and narrator for visually impaired people. In: 2019 IEEE 6th international conference on engineering technologies and applied sciences (ICETAS). IEEE, pp 1–4 2. Mandhala VN, Bhattacharyya D, Vamsi B, Thirupathi Rao N (2020) Object detection using machine learning for visually ımpaired people. Int J Curr Res Rev 12(20):157–167 3. Zou X (2019) A review of object detection techniques. In: 2019 International conference on smart grid and electrical automation (ICSGEA). IEEE, pp 251–254 4. Gullapelly A, Banik BG (2020) Exploring the techniques for object detection, classification, and tracking in video surveillance for crowd analysis 5. Chen Z, Zhang T, Ouyang C (2018) End-to-end airplane detection using transfer learning in remote sensing images. Remote Sens 10(1):139 6. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017)Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125 7. Kumar KR, Prakash VB, Shyam V, Kumar MA (2016) Texture and shape based object detection strategies. Indian J Sci Technol 9(30):1–4 8. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271 9. Peng J, Wang D, Liao X, Shao Q, Sun Z, Yue H, Ye H (2020) Wild animal survey using UAS imagery and deep learning: modified Faster R-CNN for kiang detection in Tibetan Plateau. ISPRS J Photogramm Remote Sens 169:364–376 10. Ding B, Qian H, Zhou J (2018) Activation functions and their characteristics in deep neural networks. In: 2018 Chinese control and decision conference (CCDC). https://doi.org/10.1109/ ccdc.2018.8407425 11. NarasingaRao MR, Venkatesh Prasad V, Sai Teja P, Zindavali Md, Phanindra Reddy O (2018) A survey on prevention of overfitting in convolution neural networks using machine learning techniques. Int J Eng Technol 7(2.32):177–180 12. Sharma SU, Shah DJ (2016) A practical animal detection and collision avoidance system using computer vision technique. IEEE Access 5:347–358 13. Bochkovskiy A, Wang CY, Liao HY (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 14. Krishnaveni G, Bhavani BL, Lakshmi NV (2019) An enhanced approach for object detection using wavelet based neural network. J Phys Conf Ser 1228(1):012032. IOP Publishing 15. Chen R, Little R, Mihaylova L, Delahay R, Cox R (2019) Wildlife surveillance using deep learning methods. Ecol Evol 9(17):9453–9466 16. Chowdary MK, Babu SS, Babu SS, Khan H (2013) FPGA implementation of moving object detection in frames by using background subtraction algorithm. In: 2013 International conference on communication and signal processing. IEEE, pp 1032–1036 17. Fu K, Zhao Q, Gu IY, Yang J (2019) Deepside: a general deep framework for salient object detection. Neurocomputing 356:69–82
An Optimized Convolutional Neural Network …
15
18. Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2017)Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3203–3212 19. Jia S, Bruce NDB (2019) Richer and deeper supervision network for salient object detection. arXiv preprint arXiv:1901.02425 20. Ren S, He K, Girshick R, Sun J (2015)Faster r-cnn: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 21. Liu G-H, Yang J-Y (2018) Exploiting color volume and color difference for salient region detection. IEEE Trans Image Process 28(1):6–16 22. Yu L, Sun W, Wang H, Wang Q, Liu C (2018)The design of single moving object detection and recognition system based on OpenCV. In: 2018 IEEE international conference on mechatronics and automation (ICMA). IEEE, pp 1163–1168 23. Othman NA, Aydin I (2018)A new deep learning application based on movidius ncs for embedded object detection and recognition. In: 2018 2nd international symposium on multidisciplinary studies and innovative technologies (ISMSIT). IEEE, pp 1–5 24. Gasparovsky D (2018) Directions of research and standardization in the field of outdoor lighting. In: 2018 VII. lighting conference of the Visegrad Countries (Lumen V4). IEEE, pp 1–7 25. Guo Y, Guo X, Jiang Z, Zhou Y (2017)Cascaded convolutional neural networks for object detection. In: 2017 IEEE visual communications and image processing (VCIP). IEEE, pp 1–4 26. Guo Y, Guo X, Jiang Z, Men A, Zhou Y (2017) Real-time object detection by a multi-feature fully convolutional network. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 670–674
A Study on Current Research and Challenges in Attribute-based Access Control Model K. Vijayalakshmi and V. Jayalakshmi
Abstract Access control models are used to identify and detect anonymous users or attacks when sharing big data or other resources in the distributed environment such as cloud, edge, and fog computing. The attribute-based access control model (ABAC) is a promising model used in intrusion detection systems. Comparing with the primary access control models: discretionary access control model (DAC), mandatory access control model (MAC), and role-based access control model, ABAC gets attention in the current research due to its flexibility, efficiency, and granularity. Despite ABAC is performing well in addressing the security requirements of today’s computing technologies, there are open challenges such as policy errors, scalability, delegations, and policy representation with heterogeneous datasets. This paper presents the fundamental concepts of ABAC and a review of current research works toward framing efficient ABAC models. This paper identifies and discusses the current challenges in ABAC based on the study and analysis of the surveyed works. Keywords Access control models · Attribute-based access control model · Cloud computing · Big data · DAC · Intrusion detection system · MAC · RBAC
1 Introduction The intrusion detection system (IDS) is a software and protection mechanism used in the security system to monitor, identify, and detect anonymous users’ attacks. The primary roles of IDS are monitoring all incidents, watching logging information, and reporting illegal attempts [1]. The increased quantity of malicious software gives K. Vijayalakshmi (B) Vels Institute of Science, Technology and Advanced Studies, Chennai, India Arignar Anna Govt. Arts College, Cheyyar, India V. Jayalakshmi School of Computing Sciences, Vels Institute of Science, Technology and Advanced Studies, VISTAS, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_2
17
18
K. Vijayalakshmi and V. Jayalakshmi
dangerous challenges for researchers in designing efficient IDS. And also, there are more security threats such as denial of service, data loss, data leakage, loss of data confidentiality in the connected information technology. Hence, the security is an important issue, and the design of efficient IDS is also a challenging task [2]. The formal definition of IDS is introduced in 1980. IDS is mainly classified as misuse-IDS and anomaly-based IDS. The misuse-IDS uses recognized patterns to detect illegal access. The possible harmful and suspicious activities are stored as patterns in the database. Based on the recognized patterns, this misuse-IDS monitors and detects illegal activities. The anomaly-based IDS uses network behavior as a key to detect the anonymous user or attacks. Thus, if network behavior is up to the predefined behavior, then the access is granted; otherwise, the anomaly-based IDS generates the alerts [3]. IDS uses access control models as an agent and the analysis engine to monitor and identify the sign of intrusion [4]. The traditional IDS fails in addressing the security attacks of today’s computing technologies like cloud computing, edge computing, fog computing, and the Internet of Things (IoT). With the development of the Internet and the usage of social networks, the resources and the users are increasing exponentially. And, the attacks and security threats are also increased day by day. Developing the IDS to meet all the security needs is a big challenge [5]. IDS implements an access control model to monitor and detect malicious intrusion. The implementation of a flexible and efficient access control model is an important task for addressing todays’ complex security needs [6]. The access control model is a software function and established with a set of security policies. The three main operations of the access control model are authentication, authorization, and accountability. Authentication is the process of identifying the legal users based on the proof of identity. The function of authorization is deciding whether to allow or deny the request of the user. Accountability is the task of monitoring users’ activity within the system and logging the information of these activities for further auditing [7]. Thus, access control model allows or denies the request of the user based on the security policies. Many access control models have been proposed, and some got great success in addressing the security needs, while some fail [8]. The discretionary access control model (DAC) uses the access control list (ACL) for each shared resource that specifies the access rights for the users [9]. DAC is a owners’ discretionary model; thus, it allows the owner of the resource to create the ACL for his resource. The mandatory access control model (MAC) uses security labels for both user and the resource. MAC identifies the legal access or user based on the security labels [10]. Both DAC and MAC give better performance when the number of users and resources is limited. They failed in addressing the security issues of todays’ complex computing technologies. The role-based access control model (RBAC) is proposed to address security attacks in large-scale applications [11, 12]. RBAC establishes two mappings: permissions-role and role-user. RBAC first assigns all feasible access rights (permissions) to the role (job) of the user and then it assigns the role to the user. Hence, the user can get the access rights up to the limit of his role. Many versions of RBAC with the new technical concept have been proposed to refine and improve the efficiency of the model [13, 14]. Despite RBAC is performing well, there are some limitations like poor expressive power of policies and inability to address
A Study on Current Research and Challenges …
19
the dynamic and complex security needs of today’s computing technologies. The attribute access control model (ABAC) is promising in addressing well-developed and complex security attacks and threats [15]. The most challenging security attacks are denial of service, account hijacking, data breach, and data loss [16, 17]. ABAC identifies and allows legal activities based on security policies. The security policy is a set of security rules, and a rule is constructed with the attributes of the subject (who requests the access), resource, and environmental conditions [18, 19]. Despite ABAC meets complex security needs, and some open challenges affect the performance and efficiency of the model [20, 21]. In this paper, we described the basic concepts of ABAC and presented a review of current research works toward framing efficient ABAC models. And also, we identified and analyzed the important challenges of policy errors, scalability, delegations, and policy representation with heterogeneous datasets. Section 2 presents the related research works toward developing the ABAC model. Section 3 describes the fundamental concepts of the ABAC model and presents a review on ABAC models. Section 4 categorizes and discusses the current research works in ABAC. Section 5 discusses the current open challenges in designing efficient ABAC model. Finally, we concluded in Sect. 6.
2 Literature Survey Heitor Henrique and his team designed and implemented the access control model to overcome the security problems in federated clouds (interconnected clouds). They experimented in bioinformatics applications [22]. Muhammad Umar Aftab proposed a hybrid access control model by combining the strengths and features of ABAC and RBAC and removing the limitations of these two models. This hybrid model has the features of the policy-structuring ability of ABAC and the high-security power of RBAC [23]. Jian Shu Lianghong Shi proposed an extended access control model by introducing action based on ABAC. The usage of multi-attributes with complex structures is avoided in this model, and also, this model resolves the issues in dynamic authorization and changes of access rights [24]. Bai and Zheng have done a survey on access control models and provided a detailed analysis of the access control models through the research on access control matrix, access control list, and policies [25]. Xin Jin and Ram Krishnan have proposed a hybrid ABAC model called ABACα which can easily be configured to other dominating access control models DAC, MAC, and ABAC. Thus, ABACα combines the strengths and features of DAC, MAC, RBAC, and ABAC [26]. Canh Ngo and his team proposed a new version of ABAC by incorporating complex security mechanisms for multi-tenant cloud services. They extended their model for inter-cloud platforms [27]. Riaz Ahmed Shaikh proposed a data classification method for ABAC policy validation. He proposed the algorithms for detecting and resolving rule inconsistency and rule redundancy anomalies [28]. Daniel Servos and Sylvia L. Osborn gave a detailed review of ABAC and discussed the current challenges in ABAC [29]. Maryem Ait El Hadj and his team proposed a cluster-based approach for detecting and resolving anomalies in ABAC security
20
K. Vijayalakshmi and V. Jayalakshmi
policies [30]. Harsha S. Gardiyawasam Pussewalage and Vladimir A. Oleshchuk proposed a flexible, fine-grained ABAC model by incorporating the access delegation features for e-health care platforms [31]. Majid Afshar and his team proposed a framework for ABAC for health care policies [32]. Xing bing Fu proposed the new ABAC scheme with large universe attributes and the feature of efficient encryption for addressing the security requirements of the cloud storage system [33]. Table 1 illustrates the analysis of the current research on the ABAC model. Figure 1 shows the flow diagram of the literature survey. Edelberto Franco Silva and his team proposed an extended framework called ACROSS for authentication and authorization. They developed this framework based on the policies and attributes of virtual organizations. Thus, this framework is developed for addressing the security issues in virtual organization platforms [34]. Hui Qi and his team proposed a hybrid model called role and attribute-based access control (RABAC) by incorporating the efficiencies of both RBAC and ABAC. RABAC has the capability of static relationships of RBAC (permission-role and role-user mappings) and the dynamic ABAC policies [6]. Maryem Ait El Hadj proposed an approach for clustering ABAC policies and algorithms for detecting and resolving policy errors [35]. Youcef Imine and his team proposed a flexible and fine-grained ABAC scheme to improve the security level of the model. This novel ABAC scheme also gave the solution for revocation problems like removing the users or some attributes in the system and preventing the user from getting access [36]. Mahendra Pratap Singh and his team proposed an approach and gave solutions for converging, specifying, enforcing, and managing other access control models’ security policies with ABAC policies [37]. Charles Morisset proposed an extended ABAC framework for evaluating missing information in the ABAC security policies with the use of binary decision data structures [38]. In our previous research, we proposed a priority-based clustering approach to cluster the ABAC policies before the policy validation [39]. We have done a review on access control models and analyzed the access control models based on the study on the previous researches [40, 41].
3 Attribute-based Access Control Model 3.1 Background of ABAC ABAC model is a software and protection mechanism to monitor and identify the intrusion of malicious users [42]. ABAC is established with a set of security policies. The decision on the users’ requests is made based on the specified policy set. Each ABAC policy is a set of security rules. ABAC allows or denies the request for accessing the shared resource based on the security rules. Thus, ABAC model allows only the legitimate users by checking their identity in two gates [43]. The first gate is a traditional authentication process that verifies the common identities of the users like username, password, and date of birth. The second gate is the ABAC model that
A Study on Current Research and Challenges …
21
Table 1 Analysis of the research on ABAC model References
Technique
Heitor Henrique et al. [22]
Access control model Improved security for for federated clouds interconnected clouds
Efficiency
Limitations Increases the complexity of developing an efficient access control model
Muhammad et al. [23]
Hybrid access control Combined features of model ABAC and RBAC
The poor expressive power of policies Increases the complexity of managing attributes dynamically
Shu et al. [24]
Extended ABAC model
Jin et al. [26]
Hybrid access control Easily compromised model ABACα and configured with the primary access control models DAC, MAC, and RBAC
Fails to address the inefficiencies of the configured access control models
Ngo et al. [27]
A new version of ABAC with complex security mechanisms
Efficient in multi-tenant clouds and inter-cloud platforms
Implementing the model with complex security requirements is a difficult task
Shaikh et al. [28]
ABAC policy validation using data classification method
Identified and resolved the anomalies rule inconsistency and rule redundancy
Not concentrated on all anomalies like conflict-demand, rule discrepancy
Ait et al. [30]
Policy validation using a cluster-based approach
Policy validation with reduced computation time
Generated more clusters and increases complexity and cost
Introduced Degrades the action-based ABAC efficiency of the Avoids the complexity authorization process of attributes
Pussewalage et al. [31] Fine-grained ABAC with delegation feature
Efficient in health care The delegation feature platforms may cause critical security issues
Afshar et al. [32]
A framework for ABAC
Efficient in expressing The scope is limited to health care policies the health care platform
Fu et al. [33]
ABAC model with large universe attributes
Implemented efficient encryption
Edelberto et al. [34]
ACROSS —an extended framework based on virtual organizations
Efficient authorization Increased complexity and authentication in implementing the feature model
Increased the complexity of managing large universe attributes
(continued)
22
K. Vijayalakshmi and V. Jayalakshmi
Table 1 (continued) References
Technique
Qi et al. [6]
A hybrid model with Efficient in managing the features of RBAC static relationships and ABAC and dynamic ABAC policies
Efficiency
Limitations High computation time and complex implementation
Search Articles related to ABAC models
Various databases such as Springer, Elsevier, IEEE, Google Scholar, etc. Download
500 Articles related to ABAC models were downloaded
100 Articles unrelated to ABAC models were excluded
Excluded Categorized and filtered based on ABAC concepts
Taken for review process 52 Articles related to ABAC research and current challenges are identified
A review is conducted and current research and challenges in ABAC are identified and discussed Fig. 1 Flow diagram of literature survey
verifies the users with more attributes like name of department, designation, resource name, resource type, time. The common jargons in ABAC models are as follows: Subject: The user who requests access to a shared resource is called the subject. The subject may be a person (user), process, application, or organization. Subject attributes {S1 , S2 , …, Sn }: The important properties or characteristics used to describe the subject are referred to as subject attributes. Example: {S1 , S2, S3 } = {Department, Designation, grade}.
A Study on Current Research and Challenges …
23
Subject attributes values {VS1 , VS2 , …, VSn }: The possible set of values (domain) is assigned to the subject attributes {S1 , S2 , …, Sn }. Such that VSk = {sk v1 , sk v2 , …, sk vn } is the value domain for attribute Sk , and Sk = {values ε VSk }. Example: {VDepartment = {Cardiology, Hematology, Urology, Neurology}. Subject value attribute assignment: The values of the subject attribute are assigned as Sk = {values ε VSk }. Example: Department = {Hematology, Urology} ∧ Designation = {Nurse, Doctor}. Object: The shared resource is called the object. Object attributes {A1 , A2 , …, An }: The important properties or characteristics used to describe the object are referred to as object attributes. Example: {O1 , O2, O3 } = {ResourceName, ResourceType, LastUpdatedOn}. Object attributes values {VO1 , VO2 , …, VOn }: The possible set of values (domain) is assigned to the object attributes {O1 , O2 ,.., On }. Such that VOk = {ok v1 , ok v2 , …, ok vn } is the value domain for attribute Ok , and Ok = {values ε VOk }. Example: {VResourceName = {Pat_007_Blood_Report, Pat_435_CBC_Report}. Object value attribute assignment: The values of the object attribute are assigned as Ok = {values ε VOk }. Example: ResourceName = { Pat_435_CBC_Report} ∧ ResourceType = {DataFile}. Environmental condition: This category specifies the information about the environmental conditions. Environmental condition attributes {E1 , E2 , …, En }: The characteristics are used to describe the environment. Example: {E1 , E2, E3 } = {DateOfRequest, Time, PurposeOfRequest}. Environmental condition attributes values {VE1 , VE2 , …, VEn }: The possible set of values (domain) is assigned to the environment attributes {E1 , E2 ,.., En }. Such that VEk = {ek v1 , ek v2 ,…,ek vn } is the value domain for attribute Ek , and Ek = {values ε VEk }. Example: {V Time = {07:12, 12:05, 08:16y}. Environmental value attribute assignment: The values of the environmental attribute are assigned as Ek = {values ε VEk }. Example: Time = {07:12}. ABAC rule is expressed as R = {Xop | {A1 e VA1 , A2 e VA2 , …, An e VAn }. X is the decision (allow or deny) for the request of operation (read, write, print, etc.). {A1 , A2 , …, An } is the list of attributes belonging to categories {subject, object, environmental conditions}. VA1 , VA2 , VAn are the set of permitted values
24
K. Vijayalakshmi and V. Jayalakshmi
of the attributes {A1 , A2 , …, An }, respectively. The decision is made based on the attributes specified in the ABAC rule. The ABAC rule can be written as follows: R1 =
allowread | Designation = {Surgeon, Chief doctor}, Department = {Cardiology}, FileName = {Pat_567_CBC_Report}
The above rule, R1, states that the persons who are all working as a surgeon or chief doctor belonging to the department of cardiology can read the file Pat_567_CBC_Report.
3.2 Policy Expression ABAC policies can be written using access control policy languages. Most ABAC implementation uses extensible access control markup language (XACML) to express the ABAC policies [44]. Organization for the Advancement of Structured Information Standards (OASIS) created a standard for XACML based on XML concepts in 2002. OASIS also developed security assertion markup language (SAML) in 2005 for the specification of security policies [45]. The security policy set of the ABAC model can also be expressed by JavaScript Object Notation (JSON). In XACML, each attribute is expressed with the pair (attribute’s name, attribute’s value) using a markup language. The ABAC policy set can be expressed by XACML as follows:
read write
dermatology chief doctor
PatID_005_CBCe_Report
07:12
…// more rules can be specified // more policies can be specified
In the above example, the rule R1 states that security policy allows the chief doctor in the department of dermatology to read and write the file “PatID_005_CBC_Report” during the time 07:12 h.
A Study on Current Research and Challenges …
25
4 Taxonomy of ABAC Research The current study on ABAC research is classified based on model, implementation, policy, and attributes. The research on each category can be divided into subcategories specific to the area or domain of the research. Figure 2 shows the taxonomy of ABAC research.
4.1 ABAC Models The research on designing ABAC models either the original model or hybrid models is getting great attention. The design of the original model is a purely new attributebased access control model not an extended model of any previous access control model. The design of the original access control model may be general or domainspecific models. The hybrid models are designed by the combined features or strengths of two or more existing models. Table 2 shows the type of ABAC models.
4.2 ABAC Implementation The researches toward the implementation of ABAC models have also great impact and interest in todays’ communication technology. Comparing to the researches on designing ABAC models, the researches on the implementation is in the next place to the design of ABAC models. The framework for the implementation of the ABAC model comprises several functional components like representation of ABAC policies in any one of the access control languages (XACML, SAML, ORACLE, MySQL, or others), establishing security policies, storing and managing policies and metadata, and testing and maintenance of the framework.
Categorization of current research in ABAC
Model-related research
Informationrelated research
Fig. 2 Taxonomy of ABAC research
Policy-related research
Attributesrelated research
26
K. Vijayalakshmi and V. Jayalakshmi
Table 2 Taxonomy of ABAC research in designing models ABAC model
Techniques
Original model
• General models - Logic-based ABAC: Designing is mainly concentrated on consistency, representation, and validation of ABAC policies [46] - ABACα: Designing the model by incorporating the features of DAC, MAC, and RBAC [26] - Attribute-based access matrix model: implementing ABAC matrix called ABACM. Each row represents the subject’s attribute and value pair. Each column represents the object’s attribute and value pair. Each cell specifies the access right [47] • Domain-specific models - Cloud computing: designing the model for the domain of cloud computing [48] - Grid computing: designing the model for the domain of grid computing - Real-time systems: designing the model for the domain of real-time systems
Hybrid model
• RABAC: designing the model with the combined features of RBAC and ABAC • PRBAC: designing the model with the combined features of parameterized RBAC and ABAC • Attribute-based and role assignment: a model with attribute-based and role-assignment policies
4.3 ABAC Policies The researches toward the development, testing, and validation of ABAC security policies are also getting great attention. The researcher has an equivalent interest in policy-related tasks and the implementation of the ABAC model. The previous and current research contributions on policies are preserving the consistency and confidentiality of the policies, flexible and efficient policymaking, testing policies, detecting anomalies, and validating policy anomalies.
4.4 ABAC Attributes The literature review describes that there are also more research contributions on determining and specifying attributes in the policies. The research on policy attributes involves preserving confidentiality, adding more attributes to improve the security level, flexible attribute specification, storing and managing attributes. Figure 3 shows the evolution of ABAC research, and Fig. 4 shows the research rate of each category of ABAC.
A Study on Current Research and Challenges …
27
Fig. 3 Evolution of ABAC research
Fig. 4 Research rate of each category of ABAC
5 Challenges in ABAC 5.1 Policy Errors The main critical issues are anomalies or conflicts in the security policies. The policy errors cause dangerous security issues like denial of service, data loss, or data breach. The primary policy errors are rule redundancy and rule discrepancy [49]. The rule redundancy errors consume high storage space and increase the complexity
28
K. Vijayalakshmi and V. Jayalakshmi
in updating security policies [50]. The rule discrepancy error provides confusion in granting permissions to the users. This error causes unavailability of the shared resource or illegal access.
5.2 Scalability The important challenge in implementing or adopting the ABAC framework is the scalability of the model. The traditional access control models DAC and MAC proved their scalability in small-scale applications [51]. RBAC is also performing well in large-scale applications. ABAC has to meet complex security requirements and manage millions of subjects’ and objects’ attributes. ABAC solutions should require many case studies to prove their scalability.
5.3 Delegations The most essential feature of access control models is delegation. The delegation feature allows one subject to grant (delegate) certain permissions (access rights) to the other subjects. Due to frequent and dynamic changes of attributes and policies, achieving dynamic delegation is more complex [52]. The delegation requires constant policies with constant attributes and role-user assignments. The researchers are struggling to fulfill the requirement of dynamic delegation.
5.4 Auditability Another important and necessary aspect of all security systems and access control models is auditing. The term auditing refers to the ability to determine the number of subjects who has got particular access rights (read, write, or share) for a certain object, or the particular subject has got access rights for how many objects. ABAC never maintains the identity of the users [42]. The users are unknown, and they get the access rights if their attributes are satisfied with the predefined ABAC policies. Thus, it is more difficult to determine the number of users for a particular object and the number of objects allowed for access to a particular user.
A Study on Current Research and Challenges …
29
6 Conclusion With the help of the Internet, communication, and information technology, the number of users and resources is growing rapidly. Hence, the security is an essential, critical, and challenging concept. Many access control models play a vital role in addressing security threats and attacks like denial of service, account hijacking, and data loss. ABAC is getting more attention from the researchers, due to its flexibility and efficiency. This paper has presented the fundamental concepts of the ABAC model and the taxonomy of research in ABAC. This paper has categorized and described each category of ABAC research. This article also discussed the challenges in ABAC models. This review work may help the researchers and practitioners toward attaining knowledge of ABAC models, implementation, policies, attributes, and challenges in ABAC.
References 1. Kumar A, Maurya HC, Misra R (2013) A research paper on hybrid intrusion detection system.Int J Eng Adv Technol 2(4):294–297 2. Khraisat A, Gondal I, Vamplew P, Kamruzzaman J (2019) Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2(1). https://doi.org/10.1186/s42 400-019-0038-7 3. Hydro C et al (2013) We are IntechOpen, the world ’ s leading publisher of Open Access books Built by scientists, for scientists TOP 1 %. INTECH 32(July):137–144 4. Liang C et al (2020) Intrusion detection system for the internet of things based on blockchain and multi-agent systems. Electrononics 9(7):1–27. https://doi.org/10.3390/electronics9071120 5. Varal AS, Wagh SK (2018) Misuse and anomaly intrusion detection system using ensemble learning model. In: International conference on recent innovations in electrical, electronics & communication engineering ICRIEECE 2018, pp. 1722–1727. https://doi.org/10.1109/ICRIEE CE44171.2018.9009147 6. Qi H, Di X, Li J (2018) Formal definition and analysis of access control model based on role and attribute. J Inf Secur Appl 43:53–60. https://doi.org/10.1016/j.jisa.2018.09.001 7. Suhendra V (2011) A survey on access control deployment. In: Communication in computer and information science, vol 259 CCIS, pp 11–20. https://doi.org/10.1007/978-3-642-27189-2_2 8. Sahafizadeh E (2010) Survey on access control models, pp 1–3 9. Conrad E, Misenar S, Feldman J (2016) Domain 5: identity and access management (Controlling Access And Managing Identity). In: CISSP Study Guid, pp 293–327. https://doi.org/10. 1016/b978-0-12-802437-9.00006-0 10. Xu L, Zhang H, Du X, Wang C (2009) Research on mandatory access control model for application system. In: Proceedings of international conference on networks security, wireless communications and trusted computing NSWCTC 2009, vol 2, no 1, pp 159–163. https://doi. org/10.1109/NSWCTC.2009.322 11. Sandhu RS et al (1996) Role based access control models. IEEE 6(2):21–29. https://doi.org/ 10.1016/S1363-4127(01)00204-7 12. Sandhu R, Bhamidipati V, Munawer Q (1999) The ARBAC97 model for role-based administration of roles. ACM Trans Inf Syst Secur 2(1):105–135. https://doi.org/10.1145/300830. 300839 13. Sandhu R, Munawer Q (1999) The ARBAC99 model for administration of roles. In: Proceedings 15th annual computer security applications conference, vol Part F1334, pp 229–238. https://doi.org/10.1109/CSAC.1999.816032
30
K. Vijayalakshmi and V. Jayalakshmi
14. Hutchison D (2011) Data and applications security and privacy XXV. In: Lecture notes computer science, vol 1, pp 3–18. https://doi.org/10.1007/978-3-319-20810-7 15. Crampton J, Morisset C (2014) Monotonicity and completeness in attribute-based access control. In: LNCS 8743,Springer International Publication, pp 33–34 16. Prakash C, Dasgupta S (2016) Cloud computing security analysis: challenges and possible solutions. In: International conference on electrical, electronics, and optimization techniques ICEEOT 2016, pp 54–57. https://doi.org/10.1109/ICEEOT.2016.7755626 17. Markandey A, Dhamdhere P, Gajmal Y (2019) Data access security in cloud computing: a review. In: 2018 International conference on computing, power and communication technologies GUCON 2018, pp 633–636. https://doi.org/10.1109/GUCON.2018.8675033 18. Que Nguyet Tran Thi TKD, Si TT (2017) Fine grained attribute based access control model for privacy protection. Springer International Publication A, vol 10018, pp 141–150. https:// doi.org/10.1007/978-3-319-48057-2 19. Vijayalakshmi K, Jayalakshmi V (2021) Analysis on data deduplication techniques of storage of big data in cloud. In: Proceedings of 5th international conference on computing methodologies and communication ICCMC 2021. IEEE, pp 976–983 20. Vijayalakshmi K, Jayalakshmi V (2021) Identifying considerable anomalies and conflicts in ABAC security policies. In: Proceedings of 5th international conference on intelligent computing and control systems ICICCS 2021. IEEE, pp 1286–1293 21. Vijayalakshmi K, Jayalakshmi V (2021) A similarity value measure of ABAC security rules. In: Proceedings of 5th international conference on trends electronics and informatics ICOEI 2021, IEEE 22. Costa HH, de Araújo AP, Gondim JJ, de Holanda MT, Walter ME (2017) Attribute based access control in federated clouds: A case study in bionformatics. In: Iberian conference on information systems and technologies CIST. https://doi.org/10.23919/CISTI.2017.7975855 23. Aftab MU, Habib MA, Mehmood N, Aslam M, Irfan M (2016) Attributed role based access control model. In: Proceedings of 2015 conference on information assurance and cyber security CIACS 2015, pp 83–89. https://doi.org/10.1109/CIACS.2015.7395571 24. Shu J, Shi L, Xia B, Liu L (2009) Study on action and attribute-based access control model for web services. In: 2nd International symposium on information science and engineering ISISE 2009, pp 213–216. https://doi.org/10.1109/ISISE.2009.80 25. Bai QH, Zheng Y (2011) Study on the access control model in information security. In: Proceedings of 2011 cross strait quad-regional radio science wireless technology conference CSQRWC 2011, vol 1, pp 830–834. https://doi.org/10.1109/CSQRWC.2011.6037079 26. Jin X, Krishnan R, Sandhu R (2012) A unified attribute-based access control model covering DAC, MAC and RBAC BT. In: Lecture notes in computer science, vol 7371, pp 41–55 27. Ngo C, Demchenko Y, De Laat C (2015) Multi-tenant attribute-based access control for cloud infrastructure services. https://doi.org/10.1016/j.jisa.2015.11.005 28. Shaikh RA, Adi K, Logrippo L (2017) A data classification method for inconsistency and incompleteness detection in access control policy sets. Int J Inf Secur 16(1):91–113. https:// doi.org/10.1007/s10207-016-0317-1 29. Servos D, Osborn SL (2017) Current research and open problems in attribute-based access control. ACM Comput Surv (CSUR) 49(4):1–45. https://doi.org/10.1145/3007204 30. El Hadj MA, Ayache M, Benkaouz Y, Khoumsi A, Erradi M (2017) Clustering-based approach for anomaly detection in xacml policies. In: ICETE 2017—proceedings of 14th international joint conference on E-business telecommunication, vol 4, no Icete, pp 548–553. https://doi. org/10.5220/0006471205480553 31. Pussewalage HSG, Oleshchuk VA (2017) Attribute based access control scheme with controlled access delegation for collaborative E-health environments. J Inf Secur Appl 37:50–64. https:// doi.org/10.1016/j.jisa.2017.10.004 32. Afshar M, Samet S, Hu T (2018) An attribute based access control framework for healthcare system. J Phys Conf Ser 933(1). https://doi.org/10.1088/1742-6596/933/1/012020 33. Fu X, Nie X, Wu T, Li F (2018) Large universe attribute based access control with efficient decryption in cloud storage system. J Syst Softw 135:157–164. https://doi.org/10.1016/j.jss. 2017.10.020
A Study on Current Research and Challenges …
31
34. Franco E, Muchaluat-saade DC (2018) ACROSS: a generic framework for attribute-based access control with distributed policies for virtual organizations. Futur Gener Comput Syst 78:1–17. https://doi.org/10.1016/j.future.2017.07.049 35. Ait El Hadj M, Khoumsi A, Benkaouz Y, Erradi M (2018) Formal approach to detect and resolve anomalies while clustering ABAC policies. ICST Trans Secur Saf 5(16):156003. https://doi. org/10.4108/eai.13-7-2018.156003 36. Imine Y, Lounis A, Bouabdallah A (2018) AC SC. https://doi.org/10.1016/j.jnca.2018.08.008 37. Pratap M, Sural S, Vaidya J (2019) Managing attribute-based access control policies in a unified framework using data warehousing and in-memory database. Comput Secur 86:183– 205. https://doi.org/10.1016/j.cose.2019.06.001 38. Morisset C, Willemse TAC, Zannone N (2019) A framework for the extended evaluation of ABAC policies. Cybersecurity 2(1). https://doi.org/10.1186/s42400-019-0024-0 39. Vijayalakshmi K, Jayalakshmi V (2020) A priority-based approach for detection of anomalies in ABAC policies using clustering technique. In: Iccmc, pp 897–903. https://doi.org/10.1109/ iccmc48092.2020.iccmc-000166 40. Vijayalakshmi K, Jayalakshmi V (2021) Shared access control models for big data: a perspective study and analysis. Springer, pp 397–410. https://doi.org/10.1007/978-981-15-8443-5_33 41. Vijayalakshmi K, Jayalakshmi V (2021) Improving performance of ABAC security policies validation using a novel clustering approach. Int J Adv Comput Sci Appl 12(5):245–257 42. Hu VC et al (2014) Guide to attribute based access control (abac) definition and considerations. NIST Spec Publ 800:162. https://doi.org/10.6028/NIST.SP.800-162 43. Cavoukian A, Chibba M, Williamson G, Ferguson A (2015) The importance of ABAC: attributebased access control to big data: privacy and context. In: Private Big Data Institute, p 21 44. Deng F et al (2019) Establishment of rule dictionary for efficient XACML policy management. Knowl-Based Syst 175:26–35. https://doi.org/10.1016/j.knosys.2019.03.015 45. OASIS (2008) SAML v2.0. Language (Baltim) 46. Dovier A, Piazza C, Pontelli E, Rossi G (2000) Sets and constraint logic programming. ACM Trans Program Lang Syst 22(5):861–931. https://doi.org/10.1145/365151.365169 47. Zhang X, Li Y, Nalla D (2005) An attribute-based access matrix model. In: Proceedings of the 2005 ACM symposium on applied computing, vol 1, pp 359–363. https://doi.org/10.1145/106 6677.1066760 48. Ahuja R, Mohanty SK, Sakurai K (2016) A scalable attribute-set-based access control with both sharing and full-fledged delegation of access privileges in cloud computing. Comput Electr Eng, pp 1–16. https://doi.org/10.1016/j.compeleceng.2016.11.028 49. Vijayalakshmi K, Jayalakshmi V (2021) Resolving rule redundancy error in ABAC policies using individual domain and subset detection method. In: Proceedings of 6th international conference on communication and electronics systems. ICCES 2021, IEEE 50. Ait M, Hadj E, Erradi M, Khoumsi A (2018) Validation and correction of large security policies : a clustering and access log based approach. In: 2018 IEEE international conference on big Data (Big Data), no 1, pp 5330–5332. https://doi.org/10.1109/BigData.2018.8622610 51. Fugkeaw S, Sato H (2018) Scalable and secure access control policy update for outsourced big data. 79:364–373. https://doi.org/10.1016/j.future.2017.06.014 52. Servos D, Mohammed S, Fiaidhi J, Kim TH (2013) Extensions to ciphertext-policy attributebased encryption to support distributed environments. Int J Comput Appl Technol 47(2–3):215– 226. https://doi.org/10.1504/IJCAT.2013.05435
Audio Denoising Using Deep Neural Networks S. Jassem Mohammed and N. Radhika
Abstract Improving speech quality is becoming a basic requirement with increasing interest in speech processing applications. A lot of speech enhancement techniques are developed to reduce or completely remove listeners fatigue from various devices like smartphones and also from online communication applications. Background noise often interrupts communication, and this was solved using a hardware physical device that normally emits a negative frequency of the incoming audio noise signal to cancel out the noise. Deep learning has recently made a break-through in the speech enhancement process. This paper proposes an audio denoising model which is built on a deep neural network architecture based on spectrograms (which is a hybrid between frequency domain and time domain). The proposed deep neural network model effectively predicts the negative noise frequency for given input incoming audio file with noise. After prediction, the predicted values are then removed from the original noise audio file to create the denoised audio output. Keywords Deep neural network · Spectrogram · Transfer learning · Activation function · Sampling rate · Audio synthesizing · Preprocessing
1 Introduction Speech signals are transmitted, recorded, played back, analyzed, or synthesized by electronic systems in the context of audio communication. Noise influences must be carefully considered when building a system for any of these reasons. Different types of noise and distortion can be identified, and there are a variety of signal processing principles that can help mitigate their impact. One of the most researched problems in audio signal processing is denoising. Noise is an inevitable and, in most cases, an S. J. Mohammed · N. Radhika (B) Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] S. J. Mohammed e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_3
33
34
S. J. Mohammed and N. Radhika
Fig. 1 Working of AI-based audio denoising model
undesired component of audio recordings, necessitating the use of a denoising stage in signal processing pipelines for applications such as music transcription, sound categorization, voice recognition. The goal of audio denoising is to reduce noise while preserving the underlying signals. There are several applications, including music and speech restoration. Figure 1 shows the overall basic working of an audio denoising artificial intelligence model. Figure 1 shows the basic overall working of a trained audio denoising model. The trained audio denoising model for an incoming noisy speech file predicts the noise values present in the incoming input speech. These noise values when subtracted from the incoming audio file, clean or denoised audio file is obtained. For decades, researchers have been working on speech enhancement techniques that predict clear speech based on statistical assumptions over how speech and noise behave. With the development of deep learning-based approaches, a new age of voice augmentation has begun. By training a deep neural network, these strategies learn the mapping function that transfers noisy speech to clean speech without making any statistical assumptions. This deep neural network is fed a lot of data for training in the form of clean and noisy speech pairings, and it updates its parameters during the supervised learning process to produce the best prediction for the target clean speech. This paper has seven sections in total. Section 2 presents the background literature survey, Sects. 3 and 4 elucidates on the design of the proposed model and implementation of the proposed model, while Sects. 5 and 6 discusses the results and inferences. The conclusion and future scope are described at the end in Sect. 7. In Sect. 7, the appendix contains important snippets of code that were used for the implementation of this project.
Audio Denoising Using Deep Neural Networks
35
2 Background In this section, some of the recent techniques that have been used for denoising speech using artificial intelligence models and algorithms have been surveyed. The deep autoencoder model (DAE) described in this paper [1] has been utilized to perform dimensionality reduction, face recognition, and natural language processing. The author studied utilizing a linear regression function to create the DDAE model’s decoder (termed DAELD) in this research and evaluated the DAELD model on two speech enhancement tasks (Aurora-4 and TIMIT datasets). The encoder and decoder layers of the DAELD model are used to transform speech signals to highdimensional feature representations and then back to speech signals. The encoder consists of nonlinear transformation, and the decoder consists of the linear regression algorithm. The author had proved that utilizing linear regression in the decoder part, he was able to obtain improved performance in terms of PESQ and STOI score values. In this paper [2], using deep neural networks, the author used an ideal binary mask (IBM) as a binary classification function for voice improvement in complex noisy situations (DNNs). During training, IBM is employed as a target function, and trained DNNs are used to estimate IBM during the augmentation stage. The target speech is then created by applying the predicted target function to the complex noisy mixtures. The author had proved that a deep neural network model with four hidden layers and with the mean square error as its loss function provides an average seven percent improvement in the speech quality. This paper [3] provides a deep learning-based method for improving voice denoising in real-world audio situations without the need for clean speech signals in a self-supervised manner. Two noisy realizations of the same speech signal, one as the input and the other as the output, are used to train a fully convolutional neural network as proposed by the author using LeakyReLU as the activation function as the author had mentioned that it would help speeding the training processes. Thus, LeakyReLU had been selected as the activation function for the proposed model of this paper. In this paper [4], in the spectro-temporal modulation domain, the author presented a simple mean squared error (MSE)-based loss function for supervised voice enhancement. Because of its tight relationship to the template-based STMI (spectro-temporal modulation index), which correlates well with speech intelligibility, this terms the loss spectro-temporal modulation error (STME). In the training and test sets, the author used a small-scale dataset with 9.4 hours and 35 min of loud speech, respectively. The author used the Interspeech2020 deep noise suppression (DNS) dataset for the large-scale dataset. The author’s model consists of four fully connected layers connected with two stacked gated recurrent units between the first and the second layer. Unlike the proposed model of this paper, the author of paper [4] had built a speech enhancement model on the modulation domain.
36
S. J. Mohammed and N. Radhika
In this paper [5], deep neural networks had been developed by the author to classify spoken words or environmental sounds from audio. After that, the author trained an audio transform to convert noisy speech to an audio waveform that minimized the recognition network’s “perceptual” losses. For training his recognition network with perceptual loss as the loss function, the author utilized several wave UNet architectures and obtained PESQ score of 1.585 and STOI score of 0.773 as the highest score reached by the author for various proposed architectures with a similar architecture to that of the UNet model. This Stanford paper [6] by the author Mike Kayser has come up with two different approaches for audio denoising, and the first method is to provide the noisy spectrogram to a convolutional neural network and obtain a clean output spectrogram. The clean spectrogram is used to generate mel-frequency cepstral coefficient (MFCC). In the second method proposed by the author, the noisy spectrogram is given as an input to the multilayer perceptron network which is in turn connected to a convolutional neural network. This combined network learns and predicts the MFCC features. The author has also concluded from his experiments that for various architectures, tanh activation function gives better results when training audio spectrograms compared to that of rectified linear units. In this paper [7], the author proposes Conv-TasNet, a deep learning system for end-to-end time-domain speech separation, as a fully convolutional time-domain audio separation network (Conv-TasNet). Conv-TasNet generates a representation of the speech waveform that is optimized for separating distinct speakers using a linear encoder. The encoder output is subjected to a collection of weighting functions (masks) to achieve speaker separation. Thus, in order to propose a deep neural network model and to improve the performance of the designed model, the above literature survey was done. From paper [2], it shows that the presence of hidden layers can improve the performance of the model. Utilizing LeakyReLU as mentioned in paper [3] reduces the training time of the proposed model, and paper [6] shows that tanh activation function can improve the performance of the denoising model. Paper [5] shows how UNet model architecture can be used for building the audio denoising model. Combining this extracted information done from the literature survey, a deep neural network model which is a hybrid of UNet model and dense layers has been proposed and explained in the next section.
3 Methodology In this section, a description of the dataset and a detailed explanation of the model architecture are provided.
Audio Denoising Using Deep Neural Networks
37
3.1 Dataset The datasets chosen for the projects are, Vassil Panayotov worked with Daniel Povey to create LibriSpeech [8], a corpus of around 1000 h of 16 kHz read English speech. The information comes from the LibriVox project’s read audiobooks, which have been carefully separated and aligned. The ESC-50 dataset [9] is a tagged collection of 2000 ambient audio recordings that can be used to compare sound categorization systems. The dataset is made up of 5-s recordings that are divided into 50 semantic classes (each with 40 examples) and informally sorted into five major categories: 1. 2. 3. 4. 5.
Animals Natural soundscapes and water sounds Human, non-speech sounds Interior/domestic sounds Exterior/urban noises.
Dataset for noisy input has been synthesized by randomly combining noise audios from the ESC-50 dataset onto the LibriSpeech dataset.
3.2 Model Design The proposed neural network structure was constructed with the UNet model as the base, and this network architecture was modified for working with spectrograms, and the last five layers of this network architecture comprise dense layers. The overall working of the entire system has been shown in the form of a block diagram shown in Fig. 2. The deep neural network model is similar to that of the UNet architecture. The UNet architecture has been chosen for this application because UNet architectures are normally used in image segmentation problems, which are similar to the denoising of audio file application as the network has to identify and segment out the clean audio from the incoming noise audio file. The constructed model has two major portions. The first major portion of the neural network is known as the contracting portion, and the second major portion is called the expansive portion. The expansive portion has five dense layers present at the end of the architecture as shown in Fig. 3. Figure 3 shows the architecture diagram of the proposed model. The output from the proposed model gives the value of negative audio noise value which is then subtracted from the noisy speech audio spectrogram to produce denoised audio file.
38
S. J. Mohammed and N. Radhika
Fig. 2 Block diagram of entire model
Fig. 3 Architecture model diagram
4 Experimental Setup Dataset Preprocessing. The audio files cannot be used as such for training as the noisy speech data should be synthesized by randomly combining both the ESC-50 dataset and the LibriSpeech dataset. The audio files are first converted into NumPy which is then converted into a spectrogram matrix. For converting an audio file into a NumPy matrix, the following parameters had been initially set, 1. 2.
sampling rate = 8000 frame length = 255
Audio Denoising Using Deep Neural Networks
3. 4.
39
hop frame length = 63 minimum duration = 1.
Here, sampling rate is defined as the number of samples to be extracted per second present in a time series. Here, the standard value of 8000 Hz has been used, and further during experimentation, this number has been increased to 16,000 Hz to check the performance of the model. For training a model to recognize the noise present in the audio, the model should be trained on a noisy speech audio and also the noise audio for the model to learn the features of the noise. Audios that are captured at 8 kHz and windows somewhat longer than 1 s were removed to form the datasets for training, validation, and testing. For the environmental noises, data augmentation has been done by changing the window value randomly at different times to create different noise windows. With a randomization of the noise intensity, noises have been merged to clear voices between 20 and 80. A single noise audio file has been created as training data where the audio file from the ESC-50 dataset has been randomly merged.
4.1 Evaluation Metrics This sub-section explains the evaluation metrics utilized for evaluating the performance of the proposed model. Upon literature survey, various authors [1–5] have utilized the same evaluation metric for evaluation of denoised audio. PESQ Score. PESQ [10] refers to perceptual evaluation of speech quality which is defined by International Telecommunications Union recommendation P.862. This score value ranges from 4.5 to −0.5 where greater score value indicates better audio quality. The PESQ algorithm is normally used in the telecommunications industry for objective voice quality testing by phone manufacturers and telecom operators. Higher the PESQ score, better the audio quality. For obtaining the PESQ score of the model, both clean speech audio file and the noisy speech audio file are required for obtaining the PESQ score. PESQ library has been used for this purpose. STOI Score. STOI [11] refers to short term objective intelligibility which is a metric that is used for predicting the intelligibility of noisy speech. This does not evaluate the speech quality (as speech quality is normally evaluated in silence) of the audio, but this returns a value between 0 and 1 where 1 being the highest score where the noisy speech can be understood easily. The pystoi library has been used for obtaining the STOI score of the models, and similar to the PESQ metric, the STOI score computation also requires the presence of clean speech audio and the noisy speech audio file.
40
S. J. Mohammed and N. Radhika
5 Implementation and Results The synthesized noisy speech is converted into spectrogram and along with its pair of clean speech audio. These spectrograms are then given as input training data to the proposed model. The input dataset is split into 80% training data and 20% testing for training the model. Adam optimizer function is used, and mean squared error loss function is used in the proposed model. The model stops once training when the validation loss starts to increase. This is necessary in order to avoid overfitting of the model. Once the model is trained, an input noisy speech audio in the form of its matrix spectrogram is given as input to the model, and the predicted output values are obtained. The predicted output values are then subtracted from the noisy speech audio spectrogram in order to obtain the clean speech audio spectrogram which is then converted to audio file.
5.1 Implementation of UNet Model The base UNet model [12] has been implemented on the same dataset for comparison purposes. Figure 9 shows the training graph for the UNet model. UNet model [12] has been chosen as this architecture’s contracting and symmetrical expanding path helps in distinct localization with constructing the model with less training data. The UNet model performs best when it comes to image segmentation problems. The audio files are converted into spectrograms [13], which are later converted into a NumPy array at the time of training the model. This process is similar to the way of handling image files where the images are converted into NumPy arrays when feeding as training data into the model. From the above Fig. 4, there is a slight increase in validation loss, after the third epoch. This shows that the model has stopped learning. On increasing the number of epochs above 4, the model starts overfitting as the training loss starts increasing. The UNet model implementation was done using standard hyperparameters. Fig. 4 Learning curve of the UNet model
Audio Denoising Using Deep Neural Networks
41
Fig. 5 Frequency–time graph of the input noisy audio file
Fig. 6 Spectrogram of the input noisy audio file
For this implementation purpose, a different external audio noisy speech file was generated and used for obtaining the evaluation metric of the model. Figures 5 and 6 show the graphical representation of the input audio file. Results. The following are the test results that were obtained for testing the UNnet model for the above noise voice file. Figures 7 and 8 show the graphical representation of the output audio file. The output spectrogram file from Fig. 8 shows only fewer and clean spikes of red with a deeper blue background compared to that of the input file’s spectrogram from Fig. 6. This shows visually the absence of noise in the output spectrogram.
Fig. 7 Frequency–time graph of the output audio file
42
S. J. Mohammed and N. Radhika
Fig. 8 Spectrogram of the output audio file
The evaluation metric obtained for the implementation of the UNet model is, 1. 2.
STOI score is 0.738. PESQ score is 1.472.
5.2 Implementation of the Proposed Model Initially, the proposed model has been trained on default hyperparameter values. The number of epochs has been set to 4 because if the number of epochs is increased, the proposed model starts overfitting, and the validation loss starts increasing, denoting that the model has stopped learning. LeakyReLU has been utilized as the activation function based on the results obtained from this paper [14] for the ESC-50 dataset [9]. Mean squared error loss function is utilized since this is a prediction problem and not a classification problem along with Adam [15] optimizer. Hyperparameter values are, 1. 2. 3. 4. 5.
number of epochs = 4 activation function = LeakyReLU optimizer = Adam loss = mean squared error sampling rate = 8000.
Results. For the above hyperparameter values, the model was trained and the following evaluation metric results were obtained. 1. 2.
STOI score is 0.727. PESQ score is 1.681.
The obtained evaluation metrics do not provide a drastic change compared to the evaluation metric values obtained from the implementation of the UNet model for comparison purposes. The presence of dense layers present in the proposed architecture did not bring change in the evaluation metric except for a slight increase in PESQ score. In order to boost the performance of the proposed model, tuning of the hyperparameter values must be done.
Audio Denoising Using Deep Neural Networks
43
Fig. 9 Frequency–time graph of the input audio file
Fig. 10 Spectrogram of the input audio file
Hyperparameter tuning. According to the author Mike Kayser in his paper [6] “Denoising convolutional autoencoders for noisy speech recognition,” the author had proved that the tanh activation function yields better when a deep learning model is trained on audio data; hence, the activation function had been changed to the proposed architecture. The visual representation of the input noise speech audio file is shown in Figs. 9 and 10. In order to obtain improved results from implementing the proposed model, the sampling rate of the audio file had been increased from 8000 to 16,000 Hz during the training of the proposed model, as sampling rate defines the number of samples per second taken from a continuous signal to make it a discrete or digital signal. Thus, on increasing the sampling rate, the number of samples utilized for training the proposed model increases which helps in learning the features of the audio file. The proposed model had been trained for 3 epochs with the following hyperparameter values. Hyperparameters values are, 1. 2. 3. 4. 5.
number of epochs = 3 activation function = LeakyReLU, tanh optimizer = Adam loss = mean squared error sampling rate = 16,000. For the above hyperparameter values, Fig. 11 shows the learning curve diagram.
44
S. J. Mohammed and N. Radhika
Fig. 11 Learning curve of the proposed model
From the above training graph from Fig. 11, it is observed that the rate of validation loss values slightly increases after the second epoch, but the training loss decreases at a faster rate further. This shows that the model had slowly started to overfit based on the given input training data. If the model is trained above 3 epochs, the model starts overfitting, and this can be visually seen when the training loss keeps decreasing, but the validation loss values saturates. The following results were obtained for testing the model on a random noise voice file. Figures 12 and 13 show the graphical representation of the output audio file.
Fig. 12 Frequency–time graph of the output audio file
Fig. 13 Spectrogram of the output audio file
Audio Denoising Using Deep Neural Networks Table 1 Summary of results obtained for various implementations
45
S. no.
Model
PESQ score
STOI score
1
UNet model
1.472
0.738
2
Proposed model
1.681
0.727
3
Proposed model (tuned)
1.905
0.756
4
WaveNet model [5]
1.374
0.738
5
A1 + W1 [5]
1.585
0.767
Results. On visual comparison between the input spectrogram and output spectrogram, the brightness of red is slightly reduced in areas where noises are present, but it is not completely removed. From this, it is inferred that the denoised audio file does have noise present in it but at a lower magnitude compared to that of what the noise file had initially. The evaluation metric obtained for this experiment is, 1. 2.
STOI score is 0.756. PESQ score is 1.905.
From the obtained evaluation metric, it is observed that by increasing the sampling rate of the audio file at the time of training the proposed model, changing the activation function to tanh has increased the performance of the proposed model. Comparing the spectrograms between Figs. 13 and 10, there is a drastic visual difference in the shade of red, showing the magnitude of the noise has still further reduced. Table 1 shows the condensed form of all results obtained, and from this, we can clearly see that upon changing the hyperparameter values, improvised results can be obtained. The table also clearly shows comparison of already existing models from WaveNet model and A1 + W1 model from this paper [5]. It is observed that the proposed model has better evaluation metrics compared to that of the existing models and variations of the UNet model.
6 Inference From Table 1, we can infer that the proposed deep neural network model works better than the standard UNet architecture due to the presence of dense layers present in the proposed model. Dense layers are normally utilized to identify unlabeled or untagged features compared to that of a standard convolutional layer, where layer can accurately learn the marked or highlighted features. Moreover, the proposed model with tanh activation function increases the performance of the model. Further, the performance of the model can still be increased with increase in sampling rate of the audio file. Increased sampling rate refers to increasing number of samples obtained for once second present in the audio; thus, more detailed features are present for the proposed model to learn, hence increased performance of the model.
46
S. J. Mohammed and N. Radhika
7 Conclusion In this project, a deep neural network model has been proposed and experimented with, which enhances the speech and denoises multiple kinds of noises present in for any given audio file. The proposed model shows significant improvement in terms of PESQ and STOI as audio spectrograms of clean speech audio files, and synthesized speech noise audio files are used as training data. The results from experiments where we obtain a STOI value of 0.756 and a PESQ score of 1.905 show how the presence of dense layers with tanh activation function and the increased sampling rate (from 8000 to 16,000 Hz) during training can significantly improve the results of the proposed model.
References 1. Zezario RE, Hussain T, Lu X, Wang H-M, Tsao Y (2020) Self-supervised denoising autoencoder with linear regression decoder for speech enhancement. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6669–6673 https://doi.org/10.1109/ICASSP40776.2020.9053925 2. Saleem N, Khattak MI (2019) Deep neural networks for speech enhancement in complex-noisy environments. Int J Interact Multimed Artif Intell InPress, p 1. https://doi.org/10.9781/ijimai. 2019.06.001 3. Alamdari N, Azarang A, Kehtarnavaz N (2020) Improving deep speech denoising by noisy2noisy signal mapping. Appl Acoust (IF 2.440) Pub Date 16 Sept 2020. https://doi.org/ 10.1016/j.apacoust.2020.107631 4. Vuong T, Xia Y, Stern RM (2021) A modulation-domain loss for neural-network-based realtime speech enhancement. In: ICASSP 2021—2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6643–6647. https://doi.org/10.1109/ICASSP 39728.2021.9414965 5. Saddler M, Francl A, Feather J., Kaizhi A, Zhang Y, McDermott J (2020). Deep network perceptual losses for speech denoising 6. Kayser M, Zhong V (2015) Denoising convolutional autoencoders for noisy speech recognition. CS231 Stanford Reports, 2015—cs231n.stanford.edu 7. Luo Y, Mesgarani N (2019) Conv-tasnet: Surpassing idealtime–frequency magnitude masking for speech separation. IEEE/ACM Trans Audio Speech Lang Process 27(8):1256–1266 8. Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964 9. Piczak KJ (2015) ESC: dataset for environmental sound classification. https://doi.org/10.7910/ DVN/YDEPUT, Harvard Dataverse, V2 10. Rix A (2003) Comparison between subjective listening quality and P.862 PESQ score 11. Taal CH, Hendriks RC, Heusdens R, Jensen J (2010) A short-time objective intelligibility measure for time-frequency weighted noisy speech. In: ICASSP, IEEE international conference on acoustics, speech and signal processing—proceedings, pp 4214–4217. https://doi.org/10. 1109/ICASSP.2010.5495701 12. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. LNCS 9351:234–241. https://doi.org/10.1007/978-3-319-24574-4_28 13. French M, Handy R (2007) Spectrograms: turning signals into pictures. J Eng Technol 24:32–35 14. Zhang X, Zou Y, Shi W (2017) Dilated convolution neural network with LeakyReLU for environmental sound classification, pp 1–5. https://doi.org/10.1109/ICDSP.2017.8096153
Audio Denoising Using Deep Neural Networks
47
15. Kherdekar S (2021) Speech recognition of mathematical words using deep learning. In: Recent trends in image processing and pattern recognition. Springer Singapore, pp 356–362 16. Pandey A, Wang DL (2019) A new framework for cnn-based speech enhancement in the time domain. IEEE/ACM Trans Audio Speech Lang Process 27(7):1179–1188 17. Zhao Y, Xu B, Giri R, Zhang T (2018) Perceptually guided speech enhancement using deep neural networks. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, Calgary, AB, pp 5074–5078 18. Martin-Donas JM, Gomez AM, Gonzalez JA, Peinado AM (2018) A deep learning loss function based on the perceptual evaluation of the speech quality. IEEE Signal Process Lett 25(11):1680– 1684 19. Mohanapriya SP, Sumesh EP, Karthika R (2014) Environmental sound recognition using Gaussian mixture model and neural network classifier. In: International conference on green computing communication and electrical engineering (ICGCCEE) 20. Kathirvel P, Manikandan MS, Senthilkumar S, Soman KP (2011) Noise robust zerocrossing rate computation for audio signal classification. In: TISC 2011—proceedings of the 3rd international conference on trendz in information sciences and computing, Chennai, pp 65–69 21. Manoj C, Magesh S, Sankaran AS, Manikandan MS (2011) Novel approach for detecting applause in continuous meeting speech. In: ICECT 2011—2011 3rd international conference on electronics computer technology, Kanyakumari, vol 3, pp 182–186 22. Bhaskar J, Sruthi K, Nedungadi P (2015) Hybrid approach for emotion classification of audio conversation based on text and speech mining. In: Proceedings of the international conference on information and communication technologies (ICICT), Procedia Computer Science 23. Raj JS (2020) Improved response time and energy management for mobile cloud computing using computational offloading. J ISMAC 2(1):38–49 24. Suma V, Wang H (2020) Optimal key handover management for enhancing security in mobile network. J Trends Comput Sci Smart Technol (TCSST) 2(4):181–187
Concept and Development of Triple Encryption Lock System A. Fayaz Ahamed, R. Prathiksha, M. Keerthana, and D. Mohana Priya
Abstract The main aim of the triple encryption lock system is to surge the concept of security and to illuminate threats, and it allows higher authorities to authorise the concerned person to access the restricted areas. The issue of accessing highly authorised areas is paramount in all places. This system is suitable for server rooms, examination cells, home security and highly secured places. It is designed in such a way that the door has three encryptions—DTMF, password security and fingerprint sensing. We have designed it in such a way that the circuit will be in an OFF condition. The user sends a signal to audio jack frequency, then the relay is triggered, and it moves to the other two encryptions—keypad and fingerprint sensing. The attempt of our encryption system is that the microcontroller gets turned on only when the signal is sent from the user so that 24 h of heating issues are resolved. The real benefit is that it provides significant changes in accessing highly authorised areas and can bring a great change in the security system. Keywords DTMF—dual-tone multiple frequency · Door lock · Triple encryption lock · Keypad · Microcontroller · Fingerprint sensor
1 Introduction Security being the main intent of the project, the most important application is to provide security in home, examination cells, manufacturing units etc. Shruti Jalapur, Afsha Maniyar published an article on “DOOR LOCK SYSTEM USING CRYPTOGRAPHIC ALGORITHMS BASED ON IOT” in which the secured locking is achieved with AES-128 (Advanced Encryption Standards) and SHA-512 (Secure Hashing Algorithm). Hardwares such as Arduino, servo motor, Wi-Fi module and keypad have been used to obtain the proposed locking system [1]. Neelam Majgaonkar et al. proposed a door lock system based on Bluetooth technology, but A. F. Ahamed · R. Prathiksha (B) · M. Keerthana · D. M. Priya Department of Electrical and Electronics Engineering, R.M.K Engineering College, Kavaraipettai 601206, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_4
49
50
A. F. Ahamed et al.
the communication range of a Bluetooth module is too low in comparison with Wi-Fi or GSM communication [2]. Harshada B. More et al. made a deep technical survey on face detection, face recognition and door locking system. Eigenfaces algorithm is one of the algorithms that is mainly used for face recognition. The recognised face is compared with the prefetched face in order to lock and unlock the door [3]. Varasiddhi Jayasuryaa Govindraj et al. proposed a smart door using biometric NFC band and OTP-based methods in which Arduino, biometric, OTP, NFC, RFID and GSM module are used. In this, NFC band has been used as one of the methods for registered members and OTP technology is used for the guest user [4]. We have analysed and studied the locks of various manufacturing companies such as Godrej and Yale. On the basis of overall study, microcontrollers in all the systems have been switched on for 24 h which results in heating issues and reduces the lifetime of the system. So we are suggesting an efficient development over the current locking system with high security without heating issues. As the name defines the meaning, triple encryption lock system, primarily has three encryptions, i.e., DTMF module, password security and fingerprint sensor. As it is a secure and safe lock system, it consists of an electronic control assembly which ensures that safety is only in the hands of the authorities. Two things that happen in authenticated places are to provide security and easy access for unlocking the door, to be accessed only by the specific person with the user control. Dual-tone multiple frequency module, password security and fingerprint sensor are attached to the door to achieve the proposed system. This paper will give a vivid idea about the mechanism of each encryption, flow chart of the system and its working.
2 Objective The main objective of the paper is to bring out the study about the three encryptions (dual-tone multiple frequency module, password security and fingerprint sensor) in a generalised perspective. Authorisation is the process of verifying the credentials of a person and granting permission to access. In such a case, our system could be able to award authorisation to a higher degree. This system not only provides security for homes but also for other authenticated places. The report supplies information about the techniques and working in each encryption.
3 Methodology The report is structured by identifying the importance and demand of security in door locking and unlocking. This system involves electrical work to achieve our idea. The design methodology of the system consists of various steps. A single operator can use this system in minutes. First, the user’s problem in security is planned to achieve the desired system. Problems in existing system is analysed, then the essential method is
Concept and Development of Triple …
51
Fig. 1 Workflow of designing a triple encryption lock system
the electrical part using which workflow and functional block diagram is obtained. To integrate the system to any existing structure of design, microcontroller and motor is selected accordingly. The testing of code is done. The final prototype is developed for effective target to access the highly authenticated places. This enables the user to enter into highly authenticated areas (Figs. 1 and 2).
4 Major Components Required See Table 1.
52
A. F. Ahamed et al.
Fig. 2 Functional block diagram
Table 1 Components used in the system
S. No.
Name
Qty
1
Arduıno uno
1
2
MT 8870 DTMF Decoder
1
3
AS608 fıngerprınt sensor
1
4
4 × 4 matrıx keypad
1
5
Servomotor
1
6
Relay
1
7
Lock
1
8
12 V battery
1
5 Encryptions of the System 5.1 First Encryption This project uses DTMF technology for opening and closing of doors. Positive terminal of the LED is connected to the output pin of the decoder, negative terminal of the LED is connected to the ground of the decoder. Similarly, the mobile to DTMF decoder is connected by the auxiliary cable. Every numeric button on the keypad of the mobile phone generates a unique frequency when pressed. The user presses the keys and signal is sent via to the audio jack of the mobile. DTMF decoder decodes the audio signal. When the signal comes from the mobile, the corresponding value of the frequency selects their function and performs it. The positive and negative terminal of the 9 V battery is connected to the Vcc and ground, respectively. When
Concept and Development of Triple …
53
Fig. 3 Signal sent from user to audio jack frequency
number “1” is pressed from the authorised person from a faraway locked spot, the user mobile receives the frequency and the microcontroller gets turned on (Fig. 3).
5.2 Second Encryption The keypad consists of a set of push buttons. The encryption is made in such a way that the entered pin number compares with the preprogramed pin number. The keypad lock works with the 3-digit code which is 888. Once you place the correct combination in the keypad, the door gets unlocked. The door will remain closed upon entering the wrong pin number (Fig. 4). Fig. 4 Keypad encryption
54
A. F. Ahamed et al.
5.3 Third Encryption The Vcc of the fingerprint sensor is connected to 5 V of the Arduino and the ground of the fingerprint sensor is connected to ground of the Arduino. Similarly, Rx of fingerprint sensor is connected to pin 2 of Arduino and vx of fingerprint sensor is connected to pin 3 of Arduino. The door gets unlocked when the user scans the right fingerprint which is recorded in the system (Fig. 5).
6 Proposed System and Workflow This project is proposed to provide access to the system far away from the locked spot. Sometimes in the examination cells and home, it is necessary to provide access even if the authorised person is not present. The solution proposed is triple encryption lock system. The proposed system is designed in such a way that the door has three encryptions—DTMF (dual-tone multi frequency), password security and fingerprint sensing. In this project, the circuit will be initially at OFF condition. The user sends a signal to the audio jack frequency of the mobile to turn on the microcontroller. Relay gives true or false information; microcontroller, keypad and fingerprint gets turned on. If the password security results are true, it moves to the fingerprint sensing. If all the two inputs result true as an output, the motor running forward door lock gets opened. The relay gives the required supply to the DTMF to turn ON the microcontroller. Once the task is completed, Relay 2 is triggered to lock the door. The attempt of our encryption system is that the microcontroller gets turned on only when the signal is sent from the user so that 24 h of heating issues are resolved. This can bring a great change in the security system. In certain cases, it is difficult for higher authorities to give authorization to the concerned person to access the restricted areas. The project is designed in such a way that the access is given at that instant and triple encryption lock system can be made affordable to the people around. In today’s fast growing world, the proposed system has high security and gives convenient access with three encryptions (Fig. 6). Fig. 5 Fingerprint encryption
Concept and Development of Triple …
55
Fig. 6 Flowchart of the system
The user has to successfully cross the three verification points as below.
6.1 Step 1: DTMF—Triggering Authorised person who is far away from the locked spot has to dial up to the mobile that is connected to the locking system and enters number “1” from his/her dial pad. So that the DTMF decodes the audio frequency and powers the microcontroller, keypad and fingerprint sensor. Power to the microcontroller and other equipment are turned ON only if the relay is closed.
56
A. F. Ahamed et al.
Fig. 7 Triple encrypted lock system
6.2 Step 2: Pin Code Verification The user has to enter a number on the keypad, the entered pin is verified with the preprogramed pin. If it is verified true, it moves on to the next fingerprint encryption. If either of the input is false, the process gets stopped.
6.3 Step 3: Fingerprint Verification Third encryption is the fingerprint sensor. The user has to place the finger on the fingerprint sensor, the captured fingerprint is verified with the recorded fingerprint. If it results true, then the motor runs forward and the door gets unlocked (Fig. 7).
7 Conclusion With the developments enumerated, we have developed expertise in the design, development performance and modelling in the application of lock systems. This will be pivotal in ensuring access to the concerned areas. The designed system not only provides easy access to the user, it also resolves 24 h of heating issues. Thus, access can be given to the concerned person in highly authorised areas. On the off chance when you go to the market for the unique lock, it is close to |9000–|14,000, whereas
Concept and Development of Triple …
57
our triple encrypted lock system is modest with high security and suited for examination cell, server room, home security and highly authenticated areas. Therefore, when we need a solution for easy access in highly concerned areas with a high security, our triple encrypted lock system will be the solution.
References 1. Jalapur S, Maniyar A (2020) Door lock system using cryptographic algorithms based on IOT. Int Res J Eng Technol 7(7) 2. Majkaongar N, Hodekar R, Bandagale P (2016) Automatic door locking system. IJEDR 4(1) 3. More HB, Bodkhe AR (2017) Survey paper on door level security using face recognition. Int J Adv Res Comput Commun Eng 6(3) 4. Govindraj VJ, Yashwanth PV, Bhat SV, Ramesh TK (2019) Smart door using biometric NFC band and OTP based methods. JETIR 6(6) 5. Nehete PR, Chaudhari J, Pachpande S, Rane K (2016) Literature survey on door lock security systems. Int J Comput Appl 153:13–18 6. Delaney R (2019) The best smart locks for 2019. In: PCMag 7. Automatic door lock system using pin on android phone (2018) 8. Verma GK,Tripathi P (2010) A digital security system with door lock system using RFID technology. Int J Comput Appl (IJCA) (09758887) 5:6–8 9. Hassan H, Bakar RA, Mokhtar ATF (2012) Face recognition based on auto-switching magnetic door lock system using microcontroller. In: 2012 International conference on system engineering and technology (ICSET), pp 1–6 10. Jagdale R, Koli S, Kadam S, Gurav S (2016) Review on intelligent locker system based on cryptography wireless & embedded technology. Int J Tech Res Appl pp 75–77 11. Johnson J, Dow C (2017) Intelligent door lock system with encryption. Google Patents
Partially Supervised Image Captioning Model for Urban Road Views K. Srihari and O. K. Sikha
Abstract Automatically generating a characteristic language portrayal of an image has pulled in interests in light of its significance in practical applications and on the grounds that it associates two significant artificial intelligence fields: natural language processing and computer vision. This paper proposes a partially supervised model for generating image descriptions based on instance segmentation labels. Instance segmentation, a combined approach of object detection and semantic segmentation is used for generating instance level labels which is then used for generating natural language descriptions for the image. The instance segmentation model uses MRCNN framework with feature pyramid networks and region proposal networks for object detection, and fully convolution layer for semantic segmentation. Information obtained from different local region proposals are used to generate region wise captions. Important aspects of the caption include distance, color and region calculations based on the results obtained from the instance segmentation layers. This paper uses instance segmentation layer information such as ROIs, class labels, probability scores and segmentation values for generating effective captions for the image. The proposed model is evaluated on Cityscape dataset where the primary objective is to provide semantic scene understanding based on the instances available in urban areas. Keywords Instance segmentation · Partially supervised model · MRCNN (Mask region based convolution neural networks) framework
1 Introduction Generating meaningful natural language descriptions of an image has been promisingly used by many applications over the past few years. Image captioning applications uses both computer vision and natural language processing tasks. Most of K. Srihari · O. K. Sikha (B) Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_5
59
60
K. Srihari and O. K. Sikha
the state-of-the-art image captioning algorithms [1] have different set of preprocessing techniques for image and the output text separately. They use different types of sequence modeling to map the input image to the output text. Supervised image captioning has stronger impact but has few important limitations and issues. Applications of supervised image captioning like super-captioning [2] uses two-dimensional word embedding while mapping the image to its corresponding natural language description. Partially supervised image captioning [3] has applied their approach of captioning to existing neural captioning models using COCO dataset. This approach uses weakly annotated data which is available in the object detection datasets. The primary objective of this paper is to derive set of natural language captions with the partially available supervised data retrieved from the Instance segmentation network trained on Cityscapes dataset. Object detection and semantic segmentation results were involved in creating the partially supervised data. Major contributions of this work include: • Training end-to-end MRCNN with U-NET model for instance segmentation to get semantic information. • Object classification and localization based on images from urban street view. • Develop an inference level captioning module without sequence modeling for generating meaningful captions based on the information produced from the instance segmentation layers. The rest of the paper is organized as follows. Related works are explained in Sect. 2; Sect. 3 describes the dataset attributes, image details and proposed model in brief; The captioning results obtained from the proposed model are detailed in Sect. 4. Finally, the paper concludes with Sect. 5.
2 Related Works Object detection and semantic segmentation have become one of the key research problem in computer vision since most of the high-end vision-based tasks such as indoor navigation [4], autonomous driving [5], facial part object detection [6], human computer interaction require accurate and efficient segmentation. With the advent of deep learning models in the past few decades’ semantic segmentation problem also witnessed great progress especially using deep Convolutional Neural Networks. Mask RCNN (MRCNN) [7] is an important breakthrough in the instance segmentation domain. Mask R-CNN uses Feature Pyramid Networks [8] and Region Proposal Networks as in FASTER-RCNN [9] for object detection and uses fully convolution layers for semantic segmentation. Ronnebergeret al. developed U-NET [10] model which was an inspiring semantic segmentation algorithm majorly used in medical AI applications. Brabander et al. [11] proposed a discriminative loss function based semantic instance segmentation model for autonomous driving application. Recurrent neural networks for semantic instance segmentation proposed by Salvador et al. [12] uses CNN, RNN and LSTM for semantic segmentation, object
Partially Supervised Image Captioning Model for Urban Road Views
61
detection and classification, respectively. Developments in the instance segmentation domain also put forward new research directions in image captioning [13]. Image captioning with object detection and localization [14] by Yanguses uses sequence to sequence encoder decoder model with attention mechanism to generate image captions. Densecap fully convolution localization networks for dense captioning by Justin et al. [15] used localization values as ground truth while generating captions. Pedersoli et al. [16] proposed areas of attention for image captioning, where object localization and proposals are used for generating captions. The major drawback of sequence modeling based approach is that training end to end sequence modeling is computationally expensive. Most of the state-of-the-art image captioning models fails to generate captions with semantic information of the image under consideration and they tend to generate identical captions for similar images. Anderson proposed a partially supervised image captioning model [3], where labeled objects and detection results are used to generate finer sentences as captions. A semi-supervised framework for image captioning proposed by Wenhuet al. [17] detects visual concept from image and uses reviewer-decoder with attention mechanism for caption generation. Liu et al. [18] proposed an image captioning model which uses partially labeled data proposed as Image captioning by self-retrieval with partially labeled data, but implemented using reinforcement learning algorithm. Kamel and his team proposed a Tenancy status identification in parking slots [19] which was based on mobile net classifier. Image processing techniques are always of greater significance and used in most of the computer vision applications. 3D image processing using Machine learning [20] was developed by Sungeetha and team which is based on input processing for Man–Machine Interaction.
3 Proposed Image Captioning System This section describes the proposed partially supervised image captioning model in detail. An improved Mask-RCNN model [21] with UNET architecture proposed in our previous work is used for generating instance segmentation labels. The bounding box and pixel-wise semantic information obtained from the hybrid M-RCNN—UNET model is used as the initial input to the image captioning model. The annotations, masked outputs, localization results and object level labels obtained from the instance segmentation model are used for generating meaningful captions. Figure 1 shows the MRCNN-UNET [21] hybrid model architecture for instance segmentation, and Figure 2 shows the proposed image captioning model. The instance segmentation labels obtained from the MRCNN-UNET hybrid model is shown in Fig. 4, which has pixel level annotation and corresponding confidence score. Region proposal network is used to generate proposals for object detection in faster-rcnn. RPNs does that by learning from feature maps obtained from a base network (VGG16, ResNet, etc.,). RPN will inform the R-CNN where to look. The input given to the RPN is the convolution feature map obtained from a backbone
62
K. Srihari and O. K. Sikha
Fig. 1 Architecture of MRCNN-UNET hybrid instance segmentation model
Fig. 2 Architecture of the proposed image captioning model
network. The primary function of RPN is to generate Anchor Boxes based on Scale and Aspect Ratio. 5 varying scales and 3 different aspect ratios are initialized, creating 15 anchor boxes around each proposal in the feature map. The next immediate task of RPN is to classify each box whether it denotes foreground or back ground object based on IOU values of each anchor boxes compared with the ground truth. The metrics used in this level is rpn_cls_score and rpn_bbox_pred values. In anchor target generation, we calculate the IOU of GT boxes with anchor boxes to check if it is foreground/background and then the difference in the coordinates are calculated as targets to be learned by regressor. Then these targets are used as input for cross
Partially Supervised Image Captioning Model for Urban Road Views
63
entropy loss and smooth l1 loss. These final proposals are propagated forward through ROI pooling layer and fully connected layers. Feature pyramid network (FPN) is a feature extractor which generates multiple feature map layers (multi-scale feature maps) with better quality information than the regular feature pyramid for object detection. With more high-level structures detected, the semantic value for each layer increases. FPN provides a top-down pathway to construct higher resolution layers from a semantic rich layer. FPN extracts feature maps and later feeds into a detector, says RPN, for object detection. RPN applies a sliding window over the feature maps to make predictions on the objectness (has an object or not) and the object boundary box at each location. U-NET is a fully convolutional neural network which is mainly used for training end to end image processing algorithms where the set of input images can be of any domain, but the corresponding output images are masked images of the primary objects present in the input image. The size of input and output images are same. U-NET model is nothing but a Convolutional AutoEncoder which maps input images to masked output images. One important modification in U-Net is that there are a large number of feature channels in the upsampling part, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting part, and yields a ushaped architecture. The network only uses the valid part of each convolution without any fully connected layers. The output images are of binary images where only the primary objects will be in the form of masked objects. The U-NET model consists of 10 layers where first 5 layers in the contractive phase and the last 5 layers in the expansive phase. The loss function used is binary cross entropy and the optimizer used is Adam. The metric used for validating the FCN model is cross entropy loss value and Mean IOU with 2 classes. One class is for background and one class is for foreground objects. There were 19,40,817 trainable parameters. Figure 3 shows the sample image and its corresponding output mask.
Fig. 3 a Sample input image to UNET. b Output mask
64
K. Srihari and O. K. Sikha
Fig. 4 Sample output from MRCNN-UNET instance segmentation model
Table 1 Quantitative evaluation Metrics analysis of MRCNN-UNET instance segmentation model
Metric name
Value
Box-classifier loss
0.3874
Box-localization loss
0.3232
Box-mask loss
2.5810
RPN localization loss
1.7640
RPN classification loss
1.8240
Total loss
7.0620
Mean Average Precision (mAP)
0.0341
mAP at 0.5 IOU
0.0401
mAP at 0.75 IOU
0.0001
mAP (small)
0.0029
mAP (medium)
0.0017
mAP (large)
0.0094
Average Recall (AR)
0.0017
AR (small)
0.0012
AR (medium)
0.0362
AR (large)
0.0197
Table 1 summarizes various evaluation metrics of MRCNN-UNET instance segmentation model [21]. Box classification loss metric shows the measure of correctness of object classification during object detection phase and the value was 0.3874 after 50,000 epochs. Box localization loss metric shows how tight the bounding box values are predicted. It is a 4 K variable compared with the ground truth values and the value was 0.3232 after 50,000 epochs. RPN loss is the metric obtained during region proposal network trying to match the proposals to the ground truth values and the value was 1.824 after the end of training. Box-classifier mask loss is the metric used to compare the ground truth masked object. The instance segmentation mask obtained from the MRCNN-UNET hybrid model is fed into the image caption module as shown in Fig. 2. The information available at the end of instance segmentation layers includes region of interests for the detected objects, their probability scores, and their class labels and their corresponding binary labeled pixel wise segmented masks which are shown in Fig. 5. The captioning module does not have ground truth captions for training. The information obtained from the instance level labels
Partially Supervised Image Captioning Model for Urban Road Views
65
Fig. 5 Information available at the instance segmentation layer
were fit into meaningful NLP descriptions including semantic information such as object color, location and distance between the objects. The skeleton structure of the output captions are fixed for all the images but distance, color and region values of the objects differ based on the 3 different captioning modules in every image. The captioning modules include estimating size and distance based on reference object method, color detection using k-means clustering and HSI color calculations, and image-region wise captioning. This combination of instance segmentation labels and inference captioning modules is a novelty approach for image captioning. This approach does not include any sequence modeling for generating captions, making the inference part computationally simple and effective. First level of caption lists the important objects present in the image based on class label information. Second level of caption is based on the estimated distance between vehicles or distance between a vehicle and traffic signal, with respect to a real world reference object. Colors
66
K. Srihari and O. K. Sikha
of the contour detected objects are found based on the color detection captioning module. Finally, the object’s location in the image is found using image-region wise captioning module. Instance segmentation is used as it is best scene understanding in any real time applications. The issues in the existing traditional Image captioning systems are limited ability of generating captions, generating identical sentences for similar images. Sequence modeling is computationally expensive which is not used in the proposed model. Localization and segmentation results are used in the proposed approach which is lacking in most of the state of the art image captioning approaches. Therefore, proposed model overcomes these research issues which are carried over by the usual image captioning algorithms and provide good results by using the inference level captioning modules.
3.1 Dataset Used The Cityscapes Dataset is used for evaluating the proposed model. The dataset contains images from 50 cities during several months (spring, summer and fall) and the images are augmented with fog and rain, making it diverse. It has manually selected frames which include large number of dynamic objects, scene layout and varying background. It covers various vehicles like car, truck, bus, motor cycle, bicycle, caravan and also classifies human person walking in a side walk or rider riding on road. Table 2 shows the set of classes present in the dataset and Fig. 6 are few sample images. The cityscapes dataset contains 25,000 images across 30 classes of objects covering 50 different cities. From each class around 500–600 instances, across different images are taken into consideration for the training process. Tensorflow object detection and instance segmentation Mask-RCNN approach is used. The creation of training and validation records used for modeling is created based on the 30 classes using annotations and parsing into a single json file is created where all the image and object details will be present as the ground truth. Horizontal and Table 2 Dataset instances
Group
Objects
Flat
road, sidewalk, parking, rail track
Human
Person, rider
Vehicle
Car, truck, bus, on rails, motorcycle, bicycle, caravan
Construction
Building, wall, fence, guard, rail, bridge, tunnel
Object
Pole, traffic sign, traffic light
Nature
Vegetation, terrain
Others
Ground, sky
Partially Supervised Image Captioning Model for Urban Road Views
67
Fig. 6 Cityscape dataset sample images
vertical flips are majorly used image augmentation and preprocessing techniques. Contrast, saturation, brightness and hue image processing attributes are also used in the augmentation process. Around 500 first stage max proposals are given to each ROI for best detection purpose during training and dropout, weight decay and l2 regularization techniques are also used. Pipeline configuration files are used to fine tune the hyper parameters for the MRCNN model. The training is connected to tensorboard where real-time graph values of all metrics can be seen and analyzed. After the training and validations are completed, the saved hybrid MRCNN-UNET model is generated using tensorflow inference graph session-based mechanisms.
3.2 Estimation of Size and Distance Using a Reference Object The generated captions from the proposed model has information regarding the distance between vehicle instances, or distance between vehicle and traffic signal. A reference object based algorithm is used for calculating the distance, which demands for a reference object whose original size is known. The pixel wise vehicle mask obtained in the segmentation result is mapped with the original size to get a relative pixel wise size. Reference ratio is then calculated by dividing the original size by pixel wise size. By taking corresponding length, or corresponding width, the reference ratio will be approximately same. Our objective is to find the original distance of any object or original distance between any 2 vehicles. The pixel wise distance between two vehicles is calculated using the Euclidean formula which is then multiplied with the reference ration to obtain the actual distance as shown in Eqs. 1 and 2. Table 3 shows a sample vehicle mask obtained from instance segmentation module and the corresponding distance calculated based on reference object algorithm. Original height of car Calculated height of car Original_Distance_between_cars = Calculated_Difference_between_cars
Reference Ratio =
(1)
68
K. Srihari and O. K. Sikha
Table 3 Sample results of distance calculation Pixel wise vehicle mask obtained from instance segmentation module
Calculated distance
11 Meters
2.45 Meters
Original_Distance_Between_Cars = Reference_Ratio *Calculated_Difference_Between_Cars (2) The basic information needed for calculating the distance between 2 cars is region of interests box detections of the 2 cars. Consider the reference object in the image is another car where the original height and width of the car is known. The reference object is also detected by the model and its corresponding ROI box coordinates are known. So we will be getting the original height of the reference object and the box coordinate calculated machine result height of the reference object. Reference ratio is calculated by dividing original height by machine result height of the reference object. This reference ratio is same and common for all the objects present in the image. Whereas the reference ratio value changes with image to image as each image has different orientation and zooming attributes. The objective is to find the original distance between the cars. Since the ROIs of the 2 cars is known, the center coordinates of both the cars is also calculated. By using Euclidean distance, machine result of pixel wise distance between the 2 center coordinates of the car is calculated. When this distance value is multiplied with the reference ratio, the original distance between the cars can be calculated. Using this algorithm, the distance between any 2 objects can be calculated provided the object is present in the training data and is well trained.
Partially Supervised Image Captioning Model for Urban Road Views
69
Fig. 7 Image-region wise captioning
Table 4 Regions and their corresponding center coordinates
Region
Center coordinates
Top-left
(256, 512)
Top-right
(256, 1536)
Center
(512, 1024)
Bottom-left
(768, 512)
Bottom-right
(768, 1536)
3.3 Image-Region Wise Captioning The bounding box coordinates of objects present in the image is further processed to calculate the location. The entire image of size 1024 × 2048 is divided into 5 regions namely; top-left, top-right, bottom-left, bottom-right and center as in Fig. 7. The center pixel coordinates of each region was calculated as tabulated in Table 4. Euclidean distances are calculated between center coordinates of the object and all other region’s center coordinates to find the exact location. In Fig. 7, the object of interest, i.e., car is located in the center part of the image.
4 Results and Discussion The output set of captions derived out of our image captioning algorithm explains clear understanding of the instances available in the image. Semantic details like distance between vehicles, distance between traffic signal and the vehicles, the region in which the instances are present and the color of instances are captured in the output captions. Few example images and their sample output captions are illustrated as follows.
70
K. Srihari and O. K. Sikha
Sample 1:
Output Caption: The objects present in this image are: a building, 2 cars on the road and 2 cars in the parking, 2 traffic sign boards, 3 poles. A tree is present in the top right part of the image. The 2 cars are present in the top left part of the image. The distance between black car and white car is 11 m. Sample 2:
Output Caption: The objects present in this image are: 3 cars, 3 bicycles, 1 motorcycle, 1 traffic light, 2 persons and 1 tree. The traffic light shows green. The distance between white car and traffic light is 6 m. The 3 cars are present in the top right part of the image. The distance between black car and traffic light is 2 m. Figures 8 and 9 describe the module wise output captions in detail. The initial level of captions tells the list of objects present in the image. Distance between truck and car is calculated in module-1 based on reference object distance calculation. Color
Partially Supervised Image Captioning Model for Urban Road Views
71
Fig. 8 Module wise output captions—Sample 1 a Input image. b Generated instance segmentation mask
Fig. 9 Module wise output captions—Sample 2 a Input image. b Generated instance segmentation mask
of the truck and car is obtained in the second module based on K-means and CIE L*a*b space values. Object locations are found in the image-region wise captioning part as the third module.
72
K. Srihari and O. K. Sikha
5 Conclusion This paper proposes an image captioning model of cityscapes dataset using instance segmentation labels as the input. Mask-RCNN-UNET is used as the instance segmentation algorithm where bounding box prediction values and pixel segmented values are available as partially supervised output data. The proposed image captioning system generates semantic descriptions including distance between vehicles by reference object distance method, colors of objects present in the image using k-means clustering and LAB color space values. The generated captions are meaningful and can be applied to many real world applications. Captions which are generated from the urban city images will have detailed information about the traffic control and pedestrian safety, which can be useful for autonomous driving. More indications or alert can be used for pedestrians crossing the road, vehicles which disobey the traffic rules based on the output of the model. When it comes to real world applications, the captions can be given to hearing aid for the blind. This model can be used for automated captions in YouTube for videos containing urban street views. While driving, some unusual behavior in the roads can be given as an instruction in order to avoid accidents. In case of road accidents, exact set of reports can be collected instantly.
References 1. Sanjay SP et al (2015) AMRITA-CEN@ FIRE2015: automated story illustration using word embedding. In: Fire workshops 2. Sun B et al (2019) Supercaptioning: image captioning using two-dimensional word embedding. arXiv preprint arXiv:1905.10515 3. Anderson P, Gould S, Johnson M (2018)Partially-supervised image captioning. arXiv preprint arXiv:1806.06004 4. Sanjay Kumar KKR, Subramani G, Thangavel S, Parameswaran L (2021) A mobile-based framework for detecting objects using SSD-MobileNet in indoor environment. In: Peter J, Fernandes S, Alavi A (eds) Intelligence in big data technologies—beyond the hype. advances in intelligent systems and computing, vol 1167. Springer, Singapore. https://doi.org/10.1007/ 978-981-15-5285-4_6 5. Deepika N, SajithVariyar VV (2017) Obstacle classification and detection for vision based navigation for autonomous driving. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). IEEE 6. Vikram K, Padmavathi S (2017) Facial parts detection using Viola Jones algorithm. In: 2017 4th International conference on advanced computing and communication systems (ICACCS), Coimbatore, pp 1–4. https://doi.org/10.1109/ICACCS.2017.8014636 7. He K et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision 8. Lin T-Y et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition 9. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. NIPS 10. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. BIOSS Centre for Biological Signalling Studies, University of Freiburg, Germany
Partially Supervised Image Captioning Model for Urban Road Views
73
11. De Brabandere B, Neven D, Gool LC (2017) Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551 12. Salvador A et al (2017) Recurrent neural networks for semantic instance segmentation. arXiv preprint arXiv:1712.00617 13. Weeks AR, Hague GE (1997) Color segmentation in the hsicolor space using the k-means algorithm. In: Nonlinear image processing VIII. Vol. 3026. International Society for Optics and Photonics 14. Yang Z et al. (2017) Image captioning with object detection and localization. In: International conference on image and graphics. Springer, Cham 15. Johnson J, Karpathy A, Fei-Fei L (2016) Densecap: Fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition 16. Pedersoli M et al (2017) Areas of attention for image captioning. In: Proceedings of the IEEE international conference on computer vision 17. Chen W, Lucchi A, Hofmann T (2016) A semi-supervised framework for image captioning. arXiv preprint arXiv:1611.05321 18. Liu X et al (2018) Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data. In: Proceedings of the European conference on computer vision (ECCV) 19. Kamel K, Smys S, Bashar A (2020) Tenancy status identification of parking slots using mobile net binary classifier. J Artif Intell 2(3):146–154 20. Sungheetha A, Sharma R (2021) 3D image processing using machine learning based input processing for man-machine interaction. J Innov Image Process (JIIP) 3(01):1–6 21. Srihari K, Sikha OK (2021) An improved MRCNN model for instance segmentation. Pattern Recogn Lett
Ease and Handy Household Water Management System K. Priyadharsini, S. K. Dhanushmathi, M. Dharaniga, R. Dharsheeni, and J. R. Dinesh Kumar
Abstract Managing waste water is one of the important things that is directly connected to the entire water chain and so it is essential to manage water utility. This is the right time to start saving water as population increases drastically along with which the necessity of water increases. For an instance, about 85 L/day of water is wasted on an average by a family. The water we save today will serve tomorrow, by this way though there are lots of technologies in saving water wastage, this project is all about narrowing down all the technologies and making an IOT application which not only makes water management at home easy but also make it handy which indeed helps to monitor and access the household water even at the absence of physical presence. Keywords IOT · Water management · Handy device · Portable
1 Introduction Here, we are with a simple IOT device which in turn helps the user to manage the household water in an efficient way. This handy and easily portable device will be even very helpful to elderly people to manage water from wasting [1]. An automatic water management system doesn’t require a person’s contribution in maintenance. All automated water systems are embedded with electronic appliances and specific sensors. The automatic irrigation system senses the soil moisture and further submersible pumps are switched on or off by using relays, and as a result, this system helps in functioning without the presence of the householder. The main advantage of using this irrigation system is to reduce human interference and ensure proper irrigation [2]. The use of automatic controllers in faucets and water storage tanks K. Priyadharsini (B) · S. K. Dhanushmathi · M. Dharaniga · R. Dharsheeni · J. R. Dinesh Kumar Department of Electronics and Communication Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, India e-mail: [email protected] S. K. Dhanushmathi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_6
75
76
K. Priyadharsini et al.
help save a large amount of water being wasted. This project shows a system where domestic use of water can be controlled using IoT devices from 200 m away. Household consumption of water is majorly in gardening, water taps in kitchen and other areas, refilling of over water tanks [3]. It is also used to know the water level in the tank and also to control overflow by switching off the pump when the tank is filled.
2 Existing System In general, in agriculture, a drip irrigation system [4] is used to maintain the moisture but broadly there is no control in measuring the moisture content already present in the soil. The person by their experience water the plants if moisture content is already present in the soil the water goes waste, and before watering if the soil goes dry losing the moisture watering after is of no use [5]. The same is happening in sprinkler systems also, some areas are watered more, and some are less. There is a lot of wastage during usage of water in houses, while operating taps. Taps must be closed tight to avoid spillage of water, or it will drain the water from the tank. If a person opens the tap to fill a bucket of water, there is always a chance that the bucket will overflow before the tap is closed. Water tanks are not monitored mainly because of their location. One knows that there is no water when the taps run dry after then only the motor is turned ON to fill the water tank [6]. Now, while the tank is being filled with water the motor should be switched off before it overflows, but if we forget the water will flow out and go waste to drain. There are systems to monitor the above causes but not together. If a tap is half closed and drains the water from the tank means, while the water level indicator senses the level and turns ON the motor, the entire storage of water will be wasted. Based on recent research, IoT-based household water management is only used for managing tank water level by using application [7].
3 Proposed System The proposed methodology introduces an application with IoT options which controls the water usage of the house. The proposed system overcomes the drawback of mono-purpose usage as supervising the tank water level to multi-purpose usage as controlling three main areas of water usage [8]: gardening, taps in the house and water tank. In the application, the first one is for gardening where there is an option for sensing the soil moisture if the moisture content is less than the required amount we can switch on the motor for gardening. This can be done directly in the application through mobile. The taps in the house are sensor based so that when we place hands the water will flow. If the user feels that a tap may be leaky, and he is not in the house he can close water flow in all the taps through the application. During all these, there should always be water in the tank, if the water level is less the application sends out
Ease and Handy Household Water Management System
77
a notification to switch ON the motor to fill the tank. When the tank is fully filled, it will automatically switch OFF the motor without letting the water to overflow. The retrieved data is sent to the cloud through the IoT module, and the user manages it through the application [9].
3.1 Advantages • Usage of smart IoT devices simplifies work and helps to plan efficiently. • Minimum amount of water is used to satisfy the daily needs. • Wastage of water can be controlled from anywhere through IoT.
4 Block Diagram The working model of the system is briefly described in the given block diagram. All modules and sensors including moisture and temperature sensors are directly connected to the arduino which serves as a controlling system [10]. Therefore, these modules are connected via cloud to our mobile phone which indeed helps the application to run (Fig. 1).
5 Flow Chart Application Software Flowchart The application ‘E-Water’ aims at conserving water for future generations which starts from every individual home. And also, the ecosystem should be kept balanced by treating plants, herbs and trees which are planted in homes with sufficient amounts of water (Fig. 2). So, initially, this application has the control over taps and the tank of the home. The following FCs would be representing individual hardware setups’ procedure of working.
5.1 Automatic Faucet System Here, the tap is being smartened by automation with servo motor so as to turn on and off while the user is present and absent, respectively, by calculating the specific range of distance using an ultrasonic sensor [11] (Fig. 3).
78
Fig. 1 Block diagram of the water management system model
Fig. 2 Flowchart of the water management system model
K. Priyadharsini et al.
Ease and Handy Household Water Management System
79
Fig. 3 Flowchart of smart faucet
5.2 Smart Tank System In addition to that, there is also the water tank at home which is being smartened in a way that the setup is set at the lid of the water tank so that the ultrasonic sensor senses the distance of presence of water from the lid of the tank. The resultant status is updated instantly on a liquid crystal display (Fig. 4). If the water is at bottom, the LCD displays ‘very low’ which in turn makes the IC L239D to direct the servo motor to be ‘ON’ and when subsequent levels of water rise, corresponding notification is updated in the LCD. While the water level reaches the top of the lid, again the IC is made to change the servo motor position so as to turn ‘OFF’ the motor [12].
80
K. Priyadharsini et al.
Fig. 4 Flowchart of smart tank
5.3 Smart Irrigation System Then to irrigate the plants, the soil moisture is tracked and if it goes below the threshold value potentiometer raises, an LED glows, an LCD (liquid crystal display) notifies ‘water irrigation’ and the motor is triggered to ‘ON’ position [13]. After the moisture of the soil is maintained, the motor is made to be ‘OFF’ (Fig. 5).
Ease and Handy Household Water Management System
81
Fig. 5 Flowchart of smart irrigation
6 Concept Implementation 6.1 Sensors The Arduino connects with ultrasonic sensors to detect water level in the tank, to sense hand in order to open the faucet through ultrasonic waves and by using soil moisture sensor to measure the moisture in the garden for irrigation.
82
K. Priyadharsini et al.
6.2 IoT Module The interface between the sensors and the cloud is done by IoT module. The collected data from hardware integration is stored in cloud memory (noSQL big data base) through IoT [14].
6.3 Application If-then-else approach is used to make it easier. User gets the collected data in the application. User can access and manage home water management through the application (Fig. 6).
Fig. 6 Workflow of proposed system
Ease and Handy Household Water Management System
83
Fig. 7 Simulation of garden irrigation by measuring moisture in soil
7 Simulation Household water management system comprises three implementations which makes sure of reduction in wastage of water, and hence, the working of these implementations which include smart irrigation system, smart faucet system and water tank have been simulated and verified as follows. These simulations are done in Tinkercad [15]. All these simulations are further connected to the application (Figs. 7, 8 and 9).
8 Function of Application Having the entire application as hardware cum software with three main usages, the ultimate aim is to preserve water through the individual circuit integration. This proposed project integrates the tank water level indication, automatic on/off water tap (by sensing hand) and smart water irrigation for home gardens [16]. When a user gets into this application, can see at the bottom the options of garden, tap, tank, status, exit. If user choose garden, it checks the moisture of soil and brings us notice to turn off water for irrigation when moisture level is low. It also shows the level of moisture in the soil. Likewise, for tap, it manages the leakages of taps. By sensing the hand, the water flow is activated in the hand washing area as well as the hands leave the sensing area (hand washing area) the water tap completely closes preventing any wastage [17, 18]. Thus, the automation in faucet is being monitored that can not only be implemented in a newly constructed building but also in homes those have been present for so many years with faulty leaky pipes. If the user chooses the option of tank, it checks the water level in the water tank
84
Fig. 8 Simulation of tank water level indication by buzzer
Fig. 9 Simulation of automatic opening closing of tape by sensing hands
K. Priyadharsini et al.
Ease and Handy Household Water Management System
85
Fig. 10 Selection of garden
and indicates us by variant colors. Once if the tank is getting filled or emptied, the buzzer also alerts us as well as the user gets notified by the application. The status option shows whether the motor is turned ON or not. Chosen of exit option, exit the application. This household water management system can be used even when user is not at home using the automatic devices technology [19]. This system even works on older homes’ water systems and older leakage faucets (Figs. 10, 11, 12, 13, 14 and 15).
8.1 Working Principle To measure dielectric permittivity of the surrounding medium, capacitance is used by the soil’s moisture measuring sensor. A voltage in the sensor is proportional to the dielectric permittivity and the water content of the soil. Based on the result of threshold value, the motor is made to be on or off, respectively. Then, the IC L239D can change the direction and speed of the servo motor wherein the faucet can toggle its position between on and off. Based on the result of threshold value, the entire setup works to produce a result of effective usage of water in taps. The tank level monitoring and filling of tanks along with notifying the target user employees the sensor probe in the arrangement. The probes that are used here
86
K. Priyadharsini et al.
Fig. 11 Showing of moisture level
are triggered to send information to the control panel to indicate the emptiness or completely getting filled before the overflow. Thus, the application aims in controlling the water wastage by the user’s choice, i.e., when one among the three choices, is selected by the applicant, the respective principle behind each choice can be accessed and the working commences.
9 Result From the E-water application, we checked the following water wastage situations: a drinking water tap can waste up to 75 L a day due to leakage. Of the total usage of water 15% of water is wasted in leakage per day in the absence of inmates (Source Google). We can avoid the wastage by shutting the flow from the tank remotely also, if water level decrease is monitored in the tank after installing our product at homes. It is estimated that 7% of the water supplied is wasted during refilling of water tanks because of overflow (Source Google). With our application refilling of water tanks can be monitored and the pump can be turned off before the tank overflows by continuously monitoring the water level. This prevents the overflow spillage and
Ease and Handy Household Water Management System Fig. 12 Selection of tap option
Fig. 13 Showing of water leakage
87
88
K. Priyadharsini et al.
Fig. 14 Selection of water tank
wastage of water to Nil. Usually during gardening, twice the amount of water is being watered to plants and leads to major water wastage in a household. This can be controlled by checking the moisture content with the application, and watering the plants when it is required (Fig. 16). The above graph constitutes the concept of utility of water before and after the implementation of the proposed system. The survey gives the information about the water usage in a home for a period of approximately two months. Axis of abscissa is the time period where one unit equals one week. Axis of ordinate is the amount of water used in kilo-liters where one unit equals one kilo-liter. The existing system graph shows the history of greater amount of water used before implementation of the proposed idea. The proposed system graph conveys the less usage of water since the wastage is controlled with the help of the application. The variation from one week to another week is due to the situations handled at home (say due to occasional moments, functional days or malfunctioning/repair on the user’s mobile). Hence, the conclusion is that when the application is used in a home, and water is preserved than before (Table 1).
Ease and Handy Household Water Management System
89
Fig. 15 Selection of motor status
Usage of water in existing system vs proposed system water in kilo litres
10 8 6 4
Proposed system
2
Existing system
0
Fig. 16 Graph analysis
90
K. Priyadharsini et al.
Table 1 Comparison between before and after implementation of the proposed system Wastage of water in household
After installing application
A flush of the toilet uses 6 L of water. On an average a person wastes about 0–45 L of water per day for flushing. To understand it better, it is 30% of the water requirement per person per day. Hence, wasted water amounts to 125 million liters per day
We only manual checking for this now here after we can check for water leakage and have control over it by using our handy application embedded with IoT and monitor and control it from anywhere
A drinking water tap can waste upto 75 L a day We can reduce atleast 98% of total wastage due to leakage. Of the total usage of water, 15% after installing our product at homes water is wasted in leakage per day It is estimated to be 7% of the water supplied is With our application refilling of water tanks wasted during refilling of water tanks because can be monitored and the pump can be turned of overflow off before tank over flows. This prevents the overflow spillage and wastage of water to nil Usually during gardening, twice the amount of water is being watered to plants and leads to major water wastage in a household
This can be controlled by checking the moisture content and only watering the plants when it is required Moisture in soil is monitored and the application gives notification when it less
10 Conclusion The household water management system connects via IOT and brings into a singlehandy application. This application can be used effortlessly. This bring entire household water management into single-handy application. Hence, by this application can conclude that to preserve water in the modern world is the need of the hour. Starting to implement in homes and then extending to the entire country helps other countries to take us as a role model and begin to save water. On an average a family with 3 members could save 40% of water. This can also be used by a large family which in turn makes them realize they would save upto 50% of water. When this is used by a densely populated places like the hotels, hostels, halls, etc., so that India could escape from water scarcity. This is the best way to save the water and prevent from the wastage of water. The final outcome of the project is a single-handy application controlling the IoT connected devices placed to manage household water. The proposed system helps to completely save water in the upcoming busy world.
References 1. Robles T, Alcarria R, Martín D, Morales A (2014) An ınternet of things based model for smart water management. ˙In: Proceedings of the 8th ınternational conference on advanced ınformation networking an applications workshops (WAINA), Victoria, Canada. IEEE, pp 821–826 2. Kumar S (2014) Ubiquitous smart home system using android application. Int J Comput Netw
Ease and Handy Household Water Management System
91
Commun 6(1) 3. Perumal T, Sulaiman M, Leon CY (2019) Internet of Things (IoT) enable water monitoring system. In: IEEE 4th Global conference consumer electronics, (GCCE) 4. Dinesh Kumar JR, Ganesh Babu C, Priyadharsini K (2021) An experimental investigation to spotting the weeds in rice field using deepnet. Mater Today Proc. ISSN 2214-7853. https://doi. org/10.1016/j.matpr.2021.01.086; Dinesh Kumar JR, Dakshinavarthini N (2015) Analysis and elegance of double tail dynamic comparator in analog to digital converter. IJPCSC 7(2) 5. Rawal S (2017) IOT based smart irrigation system. Int J Comput Appl 159(8):1–5 6. Kansara K, Zaveri V, Shah S, Delwadkar S, Jani K (2015) Sensor based automated ırrigation system with IOT: a technial review. IJCSIT 6 7. Real time wireless monitoring and control of water systems using Zigbee 802.15.4 by Saima Maqbool, Nidhi Chandra. 8. Durham R, Fountain W (2003) Water management within the house landscape Retrieved day, 2011 9. Kumar A, Rathod N, Jain P, Verma P, Towards an IoT based water management system for a campus. Department of Electronic System Engineering Indian Institute of Science Bangalore 10. Pandian AP, Smys S (2020) Effective fragmentation minimization by cloud enabledback up storage. J Ubiquit Comput Commun Technol (UCCT) 2(1): 1–9 11. Dhaya R (2021) Analysis of adaptive image retrieval by transition Kalman Filter approach based on intensity parameter. J Innov Image Process (JIIP) 3(01):7–20 12. Parvin JR, Kumar SG, Elakya A, Priyadharsini K, Sowmya R (2020) Nickel material based battery life and vehicle safety management system for automobiles. Mater Sci 2214:7853 13. Dinesh Kumar JR, Priyadharsini K., Srinithi K, Samprtiha RV, Ganesh Babu C (2021) An experimental analysis of lifi and deployment on localization based services & smart building. In: 2021 International conference on emerging smart computing and ınformatics (ESCI), pp 92–97. https://doi.org/10.1109/ESCI50559.2021.9396889 14. Priyadharsini K et al (2021) IOP Conf Ser Mater Sci Eng 1059:012071 15. Priyadharsini K, Kumar JD, Rao NU, Yogarajalakshmi S (2021) AI- ML based approach in plough to enhance the productivity. In: 2021 Third ınternational conference on ıntelligent communication technologies and virtual mobile networks (ICICV), pp 1237–1243. https://doi. org/10.1109/ICICV50876.2021.9388634 16. Priyadharsini K, Kumar JD, Naren S, Ashwin M, Preethi S, Ahamed SB (2021) Intuitive and ımpulsive pet (IIP) feeder system for monitoring the farm using WoT. In: Proceedings of ınternational conference on sustainable expert systems: ICSES 2020, vol 176. Springer Nature, p 125 17. Nanthini N, Soundari DV, Priyadharsini K (2018) Accident detection and alert system using arduino. J Adv Res Dyn Control Syst 10(12) 18. Kumar JD, Priyadharsini K, Vickram T, Ashwin S, Raja EG, Yogesh B, Babu CG (2021) A systematic ML based approach for quality analysis of fruits ımpudent. In: 2021 Third ınternational conference on ıntelligent communication technologies and virtual mobile networks (ICICV). IEEE, pp 1–10 19. Priyadharsini K, Nanthini N, Soundari DV, Manikandan R (2018) Design and implementation of cardiac pacemaker using CMOS technology. J Adv Res Dyn Control Syst 10(12). ISSN 1943-023X
Novel Intelligent System for Medical Diagnostic Applications Using Artificial Neural Network T. P. Anithaashri, P. Selvi Rajendran, and G. Ravichandran
Abstract In recent years, the recognition of images with feature extraction in medical applications is a big challenge. It is a tough task for the Doctors to diagnose the diseases through image recognition with the scanned images or x-ray images. To enhance the image recognition with feature extraction for the medical applications, a novel intelligent system has been developed using artificial neural network. It gives high efficiency in recognizing the image with feature extraction compared over fuzzy logic system. The artificial neural network algorithm was used for the feature extraction from the scanned images of patients. The implementation has been carried out with the help of Tensor flow and Pytorch. The algorithms was tested over 200 sets of scanned images has been utilized for the classification and prediction of trained dataset images. The analysis on the data set and test cases has been performed successfully and acquired 81% of accuracy for the image recognition using artificial neural network algorithm. With the level of significance (p < 0.005), the resultant data depicts the reliability in independent sample t tests. The process of prediction of accuracy for the image recognition, through the ANN gives significantly better performance than the fuzzy logic system. Keywords Artificial neural network · Novel diagnostic system · Fuzzy logic system · Feature extraction · Image recognition
T. P. Anithaashri (B) Institute of CSE, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai 602105, India e-mail: [email protected] P. S. Rajendran Department of CSE, Hindustan Institute of Technology and Science, Chennai, India e-mail: [email protected] G. Ravichandran AMET University, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_7
93
94
T. P. Anithaashri et al.
1 Introduction The lack of highly efficient detection system for image recognition of chronic diseases such as cardiovascular diseases, cancers, respiratory diseases, pulmonary diseases, asthma, diabetes [1] in the later stages becomes fatal. The exigent factors for these types of chronic diseases are the continuous treatment, pharmaceutical requirements, medical electronic equipment requirements to diagnose the stage [2] of the damage in the organ, in a periodical manner. Diagnosis of these types of diseases and taking measures for treatments are becoming more challenge [3]. The analyzes on various kinds of chronic diseases using the factors such as periodical observation through tests, monitoring the level of adequacy of glucose, sucrose [4], etc., are tedious process. Thus diagnosing the chronic disease just by symptoms is less efficient. Hence Artificial Intelligence techniques [3] can be used for better detection with high resolution and accuracy to overcome the disease diagnosis errors between the scanned images [2] and x-ray images.
2 Existing System The image analysis for chronic diseases using artificial intelligence techniques gives less efficiency in image recognition. By using convolution neural network, the diagnosis on diseased images gives approximation [5] in image recognition. The major functions of convolution neural network returns the feature map for the image identification with the various parameters in the recognition of images [6]. Differentiating the use of convolution neural network with other artificial intelligence techniques for the image analysis [7], recurring convolution neural network provides more accuracy than convolution neural network, but time constraints are more. The convolution neural network algorithm [2] can be used to outperform image classification to predict and differentiate the diseased images from the normal images. So, in order to reduce the errors and time lapse process, the artificial neural network algorithm is used with chest x-ray images [8] of patients for identifying the disease. In many sectors, the use of emerging trends in artificial intelligence, paved the way to enhance the existing system in terms of image recognition, image analysis, image classification, etc. It became an industrial revolution [6] in terms of automation for image recognition. In the medical field, there are many AI algorithms [9] and techniques that are being implemented. Disease prediction has always been a challenge to doctors and it is time consuming. To overcome all these drawbacks, automation of disease prediction using AI techniques [2] can make the process simple and feasible. With the help of AI algorithms, implementation of a smart medical diagnosing [10] system for diagnosing [11] chronic diseases through image recognition [12] is less efficient in diagnosis. The use of neural network helps in analyzing the trained data sets with validation but gives less accuracy in identifying the disease through image analysis.
Novel Intelligent System for Medical Diagnostic Applications …
95
3 Proposed System The application of artificial techniques to any field enhances the efficiency in automation of emerging technology. The use of AI in the medical applications [7] were tremendous in the automation of manual work for the real time applications. The image recognition for the analysis of various diseases [6] has become a big challenge to the Doctors community in analyzing the various kinds of parameters such as consumption of time to diagnose, feature extraction of scanned images, etc. To overcome all these drawbacks [8], the AI techniques can be used to enhance the system for image recognition of scanned images. In this novel system, the evaluation through test procedures and classification [13] of normal and abnormal scanned images of the patients through various clinical observations. Artificial intelligence algorithms ANN [14] and fuzzy logic system are compared with their performances in prediction. In the fuzzy logic system, the weights and biases [14] are assigned from layer to layer connected, but in ANN algorithm, the first and last layers are connected which is considered as an output layer. To address the problem [4] of diagnosing the disease through image recognition, a novel system has been proposed. The overview of the proposed system depicted in Fig. 1. In this framework, processing of input data through cloud application takes place. The use of neural architectural search automates the working function of artificial neural network. The neural architectural search explore the search space and helps to evaluate the ANN for the specific task. The processing of images in neural network algorithm helps in the extraction of images. Its starts with the identification of data sets, such that once the data sets are identified, the process of image classification is
Fig. 1 Novel framework for diagnosing chronic diseases through artificial neural network
96
T. P. Anithaashri et al.
carried out by artificial neural network. Thus, it paved the way to extract the image accuracy for diagnosing the diseases through image processing. The study setting of the proposed work is done in Saveetha University. The number of groups identified are two. The group 1 is fuzzy logic system, and group 2 is artificial neural network algorithm. Artificial neural network and fuzzy logic system was iterated various number of times with the sample size of 200.
3.1 Fuzzy Logic System In this system, an image is considered as frames and the pixels of images with the data augmentation, complicated images are considered to be classified as trained. An image which is clear and precise to the human eye may not be accurate and not clear with details. The analysis of scanned images through different permeation for each layer provides clarity in the analysis of images. This gives us the time efficiency. But in terms of accuracy of data, it is not efficient. Step 1: Start. Step 2: Load the datasets path through cloud application. Step 3: Read images and resize them. Step 4: Convert to grayscale. Step 5: Train and test the images. Step 6: Repeat the process for analysis. Step 7: Prediction of Accuracy extracted images. Step 8: Stop. After the process, find the number of samples for each class and test images with the trained data that are classified to predict the image recognition in an effective way. The process of data intensification helps to enhance the performance of the algorithm to classify the scanned images. After the data intensification [15], the quality images and the classified images [5] will be saved in some random order. By classifying the patients’ scanned images and modification of the dataset was a difficult process and hence provides the less accuracy.
3.2 Artificial Neural Network Algorithm The artificial neural network is used to identify the feature values of the samples of external data. It processes the inputs and analyze the images to extract the feature in an image. The feature extraction of scanned images through image analysis from the
Novel Intelligent System for Medical Diagnostic Applications …
97
classified data provides the clarity, and thus, it helps to predict the accuracy. The peak signal ratio or noise or disturbances are the part of the image for classification is called loss. The use of neural architectural search improvise the application of algorithm, thus provides the novelty to this proposed system. The neural architectural search paved the way for better enhancement in processing of the images through automation in three layers namely input layer, output layer and hidden layer. Hence, this novel architecture of artificial neural network is a better model because of the less number of parameters, reusability of weights assigned, and thus, it gives the time efficiency with high accuracy.
Step 1: Start Step2: Read the no.of data for i in range(1,n) read xi,yi next i Step 3: Assign the weights to the input Step 4: Add bias to every input xi, yi Step 5: Find the sum and activate Step 6: stop The simulation tools tensor flow and keras were used for execution of the project code. It helps to manage and access various kinds of files. Through the python environment, a command prompt can provide easy access to the code and execution. Main tools that need to be installed in the python environment are keras and tensor flow. Minimum of 4 GB RAM is required to compile and execute the project code. Preferred operating systems are windows and ubuntu. Using anaconda navigator software and anaconda prompt helps to install the necessary modules and tools. By testing various kinds of scanned images [4] for classification, with the number of epochs given as 10, it has increased the efficiency with less time of execution and more accuracy of image extraction. By reducing the data size for image classification, helped to get the improved accuracy with increased efficiency in terms of taking less time by using a novel diagnostic system. To check with the data and accuracy reliability, SPSS is used with the level of significance of 0.05.
4 Results and Discussions The image analysis with predicted data sets are trained for 10 epochs and a total sample of 200 images of chest scanned image datasets. A total of 10 epochs and batch size of 22 are used in the model and tabulated with epoch stages as shown
98
T. P. Anithaashri et al.
Table 1 Analysis on accuracy (0.9147) of train and loss (0.3262) data of images for different epochs stages (10) with the model trained by various categories of scanned images of diseased patients Epoch stage
Training accuracy
Training loss
Validation accuracy
1
0.61
0.58
0.71
Validation loss 0.63
2
0.82
0.62
0.65
0.71
3
0.81
0.51
0.62
0.54
4
0.79
0.51
0.60
0.36
5
0.87
0.68
0.66
0.45
6
0.79
0.31
0.52
0.71
7
0.69
0.48
0.63
0.67
8
0.72
0.52
0.58
0.74
9
0.93
0.54
0.58
0.62
10
0.72
0.61
0.55
0.31
in the Table 1. Thus, training with 10 epochs and the specified batches provides an accuracy of 81% in disease prediction through the proposed system. In Table 1, the process of data training will be carried out by novel diagnostic system and after the classification of training data, and the system will be trained to categorize different kinds of virus affecting the human body, respectively. Here, Fig. 2 represents the variations between the accuracy and loss by analyzing the trained data sets and achieving the accuracy of 0.93 which in turn specifies the improvisation through artificial neural network. In Table 2, F refers to the f statistics variable which is calculated by dividing mean square regression by mean square residual. T refers to t score and depicts
Fig. 2 Accuracy scores image extraction based on the different stages of epochs on the major axis with the range of minor axis for the accuracy (0.93) and loss (0.31), respectively
Equal variances assumed
Equal variances not assumed
Equal variance assumed
Equal variance not assumed
Accuracy
Accuracy
Loss
Loss
4.5
1.8
F
0.040
0.265
sig
0.97
0.97
2.61
2.61
T
10.0
18.0
16.0
18.0
df
0.36
0.360
0.003
0.002
Sig (2-tailed)
16.31
16.31
0.18
0.19
Mean difference
17.0
17.0
0.050
0.51
Std. error difference
0.32 53.04 54.88
−19.30 −23.14
0.28
95% confidence interval of the difference Upper
0.078
0.075
95% confidence interval of the difference Lower
Table 2 SPSS statistics depicts data reliability for artificial neural network and fuzzy logic system with independently sample T- test and the result is applied to fix the dataset with confidence interval as 95% and level of significance as 0.05 to analyze the data sets for both algorithm and achieved more accuracy for artificial neural network than that of fuzzy logic
Novel Intelligent System for Medical Diagnostic Applications … 99
100
T. P. Anithaashri et al.
the population variance, when the t value exceeds the critical value, then the means are different. It can be calculated by dividing the difference between the sample mean and given number to standard error. Sig (2 tailed) is a significance, which is depicted by comparing with 0.05 it should be within the level of significance. The below graphical representation Figure 2 depicts the accuracy and loss for respective algorithms compared. When compared to fuzzy logic system, artificial intelligence neural network algorithms depicts more accuracy in the image recognition analysis. The scanned images of chest are considered for classification of data. After the classification, the trained data is tested and validated with 10 epochs and results of validation are obtained. Graphical representation of the loss and accuracy for artificial neural networks gives 81% of accuracy with the assumed variance 0.18 with the help of SPSS. The reliability of data with respect to the artificial neural network with the mean difference for assumed variances and non-assumed variances of 0.02 provides more accuracy than the fuzzy logic algorithm with their mean accuracies, and thus, the high accuracy is obtained in extraction of images.
5 Conclusion The validation results are obtained by the classification of the images with trained data and 10 epochs. The implementation results show the improved accuracy of 81% in image recognition with extraction of images. By using artificial neural network, the connection between each layer helps to acquire more accuracy from classification of images. By using the novel diagnostic system, the grouping of images for affected and unaffected people helps to classify and train models to diagnose the presence of disease through the image recognition in a significant manner. The proposed system has considered scanned images, which is a limitation and can be overcome by using the radiology images for more accuracy. The use of web application through bots interaction and the utilities of AI tools for disease prediction and treatment would be a future scope of this system.
References 1. Harmon SA, Sanford TH, Xu S, Turkbey EB, Roth H, Xu Z, Yang D et al (2020) Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets. Nat Commun 2. Kılıc MC, Bayrakdar IS, Çelik Ö, Bilgir E, Orhan K, Aydın OB, Kaplan FA et al (2021) Artificial intelligence system for automatic deciduous tooth detection and numbering in panoramic radiographs. Dento Maxillo Facial Radiol 3. Grampurohit S, Sagarnal C (2020) Disease prediction using machine learning algorithms. https://doi.org/10.1109/incet49848.2020.9154130 4. Livingston MA, Garrett CR, Ai Z (2011) Image processing for human understanding in lowvisibility. https://doi.org/10.21236/ada609988
Novel Intelligent System for Medical Diagnostic Applications …
101
5. Ponnulakshmi R, Shyamaladevi B, Vijayalakshmi P, Selvaraj J (2019) In silico and in vivo analysis to identify the antidiabetic activity of beta sitosterol in adipose tissue of high fat diet and sucrose induced type-2 diabetic experimental rats. Toxicol Mech Methods 29(4):276–290 6. Fouad F, King Abdul Aziz University, and Saudi Arabia Kingdom (2019) The fourth industrial revolution is the AI revolution an academy prospective. Int J Inf Syst Comput Sci. https://doi. org/10.30534/ijiscs/2019/01852019 7. Girija AS, Shankar EM, Larsson M (2020) Could SARS-CoV-2-induced hyperinflammation magnify the severity of coronavirus disease (COVID-19) leading to acute respiratory distress syndrome? Front Immunol 8. Greenspan RH, Sagel SS (1970) Timed expiratory chest scanneds in diagnosis of pulmonary disease. Invest Radiol. https://doi.org/10.1097/00004424-197007000-00014 9. Ramesh A, Varghese S, Jayakumar ND, Malaiappan S (2018) Comparative estimation of sulfiredoxin levels between periodontitis and healthy patients—a case-control study. J Periodontol 89(10):1241–1248 10. Rahman T, Chowdhury MEH, Khandakar A, Islam KR, Islam KF, Mahbub ZB, Kadir MA, Kashem S (2020) Transfer learning with deep fuzzy logic system for scanned detection using chest scanned. Appl Sci. https://doi.org/10.3390/app10093233 11. Anithaashri TP, Ravichandran G, Kavuru S, Haribabu S (2018) Secure data access through electronic devices using artificial intelligence, IEEE Xplore. In: 3rd International conference on communication and electronics systems (ICCES). https://doi.org/10.1109/CESYS.2018. 8724060 12. Manjunath KN, Rajaram C, Hegde G, Kulkarni A, Kurady R, Manuel K (2021) A systematic approach of data collection and analysis in medical imaging research. Asian Pac J Cancer Prevent APJCP 22(2):537–546 13. Knok Ž, Pap K, Hrnˇci´c M (2019) Implementation of intelligent model for scanned detection. Tehniˇcki Glasnik. https://doi.org/10.31803/tg-20191023102807 14. Liu C, Ye G (2010) Application of AI for CT image identification. In: 2010 3rd International congress on image and signal processing. https://doi.org/10.1109/cisp.2010.5646291 15. Rahman T, Chowdhury MEH, Khandakar A, Islam KR, Islam KF, Mahbub ZB, Kadir MA, Kashem S (2020) Transfer learning with deep convolutional neural network (CNN) for pneumonia detection using chest X-Ray. Appl Sci. https://doi.org/10.3390/app10093233
Extracting Purposes from an Application to Enable Purpose Based Processing Amruta Jain and Sunil Mane
Abstract In general, applications are built to serve certain business purposes. For example, a bank invests into an application development to offer its customers online services such as online shopping, FD services, utility bill payments, etc. Within the application, every screen has its own purpose. For example, the login page is used to authenticate a customer. Dashboard screen gives a high-level view of activities done by a customer. Similarly, every field appearing on the screen has purpose too. The username field of login screen enables customer to supply a username given to the customer by the bank. Every screen of a given enterprise application is not developed keeping the exact purposes in mind. Many a times, a screen may serve multiple purposes. While an application screen may serve different purposes and an application with less number of screens suits an enterprise in terms of reduced expenditure for the development and subsequent maintenance activities, it is at loggerhead with privacy laws which are demanding purpose-based processing. Therefore, it is necessary to first build a repository of purposes that a given application can serve. And, then subsequent refactoring of application can be done (if required) to comply with privacy laws. To find out purposes, we use keyword extraction method, K-means clustering on text data of application, to get keywords and respective purposes. Keywords K-means · Crawljax · html parser jsoup · Purpose repository · Keyword extraction · tf-idf · Similarity · Web data
A. Jain (B) · S. Mane Department of Computer Engineering and Information Technology,College of Engineering Pune, Pune, Maharashtra, India e-mail: [email protected] S. Mane e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_8
103
104
A. Jain and S. Mane
1 Introduction As data scientists said that “Data gives any information, as we need to process that data, i.e., to confess anything from data, we need to torture it until we get our result.” Well, it’s absolutely true when it comes for business organization to make more profit from the current present system by analyzing historical data set and inculcating the knowledge gained in taking proper or better efficient business decisions. Nowadays, organizations are also trying to make their applications or existing system more secure and privacy concern to gain customer trust by giving privacy to their personal data, by taking their consent. Government of all countries are also making some rules and laws regarding privacy of data. So that all organization have to develop or update their system according to new rules made by government of country. As of we know, currently, most of the application are not developed keeping the exact purpose in mind. Every screen of a given enterprise application many times serve multiple purpose but as of now that is not taking in concern or we can say not given any importance to that there are several reasons for it. Chief among them is, it reduces the development time (by creating a multi-purpose screen) and hence, the subsequent resources required for the testing and maintenance for the application. It is at loggerhead with privacy laws which are demanding purpose-based processing. For that, this research part come into existence. Data Privacy and Data Security both used mutually, but there is some difference between them, Data privacy regulate how data is collected, shared for any business purpose and how it used after sharing. Data privacy is a part of data security. Data security safeguard data from intruders and vicious insiders. Main concept of data privacy is how exactly it deals with data consent, governing obligations, notice or task. More specifically, practical data privacy concerns about: (1) Exactly how or whether information is shared with third parties. (2) How information is legally collected or stored. In the digital age, the meaning of PHI (personal health information) and PII (personally identifiable information) is nothing but, how the data privacy concept is applied to critical personal info. This can include medical and health records, SSN numbers (Social Security numbers), financial info such as: bank account and credit card numbers and even basic, but still sensitive, information, like addresses, full names and birth dates, etc. System architecture develop for this research work is shown in Fig. 1. In that major focus on, how Inference engine is developed for purpose-based processing. For that first need to clear, what is Inference Engine? How it works? Importance of it? How it made system as expert System? Inference Engines: To get new facts and relationships from data repositories or knowledge graph, these unit of an AI (Artificial Intelligence) methodology apply some kind of logical rules. The process of inferring relationships between units uses ML (Machine Learning), NLP (Natural Language Processing) and MV (Machine Vision) have expanding exponentially the scale and value of relational databases and knowledge graphs in the past few years. There is two way of building inference engine as backward chaining and forward chaining.
Extracting Purposes from an Application …
105
Fig. 1 System architecture
Here to find purpose for given application data, we create one system which we called it as inference engine. Mainly perform last 4-to-5 step of concrete system design (see Fig. 2). Regarding data we used in this research, as we are considering web data as our data, like set of websites from different domains. Detail explanation we see next sections. Purpose repository here used in research is made manually by taken consideration of keywords and there purposes of respective domains of data. When we add some more data from different domain we need to update that purpose repository manually that was some time consuming work need to perform. Lastly we get purpose related to keywords which are matched with keywords placed in purpose repository. Where ever keywords and combination of keywords are matched there purpose is extracted from that repository of purpose and present in the output.
2 Related Work As we know most research work is done related to identity management concern and added to that now-days it related to privacy as privacy policy enforcement and privacy obligations related work as Marco Casassa Mont, Robert Thyne, present there work
106
A. Jain and S. Mane
mainly focuses on how to automate the enforcement of privacy within enterprises in a systemic way, in particular privacy-aware access to personal data and enforcement of privacy obligations considering identity management system [1–3]. Similarly, IBM with their Hippocratic database manages to preserve the privacy of data with taken into consideration of roles of users that is explained in [4], example given as, RBAC (Role-Based Access Control lists) database system in which according to role tasks is assign as related to tables, tuples, columns to delete, update, insert, etc. Agrawal et al. explained hippocratic database with proper architecture in there research [4]. Also, some author focuses on different various approaches which is generally looking as, knowledge or information inferencing, i.e., combining multiple data sources and extraction techniques to verify existing data knowledge and obtain new info or knowledge [5]. To make data private or hidden from some people in that case one may use of data masking or screen masking for that Goldsteen et al. describes notable hybrid approach to screen-masking in which, merging benefits of the low overhead and flexibility of masking at the network with the theme available at the presentation layers are done [6]. To extract data from online web pages, there are many approaches are available one may use hybrid approach, which is formed on integrating of hand-crafted rules methods and automatic extraction of rules described in [7]. P Parvathi; T S Jyothis describes that there are many approaches to find out which words in text documents are important to describe the class it is associated with. The proposed technique uses CNN (convolution neural network) with DL (deep learning) and the DL is used to predict the classes correctly [8]. Ravindranath et al. present another algorithm to extract meaning and structure from documents by producing semi-structured documents is statistical model uses Gibbs sampling algorithm [9]. Singh et al. did survey, regarding various different inference engines and their comparative study, explained in [10]. There are so many different techniques are present for keyphrase extraction among them one is graph based ranking technique which is used by authors Yind et al. and also as we know that a document contains many topics, regarding to that extracted keywords or keyphrases should be deal with all the main topics contain in the document, by taking inspiration from that author take topic model into consideration. Detailed explanation occurs in [11] similar type of approach used in our research. As we know there are so many different approaches present for keyword extraction, here author Beliga et al. [12] present Survey of keyword extraction is elaborated for supervised and unsupervised methods, graph-based, simple statistical methods, etc. Selectivity-based keyword extraction method is proposed as a new unsupervised graph-based keyword extraction method which extracts nodes from a complex network as keyword candidates. Again another author Yan et al. [13] shown there own approach related to keyphrase extraction as, the task of keyphrase extraction usually conducts in two steps: (1) Extracting a bunch of words serving as candidate keyphrases and (2) Determining the correct keyphrases using unsupervised or supervised approaches. There is some research was done on classification of websites based on there functional purposes, this research done by Gali et al. [14]. In there research, they
Extracting Purposes from an Application …
107
proposes novel method to classify websites based on their functional purposes. In that, they try to classify a website is either as single service, brand or service directory. For web data extraction, existing approaches use decoupled strategies—attempting to do data record detection and attribute labeling in two separate phases, Here authors Zhu et al. propose model in which both task done simultaneously. How exactly they done that is seen in there paper as [15]. The rest of this paper is structured as follows: In Sect. 3, describes our methodology. Experimental results are presented in Sect. 4. At last, Sect. 5 concludes the paper and describe any future work related with given research.
3 Methodology This section first proposes a structure of our given model or architecture of system shown in below figure, after that we described our workflow of given architecture and what we used to perform in each phase of our architecture. As shown in Fig. 2, first we required to gather data from web applications for which we have to find purposes, using “Crawljax” [16]. Crawljax is one of the java based web crawler used to extract whole web data in form of html states, DOMs, result .json file containing Json data, states and screenshots of web pages in one output folder that is used for in next phase of system. Crawljax is an open source tool generally called web crawler. As it is open source we get its jar file or code or maven file that need to run in specified framework. As jar file can run using command prompt on operating system by providing specified parameters mentioned in their readme file. That readme file we get while we download its jar files. There are so many number of options we can provide as per need, like states -s, depth -d, -waitAfterReload, -waitAfterEvent, override -o, etc. They provide some initial vales to this options mention in readme file. There is compulsory parameters are url of the page or website to which we want to crawl or want data and another parameter is path of output folder where result is stored. Files which extracted is parsed with the help of java based parser as named “Jsoup: Java HTML parser” or we may use BeautifulSoup html parser. From that we can get the text content of that application and saved that into text file or we can say that we get our text data from which our actual work of inference engine is starts. Before move further, we have to know about some basic idea about Jsoup parser, as name indicate it is a parser used for parsing data from one form to another form, here we required data in text format for finding keywords or keyphrases from any document or file. That we can get with the help of document object Model(DOM) object and its various features i.e., function or methods like—tElementById(), getElementByTag(), etc. Text data which we get is need to be pre-processed. And we know that there are lots of files we get from an application related to each state or web page. That text data is pre-processed with using some libraries or API’s. There are so many NLP (Natural Language Processing) libraries present for pre-processing of text data. In pre-processing we perform task as,
108
A. Jain and S. Mane
Fig. 2 Concrete System Design
• • • •
First lowercase all letters present in text. Remove punctuation or any other symbols present in text. Remove white spaces present in text and also join text content. Remove stopwords present in text, using stopwords present in English language and in any library corpus package. • Apply word tokenize API for tokanization of text data. • if necessary then remove duplicate words present in text using set() function. • if necessary apply word stemming and lemmatization process on text data which we will get in above steps. Some of these step are implemented in model. There are some more different preprocessing task are available but in this research, used above pre-processing steps for cleaning of text data. After that do the clustering and find out the keywords related to each cluster. (Here mostly we use K-means clustering algorithm or used unsupervised graph-
Extracting Purposes from an Application …
109
based keyword extraction technique). We also used some more keyword extraction algorithm as, TextRank, Rake, Yake, Gensim summary pakage, LDA, TF-IDF, etc. By using keywords, we try to find out purposes related to that keywords with the help of purpose repository by applying some matching algorithm as flashtext API or using regular expressions [17]. We get purpose related to each keyword and combination of keywords present in purpose repository which is created manually.
3.1 Clustering Technique As we know that there are many clustering techniques or algorithms are present but, in this model, we use k-means clustering algorithm which is based on partitioning method of clustering and that to unsupervised k-means is used [11]. After this operation we can get several different clusters contains number of word/s which are similar with respect to centroid of cluster word in document. Then, we select keyphrases, as “n” words nearer means some what similar to centroid of each clusters and provide value to variable “k,” as number of clusters we want to made. by using elbow algorithm or method we can also find optimize cluster values. From that we find top “n” keyphrases for every cluster, and we can decide value for “k” by taken into consideration of length of the document that value for k will be find out by performing number of experiments (on trial basis, default value for consider it as 2).
3.2 Graph Based Keyphrase Extraction This method is generally used for unsupervised data. According to [11], they used this approach by creating graphs from word and sentences by evaluating similarity between them. In that three graphs are constructed as, sentence-to-sentence (s-s graph), word-to-word (w-w graph) and sentence-to-word (s-w graph) graphs. All graphs are built by using similarity between them, as cosine similarity between sentences and words.
• For s-s graph, every sentence consists of several words so we construct word set for sentences, by using it find out cosine between two vectors of sentences is similarity between sentences and consider it as weight of edges and sentences as nodes of s-s graph.
110
A. Jain and S. Mane
• For w-w graph, to find similarity between words, we need to convert that word into its numerical values using word embedding or word-vector here we use fastText word embedding. It is library build by Facebook’s AI Research lab, used for text classification as well as learning of word embeddings. For obtaining vector representation of w (words) model allows to construct learning algorithms they may be unsupervised or supervised. • For s-w graph, both above mentioned graphs are taken into consideration and try to construct third graph by using word frequency and inverse sentence frequency and formulate one formula similar to TF-IDF (Term Frequency-Inverse Document Frequency) method. We get weight matrix for s-w graph. For Keyword extraction, there are some algorithms which find keywords automatically such as, TextRank, Rake, TF-IDF. • TextRank algorithm is genrally work faster on small datasets respect to other two algorithms. But it gives proper keywords as it is based on graph ranking criteria similar to PageRank algorithm created by google to set websites ranking. • Rake name indicates Rapid Automatic Keyword Extraction. It find out the keyphrases from document without considering any other context. It produces more keyphrases which are complicated too with having more information than any single words. • TF-IDF stands for Term Frequency-Inverse Document Frequency. Most of the time, this algorithm is used with large number of documents in dataset. As name indicates, it consider Inverse Document Frequency means for any given word in one document, it will consider other number of documents containing that word too for calculating word score or rank of word respect to particular document. One more term is Term Frequency means count of word occur in particular document. Together it comes up with one single value which shows how much important that word in that document or simply gives rank of word respect to particular document.
4 Results As research topic is very large, so here, we restrict our data with respect to web applications, as websites related with as three different domain websites. Total we collect 530+ or more number of webpages from websites as, banking, education and hospital and health care domains. As we know webpage consist so much data as text data, images, audio, videos, advertisement, etc. From that we extract text data which was necessary for our work, like text data from body tag, title tag, form tags, etc. We collect all that data and make text document of it and also convert that text data in excel format data using excel. For finding purpose of the webpage, first required purpose repository that is created manually for this work. In that we take three columns (Domain, Keywords, Purpose). In that according to our data, we put keywords and there purposes. If anyone add some more domains or new webpages need to update that purpose repository manually.
Extracting Purposes from an Application …
111
Next phase of doing keyword extraction task. Before doing this we first apply clustering algorithm to get similar words having same cluster. and then we get top 8–10 keywords related to each cluster. For keyword extraction there are so many different algorithms are present some are automatic keyword extraction algorithm and some of having graph-based, statistical-based, unsupervised, supervised algorithms are here we try to use some of them and find out keywords and there purposes using purpose repository. Last phase to find purpose related with keywords from purpose repository using matching technique, manually or handcrafted rules, or similarity measurement or comparison between keywords which is getting previously with keywords and there purpose present in purpose repository. One another method using VLOOKUP present in excel sheet. Also we are using one flastext library, which is mainly used for search and replace tasks. But in our case, we modify that task as search and match keywords with there purpose. We use this because, between regular expressing and flashtext, flashtext is work faster than regular expression [17]. We can also get the word cloud and word frequency graphs or plots shown in next images and final result or output also given at last (Figs. 3 and 4).
Fig. 3 Word frequency graph and word cloud
Fig. 4 Final result from sample data
112
A. Jain and S. Mane
5 Conclusion Nowadays, privacy management plays more important role for enterprises. The main objective of it to address customers privacy by considering customers preferences and rights. It is important to consider the data subjects consent and data requesters purpose if the specific person wants information from any organization. In this research work, we conducted a survey of existing techniques and tools for text analysis, and studied the drawback and limitation of the exiting technique and tools. So, to understand purposes related to each fields or screen/s with respect to given application data we implemented a model which temporary fulfill our requirement, which will address all the exiting challenges against purpose extraction to enable purpose based processing with the help of keyphrases, words or keywords and simultaneously provides the efficient purpose repository from which the finding relationship of keywords and there purposes is more easy. And also try to get all such matched purpose/s from application screen to develop purpose-based processing. We also used clustering algorithms for finding similar words or keywords according to topic in a document and then by combining them get there purposes too, as shown in final result image in result section.
5.1 Future Work There is lot of scope to do future research in this area. As of now, I did not get so much related research work specified with this topic. Any one will want to do some more research in this area may can extend scope of research by considering some more different type of data (accordingly need to add there purposes in purpose repository). Another task one can perform as, try to make purpose repository automatic in nature. Also come up with new approach related to extraction, matching and selection of keywords and there purposes regarding application and one will make system more scalable for various different types of applications (now we consider webpages from some websites, which came into web application type) some more application type as: desktop application, mobile application, gaming application, etc., or we can say it as, by adding more different domain data. Added to that, we try to encourage new researchers by saying some words, “any research idea will come up in any one’s mind so just read more relevant work and get new ideas for research in this domain or any domain.” For that purpose, we add some future scope as above.
Extracting Purposes from an Application …
113
References 1. Mont MC, Thyne R (2006) A systemic approach to automate privacy policy enforcement in enterprises. In: Privacy enhancing technologies 6th international workshop, PET 2006, Cambridge, UK, 28–30 June 2006. Revised Selected papers 2. Mont MC, Thyne R, Bramhall P (2005) Privacy enforcement with HP select access for regulatory compliance. Technical report, Technical Report HPL-2005-10, HP Laboratories Bristol, Bristol, UK 3. Mont MC (2004) Dealing with privacy obligations in enterprises. In: ISSE 2004-securing electronic business processes. Springer, pp 198–208 4. Agrawal R, Kiernan J, Srikant R, Xu Y (2002) Hippocratic databases. In: VLDB’02: proceedings of the 28th international conference on very large databases. Elsevier, pp 143–154 5. Barbosa D, Wang H, Yu C (2015) Inferencing in information extraction: techniques and applications. In: 2015 IEEE 31st international conference on data engineering. IEEE, pp 1534–1537 6. Goldsteen A, Kveler K, Domany T, Gokhman I, Rozenberg B, Farkash A (2015) Applicationscreen masking: a hybrid approach. IEEE Softw 32(4):40–45 7. Kaddu MR, Kulkarni RB (2016) To extract informative content from online web pages by using hybrid approach. In: 2016 International conference on electrical, electronics, and optimization techniques (ICEEOT). IEEE, pp 972–977 8. Parvathi P, Jyothis TS (2018) Identifying relevant text from text document using deep learning. In: 2018 International conference on circuits and systems in digital enterprise technology (ICCSDET). IEEE, pp 1–4 9. Ravindranath VK, Deshpande D, Girish KV, Patel D, Jambhekar N, Singh V (2019) Inferring structure and meaning of semi-structured documents by using a gibbs sampling based approach. In: 2019 International conference on document analysis and recognition workshops (ICDARW), vol 5. IEEE, pp 169–174 10. Singh S, Karwayun R (2010) A comparative study of inference engines. In: 2010 Seventh international conference on information technology: new generations. IEEE, pp 53–57 11. Yan Y, Tan Q, Xie Q, Zeng P, Li P (2017) A graph-based approach of automatic keyphrase extraction. Procedia Comput Sci 107:248–255 12. Beliga S, Meštrovi´c A, Martinˇci´c-Ipši´c S (2015) An overview of graph-based keyword extraction methods and approaches. J Inf Organ Sci 39(1):1–20 13. Ying Y et al (2017) A graph-based approach of automatic keyphrase extraction. Procedia Comput Sci 107:248–255 14. Gali N, Mariescu Istodor R, Fränti P (2017) Functional classification of websites.In: Proceedings of the eighth international symposium on information and communication technology 15. Zhu J et al (2006) Simultaneous record detection and attribute labeling in web data extraction. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining 16. Mesbah A, Van Deursen A, Lenselink S (2012) Crawling ajax-based web applications through dynamic analysis of user interface state changes. ACM Trans Web (TWEB) 6(1):1–30 17. analyticsvidhya. https://www.analyticsvidhya.com/blog/2017/11/flashtext-a-library-fasterthan-regular-expressions/. Last accessed 7 Dec 2017 18. Zhao Y, Li J (2009) Domain ontology learning from websites. In: 2009 Ninth annual international symposium on applications and the internet. IEEE, pp 129–132 19. Gao R, Shah C (2020) Toward creating a fairer ranking in search engine results. Inf Process Manage 57(1):102138 20. Lindemann C, Littig L (2007) Classifying web sites. In: Proceedings of the 16th international conference on World Wide Web 21. Qi X, Davison Brian D (2009) Web page classification: features and algorithms. ACM Comput Surv (CSUR) 41(2):1–31
Cotton Price Prediction and Cotton Disease Detection Using Machine Learning Priya Tanwar, Rashi Shah, Jaini Shah, and Unik Lokhande
Abstract Agricultural productivity is something on which the economy extensively depends. This is one reason why price prediction and plant disease detection play an important role in agriculture. The proposed system is an effort to predict the price of cotton and classify it as fresh or sick as accurately as possible for the benefit of all those who depend on it as a source of income, be they farmers or traders. The main goal of the created system is the prevention of losses and the boost of the economy. For this purpose, we have proposed a website in which both the price prediction using LSTM algorithm and disease detection by CNN algorithm are implemented. Keywords Machine learning · LSTM · CNN · Classification · Disease detection · Price prediction · Agriculture
1 Introduction Agriculture, the main occupation of India, seems to cover about 70% of the business including primary and secondary business that are completely dependent on agriculture. Market arrival time of any crop plays a prominent role in the crop price for farmers. Talking about cotton crop specifically, a large number of people tend to generate dependency on cotton crops, by any of the processes involved in cotton crops. The increase in demand of cotton at domestic as well as international level has inclined productivity towards mission-oriented purposes in recent times [1]. However, it is not easy to predict the price and the leaf disease of cotton due to high fluctuation because of various factors such as weather conditions, soil type, rainfall, etc. Thus, showing the necessity for the price prediction and disease detection of cotton crops. Implementing such systems add on to the revenue for the farmers as
P. Tanwar · R. Shah · J. Shah · U. Lokhande (B) Department of Information Technology, Fr. Conceicao Rodrigues College of Engineering, Mumbai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_9
115
116
P. Tanwar et al.
well as the country. Thus, having a robust automated solution, especially in developing countries such as India, not only aids the government in taking decisions in a timely manner but also helps in positively affecting the large demographics. Various methodologies have been brought into use to forecast the retail/wholesale value of agricultural commodities, such as regression, time series and automatic learning methods. The auto-regressive method and the vector auto-regressive moving average model of the regression method, tend to predict agricultural commodity prices by taking into consideration the various factors affecting them [2]. The study of changes in agricultural product prices is both intriguing and significant from the standpoint of the government. We know that manually recorded data is prone to human-caused errors, such as no or incorrect data reported on a specific day. With new pricing data entering every day for ML/DL-based models, updating the models may generate stability concerns due to crop price data quality issues. The data for price prediction in the price prediction module is of the continuous type, hence it falls under a regression model. The prices can be determined by recognizing different patterns of the training dataset which is then passed as the input to the algorithm. Diseases cause severe effects on plants which are, in the final analysis, the natural factor. For example, it reduces the overall productivity. The identification and accurate classification of leaf diseases are essential for preventing agricultural losses. Different leaves of plants have different diseases. Viruses, fungi and bacteria are the prominent categories for leaf disease [3]. In accordance with this, plants are being isolated from their normal environment and grown in unique settings. Many important crops and plants are highly susceptible to illness. Plant diseases have an impact on plant development and yield, which has a socio-biological and monetary impact on agriculture. Plant diseases are one of the ecological elements that contribute to the coexistence of live plants and animals. Plant cells primarily strengthen their defences against animals, insects, and pathogens via signalling pathways contained within them. With careful care, humans have selected and cultivated plants for food, medicine, clothing, shelter, fibre and beauty for thousands of years. As a result, monitoring regional crop diseases is critical for improving food security. Cotton is a drought-resistant crop that delivers a consistent income to farmers who cultivate in climate-change-affected areas. To detect these cotton leaf diseases appropriately, the prior knowledge and utilization of several image processing methods and machine learning techniques are helpful. In leaf disease detection, we focus on the number of predictions which can be classified correctly and thus, it falls under the classification model. There are various classification models that can be used for detection of disease in a leaf. Classification models are evaluated on the basis of the results. Thus, this system will become handy for the farmers cultivating cotton to know the diseased cotton plant and also the price trends for the same.
Cotton Price Prediction and Cotton Disease …
117
2 Literature Review 2.1 Price Prediction In this work [4], the approach focuses on the development of a precise forecasting model for wheat production using LSTM-NN, which is very precise when it comes to the forecasting of time series. A comparison is also provided between the proposed mechanism and some existing models in the literature. The R value obtained for LSTM is 0.81. The results obtained for this system can achieve better results in the forecasts and, whilst grain production will accelerate over a decade, the output ratio will keep on decelerating and pose a threat to the economy as a whole. In this paper [5], the authors presented a comparative study of LSTM, SARIMA and the seasonal Holt-Winter method for predicting walnut prices. Arecanuts price data on a monthly basis for 14 districts of Kerala was taken from Department of Economics and Statistics of Kerala. The RMSE values for LSTM for non-stationary data were 146.86 and for stationary data it was 7.278, the ARIMA S value was 16.5, and the Holt-Winter value was 18.059. It was concluded that the LSTM neural network was the best model that fit the data. In this article [6], the main aim of the researchers here is to help farmers by focussing on profitable growing of vegetables by developing an Android application in Sri Lanka. The collected data set is divided into 3 parts, in the ratio of 8:1:1 which implies 80% data was used for training, 10% was kept for testing and the remaining 10% as validation. The model is then created using LSTM RNN for vegetable forecasting and ARIMA for price forecasting. In this study [7], the researcher suggests a prediction model for the price of vegetables that uses the pre-processing method of season-trend-loess (STL) and long-short term memory (LSTM). In order to predict monthly vegetable prices, the model used vegetable price data, meteorological data from major producing districts, and other data. For this system, the model was applied to Chinese cabbage and radish on the Korean agricultural market. From the performance measurement results, it was observed that the suggested model of vegetable price forecast had predicted accuracy of 92.06% and 88.74%, respectively, for cabbage and radish in China. In this article [8], the researchers suggest the STL-ATTLSTM model, which complements the decomposition of seasonal trends that uses the Loess Pre-Treatment Method (STL) and (LSTM). In this system, STL-ATTLSTM model is used for predicting vegetable prices based on monthly data using different forms of data. The LSTM attention model has improved predictive accuracy by about 4–5% compared to the LSTM model. The combination of the LSTM and STL (STL-LSTM) has reached predictive accuracy of 12% higher than the LSTM attention model. The STL-ATTLSTM model outperforms other models, having 380 as the RMSE value and MAPE as 7%. In this paper [9], the authors have presented an artificial intelligence based solution to predict future market trends based on the time series data of cotton prices collected since 1972. The datasets are evaluated using various models like moving
118
P. Tanwar et al.
average, KNN, auto-arima, prophet and LSTM. After comparison, LSTM model was concluded to be the best fit with RMSE value of 0.017 and an accuracy of 97%. In paper [10], the authors presented an user-friendly interface to predict crop prices and forecast prices for the next 12 months. The data containing the whole price index and rainfall of various Kharif and Ragi crops like wheat, barley, cotton, paddy etc. was collected and trained on 6 different algorithms out of which supervised machine learning algorithm called Decision Tree Regressor was the most accurate with RMSE value of 3.8 after the comparison. In this paper [11], the researchers presented a comparative survey of different machine learning algorithms to predict crop prices. The data consisting of prices of fruits, vegetables and cereals was collected from the website of the Agricultural Department of India. Random Forest Regressor was concluded as the optimal algorithm with an accuracy of 92% as compared to other algorithms like Linear Regression, Decision Tree Regressor and Support Vector Machine. In paper [12], the researchers have proposed a web-based automated system to predict agricultural commodity price. In the two series experiments, machine learning algorithms such as ARIMA, SVR, Prophet, XGBoost and LSTM have been compared with large historical datasets in Malaysia and the most optimal algorithm, LSTM model with an average of 0.304 mean square error has been selected as the prediction engine of the proposed system. In paper [13], the authors present techniques to build robust crop price prediction models considering various features such as historical price and market arrival quantity of crops, historical weather data that influence crop production and transportation, data quality-related features obtained by performing statistical analysis using time series models, ARIMA, SARIMA and Prophet approaches. In paper [14], the researchers have proposed a model that is enhanced by applying deep learning techniques and along with the prediction of crop. The objective of the researchers is to present a python-based system that uses strategies smartly to anticipate the most productive reap in given conditions with less expenses. In this paper, SVM is executed as machine learning algorithm, whilst LSTM and RNN are used as Deep Learning algorithms, and the accuracy is calculated as 97%.
2.2 Disease Detection Prajapati et al. [15] presented a survey for detecting and classifying diseases present in cotton assisted with image processing and machine learning methodologies. They also investigated segmentation and background removal techniques and found that RGB to HSV colour space conversion is effective for background removal. They also concluded that the thresholding technique is better to work with than the other background removal techniques. The data set included about 190 pictures of various types of diseases spotted clicked by Anand Agricultural University for classifying and detecting the type of infection. Performing colour segmentation with masking the green pixels in the image removed from the background, the otsu threshold on
Cotton Price Prediction and Cotton Disease …
119
the fetched masked image in order to obtain a binary image was applied. It was concluded from the results that SVM provides quite good accuracy. Rothe et al. [16], a system that identifies and classifies the diseases that cotton crop deals with, generally, such as Alternaria, leaf bacterial and Myrothecium was presented. The images were obtained from fields of cotton in Buldhana and Wardha district and ICRC Nagpur. The active contour model (snake segmentation algorithm) is used for image segmentation. The images of cotton leaf detected with disease were classified using the posterior propagation neural network in which training was done by the extraction of seven invariant moments from 3 types of images for a diseased leaf. The mean classification accuracy was 85.52%. In this article [17], the author has developed an advanced processing system capable of identifying the infected portion of leaf spot on a cotton plant by implementing the image analysis method. The digital images were obtained with the help of a digital camera of a mobile and enhanced after segmentation of the colour images using edge detection technologies such as Sobel and Canny. After thorough study, homogeneous pixel counting technique was used for the image analysis and disease classification of Cotton Disease Detection Algorithm. In this article [18], the researchers carried out detection of leaf diseases assisted with a neural network classifier. Various kinds of diseases like, target leaf spot, cotton and tomato leaf fungal diseases and bacterial spot diseases were detected. The segmentation procedure is performed by k-means grouping. Various characteristics were extracted and provided as inputs to the ANN. The average accuracy of classification for four types of diseases is 92.5%. In this work [19], researchers have an approach to accurate disease detection, diagnosis and timely management to avoid severe losses of crops. In this proposal, the input image pre-processing by using histographic equalization is initially applied to increase contrast in the low-contrast image, the K-means grouping algorithm that is used for segmentation, it is used to classify the objects depending upon a characteristic set into number of K classes and then classification occurs through the Neural-Network. Imaging techniques are then used to detect diseases in cotton leaves quickly and accurately. In paper [20], the authors have compared various deep learning algorithms such as SVM, KNN, NFC, ANN, CNN and realized that CNN is 25% more precise in comparison to the rest after which they compared the two models of CNN which were GoogleNet and Resnet50 for examining the lesions on the cotton leaves. They finally concluded Resnet50 to have an edge over GoogleNet proving it to be more reliable. Paper [21] comprises the authors conducting cotton leaf disease detection as well as suggesting a suitable pesticide for preventing the same. The proposed system implemented Cnn algorithm and with the use of keras model and appropriate processing layers built a precise system for disease detection. Paper [22] is an extensive comparative analysis for detection of organic and nonorganic cotton diseases. It consists of information about various diseases and an advisable method to detect that disease in its initial stage only. Different algorithms survey is also discussed along with their efficiencies as well as pros and cons to recognize
120
P. Tanwar et al.
the most apt one. It is nothing but an in depth analysis and comparison of quite a lot of techniques.
3 Price Prediction 3.1 Dataset This system is based on statistical data that has been obtained from the data released by the Agriculture Department, Government of India almost every year from their website data.gov.in [23]. The daily market prices of cotton include information about the state, district, market in that district, variety of cotton grown, the arrival date of cotton produce, minimum price, maximum price and modal price of cotton in the market.
3.2 Long Short Term Memory (LSTM) Algorithm It is a deep learning model requiring a large data set. The architecture of the LSTM model is well suited for prediction systems due to the presence of lags of the important events in time series for unknown duration. A unit cell of the LSTM model has an input gate, output gate and forget gat entrance, an exit port and a forgotten door. Input gate handles the amount of the information needed to flow in the current cell state with the help of point wise multiplication of sigmoid and tanh in the order, respectively. Output gate takes the charge of decision making for the information that needs to be passed to the following hidden state. The information from the previous cell that need not be remembered is decided by the forget gate.
3.3 Methodology Firstly, in this system, the data set is loaded. Then, the pre-processing of data is done where necessary filtration is carried out. The boxplots for min, max and modal prices are plotted in order to understand the outliers present in the data. In order to avoid the data inconsistency, the values lying in the outliers are dropped. The Sklearn module performs the pre-processing of data. The prices columns are taken into consideration and the arrival date column of the data is transformed as the index of the data which is converted to datetime format using pandas framework. The dataset is then arranged in the ascending order of the arrival date. This step is then, followed by visualizing the data set.
Cotton Price Prediction and Cotton Disease …
121
This model is trained over the dataset which is further divided into the training and testing data in the ratios of 80:20. The train_test_split of the sklearn module is used for splitting the dataset. The data is scaled using the MinMaxScaler. 5 hidden layers are used in the process of training the model. The model consists of dividing the data set into small batches and the error is calculated by epoch. Keras sequential model is used for evaluation. For this system, the model is trained against 200 epochs and batch size taken is 32. The optimizers used for the system are Adam optimizer, RMSPROP and AdaDelta optimizer. The objective here is to predict the prices of cotton crops. Thus, various optimizer’s results are compared in order to decide the one that fetches the best output. The training and validation graph is plotted for the data. Then, for the prediction model, the price prediction graph is plotted. The graphs are plotted to increase the ease of understanding. For the purpose of plotting graphs, matplot library is used for visualization. Mean squared error is used as a loss function, whilst dealing with the keras. Further, error metrics are calculated in order to understand the performance of the model. For calculating the error metrics, math library, mean_squared error, mean_absolute error, max_error, r2_score, explained_variance_score and median_absolute_error are imported, and each of these errors are calculated for the testing data and the predicted data.
3.4 Results Figure 1 describes the results obtained on testing the dataset model for the modal price prediction of cotton with Adam optimizer for which batch size was taken as 32 and to calculate error 200 epochs were considered. The green line represents the actual price of cotton crop and the red line indicates the predicted price of the crop. It is seen that the graph follows the trend throughout.
Fig. 1 LSTM model prediction using Adam optimizer
122
P. Tanwar et al.
Accuracy. Figure 2 shows the curve for the training loss versus the validation loss graph using the Adam optimizer for LSTM model. The training data is represented by blue line, and the line graph in red is for the validation data. From the graph, it can be seen that the values converge during training. The data is neither overfitting nor underfitting for this model. In Table 1, we can view the performance by taking into consideration various accuracy parameters for the LSTM model for this system. We have taken 3 different optimizers in order to check the one that gives the best result. The values obtained
Fig. 2 Training loss versus validation loss graph using Adam optimizer
Table 1 Accuracy metrics
Accuracy
LSTM ADAM (optimizer)
RMSPROP (optimizer)
AdaDelta (optimizer)
Root mean square error
184.52
209.12
323.74
R2 score
0.8304
0.7822
0.4780
Explained variance score
0.8304
0.8181
0.6428
Max error
1771.18
1804.04
1753.84
Mean absolute error 104.04
154.442
277.47
Mean squared error
34,047.95
43,731.64
104,805.51
Median absolute error
49.30
146.14
290.33
Mean squared log error
0.0016
0.0019
0.0045
Cotton Price Prediction and Cotton Disease …
123
for each accuracy measure are compared, and the best values are considered. After training and testing process of the models for the same dataset, these values were calculated. It can be inferred from the comparison that LSTM that uses Adam optimizer outperforms the LSTM model that uses other optimizers for all the values obtained. Thus, it can be said that LSTM model that uses Adam optimizer is better suited for the price prediction of cotton crop for this system.
4 Disease Detection 4.1 Dataset The initial step is to collect data from the public database, considering an image as an input. The most popular image domains have been acquired, so any format can be used as batch input, for example .bmp, .jpg or .gif. The dataset comprises of 1951 images as training, 106 as testing and 253 as validation datasets. The dataset has four kinds of images in each of the categories, that is, diseased cotton leaf, diseased cotton plant, fresh cotton leaf and fresh cotton plant.
4.2 Convolutional Neural Network (CNN) A convolutional neural network (ConvNet/CNN) is a deep learning system that accepts an input image and assigns importance (weights and learnable biases) to different characteristics/objects in the image, allowing them to be distinguished when compared. In comparison with other classification techniques, ConvNet requires very little pre-processing. Whilst filters are designed by hand in primitive approaches, ConvNets has the potential to learn these filters/features with enough training. The architecture of a ConvNet is encouraged from that of the visual cortex and is comparable to that of the neuron connectivity model of human brain. Individual neurons respond to stimuli in a unique way. Essentially, a CNN works by conducting various convolutions in the network’s various layers. This results in various representations of the learning data, beginning with the most generic in the initial layers and progressing to the most detailed in the deeper ones. Since the lowering of the size of the convolutional layers, they operate as a form of extractor of attributes. The dimensionality of the input data divides it into layers.
124
P. Tanwar et al.
4.3 Methodology There are different types of diseases such as: • Bacterial diseases: “Bacterial leaf spot” is the common name for a bacterial illness. Starting as little, yellow-green lesions on young leaves that resemble warped and twisted leaves, or as black, damp, greasy lesions on older foliage. • Viral diseases: The most evident symptoms of virus-infected plants can be seen on the leaves, but they can also be seen on the leaves, fruits, and roots. The sickness is caused by a virus that is difficult to diagnose. Due to the virus, the leaves seem wrinkled and curling, and the growth may be small. • Fungal diseases: With the help of wind and water, fungal infections can damage contaminated seed, soil, yield, weeds, and propagation. It lightens like grey-green spots soaked by water and is easily recognizable at the bottom or as it becomes more seasoned. It causes the leaf’s surface to turn yellow as it spreads inward [24]. Gathering data and separating it into training, testing and validation datasets is the first step. The training data set should account for roughly 80% of the total labelled data. The information would be used to train the model to recognize various types of photos. The validation data set must contain roughly 20% of the total labelled data, and it is used to see how well our system identifies known labelled data. The remaining unlabelled data would make up the testing data set. This data will be used to see how well our system will classify data it has never seen before. After importing the necessary libraries, we move on to setting the dimensions of the image and found 224, 224 pixels best suitable for our system. All image pixels are then converted to their equivalent numpy array and stored for further use. We then define the path where all the images are stored. We start by defining our machine’s epoch and batch size. This is a critical phase pertaining to neural networks. The epochs were set to 7, and the batch size was set at 50. The VGG16 model we chose for our system must now be loaded. This involves importing the convolutional neural network’s transfer learning component. Transfer learning is simple to utilize because it already has neural networks and other important components that we would have to develop otherwise. Various transfer learning models exist. VGG16 was chosen since it only has 11 convolutional layers and is simple to use. We then used VGG16 to set the weights and characteristics. After then, the process of building the CNN model begins. The first step is to use sequential model to define the model. We then flatten the data and add three more hidden layers. We have a variety of models with various drop outs, hidden layers and activation. Because the data is labelled, the final activation must mostly be softmax. After that, we try fitting our training and validation data to our model using the requirements we specified previously. Finally, we create an evaluation phase to compare the accuracy of our model training set to that of the validation set. We then evaluated the classification metrics and created the confusion matrix. To use classification metrics, we converted our testing data into a numpy array, to read.
Cotton Price Prediction and Cotton Disease … Table 2 Classification accuracy metrics
125
Category
Precision (%)
Recall (%)
f 1-score (%)
Diseased cotton leaf
100
80
89
Diseased cotton plant
96
89
93
Fresh cotton leaf
81
100
90
Fresh cotton plant
93
96
95
4.4 Results The classification accuracy metrics report for CNN algorithm was generated with the values as shown in Table 2. To find the accuracy metrics, we first convert the testing data into a numpy array to read. A confusion matrix works best with the help of a dataframe, so the created numpy array is first converted into a dataframe. A normalized confusion matrix was also found with an indication of well computed accuracies as shown in Fig. 3. The training accuracy versus validation accuracy and training loss versus validation loss graphs were also plotted as shown in Figs. 4 and 5, respectively, which depicted neither underfitting nor overfitting of data and showed optimal results. The plot represents a good fit due to the following reasons: • The training loss plot decreases until a point of stability. • The validation loss plot decreases up to a point of stability and a small gap exists with the training loss plot. Fig. 3 Confusion matrix
126
P. Tanwar et al.
Fig. 4 Training versus validation accuracy
Fig. 5 Training versus validation loss
5 Conclusion Agriculture contributes about 20% to India’s GDP, which plays an important role in India’s economy and employment, so we need to make sure that this segment does not lose. Hence, a machine learning based system consisting of price prediction and disease detection modules was created with the ambition of benefiting the society to the best possible capacity. The novelty of the proposed system is that it is an integrated one consisting of both price prediction along with disease detection which does not exist at present based on the research done. Such a system is of immense utility and benefit to the actual users. The price prediction module was implemented using the LSTM algorithm and different optimizers were used to compare the results. LSTM with ADAM is the best optimizer with RMSE value of 184.52, whilst RMSProp and AdaDelta have RMSE values of 209.12 and 323.74, respectively.
Cotton Price Prediction and Cotton Disease …
127
Also, the disease detection module used CNN algorithm to classify the cotton plants and leaves as fresh or diseased possessing an accuracy of 91.5%.
References 1. Batmavady S, Samundeeswari S (2019) Detection of cotton leaf diseases using image processing. Int J Rec Technol Eng (IJRTE) 8(2S4). ISSN: 2277-3878 2. Weng Y, Wang X, Hua J, Wang H, Kang M, Wang F-Y (2019) Forecasting horticultural products price using ARIMA model and neural network based on a large-scale data set collected by web crawler. IEEE Trans Comput Soc Syst 1–7, 6(3) 3. Weizheng S, Yachun W, Zhanliang C, Hongda W (2008) Grading method of leaf spot based on image processing. In: Proceeding of the 2008 international conference on computer science and software engineering (CSSE). Washington, DC, pp 491–494 4. Haider SA, Naqvi SR, Akram T, Umar GA, Shahzad A, Sial MR, Khaliq S, Kamran M (2019) LSTM neural network based forecasting model for wheat production in Pakistan. Agronomy 5. Sabu KM, Manoj Kumar TK (2019) Predictive analytics in agriculture: forecasting prices of arecanuts in Kerala. In: Third international conference on computing and network communications (CoCoNet) 6. Selvanayagam T, Suganya S, Palendrarajah P, Manogarathash MP, Gamage A, Kasthurirathna D (2019) Agro-genius: crop prediction using machine learning. Int J Innov Sci Res Technol 4(10). ISSN No: 2456-2165 7. Jin D, Gu Y, Yin H, Yoo SJ (2019) Forecasting of vegetable prices using STL-LSTM method. In: 6th International conference on systems and informatics (ICSAI 2019), p 48 8. Yin H, Jin D, Gu YH, Park CJ, Han SK, Yoo SJ, STL-ATTLSTM: vegetable price forecasting using STL and attention mechanism-based LSTM. Agriculture 10(12):612 9. Gayathri G, Niranjana PV, Velvadivu S, Sathya C (2021) Cotton price prediction. Int Res J Modernization Eng Technol Sci 3(4) 10. Dhanapal R, AjanRaj A, Balavinayagapragathish S, Balaji J (2021) Crop price prediction using supervised machine learning algorithms. ICCCEBS 2021, J Phys Conf Ser 11. Gangasagar HL, Dsouza J, Yargal BB, Arun Kumar SV, Badage A (2020) Crop price prediction using machine learning algorithms. Int J Innov Res Sci Eng Technol (IJIRSET) 9(10) 12. Chen Z, Goh HS, Sin KL, Lim K, Chung NK, Liew XY (2021) Automated agriculture commodity price prediction system with machine learning techniques. Adv Sci Technol Eng Syst J 6(2):XX–YY 13. Jain A, Marvaniya S, Godbole S, Munigala V (2020) A framework for crop price forecasting in emerging economies by analyzing the quality of time-series data. arXiv:2009.04171v1 [stat.AP]. 9 Sept 2020 14. Agarwal S, Tarar S (2020) A hybrid approach for crop yield prediction using machine learning and deep learning algorithms. J Phys Conf Ser 1714. In: 2nd International conference on smart and intelligent learning for information optimization (CONSILIO), 24–25 Oct 2020, Goa, India 15. Prajapati BS, Dabhi VK, Prajapati HB (2016) A survey on detection and classification of cotton leaf diseases. 978-1-4673-9939-5/16/$31.00 ©2016 IEEE 16. Rothe PR, Kshirsagar RV (2015) Cotton leaf disease identification using pattern recognition techniques. In: 2015 International conference on pervasive computing (ICPC). 978-1-47996272-3/15/$31.00(c)2015 IEEE 17. Revathi P, Hemalatha M (2012) Classification of cotton leaf spot diseases using image processing edge detection techniques. In: International conference on emerging trends in science, engineering and technology. IEEE. ISBN: 978-1-4673-5144-7/12/$31.00 18. Kumari CU, Jeevan Prasad S, Mounika G (2019) Leaf disease detection: feature extraction with K-means clustering and classification with ANN. IEEE. https://doi.org/10.1109/ICCMC. 2019.8819750
128
P. Tanwar et al.
19. Warne PP, Ganorkar SR (2015) Detection of diseases on cotton leaves using K-mean clustering method. Int Res J Eng Technol (IRJET) 2(4). e-ISSN: 2395-0056 20. Caldeira RF, Santiago WE, Teruel B (2021) Identification of cotton leaf lesions using deep learning techniques. Sensors 21(9):3169 21. Suryawanshi V, Bhamare Y, Badgujar R, Chaudhary K, Nandwalkar B (2020) Disease detection of cotton leaf. Int J Creat Res Thoughts (IJCRT) 8(11) 22. Kumar S, Jain A, Shukla AP, Singh S, Raja R, Rani S, Harshitha G, AlZain MA, Masud M (2021) A comparative analysis of machine learning algorithms for detection of organic and nonorganic cotton diseases. Hindawi Math Probl Eng 2021, Article ID 1790171 23. https://data.gov.in/ 24. Saradhambal G, Dhivya R, Latha S, Rajesh R (2018) Plant disease detection and its solution using image classification. Int J Pure Appl Math 119(14):879–884. ISSN: 1314-3395
Acute Leukemia Subtype Prediction Using EODClassifier S. K. Abdullah, S. K. Rohit Hasan, and Ayatullah Faruk Mollah
Abstract Leukemia is a type of blood cancer having two major subtypes—acute lymphoblastic leukemia and acute myeloid leukemia. A possible cause of leukemia is the genetic factors of a person. Machine learning techniques are being increasingly applied in analyzing the relation between gene expression and genetic diseases such as leukemia. In this paper, we report prediction of leukemia subtypes from microarray gene expression samples using a recently reported ensemble classifier called EODClassifier. Across multiple cross-validation experiments, classification accuracy of over 96% is obtained which reveals consistent performance and robustness. It is also demonstrated that like other popular classifiers, the EODClassifier is also performing well in leukemia prediction. Keywords Data classification · Leukemia gene expression · Feature selection · Ensemble approach · EODClassifier
1 Introduction Leukemia is a group of cancers related to blood cells. Acute leukemia is of two types, i.e., acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML). Genetic factor is believed to be a possible cause of leukemia [1]. Hence, besides hematological diagnosis, investigation with microarray gene expression data samples of such subjects is also being carried out in recent times. Maria et al. [2] have reported five machine learning algorithms for diagnosis of leukemia, i.e., support vector machines (SVM), neural networks (NN), k-nearest neighbors (KNN), naïve Bayes (NB) and deep learning. Performance of these algorithms have been compared as well as their merits and demerits have been pointed out. Joshi et al. [3] have worked S. K. Abdullah (B) · A. F. Mollah Department of Computer Science and Engineering, Aliah University, IIA/27 New Town, Kolkata 700160, India S. K. R. Hasan Infosys Ltd, Kharagpur, West Bengal 721305, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_10
129
130
S. K. Abdullah et al.
with blood slide images for detection of leukemia employing feature selection and classification. They reported 93% accuracy with KNN. Subhan et al. [4] applied a similar approach where they segment blood cell images, extract features and classify. Visual cues inspired feature extraction approach from segmented cells is also followed in [5–7]. Neural network-based classification is also followed in similar experiments [8, 9]. A comparative study of such classification algorithms is made in [10]. Recently, deep learning approaches are also being explored. Sahlol et al. [11] have presented a hybrid approach for leukemia classification from white blood cells by using VGGNet and bio-inspired salp swarm algorithm. Unlike blood cell images, bone marrow images have also been used in convolutional neural network for leukemia prediction [12]. Since the introduction of microarray gene expression leukemia dataset by Golub et al. [13], investigation into genetic root of leukemia is being increasingly studied. Besides prediction of leukemia subtypes from a subject, identification of the associated gene(s) has become a prime interest. A recently developed classifier called EODClassifier [14] integrates discriminant feature selection and classification following an ensemble approach. Using this classifier, one can select top n discriminant features for training and prediction. In this paper, two acute leukemia subtypes, i.e., ALL and AML are predicted using EODClassifier from microarray gene expression [13]. Multi-fold experiments revealed consistently high performance and robustness in leukemia prediction.
2 Overview of EODClassifier In this section, a brief introduction of the EODClassifier is presented. Conceptualized by Hasan et al. [14], it applies an ensemble approach to predict a recall sample. Here, every individual feature having a fitness value makes a decision that are combined to make the final prediction. Besides that, it provides an option to select top n features for training and prediction. Thus, it integrates feature selection with pattern classification which may be preferred in certain applications. On the other hand, it is a faster way of classification as it compares the expression for likelihood instead of computing the probabilities to make a decision. Code installation and guidelines for using this classifier are available at [15]. Below, we briefly discuss how it works. EODClassifier has two parameters, i.e., p = degree and nof = number of features. According to the fitness values, top n of number of features will be taken for training and prediction. If one prefers to use all the features, it can be done as nof ‘all’. Here, p is a parameter subject to tuning. eod = EODClassifier(nof=5, p=2)
By default, nof = ‘all’ and p = 1. Given a training set of samples X_train with labels y_train, training can be done as
Acute Leukemia Subtype Prediction Using EODClassifier
131
eod.fit(X_train,y_train)
and test samples X_test can be predicted as y_pred = eod.predict(X_test)
where y_pred is an array of predicted classes for the test samples. Subsequently, standard evaluation measures can be applied to quantize the classification performance. As of now, it supports binary classification which works for two classes only. Multi-class support is not available.
3 Methodology The presented system applies a supervised approach to leukemia subtype prediction. It requires a collection of AML type and ALL type samples for training. Hence, a dataset needs to be divided into training and test sets in some ratio. The training samples are passed to any suitable pattern classifier such as the EODClassifier for training and the test samples without their class labels are passed to the trained model of that classifier for prediction. Later, the class labels of the test samples are used to quantify the prediction performance. Working methodology of the system is shown in Fig. 1.
3.1 Leukemia Gene Expression Dataset A brief introduction to the leukemia gene expression dataset [13] employed in this work is presented here. There are 72 gene expression samples of leukemia patients. Each of these samples contains the measured and quantified expression levels of 7129 number of genes. Gene expression levels of the samples are visually shown in Fig. 2. It may be noted that some genes are negatively expressed.
3.2 Leukemia Subtype Prediction Usually, gene expression datasets such as the present leukemia dataset [13] contain limited number of samples and high number of features. Moreover, it may be realized from Fig. 2 that the gene expression levels of AML and ALL types are not very distinct. Figure 3 shows the distributions of four sample genes for all the 72 samples. It reflects that there are no distinct decision boundaries for most of the features.
132
S. K. Abdullah et al.
Fig. 1 Block diagram of the leukemia subtype prediction method (prediction is done on the basis of microarray gene expression data of different subjects)
Fig. 2 Expression levels of 7129 genes for 72 samples (The first 25 samples are of AML type and the remaining 47 samples are of ALL type)
Hence, ensemble approaches such as the one followed in EODClassifier are suitable for prediction of high-dimensional samples. It may be noted that this classifier predicts the final class based on the decisions of each individual features and their fitness measures. Thus, in this classifier, a discriminating feature contributes more in determining the final class of a recall sample.
Acute Leukemia Subtype Prediction Using EODClassifier
133
Fig. 3 Expression levels of four sample genes for 25 AML (first class) and 47 ALL (second class) samples
4 Results and Discussion Experiments have been carried out on the leukemia gene expression dataset [13] which contains 72 instances and 7129 attributes. All attributes have numerical values and the outcome or class contains binary values ‘1’ or ‘0’. Class 1 signifies that the subject is acute lymphocytic leukemia and class 0 signifies that the subject is acute myelocytic leukemia. Experimental setup is discussed in Sect. 1. Prediction performance along with comparative performance analysis with respect to other classifiers is presented in Sect. 2. Finally, some observations are discussed in Sect. 3.
4.1 Experimental Setup Classification is done using EODClassifier discussed in Sect. 2 for different folds of cross-validation. Cross-validation is a well-accepted practice in pattern classification problems since it reflects a stronger picture about a classification model compared to a model built in a single pass. Therefore, in order to measure performance of the presented system, cross-validation strategy is followed. Moreover, as there are only a limited number of samples in high dimensions, leave-one-out cross-validation
134
S. K. Abdullah et al.
strategy is also adopted. Besides the experiments with the EODClassifier, similar experiments have been conducted with other well-known classifiers for comparative study. In order to report the obtained results, standard evaluation metrics such as recall, prediction, f-score and accuracy have been adopted.
4.2 Prediction Performance Classification models are trained with mostly default parameters. There are only a few required changes in parameters to these models. These parameters are presented in Table 1. As leukemia subtype prediction is shown with the EODClassifier, confusion matrices obtained for different cross-validation are also shown in Fig. 4. It may be realized that the classification performance of the said classifier is reasonably good and the misclassification rate is nominal. Mean precision, recall, f-score, accuracy and RMSE of all folds have been reported in Table 2 for threefold, fivefold, tenfold, 20-fold and leave-one-out (LOO) crossvalidation experiments with multiple classifiers along with the present classifier of interest, i.e., the EODClassifier. Default values of parameters as available in scikitlearn are taken in naïve Bayes. In KNN, the number of neighbors, i.e., k is 3. In SVM, linear kernel with gamma = ‘auto’ and C = 1 is employed. For multilayer perceptron (MLP), 100 neurons in the hidden layer with ‘relu’ activation function are taken. In random forest (RF) classifier, n_estimators = 10 and random_state = 0. At last, for the EODClassifier, we have taken the parameters as nof = ‘all’ and p = 5. Table 1 Parameters of different classifiers for training and prediction of acute leukemia subtypes Classifier
Parameters
GNB
priors = None
KNN
n_neighbors = 3, weights = ‘uniform’, p = 2, metric = ‘minkowski’ p = 2, metric = ‘minkowski’
SVM
kernel = ‘linear’, gamma = ‘auto’, C = 1
MLP
random_state = 41, hidden_layer_sizes = 100, activation = ‘relu’, solver = ‘adam’, alpha = 0.0001, batch_size = ‘auto’, learning_rate = ‘constant’, learning_rate_init = 0.001, power_t = 0.5, max_iter = 200
Random forest
n_estimators = 10, random_state = 0, criterion = ‘gini’, max_depth = None, min_samples_split = 2
EODClassifier
nof = ‘all’, p = 5
Acute Leukemia Subtype Prediction Using EODClassifier
135
Fig. 4 Confusion matrices obtained for different folds of cross-validation with the EODClassifier. Misclassification rate is very less (as reflected in the non-diagonal positions)
4.3 Discussion As evident from Table 2, EODClassifier achieves over 96% accuracy in all crossvalidation experiments. Accuracies of other classifiers are sometimes less and sometimes closed by that of the EODClassifier. Each method has its own merits and demerits. It is important to note that no single classifier can be identified as the best for all problems. A classifier which struggles on a dataset may yield great results on another dataset. However, it cannot be denied that consistency is important. In that respect, one may observe that performance of EODClassifier has been consistent in all experiments conducted in the present work, which reflects its robustness besides having high classification performance.
136
S. K. Abdullah et al.
Table 2 Acute leukemia subtype prediction performance with multiple classifiers for threefold, fivefold, tenfold, 20-fold and LOO cross-validation #Fold
Classifier
P
R
3
NB
0.9267
1.0
0.9610
0.9473
0.1846
KNN
0.8571
0.9743
0.9116
0.8771
0.3487
SVM
0.9743
0.9777
0.9733
0.9649
0.1529
MLP
0.8898
0.8944
0.9398
0.9122
0.2406
RF
0.9440
0.9583
0.9311
0.9123
0.229
EOD
0.9696
0.9761
0.9761
0.9649
0.1529
NB
0.975
1.0
0.9866
0.9818
0.0603
KNN
0.8683
1.0
0.9276
0.8954
0.2849
SVM
0.9666
0.975
0.9633
0.9500
0.1180
MLP
0.8955
0.9777
0.9411
0.9121
0.2199
RF
0.95
0.9355
0.9447
0.9303
0.2022
EOD
0.9666
0.9714
0.9664
0.9636
0.1206
NB
0.9800
1.0
0.9888
0.9833
0.0408
KNN
0.89
1.0
0.9377
0.9100
0.2119
SVM
0.9800
0.975
0.9746
0.9666
0.0816
MLP
0.9400
0.975
0.9638
0.9433
0.1040
RF
0.9600
0.9550
0.9292
0.9133
0.2080
EOD
0.975
0.975
0.9714
0.9666
0.0816
NB
0.9833
1.0
0.99
0.9833
0.0288
KNN
0.9083
0.975
0.9433
0.9083
0.1508
SVM
0.9666
0.975
0.9633
0.9500
0.0860
MLP
0.9416
1.0
0.9633
0.9416
0.0930
RF
0.9666
0.9166
0.9800
0.9666
0.0577
EOD
0.975
0.975
0.9666
0.9666
0.0577
NB
0.6491
0.6527
0.6491
0.9824
0.0175
KNN
0.6491
0.6250
0.6491
0.9122
0.0877
SVM
0.6315
0.6388
0.6315
0.9473
0.0563
MLP
0.6491
0.6527
0.6491
0.8596
0.1403
RF
0.5789
0.6250
0.5789
0.8771
0.1280
EOD
0.6315
0.6315
0.6315
0.9649
0.3508
5
10
20
72 (LOO)
F-score
Accuracy
RMSE
bold values are denote performance of the EODClassifier
5 Conclusion In this paper, prediction of acute lymphocytic leukemia and acute myelocytic leukemia from microarray gene expression sample using a recently reported
Acute Leukemia Subtype Prediction Using EODClassifier
137
ensemble classifier called EODClassifier is presented. Over 96% classification accuracy is obtained in multiple cross-validation experiments. Like some other popular classifiers, EODClassifier is found to be high performing. Additionally, performance of this classifier is found to be consistent which reflects its robustness. Possible scope of future works includes prediction using limited number of features having relatively high fitness values.
References 1. Bullinger L, Dohner K, Dohner H (2017) Genomics of acute myeloid leukemia diagnosis and pathways. J Clin Oncol 35(9):934–946 2. Maria IJ, Devi T, Ravi D (2020) Machine learning algorithms for diagnosis of Leukemia. Int J Sci Technol Res 9(1):267–270 3. Joshi MD, Karode AH, Suralkar SR (2013) White blood cells segmentation and classification to detect acute leukemia. Int J Emerg Trends Technol Comput Sci 2(3):147–151 4. Subhan MS, Kaur MP (2015) Significant analysis of leukemic cells extraction and detection using KNN and hough transform algorithm. Int J ComputSci Trends Technol 3(1):27–33 5. Laosai J, Chamnongthai K (2014) Acute leukemia classification by using SVM and K-Means clustering. In: Proceedings of the international electrical engineering congress, pp 1–4 6. Supardi NZ, Mashor MY, Harun NH, Bakri FA, Hassan R (2012) Classification of blasts in acute leukemia blood samples using k-nearest neighbor. In: International colloquium on signal processing and its applications. IEEE, pp 461–465 7. Adjouadi M, Ayala M, Cabrerizo M, Zong N, Lizarraga G, Rossman M (2010) Classification of Leukemia blood samples using neural networks. Ann Biomed Eng 38(4):1473–1482 8. Sewak MS, Reddy NP, Duan ZH (2009) Gene expression based leukemia sub-classification using committee neural networks. Bioinform Biol Insights 3:BBI-S2908 9. Zong N, Adjouadi M, Ayala M (2006) Optimizing the classification of acute lymphoblastic leukemia and acute myeloid leukemia samples using artificial neural networks. Biomed Sci Instrum 42:261–266 10. Bakas J, Mahalat MH, Mollah AF (2016) A comparative study of various classifiers for character recognition on multi-script databases. Int J Comput Appl 155(3):1–5 11. Sahlol AT, Kollmannsberger P, Ewees AA (2020) Efficient classification of white blood cell leukemia with improved swarm optimization of deep features. Sci Rep 10(2536):1–11 12. Rehman A, Abbas N, Saba T, Rahman SIU, Mehmood Z, Kolivand H (2018) Classification of acute lymphoblastic leukemia using deep learning. Microsc Res Tech 81(11):1310–1317 13. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–527 14. Hasan SR, Mollah AF (2021) An ensemble approach to feature selection and pattern classification. In: Proceedings of international conference on contemporary issues on engineering and technology, pp 72–76 15. EODClassifier (2021) https://github.com/iilabau/EODClassifier. Accessed 15 June 2021
Intrusion Detection System Intensive on Securing IoT Networking Environment Based on Machine Learning Strategy D. V. Jeyanthi and B. Indrani
Abstract The Internet of Things is the technology that is exploding in the day-today life of the home to the large industrial environment. An IoT connects various applications and services via the internet to make the environment contented. The way of communication among the devices leads to network vulnerability with various attacks. To protect from the security vulnerability of the IoT, the Intrusion Detection Systems (IDS) is employed in the network layer. The network packets from the interconnected IoT applications and services are stored in the Linux server on the end nodes. The packets are got from the server using the crawler into the network layer for attack prediction. Thus, the work contains the main objective is to identify and detect the intrusion among the IoT environment based on machine learning (ML) using the benchmark dataset NSL-KDD. The NSL-KDD dataset is pre-processed to sanitize the null values, eliminating the duplicate and unwanted columns. The cleaned dataset is then assessed to construct the novel custom features and basic features for the attack detection, which represent the feature vector. Novel features are constructed to reduce the learning confusion of machine learning algorithm. The feature vector with the novel and basic features is then processed by employing the feature selection strategy LASSO to get the significant features to increase the prediction accuracy. Due to the outperform of ensembled machine learning algorithms, HSDTKNN (Hybrid Stacking Decision Tree with KNN), HSDTSVM (Hybrid Stacking Decision Tree with SVM) and TCB (Tuned CatBoost) are used for classification. Tuned CatBoost (TCB) technique remarkably predicts the attack that occurs among the packets and generates the alarm. The experimental outcomes established the sufficiency of the proposed model to suits the IoT IDS environment with an accuracy rate of 97.8313%, 0.021687 of error rate, 97.1001% of sensitivity, and specificity of 98.7052%, while prediction.
D. V. Jeyanthi (B) Department of Computer Science, Sourashtra College, Madurai, Tamilnadu, India B. Indrani Department of Computer Science, DDE, MKU, Madurai, Tamilnadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_11
139
140
D. V. Jeyanthi and B. Indrani
Keywords IDS · NSL-KDD · IoT network environment · Custom novel features · PSO · LASSO · Machine learning · HSDTSVM · HSDTKNN · TCB
1 Introduction The IDS is the concept of shield to avoid attacks in the computer system. To compromise the security and constancy among the network which is connected to the internet using various techniques, the IDS are the most required part in the network security configuration. Generally, IDS can be classified into two categories: Anomaly detection and signature-based detection. An anomaly based detection system constructs a database with normal and generates an alarm when there is existence of the abnormal behavior from the normal. The signature-based detection system which maintains a database includes the obtainable patterns of the attacks [1, 2]. This system verifies whether the similar patterns or data exists in the current situation and provide an indication that attack or not. IoT is based on the network layer design of the intercommunication which is responsible for data packets moving among the hosts. In IoT architecture, the network layer is vulnerable and miscellaneous phase which is noticeable to various security concerns. The main reason for the security vulnerability in the IoT is that it contains large number of linked nodes which may lead to failure of entire system due to the affected of single node. The IoT architecture flaws lead to attacks such as DDoS, remote recording, botnets, data leakage, and ransomware. Manipulating a firewall is a primary security measure to fight with the vulnerability on IoT, but that is not a prominent solution due to the variability of the issues on the IoT architecture. This work proposes a framework that employs machine learning techniques to predict security anomalies in IoT environment. Thus, the work focuses on the intrusion occurrence on the IoT connected devices using machine learning techniques. The work adopted the NSL-KDD dataset for the attack prediction using the machine learning techniques by creating novel custom features from the given dataset to increase the prediction accuracy and reduce the training time. Thus, the proposed framework provides high-performance results for NSL-KDD_CF than the employment of feature selection in NSL-KDD dataset. The following section describes the attack prediction process.
2 Review of Literature The scheme suggested by Soni et al. [3] creates usage of two methodologies C5.0 and ANN. To classify the information based on the performance of C5.0 and ANN, a set of significant features must be elected. To attain unique attacks, Somwang et al. [4] developed a hybrid clustering model joining PCA and FART. Exhausting hierarchical clustering and SVM, Su et al. [5] improved detection accuracy by uniting
Intrusion Detection System Intensive …
141
IDS with hierarchical clustering and SVM. KDD 99 dataset was utilized to conduct the experiments. As a result, DoS and probe attacks have been perceived enhanced outcomes. Ei Boujnouni and Jedra et al. [6] proposed an anomaly detection-based network intrusion identification system. This scheme encompasses data conversion, normalization, relevant feature, and novelty discovery models based on a classification and resolution scheme composed of SPPVSSVDs and SMDs for defining whether traffic is normal or intrusive. Bhumgara et al. [7] proposed cross approaches merging J48 Decision Tree, SVM, and NB to discern dissimilar varieties of attacks and includes dissimilar sorts of accuracy deliberating to algorithms. Based on the OPSO-PNN model, the researcher proposed an anomaly based IDS in Sree Kala and Christy [8]. In the paper [9], an IDS is proposed with minimal set of features that employs random forest classifier as supervised machine learning method. These manually selected features assist in training and detect the intrusion in IoT environment with minimum selection and relevant features. The work [10] proposes construct an accurate model by employing various data pre-processing techniques that allow the machine learning algorithm to classify the possible attacks for the parameters exactly using cybersecurity dataset. The work [11] identifies various kinds of attacks of IoT threats using deep learning based IDS for IoT environment using five benchmark datasets. The main objective of the work [12] was to compare both KDDCup99 and NSL-KDD by the performance evaluation of various machine learning techniques with large set of classification metrics. The work [13] focuses on IoT threats by detecting and localizing the IoT devices which are infected and generate alarm. The research [14] proposes an architectural design and implementation with hybrid strategy based on multi-agent and block chain using deep learning algorithms. The researcher [15] proposed a novel framework for inspecting and labeling the suspected packet header and payload to increase the accuracy of the prediction. The paper [16] proposed a system to monitor the soldiers those who are wounded and lost on the front line by tracking the data from sensors. The paper [17] proposed a system in wireless networks for the sustainable smart farming with block chain technology is evaluated to measure the performance. The paper [18] designed a system to control the devices which is far from the control system by sending the status using the sensors.
3 Proposed Scheme This section describes the proposed scheme for the IDS for IoT environment with NSL-KDD. Proposed architecture depicts the attack identification and recognition based on the basic and novel custom features derived. The features are employed with the ML for the attack prediction. The detailed architecture for the proposed scheme is shown in Fig. 1. The proposed architecture for intrusion detection in IoT environment used NSL-KDD. This architecture handles packet information, missing value imputation, duplicate detection, best feature selection and classification. The
142
D. V. Jeyanthi and B. Indrani
Fig. 1 Proposed architecture
proposed work mainly focuses on to generate novel features to solve learning confusion problem for classifiers, and it helps to analyst to understand the features. The proposed work holds 5 layers (i) Data Collection Layer (ii) Pre-Processing Layer (iii) Construction Layer (iv) Feature Selection Layer and (v) Detection Layer to detect the attacks. Table 1 shows the parameters that was used in this work.
Intrusion Detection System Intensive … Table 1 Used parameters
143
Parameter
Description
D
Dataset
DN
Novel dataset
DC
Cleaned dataset
ACC
Accuracy
ER
Error rate
SE
Sensitivity
SP
Specificity
DB
Best feature selected dataset
MR
Miss rate
FO
Fall out
Xn
Training set
Yn
Testing set
PT
True positive
NT
True negative
PF
False positive
NF
False negative
3.1 Data Collection Layer 3.1.1
Dataset
The NSL-KDD has 41 features that considered as into basic, content, and traffic features. In compare to KDD-Cup dataset, an inventive form of NSL-KDD does not undergo from KDD-Cup’s shortcomings [17]. In addition, the NSL-KDD (D) training sets shows a rational number of records. Due to this benefit, it is possible to execute the experiments on the entire dataset short of manually choosing a small part. The dataset D includes various attack groups with ratio of DOS (79%), PROBING (1%), R2L (0.70%), U2R (0.30%) and Normal (19%).
3.2 Pre-processing Layer This pre-processing layer of this work includes the pre-processing of the raw dataset to clean and for the process of deriving novel custom features. The pre-processing of the dataset processes the dataset (D) by eliminating the duplicate columns, avoiding missing values and redundant columns from the dataset reduces the size for the further processing. In Fig. 2, the missing value illustration is shown for the given dataset. The figure depicts that the given dataset doesn’t contain any missing values.
144
D. V. Jeyanthi and B. Indrani
Fig. 2 Missing value
Table 2 Encoding values
Features
Values
Encoded value
Service
Http, Telnet, etc.
0–70
Flag
SF, REJ, etc.
0–11
Protocol
TCP/UDP/ICMP
0/1/2
Class
Normal/Attack
0/1
In this phase, the features in the dataset are encoded for the unique format for process. The fields in the dataset are in various formats so it is complex to compute the custom features for the progression thus the work encodes the fields of the set into uniform format with the encoding value. The sample encoding value for some of the features in shown in Table 2.
3.3 Construction Layer This construction layer builds the proposed novel features which are derived from the dataset (D). These proposed novel features (DN ) are extracted from the fields in the dataset (d) which employed in the prediction to increase accuracy, and it is helping to avoid the learning confusion for ML techniques. Total Bytes: The sum of the total number of source and destination bytes among the packets transaction is integrated to derive the custom feature Total Bytes.
Intrusion Detection System Intensive …
145
Total Bytes = Source Bytes + Destination Bytes Byte Counter: This custom feature is derived to evaluate the Byte Counter with respect to the Total Bytes evaluated which is proportional to the total count. Byte Counter = Total Bytes/Count Interval Counter: This custom feature is derived to evaluate the Byte Counter with respect to the Total Bytes evaluated which is proportional to the total count. Interval Counter = Duration/Count Unique ID: The custom feature Unique ID is derived with the concatenation of the service, flag and protocol type of the captured packet. UID = ProtocolType + Service + Flag Average SYN Error: SYN Error Rate and Destination Host SYN Error Rate of captured packets are integrated and averaged to obtain the average synchronize error rate. Average SYN Error = (SYN Error Rate + Destination Host SYN Error Rate)/2 Total Service Rate: To obtain the Total Service Rate, the same service rate of the packets and different service rates of the packets are integrated. TSR = Same Service Rate + Different Service Rate Nominal of Same Service Rate: This custom feature is derived to evaluate the nominal of same service of the packets with respect to the total service rate. Nominal of Same Service Rate = Same Service Rate/TSR
146
D. V. Jeyanthi and B. Indrani
REJ Error Mean: With the integration of rejection error rate and destination host rejection, error rate is evaluated to find the mean value of the REM. REJ Error Mean = (REJ Error Rate + Destination host REJ Error Rate)/2 Login State: This login state feature is derived to identify whether the host login is enable or not. This feature is derived using the feature logged-in. L State = i f (Loggedin = False) = ’F’ else{i f (Loggedin = ’H ost’) = ’H ’ else ’G’}
Nominal of Different Service Rate: The evaluation of different service rate nominal of the captured packets the total number of different service rate is proportional to the TSR. Nominal of Different Service Rate = Different Service Rate/TSR
3.4 Selection Layer The purpose of this layer is to identify significant features among the derived features of the given dataset in order to increase the accuracy of the prediction. This work presents the following techniques for selecting best features in cleaned NSLKDD (DC). The selected best features are help to improve the accuracy of the classifier.
3.4.1
PSO
Whenever PSO is employed together as a group and when private involvements are learned, those experiences are consolidated. According to the proposed resources, the optimal solution follows a predetermined path. Alternatively, this path is called the particular best solution (pbest ) of the particle as it has been measured as the shortest path. By analyzing its individual rapid experiences and interactions with others, each particle in the exploration space searches for the best solution. A better fitness value can also be achieved by detecting any particle adjacent to any particle in the group. This is denoted as the gbest . Each particle has its linked velocity for the acceleration concerning achieving the pbest and gbest . The basic thought of PSO is to attain global optimal solution, thereby moving each particle toward pbest and gbest with random weight at every phase. Particle swarms are randomly generated
Intrusion Detection System Intensive …
147
and then progress through the search space or primary space until they identify the optimal set of features by keeping track of their position and velocity. As a result, the particle’s current position (p) and velocity (v) are described as follows: pi = { pi1 , pi2 , . . . , pi D }, vi = {vi1 , vi2 , . . . , vi D } where D is dimension of the search space. The following equation is used to calculate the position and velocity of the particle i, k+1 k pik+1 D = pi D + vi D
k k k vik+1 D = w ∗ pi D + a1 ∗ r 1 ∗ pid − pi D + a2 ∗ r 2 ∗ pgd − pi D where kth iterations in the procedure is denoted with k. In the search space, the dth dimension is represented as d ∈ D. Inertia weight is denoted by “w” used to regulate the influence of the preceding of the present velocity. The random values are denoted as r 1 and r 2 for uniformly distributed in [0, 1]. Acceleration constants are represented as a1 and a2. The elements of pbest and gbest are represented as pid and pgd in the dimension dth. Particle positions and velocity values are updated without interruption until the stopping criteria are met, which can be either a large number of iterations or a suitable fitness value.
3.4.2
LASSO Feature Selection
Through LASSO feature selection, regression coefficients are shrunk, and many of them are dropped to zero. This aims to normalize model arguments. As a result of shrinkage, during this phase, the model must be restructured for every non-zero value. In statistical models, this technique minimizes related errors in predictions. An excessive transaction of accuracy is offered by LASSO models. Due to the shrinkage of coefficients, accuracy increases as the inconsistency is reduced, and bias is reduced. It extremely relies on parameter “λ”, which is the adjusting factor in shrinkage. The larger “λ” becomes, then the more coefficients are enforced to be zero. Additionally, it is useful for wipe out all variables that are not correlated to and that are not accompanying with the response variable. Thus, in LR (Linear Regression), this algorithm shrinks the error present in the work by providing an upper bound for squares. If “λ” is a parameter, then the LASSO estimator will be conditional. The “λ” influences shrinkage, with an upsurge in “λ” increasing shrinkage. An inverse connection exists between the upper bound of all coefficients and the “λ”. Whenever the upper bound raises, the attribute λ diminishes. At whatever time the upper bound is decreased, the “λ” grows instantaneously.
148
D. V. Jeyanthi and B. Indrani
3.5 Detection Layer The detection layer is surrounded with machine learning algorithms for detecting the attack using classification techniques. The ML system is employed to the attack prediction for the intrusion detection system. The best features (S B ) obtained from the feature selection phase are used as training (X n ) and testing (yn ) set for the prediction model. The prediction models employed are as follows:
3.5.1
HSDTKNN
One of the ensemble based machine learning algorithm is stacking. The advantage of stacking is that it can harness the abilities of a range of well-performing models on a classification or regression job and create predictions that have improved performance than any solo model in the ensemble. This proposed algorithm entitled “Hybrid Stacking Decision Tree with K-Neighbors Classifier.” Tree-based models are a class of nonparametric calculations that work by distributing the component space into different more minor areas with comparative reaction esteems utilizing a set of splitting rules. Predictions are accomplished by fitting a more straightforward model in every region. Given a training data X n = {t 1 , …, t n } where t i = {t i , …, t i } and the training data X n encompasses the subsequent attributes {T 1 , T 2 , …, T n } and, respectively, attribute Tn comprises the next attribute values {T 1i , T 2i , …, T ni }the instance of the input and specifies a record for network packet. Each instance in the training data X n has a specific class “yn ” is the class tag that means the output of every record perceived. The algorithm first searches for the multiple copies of the same instance in the training data X n . The stacking with Decision Tree (DT) is employed to predict the attack among the network. The ensemble classifier is designed by stacking DT (Meta) and KNN (Base) together. The DT classifier is integrated with the KNN to enhance the overall performance of the training time. From the significant feature set, SFS is employed to construct the training set X n, the each features of n elements is stacked with a different assigned value. Afterward, the DT model is fitted to the n − 1 portions of the setup, while the predictions of the network are prepared at the nth part of the stack. To fit the entire set X n , the same process is repetitive for every part of the training set X n (i). To both yn and X n , the stacked classifier KNN is fitted. There are two sets for training: training set and validation set. The validation set is used to construct the new model with performed evaluations on the set yn . The stacking model of Meta learners is very much like trying to find the best combination of base learners. In this classifier (HSDTKNN), (Table 3) the Base Learner is KNN, followed by the Meta Learner, Decision Tree (DT). The present algorithm begins by specifying the number of base algorithms. This algorithm uses a single-base algorithm called “KNN.” There are specific parameters associated with KNN, such as ten neighbors, KD-Tree computed, and Euclidean distance measures. DT Meta learner has other parameters, including five levels of max depth, none of the
Intrusion Detection System Intensive … Table 3 Parameter of HSDTKNN
149
HSDTKNN—Parameters
Value(s)
Base learner
K Neighbor Classifier (KNN)
Meta learner
Decision Tree (DT)
Cross validation
5
Max depth (DT)
5
Random state (DT)
None
Max leaf node (DT)
20
K-Neighbors (KNN)
10
Algorithm (KNN)
KD-Tree
Distance metric (KNN)
Euclidean
random state, and 20 maximum leaf nodes. Next, it performs k-fold cross-validation with value “5” for predicting the value from the base algorithm. Having received a prediction from the base learner, the meta learner begins to generate ensemble predictions.
3.5.2
HSDTSVM
The Decision Tree is a tree structure in which internal nodes represent tests on attributes, branches represent outcomes, and leaf nodes represent class labels. Subtrees rooted at new nodes are then created using the same procedure as above. An algorithm based on “Hybrid Stacking Decision Trees Using Support Vector Machines” is proposed in this work. SVMs are essentially binary classifiers that divide classes by boundaries. SVM is capable of tumbling the mistake of experimental cataloging and growing class reparability using numerous transformations instantaneously. As the margin reaches the maximum range, separation between classes will be maximized. Expect to be that “yn = {x i , yi }” is a testing sample containing two yi = 1/0 classes, and each class is composed of “x i where, i = 1, …, m” attribute. With solo decision-making or learning models, DT and SVM are more performant as a stack. Although the SVM is an accurate classification method, its deliberate processing makes it a very slow method of training when dealing with large datasets. In the training phase of SVM, there is a critical flaw. To train enormous data sets, here need an efficient data selection method based on decision trees and support vector classification. During the training phase of the proposed technique, the training dataset for SVM is reduced by using a decision tree. It addresses the issue of selecting and constructing features by reducing the number of dataset dimensions. An SVM can be trained using the disjoint areas uncovered by an SVM decision tree. A smaller dataset thus finds a more complex region than a larger one obtained from the entire training set. The complexity of decision trees is reduced with small learning datasets, despite the fact that decision rules are more complex.
150
D. V. Jeyanthi and B. Indrani
Table 4 Parameter of HSDTSVM
HSDTSVM—Parameters
Value(s)
Base learner
Support Vector Machine (SVM)
Meta learner
Decision Tree (DT)
Cross validation
5
Max depth (DT)
5
Random state (DT)
None
Max leaf node (DT)
20
Kernel
Sigmoid
Co-efficient
Sigmoid
Verbose
True
In this classifier (HDTSVM), (Table 4) base learner is the support vector machine (SVM) and the meta learner is the Decision Tree (DT). The workflow of the present algorithm begins with specifying the number of the base algorithm. Here, the base algorithm (SVM) has its own parameters such as Sigmoid Kernel, Sigmoid CoEfficient, and Verbose is true. The meta learner DT also has its own parameters such as five-level of max depth, None of Random State, and twenty Maximum leaf node. Next, it performs k-fold cross-validation with the value of “5” to predict the value from the base algorithm. After getting predictions from the base learner, the meta learner starts predictions to generate an ensemble predicted output.
3.5.3
Tuned CatBoost (TCB)
CatBoost is the name of an implementation of boosted decision trees algorithm in various applications of Boosted Decision Trees (BDTs) to fight the prediction change found in different solutions for certain kinds of distributions and the support for categorical features with another quicker methodology. Catboost algorithm likewise relies upon the ordered boosting technique. Ordered boosting procedure is an improved form of gradient boosting algorithm. Prediction shifting happens due to distinctive kinds of target leakage. For every novel split for the present tree, CatBoost algorithm uses a greedy approach. Except for the first split, every next split includes every combination and categorical feature in the present tree along with categorical features of the dataset. In this calculation, for each model built after any number of trees, every training example being evaluated is allotted a gradient value. To ensure that this grade value being assigned is well-adjusted, the model should be prepared without the specific training model. For overfitting detection, a training dataset will be divided and a little segment will be utilized for testing. Assume that the dataset with test samples (D). D = X j y j Where j = 1, 2, 3, . . . , m is a feature vector and response feature “yj ∈ R,” which can be numeric (0 or 1). There are a significant number of parameters in the model which can be adjusted in order to achieve a better performance. It is significant likewise to cross-validate the model to see if it
Intrusion Detection System Intensive …
151
Table 5 Tuning parameters of TCB TCB—Parameters
Tuned value-1
Tuned value-2
Tuned value-3
Iterations
1000
1200
1500
Learning rate
0.01
0.03
0.1
Loss function, verbose and task type
Cross entropy, 0 and GPU
Cross entropy, 0 and GPU
Cross entropy, 0 and GPU
Depth
6
8
10
Leaf regularization
3
6
9
Classifier accuracy (%)
≤87
≤93
≥97
Error rate (%)
13
7
3
generalizes on testing data as it should and prevents overfitting, offering the chance to prepare the model with a pool of parameter and pick the ones that generalize better with testing data. The parameters tuning for the TCB classifier in this work are mentioned in Table 5. The table depicts the tuned value of the tuned parameter, while prediction and its aspects such as iterations, learning rate, and loss function, depth, etc. With these tuned parameters implemented in the proposed classifier which increases the classifier accuracy and decreases the error rate, while attack prediction.
3.6 Evaluation Result The proposed IDS architecture is implemented using python with Anaconda Environment on Ubuntu Linux, 64 bit system environment with an Intel Xeon E5-2600 with 16 GB RAM, GTX 1050Ti 4 GB Graphics card and 6 TB hard disk on Rack Server. The training and testing sets contain the attack and normal packets information that are collected from NSL-KDD dataset. Processes of this experiment begin with the cleaning and encoding module to extract non-duplicate and useful features are stored into the feature vector. To reduce the machine learning algorithm’s learning confusion (zero value) problem, novel features (NSLKDD_CF) are constructed. NSLKDD dataset used feature selection (PSO, LASSO) to avoid undesirable features and select only the best features. In the next step, classification (HSDTSVM, HSDTKNN, and TCB) is performed using the training set and test set. Based on the detection accuracy between the NSLKDD and NSLKDD_CF (Existing data set features and proposed custom features), the results are compared with the NSLKDD and NSLKDD_CF. This section of the work includes the evaluation results for the feature selection and classification strategies are as follows. The evaluation results are computed for the dataset NSL-KDD and NSL-KDD_CF (Novel Features) is depicted with illustrations. The elapsed time (in sec) is evaluated for the feature selection strategy, while selection is displayed in Fig. 3. It shows that the proposed method LASSO consumes less elapsed time than PSO.
152
D. V. Jeyanthi and B. Indrani
Elapsed Time (in Seconds)
Fig. 3 Feature selection time 40 35 30 25 20 15 10 PSO
LASSO
Feature Selection Algorithm (s)
ACCURACY AND ERROR RATE
Figure 4 illustrates the accuracy and error rate for NSLKDD and NSLKDD_CF with the classifiers performance. The accuracy ratio is high and gives less error rate for the proposed method TCB than other method. Table 6 shows TP, TN, FP, and FN
1
0.9569984
0.9485124
0.9783126
0.9657016
0.8969699
0.9877469
0.8 0.6 0.4 0.2
0.0430016
0.0514876
0.0342984
0.0216874
0.1030301
0.0122531
0 HSDTKNN HSDTSVM NSLKDD
TCB
HSDTKNN HSDTSVM
CLASSIFIERS Accuracy
TCB
NSLKDD_CF
Error Rate
Fig. 4 Accuracy and error rate for NSLKDD and NSLKDD_CF
Table 6 TP, TN, FP, and FN values for the classifiers Dataset
Algorithm(s)
TP (PT )
TN (N T )
FP (PF )
NSLKDD
HSDTKNN
62,755
57,800
4587
830
HSDTSVM
64,983
54,503
2359
4127
TCB
66,599
56,641
743
1989
HSDTKNN
56,028
36,971
2836
467
HSDTSVM
53,286
33,094
5578
4344
TCB
58,477
36,645
387
793
NSLKDD_CF
FN (N F )
Intrusion Detection System Intensive …
153
values for the classifiers HSDTKNN, HSDTSVM, TCB with the datasets NSLKDD, NSLKDD_CF. Table 7 compares the classification metric performance of three classifiers HSDTKNN, HSDTSVM, TCB compared to two datasets NSL KDD, NSLKDD_CF including accuracy, error rate, sensitivity, specificity, and miss rate. (a) Accuracy (ACC) and Error Rate (ER): One way to measure a machine learning algorithm’s accuracy is to determine how many data points it correctly classifies. Based on all data points, the accuracy of a prediction is the number of points correctly predicted. ACC = ACCHSDTKNN =
PT + NT PT + PF + NT + NF
62,755 + 57,800 120,555 = = 0.9569984 62,755 + 57,800 + 4587 + 830 125,972
The error rate (ERR) is calculated as the number of all incorrect predictions divided by the number of data points that were analyzed. Error rates are best at 0.0 and worst at 1.0. ER = 1 − ACC ERHSDTKNN = 1 − 0.9569984 = 0.0430016 Figure 4 shows the accuracy and error rate for the NSLKDD and NSLKDD_CF. TCB has more accuracy and low error rate. (b) Sensitivity and Miss Rate: As determined by the number of correct positive measurements divided by the total number of positives, the sensitivity (SN) is calculated. It is also known as recall (REC) or true positive rate (TPR). Sensitivity is best at 1.0, while it is worst at 0.0. SE(TPR) = SEHSDTKNN =
PT PT + NF
62,755 = 0.9869466 62,755 + 830
A miss rate, or false negative rate (FNR), is calculated by dividing the true positive predictions by the total number of true positives and false negatives. In terms of false negative rates, the best rate is 0.0, and the worst rate is 1.0. MR(FNR) = 1 − SE(TPR)
0.9657016
0.8969699
0.9877469
HSDTSVM
TCB
0.9783126
TCB
HSDTKNN
0.9485124
HSDTSVM
NSLKDD_CF
0.9569984
HSDTKNN
NSLKDD
Accuracy
Algorithm(s)
Dataset
Table 7 Classification metrics
0.0122531
0.1030301
0.0342984
0.0216874
0.0514876
0.0430016
Error Rate
0.9866206
0.9246226
0.9917338
0.9710008
0.9402836
0.9869466
Sensitivity (TPR)
0.0133794
0.0753774
0.0082662
0.0289992
0.0597164
0.0130534
Miss Rate (FNR)
0.9895496
0.8557613
0.9287562
0.9870521
0.9585136
0.9264751
Specificity (TNR)
0.0104504
0.1442387
0.0712438
0.0129479
0.0414864
0.0735249
Fall Out (FPR)
154 D. V. Jeyanthi and B. Indrani
Sensitivity and Miss Rate
Intrusion Detection System Intensive …
1
155
0.9869466 0.9402836 0.9710008 0.9917338 0.9866206 0.9246226
0.8 0.6 0.4 0.2
0.0753774 0.0133794 0.0130534 0.0597164 0.0289992 0.0082662
0 HSDTKNN
HSDTSVM
TCB
HSDTKNN
NSLKDD
HSDTSVM
TCB
NSLKDD_CF
Classifiers Sensitivity (TPR)
Miss-Rate (FNR)
Fig. 5 Sensitivity and fall out for NSLKDD and NSLKDD_CF
MRHSDTKNN = 1 − 0.9869466 = 0.013053 The sensitivity and fall out for the three classifiers and two dataset are in Fig. 5. The proposed method TCB holds high sensitivity and low miss rate for both NSLKDD and NSLKDD_CF datasets. (c) Specificity and Fall Out: Based on the number of correct negative predictions divided by the number of total negatives, specificity (SP) is calculated. True positive rate (TNR) is another name for this ratio. Specificity is best at 1.0, and worst at 0.0. SP (TNR) = SP HSDTKNN =
TN TN + FP
57,800 = 0.9264751 4587 + 57,800
False-positive rate (FPR) is calculated by dividing the total number of negatives by the number of incorrect positive predictions. 0.0 is the best false positive rate, while 1.0 is the worst. FO(FPR) = 1 − SP(TNR) FOHSDTKNN = 1 − 0.9264751 = 0.0735249
D. V. Jeyanthi and B. Indrani
SPECIFICITY AND FALL OUT
156
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0.9870521 0.9287562 0.9264751 0.9585136
0.9895496 0.8557613
0.0735249 0.0414864 0.0712438 0.0129479 HSDTKNN
HSDTSVM
TCB
HSDTKNN
NSLKDD
0.1442387 0.0104504 HSDTSVM
TCB
NSLKDD_CF
CLASSIFIERS
Specificity (TNR)
Fall-Out (FPR)
Fig. 6 Specificity and fall out for NSLKDD and NSLKDD_CF
The specificity (SP) and fall out (FO) for both NSLKDD and NSLKDD_CF provides high specificity and low fall out for proposed method TCB than other classifiers is shown in Fig. 6.
4 Conclusion The IDS for the IoT networking environment is implemented using machine learning techniques by constructing custom features dataset NSLKDD_CF. Custom features are constructed with the motive to diminish the prediction time and upsurge the accuracy of attack identification, which is exposed achieved. PSO, LASSO are used for feature selection to neglect undesirable features. Ensembled hybrid machine learning classifications algorithms HSDTSVM, HSDTKNN, and TCB classified the attacks in the two dataset NSLKDD and NSLKDD_CF and performance is measured. TCB with tuned parameters outperformed for the both dataset. This work is limited with two benchmark datasets. This can be implemented with various IoT real-time dataset and can be implemented as a product and deployed.
References 1. Devaraju S, Ramakrishnan S (2014) Performance comparison for intrusion detection system using neural network with KDD dataset. ICTACT J Soft Comput 4(3):743–752 2. Phadke A, Kulkarni M, Bhawalkar P, Bhattad R (2019) A review of machine learning methodologies for network ıntrusion detection. In: Third national conference on computing methodologies and communication (ICCMC 2019), pp 272–275
Intrusion Detection System Intensive …
157
3. Soni P, Sharma P (2014) An intrusion detection system based on KDD-99 data using data mining techniques and feature selection. Int J Soft Comput Eng (IJSCE) 4(3):1–8 4. Somwang P, Lilakiatsakun W (2012) Intrusion detection technique by using fuzzy ART on computer network security. In: IEEE—7th IEEE conference on ındustrial electronics and applications (ICIEA) 5. Horng S-J, Su M-Y, Chen Y-H, Kao T-W, Chen R-J, Lai J-L, Perkasa CD (2011) A novel intrusion detection system based on hierarchical clustering and support vector machines. Exp Syst Appl 38(1):306–313 6. Ei Boujnouni M, Jedra M (2018) New ıntrusion detection system based on support vector domain description with ınformation metric. Int J Network Secur pp 25–34 7. Bhumgara A, Pitale A (2019) Detection of network ıntrusions using hybrid ıntelligent system. In: International conferences on advances in ınformation technology, pp 500–506 8. Sree Kala T, Christy A (2019) An ıntrusion detection system using opposition based particle swarm optimization algorithm and PNN. In: International conference on machine learning, big data, cloud and parallel computing, pp 184–188 9. Rani D, Kaushal NC (2020) Supervised machine learning based network ıntrusion detection system for ınternet of things. In: 2020 11th ınternational conference on computing, communication and networking technologies (ICCCNT) 10. Larriva-Novo X, Villagrá VA, Vega-Barbas M, Rivera D, Sanz Rodrigo M (2021) An IoTfocused intrusion detection system approach based on preprocessing characterization for cybersecurity datasets. Sensors 21:656. https://doi.org/10.3390/s21020656 11. Islam N, Farhin F, Sultana I, Kaiser MS, Rahman MS et al (2021) Towards machine learning based intrusion detection in IoT networks. CMC-Comput Mater Continua 69(2):1801–1821 12. Sapre S, Ahmadi P, Islam K (2019) A robust comparison of the KDDCup99 and NSL-KDD IoT network ıntrusion detection datasets through various machine learning algorithms 13. Houichi M, Jaidi F, Bouhoula A (2021) A systematic approach for IoT cyber-attacks detection in smart cities using machine learning techniques. In: Barolli L, Woungang I, Enokido T (eds) Advanced ınformation networking and applications. AINA 2021. Lecture notes in networks and systems, vol 226. Springer, Cham. https://doi.org/10.1007/978-3-030-75075-6_17 14. Liang C, Shanmugam B, Azam S (2020) Intrusion detection system for the ınternet of things based on blockchain and multi-agent systems. Electronics 9(1120):1–27 15. Urmila TS, Balasubramanian R (2019) Dynamic multi-layered ıntrusion ıdentification and recognition using artificial ıntelligence framework. Int J Comput Sci Inf Secur (IJCSIS) 17(2):137–147 16. Rahimunnisa K (2020) LoRa-IoT focused system of defense for equipped troops [LIFE]. J Ubiquitous Comput Commun Technol 2(3):153–177 17. Sivaganesan D (2021) Performance estimation of sustainable smart farming with blockchain technology. IRO J Sustain Wireless Syst 3(2):97–106. https://doi.org/10.36548/jsws.2021. 2.004 18. Dr PK (2020) A sensor based IoT monitoring system for electrical devices using Blynk framework. J Electron Inform 2(3):182–187
Optimization of Patch Antenna with Koch Fractal DGS Using PSO Sanoj Viswasom and S. Santhosh Kumar
Abstract An edge fed patch antenna is designed to operate in the Wi-Fi band of 5.2 GHz. For improving the band width and for achieving multiband operation, a Koch fractal DGS structure was incorporated into the ground plane. On introduction of fractal DGS, the antenna exhibits dual-band operation at 3.9 and 6.8 GHz. In order to obtain the originally designed frequency of 5.2 GHz, the antenna structure was optimized using Particle Swarm Optimization (PSO). The optimized antenna resonates at 5.2 GHz and also at 3.5 GHz, which is the proposed frequency for 5G operations. So our optimized antenna exhibits dual-band operation and is suitable for Wi-Fi and 5G applications. Also, it provides good gain in the operating frequency bands. This novel antenna design approach provides dual-band operation with enhanced bandwidth in compact size. The antenna structure was simulated and its performance parameters evaluated using OpenEMS. Keywords Microstrip antenna · Defected Ground Structure (DGS) · Koch Snowflake fractal · Particle Swarm Optimization (PSO)
1 Introduction Microstrip patch antennas find wide application in wireless communication systems because of their meritorious features like small size, light weight and easy fabrication using printed circuit technology [1]. The major drawback of the microstrip antenna includes narrow frequency bandwidth, spurious feed radiation, low power handling capability and single-band operation [2]. A number of approaches that can be adopted for designing dual-band antennas is proposed in [3]. All the antenna configurations proposed in this paper are based on Euclidean geometry. Of late, a considerable amount of interest has been directed to develop antennas based on fractal geometry [4]. Fractal antenna engineering is an emerging area in S. Viswasom (B) · S. Santhosh Kumar Department of ECE, College of Engineering, Trivandrum, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_12
159
160
S. Viswasom and S. Santhosh Kumar
antenna design that utilizes fractal shapes to design new antennas with improved features. Two of the most important properties of fractal antenna are Space Filling and Self Similarity properties. Fractals exhibit self-similarity as they consist of multiple scaled down versions of itself at various iterations. Hence, a fractal antenna can resonate at large number of resonant frequencies and exhibit multiband behavior [5]. The space filling property of antenna can be used to pack electrically larges antennas into small areas, leading to miniaturization of the antenna structures [4]. Fractal shapes has been used to improve the features of patch antennas [6–9]. Defected Ground Structures (DGS) refers to some compact geometry that is etched out as a single defect or as a periodic structure on the ground plane of a microwave printed circuit board. The DGS slots have a resonant nature. They can be of different shapes and sizes. Also, their frequency responses can vary with different equivalent circuit parameters [7]. The presence of DGS is found to exhibit a slow wave effect, which increases the overall effective length of the antenna, thereby reducing its resonant frequency leading to antenna miniaturization [10]. To achieve maximum slow wave effect, fractal structures can be etched to the ground plane. In [11], Koch curve fractal DGS structure has been etched in the ground plane of a circularly polarized patch antenna, which resulted in considerable improvement in terms of better radiation efficiency, optimal return loss bandwidth and size reduction. In [12] Sierpenski carpet fractal DGS structure has been incorporated into a microstrip patch to improve its performance and the structure optimized using PSO to achieve the desired performance characteristics. A microstrip patch antenna has been designed for an operating frequency of 5.2 GHz. The substrate material used for this design is FR4 glass epoxy having a relative permittivity of 4.4. A Koch Snowflake fractal structure has been introduced to the ground plane of the designed antenna for multiband operation and wideband behavior. The modified antenna with the DGS structure resonates at two new frequencies—3.9 and 6.8 GHz. The antenna structure was further optimized using Particle Swarm Optimization (PSO) and optimized antenna resonates at 3.5 and 5.2 GHz, with reasonably good gain. OpenEMS [13] and Octave software were used for the antenna simulation and analysis.
2 Antenna Design This section discusses about the methodology adopted to design the antenna. In this proposed work, a edge fed patch antenna has been designed to operate in the Wi-Fi band of 5.2 GHz. The introduction of a Koch Snowflake fractal DGS structure in the ground plane results in improved performance in terms of multiband behavior. However, the frequency of operation now shifts from the originally designed frequency of 5.2 GHz. By using Particle Swarm Optimization, the patch antenna dimensions and the DGS structure are optimized, so that the optimized antenna now operates at 5.2 GHz and a second resonant frequency of 3.5 GHz.
Optimization of Patch Antenna with Koch Fractal DGS Using PSO
161
2.1 Patch Antenna The patch antenna is a hugely popular antenna used for a wide array of applications, like in satellite communication, mobile communication and aerospace application. A basic rectangular patch antenna can be designed by the following equations [1]: c
W = 2 fo εreff
εr +1 2
1 12h − 2 εr + 1 εr − 1 + 1+ = 2 2 w
L =
0.412h(εreff + 3)( Wh + 0.264) (εreff − 0.254)( Wh + 0.8) L=
c − 2L √ 2 f o εreff
(1)
(2)
(3)
(4)
where, W is the Width of the patch antenna εr is the dielectric constant of the substrate εreff is the effective dielectric constant L is the length of the patch L is the extended length. Using the above design equations, a microstrip patch antenna was designed and its dimensions obtained as shown in Fig. 1. The microstrip antenna is fed using a microstrip line of characteristic impedance 50 . The patch antenna has an input impedance of 200 . To facilitate the impedance matching between the patch antenna and the feed line, a quarter wave transformer [14] is introduced between the antenna and the feed line.
2.2 Koch Snowflake Structure As a means of obtaining multiband operation, a Koch snowflake DGS structure is introduced in the ground plane of the edge fed microstrip antenna. A Koch curve is obtained by replacing the middle portion of a straight line section, by a bend section. In the succeeding iteration, each edge is further divided into three equal parts by replacing the middle section by a bend section. This process is continued for every other iteration. The iterative steps in the design of Koch Snowflake fractal is as shown in Fig. 2. A Koch snowflake fractal of third iteration has been etched to the ground plane of the proposed antenna as shown in Fig. 3.
162
S. Viswasom and S. Santhosh Kumar
Fig. 1 Dimensions of the patch antenna
Fig. 2 Iterative steps in the fractal design of koch curve
3 Results and Discussion The microstrip patch antenna was designed and simulated using openEMS software using Octave interface. OpenEMS is a free and open source electromagnetic field solver which utilizes the FDTD (Finite-Difference time-domain) technique. It supports both cylindrical and Cartesian co-ordinate system. Octave provides a flexible scripting tool for OpenEMS.
Optimization of Patch Antenna with Koch Fractal DGS Using PSO
163
Fig. 3 Koch snowflake fractal DGS structure of iteration 3
3.1 Patch Antenna The reflection coefficient (S11 ), 2D and 3D radiation pattern of the patch antenna are as shown in Figs. 4, 5 and 6.
Fig. 4 S11 of patch antenna
164
Fig. 5 2D pattern of patch antenna
Fig. 6 Patch antenna gain
S. Viswasom and S. Santhosh Kumar
Optimization of Patch Antenna with Koch Fractal DGS Using PSO
165
The antenna resonates at 5.2 GHz and its reflection coefficient is −18 dB as shown in Fig. 4. Its 2D and 3D pattern are as shown in Figs. 5 and 6. The gain of the antenna is 5.14 as shown in Fig. 6.
3.2 Patch Antenna with Fractal DGS For improving the performance of the patch antenna, a Koch Snowflake fractal DGS structure was introduced to the ground plane. Now the antenna operates at two frequencies—3.9 and 6.8 GHz. However the operating frequency of the antenna has shifted from its originally designed resonant frequency of 5.2 GHz. As a means of obtaining the original operating frequency of 5.2 GHz, Particle Swarm Optimization has been applied to the antenna structure. Using PSO, the dimensions of the fractal DGS and the patch dimensions has been optimized so that the antenna still resonates at 5.2 GHz, along with a new operating frequency of 3.9 GHz. The reflection coefficient (S11 ), 2D and 3D pattern of the patch antenna with fractal DGS are as shown in Figs. 7, 8 and 9. The antenna now resonates at two frequencies 4 and 6.8 GHz and its reflection coefficient values are −15 dB and −10 dB as show in Fig. 7. Its 2D and 3D pattern are as shown in Figs. 8 and 9, respectively. The antenna gain is 3.376 as shown in Fig. 9.
Fig. 7 S11 of patch antenna with fractal DGS
166
Fig. 8 2D pattern of patch antenna with fractal DGS
Fig. 9 Gain of the patch antenna with fractal DGS
S. Viswasom and S. Santhosh Kumar
Optimization of Patch Antenna with Koch Fractal DGS Using PSO
167
Fig. 10 PSO Implementation using OpenEMS
3.3 PSO Using OpenEMS Particle Swarm Optimization is a population-based stochastic optimization algorithm that was motivated by the collective intelligent behavior exhibited by certain animals like swarm of bees or flock of birds. PSO algorithm was discovered by Reynolds and Heppner, and the algorithm was simulated by Kennedy and Eberhart in 1995. PSO is computationally more efficient when compared to Genetic Algorithm. The block diagram explaining the PSO implementation in OpenEMS is as shown in Fig. 10. The substrate parameters along with the desired operating frequency is given as input to OpenEMS, and the structure modeled using CSXCAD format. We set a minimum and maximum values for the dimensions of the antenna (patch and fractal dimension). Also, we set a seed value to initiate the PSO algorithm. A suitable fitness function is formed using S—parameters, as given below: F(w, l, L) =
1 + λ × S11 ( f ) Gain( f )
(5)
168
S. Viswasom and S. Santhosh Kumar
where, l, w—Length and Width of the patch antenna L—Dimensions of the fractal DGS Gain (f )—Gain at the designed frequency of 5.2 GHz S11 ( f )—Magnitude of the reflection coefficient at 5.2 GHz λ—Lagrange Multiplier. The dimension that satisfy the fitness function is considered to be the optimized dimension. The attractive feature of this technique is that the antenna can be designed for a desired frequency (Table 1). The reflection coefficient (S11 ), 2D and 3D pattern of the optimized patch antenna with fractal DGS are as shown in Figs. 11, 12 and 13. The optimized antenna operates at two frequencies—3.5 and 5.2 GHz and its reflection coefficient values are −23 dB and −16 dB as shown in Fig. 11. Its 2D and 3D pattern are as shown in Figs. 12 and 13, respectively. The antenna gain is 2.275 as shown in Fig. 13. A summary of the results is given in Table 2.
Table 1 Optimized dimensions Dimensions (in mm) l Original Optimized
12.56 12
Fig. 11 S11 of the optimized antenna
w
L
17.56 15.835
10 21.725
Optimization of Patch Antenna with Koch Fractal DGS Using PSO
Fig. 12 2D pattern of the optimized antenna
Fig. 13 Gain of the optimized antenna
169
170
S. Viswasom and S. Santhosh Kumar
Table 2 Result summary Ant. Struc. Res. Freq. Patch Ant. Patch Ant. with DGS Optimized Ant.
5.2 GHz 3.9 & 6.8 GHz 3.5 & 5.2 GHz
S11 value
BW at 5.2 GHz
−18 dB −15 & −10 dB −23 & −16 dB
50 MHz – 100 MHz
4 Conclusion The design and simulation of a microstrip patch antenna for an operating frequency of 5.2 GHz are presented in this paper. A Koch Snowflake fractal DGS structure was incorporated into the ground plane for improving the antenna performance. However, its resonating frequency deviated from its originally designed frequency of 5.2 GHz. So the antenna structure was optimized using PSO. The optimized antenna resonates at 3.5 GHz and 5.2 GHz and show marked improvement in bandwidth as show in Table 2. The antenna can be used for Wi-Fi and 5G mobile applications.
References 1. Balanis CA (2005) Antenna theory: analysis and design, 3rd edn. Wiley, New York 2. Pozar DM (1992) Microstrip antennas. Proc IEEE 80(1):79–81 3. Maci S, Biffi Gentili G (1997) Dual-frequency patch antennas. IEEE Antennas Propag Maga 39(6):13–20 4. Werner DH, Ganguly S (2003) An overview’ of fractal antenna engineering research. IEEE Antennas Propag Maga 45(I) 5. Sindou M, Ablart G, Sourdois C (1999) Multiband and wideband properties of printed fractal branched antennas. Electron Lett 35:181–182 6. Petko JS, Werner DH (2004) Miniature reconfigurable three-dimensional fractal tree antennas. IEEE Antennas Propag Maga 52(8):1945–1956 7. Masroor I, Ansari JA, Saroj AK (2020) Inset-fed cantor set fractal multiband antenna design for wireless applications. International Conference for Emerging Technology (INCET) 2020:1–4 8. Yu Z, Yu J, Ran X (2017) An improved koch snowflake fractal multiband antenna. In: 2017 IEEE 28th Annual international symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), 2017, pp 1–5 9. Tiwari R (2019) A multiband fractal antenna for major wireless communication bands. In: 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), 2019, pp 1–6 10. Guha D, Antar YMM (2011) Microstrip and printed antennas–new trends, techniques and applications, 1st edn. Wiley, UK 11. Ratilal PP, Krishna MGG, Patnaik A (2015) Design and testing of a compact circularly polarised microstrip antenna with fractal defected ground structure for L-band applications. IET Microwaves Antenna Propag 9(11):1179–1185
Optimization of Patch Antenna with Koch Fractal DGS Using PSO
171
12. Kakkara S, Ranib S (2013) A novel antenna design with fractal-shaped DGS using PSO for emergency management. Int J Electron Lett 1(3):108–117 13. Liebig T, Rennings A, Erni D (2012) OpenEMS a free and open source Cartesian and cylindrical EC-FDTD simulation platform supporting multi-pole drude/lorentz dispersive material models for plasmonic nanostructures. In: 8th Workshop on numerical methods for optical nanostructures 14. Pozar DM (2012) Microwave engineering, 4th edn. Wiley, New York
Artificial Intelligence-Based Phonocardiogram: Classification Using Cepstral Features A. Saritha Haridas, Arun T. Nair, K. S. Haritha, and Kesavan Namboothiri
Abstract When cardiovascular issues arise in a cardiac patient, it is essential to diagnose them as soon as possible for monitoring and treatment would be less difficult than in the old. Paediatric cardiologists have a difficult time keeping track of their patients’ cardiovascular condition. To accomplish this, a phonocardiogram (PCG) device was created in combination with a MATLAB software based on artificial intelligence (AI) for automatic diagnosis of heart state classification as normal or pathological. Due to the safety concerns associated with COVID-19, testing on school-aged children is currently being explored. Using PCG analyses and machine learning methods, the goal of this work is to detect a cardiac condition, whilst operating on a limited amount of computing resources. This makes it possible for anybody, including non-medical professionals, to diagnose cardiac issues. To put it simply, the current system consists of a distinct portable electronic stethoscope, headphones linked to the stethoscope, a sound-processing computer, and specifically developed software for capturing and analysing heart sounds. However, this is more difficult and time-consuming, and the accuracy is lowered as a result. According to statistical studies, even expert cardiologists only achieve an accuracy of approximately 80%. Nevertheless, primary care doctors and medical students usually attain a level of accuracy of between 20 and 40%. Due to the nonstationary nature of heart sounds and PCG’s superior ability to model and analyse even in the face of noise, PCG sounds provide valuable information regarding heart diseases. Spectral characteristics PCG is used to characterise heart sounds in order to diagnose cardiac conditions. We categorise normal and abnormal sounds using cepstral coefficients, or PCG waves, for fast and effective identification, prompted by cepstral features’ effectiveness in speech signal classification. On the basis of their statistical properties, we suggest a new feature set for A. Saritha Haridas (B) Electronics and Communication Engineering, KMCT College of Engineering, Kozhikode, Kerala, India A. T. Nair · K. Namboothiri KMCT College of Engineering, Kozhikode, Kerala, India K. S. Haritha Government Engineering College, Kannur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_13
173
174
A. Saritha Haridas et al.
cepstral coefficients. The PhysioNet PCG training dataset is used in the experiments. This section compares KNN with SVM classifiers, indicating that KNN is more accurate. Furthermore, the results indicate that statistical features derived from PCG Mel-frequency cepstral coefficients outperform both frequently used wavelet-based features and conventional cepstral coefficients, including MFCCs. Keywords Phonocardiogram · AI · Health care · Cardiovascular disorders
1 Introduction Worldwide, cardiopulmonary disease is the major cause of mortality. Congenital heart illness may not exhibit symptoms until later in life, at which time treatment becomes difficult. As a result, it is important to do research on childhood cardiac disorders. Automatic detection of heart sound waves (phonocardiogram) through artificial intelligence algorithms is one straightforward approach in this field. This configuration enables non-medical persons to do cardiac tests. Centuries ago, physicians utilised auscultation to make cardiovascular diagnoses (CVD). The subsequent stethoscope was meant to provide a more pleasant auscultation experience for the patient, and the same device is still widely used in modern medicine to diagnose cardiovascular illness. Auscultation is a technique used to diagnose CVD that requires significant training and expertise in recognising abnormal heart sounds. According to statistical studies, even expert cardiologists achieve only approximately 80% accuracy, whereas primary care doctors and medical students usually reach a level of accuracy of around 20–40%. The phonocardiogram has been shown to be very successful and risk-free in detecting heart abnormalities early. Additionally, PCG measuring equipment is straightforward, simple to use and cost-effective. A homemade electronic phonocardiogram is being used to record and analyse the heart sound coming from of the chest wall in order to identify whether the heart activity is normal or pathological. The patients may be referred to a specialist for additional evaluation and treatment based on the results. Due to the immense promise in this field, much research is being conducted to develop an automated method for detecting cardiac problems using the PCG signal. It is possible to categorise different cardiac illnesses based on the nonlinear changes of heart sound variation, which is a dataset of physiological data. For automated cardiac auscultation, we selected wavelet-based feature extraction and support vector machine (SVM) because of their better capacity to model and assess sequential data in the presence of noise. It was obtained from the 2016 PhysioNet Computation in Cardiology Challenge [1] that the dataset was utilised for training and testing phonocardiograms were recorded using sensors implanted in four typical areas of the human body: the pulmonic region, the aortic area, the mitral area and the tricuspid region. The data were analysed using the phonocardiograms. It is anticipated that the results of this research will have significant ramifications for the early identification of cardiac disease in school-age children in India. This
Artificial Intelligence-Based Phonocardiogram: Classification …
175
research work summarises our effort’s idea, including the hardware and software phases, as well as the works that inspired us. We were able to solve the problem of heart sound analysis by developing a low-cost and effective biotechnological system for capturing and processing PCG signals.
2 Literature Review Luisada et al. [2] performed a clinically and graphic research study on 500 children of school age, using the phonocardiogram technique to collect data. Three examiners conducted clinical auscultation, and a phonocardiographic test revealed 114 (22.8%) abnormal occurrences. There was no correlation between the heart sound and the child’s height/weight in the research. Tae H. Joo et al. examined phonocardiograms (PCGs) of aortic heart valves [3]. They identified frequency domain characteristics by using a parametric signal modelling method that was designed specifically for sound wave categorization purposes. According to the model, a high-resolution spectral estimate is provided, from which the frequency domain characteristics may be deduced. PCGs were classified using two stages of classification: feature selection first, followed by classification. The classifiers are trained on the locations of the two maximum spectral peaks. The classifier successfully identified 17 patients out of a total of 20 cases in the training set. A method for assessing children for murmur and adults for valve abnormalities was established by Lukkarinen et al. [4] in situations when ultrasonography exams are not readily accessible. The equipment comprises of a stand-alone electronic stethoscope, stethoscope-mounted headphones, a sound-capable personal computer, and software applications for capturing and analysing heart sounds. It is possible to perform a number of operations and research on heart beat and murmur in the 20 Hz to 22 kHz range of frequencies thanks to the technology that has been created [5]. Highlighted the essential phases involved in the generation and interpretation of PCG signals. This article discusses how to filter and extract characteristics from PCG signals using wavelet transformations. Additionally, the authors highlight the gaps that exist between existing methods of heart sound signal processing and their clinical application. The essay highlights the limitations of current diagnostic methods, namely their complexity and expense. Additionally, it addresses the requirement for systems capable of correctly obtaining, analysing and interpreting heart sound data to aid in clinical diagnosis. B. Techniques Artificial Intelligencebased Shino et al. [6] present a technique for automatically categorising the phonocardiogram using an Artificial Neural Network (ANN). A national phonocardiogram screening of Japanese students was used to validate the method. 44 systolic murmurs, 61 innocent murmurs and 36 normal data are highlighted in the test findings. The melodic murmur was effectively isolated from the potentially dangerous systolic murmur via the use of frequency analysis. When it comes to making the final decision, the procedure is very beneficial for medical professionals. Strunic et al. [7] created an alternative method for PCG classification by using Artificial Neural
176
A. Saritha Haridas et al.
Networks (ANN) as a detector and classifier of heart murmurs, in addition to the techniques previously described. Heart sounds were categorised into three categories: normal, aortic stenosis and aortic regurgitation, using both generated and real patient heart sounds. They were able to categorise with an accuracy of up to 85.74 per cent. It has been shown that the precision of a team of health students is significantly associated with the precision of the ANN system when simulated sounds are present. Using convolutional neural networks, Sinam Singh et al. [8] developed an efficient technique for identifying PCG. They created two-dimensional scalogram pictures using a pre-trained AlexNet and sound data from the PhysioNet2016 competition, in addition to the continuous wavelet transform (CWT). Scalogram images were utilised in conjunction with deep learning to construct a convolutional neural network. The proposed approach achieved favourable results by minimising segmentation complexity.
3 Materials and Methods Phonetic cardiography (PCG) is a method for capturing and visualising the sounds produced by the human heart during a cardiac cycle [9]. It is used to diagnose and treat heart failure. This technique is carried out using a phonocardiogram, which is a kind of electrocardiogram. Various dynamic processes happening within the circulatory system, such as the relaxation of the atria and ventricles, valve motion and blood flow resulted in the production of this sound. When it comes to screening and detecting heart rhythms in healthcare settings, some well stethoscope method has long been the gold standard. Auscultation of the heart is the study of determining the acoustic properties of heart beats and murmurs, including their frequency, intensity, number of sounds and murmurs, length of time and quality. One significant disadvantage of conventional auscultation is that it relies on subjective judgement on the part of the physician, which may lead to mistakes in sound perception and interpretation, thus impairing the accuracy of the diagnosis. The creation of four distinct heart sounds occurs throughout a cardiac cycle. During the first heartbeat of systole, the first cardiac sound, often abbreviated S1, is generated by the turbulence induced by the mitral and tricuspid valves closing simultaneously. The aortic and pulmonic valves close, resulting in the production of the second cardiac sound, which is represented by the term “dub”. When a stethoscope is put on the chest, as doctors do, the first and second heart sounds are easily distinguishable in a healthy heart, as are the third and fourth heart sounds (Fig. 1). The low-frequency third heart sounds (S3) is usually produced by the ventricular walls vibrating in response to the abrupt distention caused by the pressure difference between the ventricles and the atria, which causes the ventricular walls to vibrate. It is only heard in youngsters and individuals suffering from heart problems or ventricular dilatation under normal circumstances [10]. S4 is very seldom heard in a normal heart sound because it is produced by vibrations in expanding ventricles caused by contracting atria, which makes it difficult to detect. Each of the four heart beat has a
Artificial Intelligence-Based Phonocardiogram: Classification …
177
Fig. 1 A typical phonocardiograph having the S1, S2, S3 and S4 pulses on it
distinct frequency range, with the first (S1) being [50–150] Hz, the second [50–200] Hz, the third (S3) [50–90] Hz, and the fourth (S4) being [50–80] Hz. Moreover, the S3 phase starts 120–180 ms after the S2 phase and the S4 phase begins 90 ms before the S1 phase.
4 Existing System The well-known stethoscope technique is the usual method of screening and diagnosing heart sounds in primary health care settings. Auscultation of the heart is the study of determining the acoustic properties of heart sounds and murmurs, including their frequency, intensity, number of sounds and murmurs, length of time and quality. One significant disadvantage of this technique of auscultation is that it relies on subjective judgement on the part of the physician, which may lead to mistakes in sound perception, thus impairing the validity of the diagnostic information. Figure 2 depicts a block diagram representation of a common phonocardiogram configuration.
Fig. 2 Schematic representation of the phonocardiogram setup
178
A. Saritha Haridas et al.
It is necessary to detect sound waves using a sensor, which is most often a high-fidelity microphone. After the observed signal has been processed using a signal conditioner such as a pre-filter or amplifier, the signal is shown or saved on a personal computer. The CZN-15E electret microphone, two NE5534P type amplifiers, a transducer block and connector (3.5 jack) for signal transmission to the computer, as well as a 12 V DC power supply, are the main hardware components of this workstation’s hardware architecture. This particular sensor is an electret microphone, which does not need an external power source to polarise the voltage since it is self-contained. Using the CZN15E electret microphone in this system is something that’s being explored [11, 12]. Table 1 has a listing of the suitable electret microphone (CZN15E) as well as the microphone’s physical characteristics. In response to the heart’s vibrations, vibrating air particles are sent to the diaphragm, which then regulates the distance between the plates as a result of the vibrations transmitted to it. As air passes through the condenser, the electret material slides over the rear plate, causing a voltage to be generated. The voltage produced is very low, and it is necessary to provide it to the amplifier in order for it to operate at its best. The amplifier needed for high-speed audio should have a low-noise floor and use a minimal amount of electrical power. A special operational amplifier, the NE5534P, was developed specifically for this purpose.
5 Proposed System This project’s hardware component is broken. Only the simulation part, which consists of a few simple stages, is used, for example, to determine the cepstral properties. To accurately analyse heart sounds that are non-stationary in nature, the wavelet transform is the most appropriate technique for the task at hand. A wavelet transform is a representation of data that is based on time and frequency. Using cepstrum analysis has many advantages. First, the cepstrum is a representation used in homomorphic signal processing to convert convolutionally mixed signals (such as a source and filter) into sums of their cepstra, which can then be used to linearly separate the signals. The power cepstrum is a feature vector that may be used to model audio signals and is very helpful. The method to feature extraction is the most important component of the pattern recognition process since it is the most accurate. To measure the features of each cardiac cycle, the complete cycle must be analysed using cepstral coefficients, which efficiently determine the log-spectral distance between two frames. In this part, we compare the two models K-Nearest Neighbour (KNN) and Support Vector Machines (SVM) [13] to discover which the better choice for binary classification problems is. KNN has an exceptional accuracy.
Artificial Intelligence-Based Phonocardiogram: Classification …
179
Table 1 Review on conventional methods Author (Citation)
Methodology
Features
Challenges
Tae H. Joo et al.
PCG
• Frequency domain features high-resolution spectral estimate
The denoising method can’t remove noises coming from children crying and moving the recording
Lukkarinen S et al.
PGG
• A system for • Mild, moderate and screening murmurs of severe murmurs were children and valve not graded defects of adults. • Frequency ranges from 20 Hz to 22 kHz
Hideaki Shino et al.
ANN
• Nationally verified The sample size is simulated and relatively low recorded patient heart sounds were tested and classified heart sounds
S. L. Strunic et al.
ANN
• Implemented as a detector and classifier of heart murmurs • Classified up to 85.74% accuracy
The method may not work well if a nonprofessional volunteer records PCG signal
F. Rios-Gutierrez et al.
Jack-Knife method
• An iterative process one sample was left out each time
The predicted value quantified to 0 (≤0.5) or (>0.5) by a threshold of 0.5
Sinam Ajitkumar Singh et al.
CNN SVM
• Effective way of classifying PCG More accurate than ANN
The accuracy rate were closely correlated
Iga Grzegorczyk et al. Hidden Markov model
• The segmentation of the PCG signals is performed
The best overall score achieved in the official phase of the PhysioNet challenge is 0.79 with specificity 0.76 and sensitivity 0.81
Mawloud Guermoui et al.
EGG PCG SVM
Characteristic features extracted from PCGs
• A low-dimensional feature space tested on relatively a big dataset
James H. and Robert S. Lees et al.
Parametric signal modelling method
Frequency domain features suitable for the classification of the valve state can be derived
180
A. Saritha Haridas et al.
5.1 Cepstral Coefficients Initially, frequency cepstral coefficients were proposed to aid in the recognition of monosyllabic syllables in continuously uttered phrases, but not in the identification of the speaker. It is feasible to simulate the human hearing system artificially by computing cepstral coefficients. This is based on the assumption that the human ear is a very accurate speaker recognizer, which is supported by research. It is based on the well-known disparity between the critical bandwidths of the human ear and the critical bandwidths of computers that cepstral characteristics are developed. In order to preserve the phonetically significant features of the speech stream, linearly spaced frequency filters were used at low frequencies and logarithmically spaced frequency filters were used at high frequencies, respectively. Most voice transmissions consist of tones with changing frequencies, with each tone having an actual frequency, f (Hz) and a subjective pitch, computed using the Mel scale, for each tonne. The linear spacing of the Mel-frequency scale is less than 1000 Hz, whilst the logarithmic spacing is more than 1000 Hz, with linear spacing less than 1000 Hz. A 1 kHz tone played at 40 dB over the perceptual hearing threshold is estimated to have a pitch of 1000 mels, and this is used as a reference in the following example. Using a filter bank, it is possible to calculate the FCC coefficients of a signal by dissolving it and then calculating its frequency. In terms of short-term energy, it provides a discrete cosine transform (DCT) of the real logarithm of the spectrum on the Mel-frequency scale, which is represented by the spectrum on the Mel-frequency scale. To identify their contents, frequency cepstral coefficients are used to identify the contents of flight reservations, phone numbers spoken into a phone, and voice recognition systems used for security. A number of modifications to the basic method have been suggested in order to boost resilience, including increasing the log-mel-amplitudes to a suitable power (about 2 or 3) prior to applying the DCT and minimising the impact of the low-energy parts.
5.2 KNN Classifier The K-Nearest Neighbour algorithm is a fundamental component of Machine Learning. It is founded on the technique of Supervised Learning. To maximise accuracy, this approach makes an assumption about the similarity between the new case/data and the existing cases and assigns the new instance to the category that is most similar to the existing categories. The K-NN method keeps all available data and classifies new data points according to their similarity to previously classified data. This means that as new data is generated, it may be rapidly categorised into one of the suitable categories using the K-NN method. Although the K-NN technique may be used to solve regression as well as classification problems, it is most commonly employed to solve classification difficulties. K-NN is a non-parametric technique, which means that it makes no assumptions regarding the data used as a baseline. It
Artificial Intelligence-Based Phonocardiogram: Classification …
181
Fig. 3 Diagram of KNN
is commonly referred to as a lazy learner algorithm since it does not immediately begin learning from the training set but instead stores it and then performs an action on it when the time comes to categorise the data. During the training phase, the KNN algorithm simply saves the dataset and classifies new data into a category that is highly close to the dataset used for training. Consider the following fictitious scenario: There are two categories, A and B, and we have a new data point × 1 that we would want to allocate to one of them. This type of difficulty necessitates the deployment of a K-NN algorithm. We can quickly and simply discover the category or class of a given dataset by utilising K-NN methods. Consider the following example (Fig. 3): The following algorithm can be used to illustrate the operation of the K-NN network: Step-1: Choose the neighbour’s Kth number. Step-2: Compute the Euclidean distance between K neighbouring points. Step-3: Take the K closest neighbours based on the Euclidean distance computed. Step-4: Count the number of data for each category amongst the k closest neighbours. Step-5: Allocate the new data points to the category with the most neighbours. Step-6: KNN model. Assume, we have a new data point and need to assign it to the appropriate category. Consider the following illustration (Fig. 4): • To begin, we will always select k = 5 as the number of nearest neighbours. • Then, we’ll calculate the Euclidean distance between the two spots. The Euclidean distance is the distance between two previously studied geometrical locations. It is calculated in the following manner: • We determined the nearest neighbours by computing the Euclidean distance, with three in category A and two in category B. Consider the following example:
182
A. Saritha Haridas et al.
Fig. 4 Diagram of classification 1
• As we can see, the three nearest neighbours all belong to category A, indicating that this new data point must as well (Fig. 5).
5.3 SVM Algorithm When dealing with Classification and Regression issues that require Supervised Learning, the Support Vector Machine, or SVM, is a frequently used method. In Machine Learning, on the other hand, it is mostly utilised to tackle categorization issues. The goal of the SVM method is to find the optimal line or decision boundary
Fig. 5 Diagram of classification 2
Artificial Intelligence-Based Phonocardiogram: Classification …
183
Fig. 6 Diagram of classification 1
that divides n-dimensional space into classes, enabling future data points to be categorised with ease. A hyperplane is the mathematical term for this optimal choice boundary. The SVM algorithm determines the hyperplane’s extreme points/vectors. Support vectors are used to refer to these severe conditions, and the technique is called a Support Vector Machine. Consider the diagram below, which illustrates two distinct categories divided by a decision boundary or hyperplane (Fig. 6).
6 Experimental Setup TRAINING
TESTING
184
A. Saritha Haridas et al.
6.1 Hardware Requirements A software requirements specification (SRS) is a comprehensive description of the software system that will be developed, including both functional and nonfunctional requirements. It is used in the development of software systems. The SRS is created in accordance with the agreement reached between the client and the contractors. Depending on the software system, it may include demonstrations of how the user will interact with it. There are no requirements that cannot be met by the document providing the software requirement specification. To build a software system, we must first have a comprehensive understanding of the system under consideration. In order to guarantee that these requirements are fulfilled, continuous communication with consumers is required. When written well, a software system interaction specification (SRS) defines how a software system will interact with all internal modules, hardware, other programmes and humans under a wide range of real-world circumstances. It is critical for testers to comprehend all of the information given in this article in order to prevent making errors in test cases and the desired results of those tests. It is highly suggested that SRS papers be thoroughly examined or tested before developing test cases or developing a testing strategy. The importance of MATLAB is due to its capabilities for lattice research. Today, we require a domain in which estimate, detailing and visual representations involving numbers must be examined. As a result, we require a dialect that uses fourth-generation technology to allow anomalous state programming. MATLAB was created by Mathswork. Mathematical work enables the treatment of lattices; it enables calculation; data and capacity plotting; calculation development; and user interface design; it enables the consolidation of programmes written in other dialects, including FORTRAN, C++, Java and C; it also enables the dissection of data and the creation of unique applications and models. It consists of a huge number of implicit charges and the usefulness of science, which enables us to choose scientific projects, plot ages and feasible mathematical techniques. It is an incredibly helpful equipment for numerical calculations (Table 2). There are several critical features of MATLAB: • It is utilised for juggling numbers, constructing uses and determining. • It creates the communal environment conducive to problem solving, outlining and painstaking research. • Statistics, separation, number crunching unification, straight polynomial math, normal differential conditions and fathoming improvements are all scientific skills included in its library, along with built-in apparatuses for graphical data perception and bespoke plots. Table 2 Hardware requirement
Processor
PC with a core i3processor (Recommended)
RAM
4 GB (Recommended)
Hard circle
320 GB (Recommended)
Artificial Intelligence-Based Phonocardiogram: Classification … Table 3 Table of summons vectors
Command
185
Purpose
clc
Clears command window
clear
Removes variables from memory
exist
Checks for existence of file or variable
global
Declares variables to be global
help
Searches for a help topic
Look for
Searches help entries for a keywords
Quit
Stops MATLAB
Who
Lists current variables
Whose
Lists current variables (long display)
• It is a tremendously powerful equipment for increasing the character of codes and widening the introduction of the interface. It offers instrumentation for the graphical user interface. • Additionally, it offers tools for connecting non-MATLAB programmes such as Microsoft Excel, .Net, Java and C with MATLAB computations. It also made extensive use of a range of applications, including the following: It includes a programme called Matlab that enables users to manipulate numbers and see information. Using the provoke “ >>” in the charge window, you can effectively create the summons. Consumers commonly employ a few basic summonses. A table detailing such orders is given below (Table 3). Number vectors are one-dimensional representations. In MATLAB, vectors are divided into two types: Column Vectors: When the arrangement of data or components is restricted by square portions, this sort of vector is used; for unconstrained components, we use a comma or space. Section Vectors: When the arrangement of data or components is restricted by square sections, this sort of vector is used; for unrestrained components, a semicolon is used. Plotting To create the chart in MATLAB, we must follow the following steps: 1. 2. 3.
Define the range of the x variable and, additionally, denote the type of task for which f x are shown. Similarly, capacity y is defined. There is a summons known as a plot, sometimes known as a scheme (x, y). MATLAB diagram plot.
Additional modifications to this design include adding a title, naming the x- and yhubs, framing network lines connecting the chart plot zones, and altering the graphic’s tomahawks.
186
A. Saritha Haridas et al.
7 Experimental Results 7.1 Training Preprocess the dataset prior to initiating the training method. Through the use of randomised augmentation, it augments the training dataset. Additionally, augmenting enables the training of networks to be insensitive to picture data abnormalities. Resizing and grayscale conversion are included in the pre-processing. Individuals were classified into two groups in this section: “Normal” and “Abnormal”. One can determine the progress of training by tracking several factors. When the “Plots” option in training Options is set to “training-progress” and the network is trained, train network generates a figure and shows training metrics for each iteration. Each cycle determines the gradient and adjusts the parameters of the network. If training options include validation data, the picture displays validation metrics for each time the train network validates the network. It provides information on training correctness, validation accuracy and train loss (Fig. 7). Confusion matrix and receiver operating characteristic curves illustrate the system’s performance. Confusion matrix constructs a Confusion Matrix Chart object from a confusion matrix chart that includes both true and anticipated labels. The confusion matrix’s rows correspond to the real class, and its columns to the predicted class. Diagonal and off-diagonal cells represent properly categorised observations and erroneously classified observations, respectively (Fig. 8).
Fig. 7 Signal analyser
Artificial Intelligence-Based Phonocardiogram: Classification …
187
Fig. 8 Confusion matrix
In a binary classification, the first green colour denotes the positive (abnormal), whereas the second green colour denotes the negative (normal) (normal). We picked a total of 19 abnormal signals and 29 normal signals for testing. For aberrant signals, true positives (TP) are 19 (green in the confusion matrices), but true negatives (TN) are zero (pink colour). That is, all 19 aberrant signals in this case are correctly anticipated as abnormal. Thus, the true negative is zero in this case, suggesting one hundred per cent accuracy. Five false positives and twenty-four false negatives are displayed for every 29 normal signals. That instance, normal 5 signals are incorrectly classified as abnormal, but normal 24 signals are correctly classified as normal. This equates to an accuracy of 82.8%. Overall, the accuracy is 95%. The receiver operational curve (ROC curve) is a graph that illustrates the performance of a classification model over all categorization criteria. The genuine positive rate (Y axis) and the false positive rate (X axis) are shown on this curve (x-axis). The word “True Positive Rate” is a colloquial term for “recall”. It is defined as follows: TPR =
TP TP + FN
False Positive Rate (FPR) is defined as follows
188
A. Saritha Haridas et al.
FPR =
FP FP + TN
The receiver operating characteristic (ROC) curve depicts the connection between TPR and FPR over a range of classification criteria. Reduce the threshold for positive classification, and more items are labelled as positive, increasing both False Positives and True Positives. A typical receiver operating characteristic (ROC) curve is seen in the accompanying figure. To compute the points on a ROC curve, we might analyse a logistic regression model several times with varied classification criteria. However, this would be wasteful. Fortunately, there is a quick, sorting-based approach known as AUC that can provide this information. In this situation, the curve formed is nonlinear (Fig. 9). AUC is an abbreviation for the Area Under the Receiver Operating Characteristic Curve. The term AUC stands for “Area Under the ROC Curve“. That is, AUC measures the whole two-dimensional area beneath the entire receiver operating characteristic curve (consider integral calculus) from (0,0) to (1,1). (100,100). (1,1). Dataset Testing Results: Abnormal Case See Fig. 10. Normal Case See Fig. 11 and Table 4.
Fig. 9 ROC Curve
Artificial Intelligence-Based Phonocardiogram: Classification …
Fig. 10 PCG Classification result (Abnormal)
Fig. 11 PCG Classification result (Normal)
189
190 Table 4 Features of ROC curve
A. Saritha Haridas et al. Accuracy
0.899
Sensitivity
1
Specificity
0.79167
Precision
0.82759
Recall
1
f_measure
0.90566
g mean
0.88976
8 Future Scope Similarly, electrocardiogram (ECG) data should be analysed using the artificial intelligence (AI) approach. Rather of just classifying cardiovascular diseases as normal or abnormal, this future endeavour will give them names. Finally, it is recommended that the PCG and ECG techniques be integrated and used to heart disease diagnostics in order to enhance the prediction of coronary artery disease.
9 Conclusion This piece is divided into three different sections. This research is divided into two stages: the first includes gathering phonocardiogram data, and the second involves creating an artificial intelligence-based computer system for automatically distinguishing normal from pathological heart sounds. Third step is cardiovascular disease screening of a limited group of PCG as part of their social responsibility. The procedure has been finished in its entirety. After acquiring the PCG signal, features were extracted using the cepstral coefficient and classification conducted using the KNN and Support Vector Machine (SVM) techniques. The best choice is shown by KNN. We used PCG data from the well-known Physio Net online service to conduct training and testing. The training procedure is significantly faster than previous feature extraction techniques.
References 1. Moody B, Li-wei H, Johnson I (2016) Classification of normal/abnormal heart sound recordings: the PhysioNet/computing in cardiology challenge 2016. In: International conference on computing in cardiology (CinC), Vancouver, BC, Canada 2. Luisada A, Haring OM, Aravanis C (1958) Murmurs in children: a clinical and graphic study in 500 children of school age. Brit Heart J 48:597–615 3. Joo TH, James H, Lees RS (1983) Pole-zero modeling and classification of phonocardiograms. IEEE Trans Biomed Eng BME-30:110–118
Artificial Intelligence-Based Phonocardiogram: Classification …
191
4. Lukkarinen S, Noponen AL, Skio K (1997) A new phonocardiographic recording system. J Comput Cardiol 24:117–120 5. Emmanuel BS (2012) A review of signal processing techniques for heart sound analysis in clinical diagnosis. J Med Eng Technol 36:303–307 6. Shino H, Yoshida H, Sudoh J (1996) Detection and classification of systolic Murmur for phonocardiogram screening. In: Proceedings of the 18th Annual international conference of the IEEE Engineering in Medicine and Biology Society, Amsterdam, pp 123–125 7. Strunic SL, Rios-Gutierrez F, Alba-Flores R (2007) Detection and classification of cardiac Murmurs using segmentation techniques and artificial neural networks. In: IEEE symposium on computational intelligence and data mining, Honolulu, HI, USA 8. Singh SA, Majumder S, Mishra M (2019) Classification of short un-segmented heart sound based on deep learning. In: IEEE international instrumentation and measurement technology conference, Auckland, New Zealand 2019. 9. Nair AT, Muthuvel K (2021) Automated screening of diabetic retinopathy with optimized deep convolutional neural network: enhanced moth flame model. J Mech Med Biol 21(1):2150005 (29 p). https://doi.org/10.1142/S0219519421500056 10. Nair AT, Muthuvel K (2020) Blood vessel segmentation and diabetic retinopathy recognition: an intelligent approach. In: Computer methods in biomechanics and biomedical engineering: imaging & visualization. https://doi.org/10.1080/21681163.2019.1647459 11. Nair AT, Muthuvel K, Haritha KS (2020) Effectual evaluation on diabetic retinopathy. Publication in Lecture Notes, Springer, Berlin 12. Nair AT, Muthuvel K, Haritha KS (2021) Blood vessel segmentation for diabetic retinopathy. In: Publication in the IOP: Journal of Physics Conference Series (JPCS), Web of Science 13. Nair AT, Muthuvel K (2020) Research contributions with algorithmic comparison on the diagnosis of diabetic retinopathy. Int J Image Graphics 20(4):2050030(29 p). https://doi.org/10. 1142/S0219467820500308 14. Guermoui M, Mekhalfi ML, Ferroudji K (2013) Heart sounds analysis using wavelets responses and support vector machines. In: 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA) 15. Grzegorczyk I, Perka A, Rymko J (2016) PCG classification using a neural network approach. In: Computing in cardiology conference, Vancouver, BC, Canada 16. Abbas AK, Bassam R (2009) Phonocardiography signal processing: IEEE synthesis lectures on biomedical engineering. Aachen University of Applied Science
Severity Classification of Diabetic Retinopathy Using Customized CNN Shital N. Firke and Ranjan Bala Jain
Abstract Diabetic retinopathy is an issue that impacts the eyes due to diabetes. The problem is caused by arteries in the light-sensitive tissue in the eyeball. It is becoming extremely crucial to diagnose early important things to save many lives. This work will be classified by identifying patients with diabetic retinopathy. A convoluted neural network has been developed using K-Fold cross-validation technology to make the above diagnosis and to give highly accurate results. The image is put through convolution and max-pooling layers that are triggered with the ReLU function before being categorized. The softmax function was then utilized to complete the process by triggering the neurons in the dense layers. While learning the system, the accuracy improves, and at the same period, the loss is reduced. Image enhancement is used before installing the algorithm to reduce overfitting. The network-based convolution neural network gave a total validation accuracy of 89.14%, recall of 82%, precision of 83%, and F1-Score of 81%. Keywords Convolutional neural network · Cross validation · Diabetic retinopathy · Deep learning · K-Fold
1 Introduction Diabetes is a long-term condition in which blood sugar levels rise due to a lack of insulin [1]. It impacts 425 million adults globally. Diabetes influences the retina, nerves, heart, and kidneys [2]. Diabetic retinopathy (DR) is the common cause of eyesight loss. DR will impact 191 million individuals worldwide by 2030 [3]. It happens when diabetes harms S. N. Firke (B) · R. B. Jain Electronics and Telecommunication Engineering, Vivekanand Education Society’s Institute of Technology, University of Mumbai, Mumbai, India e-mail: [email protected] R. B. Jain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_14
193
194
S. N. Firke and R. B. Jain
Fig. 1 Steps of DR
the coronary arteries in the eyes. It can cause blind spots, and blurred vision. It can impact a person with diabetes, and it often affects both eyes. DR is a progressive procedure, and thus the medical experts recommend that people with diabetes should be examined at least twice a year for indications of the sickness [4]. There are four steps of DR such as Mild Nonproliferative DR (Mild NPDR), Moderate Nonproliferative DR (Moderate NPDR), Severe Nonproliferative DR (Severe NPDR), and Proliferative DR (PDR). Figure 1 shows steps of DR. In past times, researchers have worked on improving the efficiency of DR screening, which detects lesions such as microaneurysms, hemorrhages, and exudates, and have established a range of models to do so. All the strategies presented thus far are established on DR prediction and extraction of features. Convolutional Neural Networks (CNN) or Deep CNN (DCNN) have been frequently utilized to extract and classify information from fundus images. Recently, Deep Learning has been widely used in DR detection and classification. It can successfully learn the features of input data even when many heterogeneous sources are integrated [5]. This paper’s main contribution is the establishment of an automatic DR detection technique that depends on a short dataset. With the goal of achieving end-to-end real-time classification from input images to patient conditions, we are working on classifying fundus imagery based on the severity of DR. To extract multiple significant features and then classify them into their corresponding categories, image data preprocessing approaches are applied. We assess the model’s Accuracy, Recall, Precision, and F1-Score.
Severity Classification of Diabetic Retinopathy …
195
The rest of the article is summarized as below: Section 2 discusses some relevant research. We explain our proposed methods in Sect. 3. Section 4 outlines the experimental findings and assesses the system’s performance. The conclusion is provided in the end section.
2 Related Work In the developing world, DR is the major source of disability. Depending on their research area and field of research, a variety of work has been performed in the field of DR. Here are some citations relevant to our paper. Chandrakumar and Kathirvel [6] have proposed a method for classifying DR using deep learning techniques. The authors used Kaggle, DRIVE, and STARE datasets in this work. Various image preprocessing and augmentation methods are employed here. Finally, they achieved accuracy ranging from 94 to 96%. Chen et al. [7] developed a neural network-based technique for identifying DR and categorized fundus images into 5 stages. They used APTOS blindness detection datasets that were freely accessible to the public. Finally, the authors obtained an accuracy rate of 80% and a kappa rate of 0.64. Lands et al. [8] have considered the pre-trained ResNet and DenseNet architecture to differentiate DR into 23,302 images of the APTOS database. Images are divided into 256 × 256, and Adam was used as an optimizer. Image enlargement has been used to overcome the problem of overfitting. The authors compared three pre-trained structures (ResNet 50, DenseNet 121, and DenseNet 169) for multi-stage layouts. The best build was DenseNet 169 with an average accuracy of 95%. Xiaoliang et al. [9] have used transfer learning of the Alexnet, VGG16, and Inception v3 to train a CNN. The authors drew inspiration for their work from the Kaggle dataset, which included 166 images. The images for each architecture were cropped to 227 × 227 for AlexNet, 224 × 224 for VGG16, and 299 × 299 for InceptionV3. They used the network’s efficiency as the evaluation parameter, and they cross-validated it using K-Fold validation with K = 5. The best-reported accuracy was 63.2% for the InceptionV3 architecture. Shaban et al. [10] detected the DR stages of the Kaggle dataset using a CNN architecture. The images were resized to a standard size of 224 × 224 before being delivered to the CNN. The authors used SGD as an optimizer with a 0.001 learning rate. They obtained a training accuracy of 91% for fivefold and 92% for tenfold. Many techniques for identifying and classifying DR phases have been used or proposed in the literature, but there are some drawbacks: Most studies ignored preprocessing steps, while the noise and low contrast affect the categorization accuracy. Some studies proposed the DR grades diagnosis. These models were conservative, and they were not applicable in the real being limited and imbalanced datasets. Besides, they fall into overfitting. In the previous work, the authors used a lengthy model. So they required large computation power and time to make the model
196
S. N. Firke and R. B. Jain
more realistic. For that purpose, the customized CNN model with the K-Fold crossvalidation (K-Fold CV) technique is developed here. CNN’s are simpler to train and require many fewer parameters than fully connected networks with the same number of hidden units. K-Fold CV method can balance out the projected features classes if one is dealing with an unbalanced dataset. This prevents the proposed model from overfitting the training dataset.
3 Proposed Work The deep learning CNN strategy is highly adaptive among the most important strategies in detection activities. It works well specially in image data classification, particularly in the study of retinal fundus images. CNN can extract useful information from images, obviating the need for time-consuming human image processing. Its huge popularity is due to architecture, which eliminates the necessity for feature extraction. These features of CNN motivate us to use customized CNN models for our work. Figure 2 shows the general workflow of the proposed system. It consists of four main steps: preprocessing, data augmentation, model training, and testing. The preprocessing step includes image resizing, image normalizing, and label encoder. To reduce overfitting issues, the image augmentation step is used. The model is trained using the CNN with the K-Fold cross-validation technique. Finally, measures such as Precision, Recall, Accuracy, and F1-score are calculated to evaluate the findings and compare them to commonly used methodologies.
3.1 Dataset The collection of data is an important aspect of the experiments for the proposed technique’s analysis. The dataset must be chosen carefully since it must contain a diverse collection of images. For this work, we have obtained a dataset from APTOS
Fig. 2 Proposed system
Severity Classification of Diabetic Retinopathy …
197
Fig. 3 Number of images in each class
blindness detection 2019 [11]. We used a publicly accessible DR detection dataset of fundus images in Kaggle, which has 3699 images taken under a number of imaging circumstances. The images are divided into five folders, each corresponding to a different class. The folders are numbered from 0 to 4, where 0 indicating No DR, 1 indicating mild, 2 indicating moderate, 3 indicating severe, and 4 indicating proliferative DR. There are 1974 images labeled as No DR, 315 images labeled as Mild, 959 images labeled as Moderate, 168 images labeled as Severe, and 283 images labeled as Proliferative. Figure 3 shows the number of images in each class.
3.2 Preprocessing The fundus photographs used in this method are collected from Kaggle in various forms. So it is necessary to apply preprocessing stages. Here, we apply various image preprocessing stages such as Image Resizing, Normalizing Image, and Label Encoder.
3.2.1
Image Resizing
Images are resized to 128 × 128 pixels to be made ready as input to the system.
198
3.2.2
S. N. Firke and R. B. Jain
Image Normalizing
The technique of adjusting a collection of pixel values to create an image more visible or standard to the senses is known as image normalization. It is used to eliminate noise from images. By dividing by 255, the pixel values are rescaled into Null and 1.
3.2.3
Label Encoder
Substituting a numeric value ranging from zero and the n classes minus 1 for the category variable value has five distinct classes, this method is employed (0, 1, 2, 3, and 4).
3.3 Dataset Splitting Here, the dataset is categorized into two phases: a training phase and a testing phase. 80% of data is adopted for training and 20% of data is used for testing. The training phase comprises a known output, and the model learns from it in order to obtain new data in future. To evaluate our model’s prediction on this subset, we have the testing dataset. The full dataset consists of 3699 fundus images [11], which are divided into 2959 training, and 740 testing images. In the training set, a total of 2959 images of which 1590 images are labeled as NO DR, 259 images are labeled as Mild, 751 images are labeled as Moderate, 132 images are labeled as Severe and 227 images are labeled as Proliferative. In the testing set, a total of 740 images of which 384 images are labeled as NO DR, 56 images are labeled as Mild, 208 images are labeled as Moderate, 36 images are labeled as Severe and 56 images are labeled as Proliferative. Figure 4 shows the number of images in each class for training and testing sets.
3.4 Data Augmentation CNN (deep learning) models are used to split the DR images. The efficiency of the algorithms can be improved by data augmentation. Data augmentation is applied to training data, so as to improve its quality, size, and adeptness. The parameters used in data augmentation are shown in Table 1.
Severity Classification of Diabetic Retinopathy …
199
Fig. 4 Number of images in each class for training and testing sets
Table 1 Data augmentation used
Technique
Setting
Zoom
0.2
Rotation
50
Width shift range
0.2
Horizontal Flip
True
Fill mode
Nearest
3.5 K-Fold CV To segregate training datasets, the K-Fold CV resampling method is used. This aids in evaluating the CNN model’s capability. Here, the dataset of training is detached into 5 independent folds in K-Fold CV, with 5-1 folds utilized to train the algorithm and the remaining one is saved for validation. This procedure is done until all of the folds have been used only once as a validation set. 5-Fold CV is depicted in Fig. 5.
3.6 CNN Classification The CNN algorithm is such a well-known and commonly utilized DL [12]. It is among the very well-known methods for recognizing and categorizing images. CNN requires less preprocessing when contrasted to other classification methods. CNN architectural design is shown in Fig. 6.
200
S. N. Firke and R. B. Jain
Fig. 5 Process of 5-fold cross validation
Fig. 6 Architecture of CNN
3.6.1
Convolutional Layer
The first and most significant layer in CNN is the convolution. The feature selection layer is so named because it’s where the image’s parameters are taken. The image is supplied to the filters in convolution. It is the pointwise multiplication of functions to build the third function. A convoluted feature is a matrix created by applying a kernel to an image and computing the convolution operation. In the example illustrated in Fig. 7, there is a 5 × 5 input image with pixel values of 0 or 1. A 3 × 3 filter matrix is also given. The filter matrix slides over the image and computes the dot product to produce the convolved feature matrix.
3.6.2
Pooling Layer
To reduce the breadth of a source volume’s dimensions, a pooling layer is utilized. The thickness of the source is not reduced by this layer. This layer is often used to reduce the image’s dimensionality, minimizing the processing power needed to process it.
Severity Classification of Diabetic Retinopathy …
201
Fig. 7 Example of convolution layer
Fig. 8 Example of pooling layer
The highest value present in a specified kernel is maintained in max pooling, while all other values are eliminated. In average pooling, the mean of almost all of the values in a kernel is stored. In the example, the max-pooling operation is used on the 4 × 4 input. It’s really easy to divide it into separate regions. The output is 2 × 2 each of the outputs will simply be the maximum value from the shaded zone. In average pooling, the average of the numbers in green is 10. This is average pooling. Figure 8 illustrates the Pooling operation.
3.6.3
Fully Connected (FC) Layer
The final output of the pooling and convolution layers are flat and coupled to one or more fully linked layers. It is also termed the Dense layer. When the layers are fully integrated, all neurons in the preceding layer are coupled with all neurons in
202
S. N. Firke and R. B. Jain
Fig. 9 Example of fully connected layer
the subsequent layer. Layers that are fully connected, often known as linear layers [13]. Figure 9 depicts the FC Layer.
4 Results and Discussion The result obtained from the proposed method is compared with the existing method. The experiments are performed on Jupyter Notebook. Keras is used with the TensorFlow machine learning back-end library. In the deep learning process, optimizers are crucial. Here, Adam Optimizer is utilized. It is efficient and takes minimal memory. In this work, the learning rate and the number of epochs or iterations are set to be 0.0001 and 20, respectively. The batch size is considered as 32 images per epoch. With all these parameters, the Neural Network has been designed to learn. The algorithm measures performance on different parameters like Accuracy, Confusion Matrix, Precision, Recall, F1 Score.
4.1 Hardware and Software Requirement For contrast adjustment, color balance adjustment, rotation, or cropping, the image editing tool is employed. The NumPy package is used for image resizing during the
Severity Classification of Diabetic Retinopathy …
203
preprocessing stage. Theano library is used to implement CNN architecture. The hardware Intel (R) Core (TM) i5-8265U CPU @ 1.60 GHz 8 GB RAM is used.
4.2 Performance Evaluation Various evaluation measures are used to assess the quality of a model. We evaluated our proposed model using Accuracy, Precision, Recall, and F1 Score. Where, TP= True Positive, TN= True Negative, FP= False Positive, FN= False Negative.
4.2.1
Accuracy
To calculate accuracy, positive and negative classes are utilized. Accuracy =
4.2.2
TP + TN TP + FP + FN + TN
(1)
Precision
It is expressed as a percentage of accurately predicted positive findings to total expected positive findings [14]. Precision =
4.2.3
TP TP + FP
(2)
Recall (Sensitivity)
The percentage of accurately predicted positive findings to all findings in the actual class is computed as a recall [15]. Recall =
4.2.4
TP TP + FN
F1-Score
It is the melodic mean of accuracy and recall [15].
(3)
204
S. N. Firke and R. B. Jain
F1 − Score = 2 ∗
Precision ∗ Recall Precision + Recall
4.3 CNN Architecture The CNN architecture implementation is described in Table 2. This architecture consists of 2 convolutional layers, 3 fully connected layers along with Max pool layers. This architecture is trained with the APTOS 2019 Blindness Detection dataset images. Table 3 depicts the results obtained from the Confusion Matrix. In this matrix, all diagonal values show the number of correct predictions out of available images, for each class. Table 4 shows the performance evaluation reports. Table 2 CNN architecture Layer number
Type
Maps and neurons
Kernel size
0
Input
3 × 128 × 128
–
1
Conv1
32 × 126 × 126
3×3
2
ReLu
32 × 126 × 126
–
3
Pool1
32 × 63 × 63
2×2
4
Conv2
64 × 61 × 61
3×3
5
ReLu
64 × 61 × 61
–
6
Pool2
64 × 30 × 30
2 × 23
7
FC3
256
–
8
FC4
128
–
9
Softmax
5
–
Table 3 Confusion matrix results predicted results Predicted results Actual results
Class 0
Class 1
Class 2
Class 3
Class 4
366
4
13
0
1
6
40
5
1
4
27
26
151
1
3
3
3
12
18
0
7
10
10
0
29
Severity Classification of Diabetic Retinopathy …
205
Table 4 Performance evaluation report stages Stages
Precision
Recall
F1-Score
Class 0
0.89
0.95
0.92
Class 1
0.48
0.71
0.58
Class 2
0.79
0.73
0.76
Class 3
0.90
0.50
0.64
Class 4
0.78
0.52
0.62
Table 5 Comparison with recent works Authors
Method
Number of class
Training accuracy
Average CV accuracy (%)
Xiaoliang et al. [9]
AlexNet
5
–
37.43
VGG 16
50.03
InceptionNet V3
63.23
Shaban et al. [10]
Modified VGG 19
3
91%
88
Proposed method
CNN
5
93.39%
89.14
4.4 Comparison with Previous Works The findings of the developed CNN for the APTOS dataset were compared to two recent studies that employed the same K-Fold CV Method. Table 5 shows a contrast of the performance measures. The constructed CNN model generated the highest performance metrics for detecting the five steps of DR, as depicted in this table. In order to make a comparison, we looked at four separate indicators to make a comparison. These are the following: (i) Method, (ii) Number of class, (iii) Training Accuracy, and (iv) Average CV Accuracy. In comparison with existing methodologies, the proposed customized CNN strategy gives 93.39% training accuracy and 89.14% average CV accuracy. This improvement is obtained due to the use of two convolutional layers for feature extraction and three fully connected layers for classification.
5 Conclusion Diabetes is an incurable disease that has spread throughout the world. The only method to fix this problem is to detect the disease early and take preventative action to reduce the disease impact. In this study, a customized CNN model is established
206
S. N. Firke and R. B. Jain
for DR classification with the K-Fold CV technique. A customized CNN model has 5 layers, including 2 convolutional layers for feature extraction and 3 fully connected layers for classification. The customized model shows more promising results than pre-trained models. Experimental results show an average CV accuracy of 89.14% by the K-Fold CV technique.
References 1. Taylor R, Batey D (2012) Handbook of retinal screening in diabetes: diagnosis and management, 2nd edn. Wiley-Blackwell 2. International diabetes federation—what is diabetes. Available online at https://www.idf.org/ aboutdiabetes/what-is-diabetes.html. Accessed on Aug 2020 3. Chen XW, Lin X (2014) Big data deep learning: challenges and perspectives. IEEE Access pp 514–525 4. Sungheetha A, Sharma R (2021) Design an early detection and classification for diabetic retinopathy by deep feature extraction based convolution neural network. J Trends Comput Sci Smart Technol (TCSST) 3(02):81–94 5. Zheng Y et al (2012) The worldwide epidemic of diabetic retinopathy. Indian J Ophthalmol pp 428–431 6. Chandrakumar T, Kathirvel R (2016) Classifying diabetic retinopathy using deep learning architecture. Int J Eng Res Technol (IJERT), pp 19–24 7. Chen H et al (2018) Detection of DR using deep neural network. In: IEEE twenty third international conference on digital signal processing (ICDSP) 8. Lands A et al (2020) Implementation of deep learning based algorithms for DR classification from fundus images. In: Fourth international conference on trends in electronics and informatics (ICTEI), pp 1028–1032 9. Xiaoliang et al (2018) Diabetic retinopathy stage classification using convolutional neural networks. In: International conference on information reuse and integration for data science (ICIRIDS), pp 465–471 10. Shaban et al (2020) A CNN for the screening and staging of diabetic retinopathy. Open Access Research Article, pp 1–13 11. APTOS 2019 Blindness detection dataset. Available online at https://www.kaggle.com/c/apt os2019-blindness-detection. Accessed on Feb 2021 12. Alzubaidi L et al (2021) Review of deep learning: CNN architectures, concepts, applications, challenges, future directions. Open Access J Big Data, pp 2–74 13. FC layer. Available online at https://docs.nvidia.com/deeplearning/performance/dl-perfor mance-fully-connected/index.html. Accessed on Mar 2021 14. Precision and recall. Available online at https://blog.exsilio.com/all/accuracy-precision-recallf1-score-interpretation-of-performance-measures/. Accessed on Mar 2021 15. F1-Score. Available online at https://deepai.org/machine-learning-glossary-and-terms/f-score. Accessed on Mar 2021
Study on Class Imbalance Problem with Modified KNN for Classification R. Sasirekha, B. Kanisha, and S. Kaliraj
Abstract Identification of data imbalance is a very challenging one in the modern era. When we go for a data warehouse, there would be a vast data available in it but managing data and sustaining the balanced state of data is very difficult to handle in any type of sector. Occurrence of data imbalance comes when specimens are classified based on their behaviour. In this paper, the imbalance state of data is analysed and the machine learning techniques are studied carefully to choose the best technique to handle data imbalance problems. Wide analysis of the k-nearest neighbour (KNN) algorithm can be carried out to keep the classification of specimens grouped equally. Keywords Imbalance · KNN algorithm · Classification specimens
1 Introduction Data warehouse is a vast environment where we can get an enormous amount of data. Data mining is a good environment for the data scientist to get needed data from the source of the warehouse. Data mining environments are widely used to perform the evaluation to produce good results with good output. Hence, the data imbalance may cause severe effects in any kind of sector. In this paper, the analysis of the imbalance problem and the suitable machine learning technique to solve the imbalance issue could be considered for the brief study. Flow chart shows the test and training data classification of data in supervised and unsupervised learning. Before going to the imbalance problem, the study about classification and clustering could be a broad knowledge gaining statistics to bring out the idea of several handling mechanisms for imbalanced data. Before knowing about the handling mechanism of imbalance analysis, knowing about the imbalance problem and how it occurs are the most important views to solve the imbalance issue. In the related work section, clarification about imbalance R. Sasirekha (B) · B. Kanisha · S. Kaliraj SRM Institute of Science and Technology, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_15
207
208
R. Sasirekha et al.
Fig. 1 Flow chart for imbalanced classification in supervised and unsupervised environment
problems have been carried out. First, the study about the occurrence of imbalance problems. Secondly, what can be indicated here as an imbalance problem. Finally, how the imbalance problem can be handled. As in many contexts, KNN is the best algorithm to handle imbalance problems. In the upcoming sections various modified KNN algorithms are discussed and how those algorithms are suited for the classifications are also discussed in detail (Fig. 1).
2 Related Works There are be a lot of classifiers used in the classification supervised and unsupervised data. Random forest, Naive Bayes and KNN classifiers are the widely used classifiers in the field of class imbalance problem solving. In this paper, related works are based on the study of imbalance problem and what are the handling ways are proposed and how it could be worked on it.
2.1 Curriculum Learning In the talk about imbalance data problems, the main curriculum is the learning about supervised and unsupervised environments. After collecting huge information about these environments, it would be easy to handle imbalance problems. Supervised
Study on Class Imbalance Problem with Modified …
209
learning: Labelled data comes under supervised learning. Each and every classification algorithm comes under supervised learning. The classification algorithms are used for the classification of data based on their nature. The decision vector plays a vital role in the classification based approaches. The supervised learning environment could be strong to give predicted results. In case of weakly supervised learning [1], there could be a more intensive care amongst semi-supervised learning, domain adaptation, multi-instance learning and label noise learning demonstrate our effectiveness. Unsupervised Learning: The decision tree algorithm and the clustering algorithm come under the unsupervised environment. The study about the unsupervised environment is quite difficult when compared to a supervised environment. Recently, unsupervised learning has been done to accurately getting information from heterogeneous specimens. However, most of the used methods are consuming huge times [2]. The formation of clusters is based on the characteristics of the data points and the centroid could be identified to form a group of data points which are in the particular range. K-means algorithms which are broadly used in the clustering of imbalanced data. The data which is said to be imbalanced is very hard to handle. Such imbalanced data are identified easily by means of the k-means algorithm. The centroid of the k-means algorithms is identified using Euclidean distance metrics. The distance for the data points in the particular range is identified and the cluster could be formed easily by the use of the Euclidean algorithm. Many of them are not aware about what KNN and K-means algorithms are. Some people think that Kmeans and KNN are the same. In this paper, the KNN classifier is studied carefully and the major difference between KNN, and K-means are described in the upcoming sections (Fig. 2).
Fig. 2 Sample experiment in weka tool for supervised learning-soybean dataset using naive bayes classifier
210
R. Sasirekha et al.
2.2 Clarification About Imbalance Problem Before entering into any kind of research work there is a need for rich analysis about the topic. First, clarity about where the imbalance problem occurs. When the class of specimens are classified into several classes based on their behaviour, imbalance problem occurs [3] if the number of specimens in one group is not equal to the number of specimens in another group. Secondly, clarity about what would be considered as an imbalance problem. The data in the group is not sufficient to produce an exact result for the evaluation is considered as imbalanced data and the situation which can’t be handled at the moment is considered as an imbalance problem. Finally, clarity about the ways to handle imbalance problems. An imbalance problem can be handled by various approaches widely used approaches are resampling, ensemble, clustering, evaluation metrics. These approaches are discussed deep in the upcoming sections. Remaining sections cover the different handling methods of imbalance classification.
3 Resampling Widely used approaches to handle data imbalance problems are resampling techniques. There are several resampling techniques used to solve imbalance issues. Resampling techniques are used to resample the trained data.
3.1 Data Imbalance Problem Can Be Handled by Oversampling Oversampling is a widely used resampling method to handle data imbalance problems. Whilst classifying the specimens of a class into separate groups, there is a possibility that specimens are unequally classified into two groups. A group which may have a greater number of specimens is said to be a majority group and the group which may have lower number of specimens are said to be a minority group. Oversampling technique takes minority groups to solve issues by adding copies over it. Widely used oversampling techniques: In this modern era, few oversampling techniques are widely used to resample the trained data. 1.
Smote synthetic minority oversampling technique is widely used to overcome the imbalance problem by adding copies of existing specimens in the minority group [4]. It is very simple and more effective when compared to other oversampling techniques.
Study on Class Imbalance Problem with Modified …
2. 3.
4.
211
MC-SMOTE: Minority clustering—SMOTE [5] handles the imbalance data by taking the specimens from minority classes. Adaptive synthetic sampling: Generating synthetic data without copying the same data of the minority class. This is why adaptive synthetic sampling is familiar in data balancing strategies. Augmentation [6]: The oversampling applied to the cross-validation, only after the classification of training and test set. The training data could be augmented.
3.2 Data Imbalance Problem Can Be Handled by Undersampling: Undersampling [7] could be a big threatening method. Because elimination happened in the majority class that could cause severe effects in case of deletion of important instances from the majority class [8]. RUS: Random undersampling method [4] is the widely used undersampling method to handle imbalance class. But it can be handled with care. This approach concentrates on the majority class to solve imbalance problem by removing instances without affecting the containing class.
4 Ensemble Ensemble [9] uses Specimens of the majority group can be divided into several equal sections to solve the imbalance problem. For example: if group A has 40,000 specimens but group B has only 4000 specimens. In this scenario, the working strategy of the ensemble is to divide the 40,000 specimens in 10 sets which can have 4000 specimens for each. After dividing a large group of specimens into an equal group of specimens, it can resample the trained data to handle imbalance problems.
4.1 Bagging Bootstrap Aggregation (Bagging) [10], wonderful ensemble approach. An ensemble approach which covers the several machine learning algorithms for predictions used for the exact predictions it could be widely used for the reduction of the variance in algorithms which has highest variance value [11]. An algorithm which contains high variance is said to be a decision trees. In training data, the decision trees are more sensitive. Occurrence of changes in training data would result in various predictions. Assume we have a sample dataset of 20,000 occurrences (x) and are employing the CART technique, which has a greater variance. The CART algorithm would be bagged as follows. Make a large number (e.g., 2000) of random sub-samples of our
212
R. Sasirekha et al.
dataset with replacement. The CART model has been trained on each sample. Given a new dataset, calculate the average forecast from each model. For example, if we had 7 bagged decision trees that predicted the following classes for a given input sample: read, write, read, write, write, read, and read, we would anticipate read as the most frequent class. Bagging would not cause the training data to be overfit.
5 Clustering Versus Classification Clustering is also done under the undersampling of data. Centroids of the cluster remove the points whichever apart from it. K-means [12] is an algorithm which can be applied eagerly by the data scientist to handle imbalance problems. KNN comes under supervised learning to handle imbalance problems. KNN is an algorithm to predict neighbour behaviour it checks the similar characteristic samples and group it in a category of which character the neighbours are belonging to.
5.1 K-NN for Imbalance Problem When using a KNN in any application it will produce balanced data to get an exact result. Nowadays, we are in need of choosing the best classifier to produce good accuracy. KNN is a user-friendly algorithm to predict nearest points in a network. KNN is widely used in the field of medical and in the field of engineering technology. In the KNN algorithm, K indicates the number of nearest neighbour. KNN is an excellent algorithm for the classification and in the prediction of nearest neighbours [13] (Table 1). Sample application of KNN in weka tool: The centroid to cluster an item having similar behaviour. Sample application of KNN in chosen trained dataset. Weather dataset has been chosen to show the application of KNN (Fig. 3). While pre-processing a trained dataset in the weka tool, the nominal dataset has been chosen to apply the KNN algorithm. Classification has been implemented in nominal datasets only. Hence, the weather nominal dataset has been chosen. Study Table 1 Description of sample binary datasets with imbalance ratio Datasets
DataCount
Number of attribute
Class distribution
Class
Imbalance ratio
Heart
270
14
120,150
2
diabetes
768
9
268,500
2
1.250 1.866
climate-model-simulation-crashes
540
20
46,494
2
10.739
Study on Class Imbalance Problem with Modified …
213
Fig. 3 Preprocess the trained dataset—weather
on class imbalance problem with modified KNN for classification 7. In this example, k value is assigned as 4. Hence, there would be only 4 centroids which can form a group based on the similarities of a data. Distance of points calculated by means of Euclidean distance. Now, let us discuss about the various modified KNN. Fuzzy k-NN: Mainly, a theoretical analysis of knn could be taken as a major thing to come to know about fuzzy K-NN. Main rule of fuzzy K-NN [14] is grouping of fuzzy sets which can be made more flexible to analyse the instance of a class in the context of ensuring membership of data in a class. Fuzzy set theory has been taken rather concentrating on the centroids to the distance of points. Bayes’ decision rule fails due to not satisfying the rule of fuzzy K-NN. D-KNN: D-KNN is used to improve the chance of K-nearest neighbour search over distribution storage and distributed calculations are done effectively. It reduces the storage space of main memory by means of providing distributed storage nodes [15]. This algorithm is used in cyber physical social networks.
6 Discussion In this section, the important elements to handle imbalance problems have been discussed. Imbalance problems may occur when trying to classify specimens. A suitable classifier to be chosen to prevent these kinds of imbalance problems imbalance of data may cause severe effects in any kind of sector widely used classifiers are Naive bayes, Random forest and KNN getting details about the neighbour node would be a big task. But, it could be solved by means of K-NN classifier. The difference between the K-NN and K-Means algorithm is very much important in the classification of specimens. KNN has been discussed in an earlier section.
214
R. Sasirekha et al.
Let us discuss K-means based methods to handle imbalance problems. FSCL: it is a k-means based algorithm to improve the framework of K-Means [16]. This approach are widely used to bring the data points nearer to the input lines to have a details about the seed point frequency pressure could be applied to the data point to make it move towards the incoming data point. Hence, the centroid has been calculated to have a count of data points must be equally transmitted in cluster. Uniform effect is identified as a drawback of this method. Because of frequency weight, uniform effect has occurred in a class in which it may cause severe imbalance problems. RPCL: To improve FSCL [8] RPCL has been used. Using revel penalization method the imbalance Problem could be handled carefully. By the effect of revel penalization the data points travel towards the opposite direction. The disadvantage of this method is there is no knowledge about the number of clusters travelled in the revel penalized mechanism. MPCA: Multi prototype clustering algorithm [8] is used with a multiple prototype implementation mechanism to solve the uniform effect. The disadvantage of this algorithm is the assignment of prototype count is user defined. Hence, the number of imbalanced data clusters is more than the pre-defined number of prototypes; it may fall to failure of the framework. Euclidean metrics [17] which can be used in clustering purposes. Based on the distances, the data points can be chosen to form a cluster. An outlier analysis [18] is a simple way to find out the data which are not in the same group. Outliers is like an odd man out. In general, the points which are outside the region are said to be an outlier. It could be handled carefully. NMOTe [19] navo minority oversampling technique is a novel oversampling approach it has been used to find an efficient and consistent solution.
7 Background Study on Classification Metrics As per the study, the evaluation can be done in the key classification metrics: Accuracy, Recall, Precision, and F1-Score. The Recall and Precision may differ in certain cases (Fig. 4). Decision Thresholds and Receiver Operating Characteristic (ROC) curve. The first is ROC curve and the determination of ROC curve is suites or not by noticing at AUC (Area Under the Curve) and the other parameters are also known as confusion metrics. A confusion matrix is a data table which has been used in the description of a classification model performance on a test data for the true values are already found. Except Auc all the measures have been calculated by considering the left most four parameters. The correctly predicted observations are true positive and true negatives. Reduction of false positives and false negatives is to be considered. True Positives (TP)—They are correctly predicted positive values that is the value of original class is Yes and the value of predicted class is also Yes.
Study on Class Imbalance Problem with Modified …
215
Fig. 4 Sample experiment for classification metrics using trained dataset-weather
True Negatives (TN)—They are correctly predicted negative values that is the value of original class is No and value of predicted class is also NO. False Positives (FP)—The value of original class is No and predicted class is Yes (Table 2). False Negatives (FN)—The value of original class is Yes but predicted class in No. Understanding these four parameters is important to calculate Accuracy, Precision, Recall and F1 score. 1. 2. 3. 4.
Accuracy—it is said to be a ratio of correctly predicted specimens to the taken total taken specimens. Accuracy = (TP) + TN/TP + FP + FN + TN Precision ratio of correctly predicted positive specimens to the total predicted positive specimens. Precision = TP/TP + FP Recall is the ratio of correctly predicted positive specimens to the all taken specimens in original class is Yes. Recall = TP/TP + FN F1 score is the weighted average of Precision and Recall. It takes both false positives and false negatives specimens. F1 Score = 2*(Recall * Precision)/(Recall + Precision)
Table 2 Positive negative observations of a actual class with the predictions
Actual Class: yes Predicted Class: yes
TP
Actual Class: yes Predicted Class: no
FN
Actual Class: no Predicted Class: yes
FP
Actual Class: no Predicted Class: no
TN
216
R. Sasirekha et al.
8 Conclusion In this paper, the proper study on supervised and unsupervised learning has been carried out and the imbalanced problem handling mechanisms are also discussed. In this survey, several modified KNN classifiers are studied with an Outlier analysis and classification metrics are discussed elaborately with the training samples to represent each specimens by choosing suitable K-nearest neighbour based classifier and also intensive study on classification metrics has been carried out.
9 Future Work Future work can be carried out by selecting suitable oversampling mechanisms to solve imbalance problems. According to the survey taken, the undersampling method is not well suited in the field of handling imbalance problems. In the future work, the suitable modified K-Nearest Neighbour algorithm can be used with the Euclidean metrics accompanied with the outlier analysis to handle the imbalance problem with intended care. There are very few studies examining different distance metrics with their effect on the performance of KNN. Hence, future work can lead by testing a large number of distance metrics on a data set and finding up of the distance metrics that are least affected by added noise.
References 1. Li Y-F, Guo L-Z, Zhou Z-H (2021) Towards safe weakly supervised learning. IEEE Trans Pattern Anal Mach Intell 43(1):334–346 2. Xiang L, Zhao G, Li Q, Hao W, Li F (2018) TUMK-ELM: a fast unsupervised heterogeneous data learning approach. IEEE Access 6:35305–35315 3. Lu Y, Cheung Y, Tang YY (2020) Bayes imbalance impact index: a measure of class imbalanced data set for classification problem. IEEE Trans Neural Networks Learn Syst 31(9):2020 4. Lin W-C (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409:17–26 5. Yi H (2020) Imbalanced classification based on minority clustering smote with wind turbine fault detection application. IEEE Trans Ind Inform 1551–3203 6. Zhang L, Zhang C, Quan S, Xiao H, Kuang G, Liu L (2020) A class imbalance loss for imbalanced object recognition. IEEE J Sel Top Appl Earth Observ Rem Sens 13:2778–2792 7. Ng WW, Xu S, Zhang J, Tian X, Rong T, Kwong S (2020) Hashing-based undersampling ensemble for imbalanced pattern classification problems. IEEE Trans Cybern 8. Lu Y, Cheung YM (2021) Self-adaptive multiprototype-based competitive learning approach: a k-means-type algorithm for imbalanced data clustering. IEEE Trans Cybern 51(3):1598–1612 9. Yang Y, Jiang J (2016) Hybrid sampling-based clustering ensemble with global and local constitutions. IEEE Trans Neural Networks Learn Syst 27(5):952–965 10. Chakraborty S, Phukan J, Roy M, Chaudhuri BB (2020) Handling the class imbalance in land-cover classification using bagging-based semisupervised neural approach. IEEE Geosci Remote Sens Lett 17(9):1493–1497 11. Yang W, Nam W (2020) Brainwave classification using covariance-based data augmentation. IEEE Access 8:211714–211722
Study on Class Imbalance Problem with Modified …
217
12. Zhang T (2019) Interval type-2 fuzzy local enhancement based rough k-means clustering imbalanced clusters. IEEE Trans Fuzzy Syst 28(9) 13. Zhuang L, Gao S, Tang J, Wang J, Lin Z, Ma Y, Yu N (2015) Constructing a nonnegative lowrank and sparse graph with data-adaptive features. IEEE Trans Image Process 24(11):3717– 3728 14. Banerjee I, Mullick SS, Das S (2019) On convergence of the class membership estimator in fuzzy nearest neighbor classifier. IEEE Trans Fuzzy Syst 27(6):1226–1236 15. Zhang W, Chen X, Liu Y, Xi Q (2020) A distributed storage and computation k-nearest neighbor algorithm based cloud-edge computing for cyber-physical-social systems. IEEE Access 8:50118–50130 16. Chen D, Jacobs R, Morgan D, Booske J (2021) Impact of nonuniform thermionic emission on the transition behavior between temperature-and space-charge-limited emission. IEEE Trans Electron Devices 68(7):3576–3581 17. Ma H, Gou J, Wang X, Ke J, Zeng S (2017) Sparse coefficient-based k-nearest neighbor classification. IEEE Access 5:16618–16634 18. Bezdek JC, Keller JM (2020) Streaming data analysis: clustering or classification. IEEE Trans Syst Man Cybern: Syst 19. Chakrabarty N, Biswas S (2020) Navo minority over-sampling technique (NMOTe): a consistent performance booster on imbalanced datasets. J Electron Inform 02(02):96–136
Analysis of (IoT)-Based Healthcare Framework System Using Machine Learning B. Lalithadevi and S. Krishnaveni
Abstract In recent years, Internet of things (IoT) are being applied in several fields like smart healthcare, smart cities and smart agriculture. IoT-based applications are growing day by day. In healthcare industry, wearable sensor devices are widely used to track patient’s health status and their mobility. In this paper, IoT-based framework for healthcare us ing a suitable machine learning algorithm have been analysed intensely. Transmission of data using various standards are reviewed. Secure storage and retrieval of medical data using various ways are discussed. Machine learning techniques and storage mechanisms are analysed to ensure the quality of service to the patient care. Keywords Internet of Things (IoT) · Wearable sensors · Machine learning · Cloud computing · Fog computing · Security
1 Introduction An Internet of Things technology is a process, which is utilized to provide the interface and brilliant gadgets to make human life easier. It gathers the sensor data, process it and send through the internet. Many organizations predict the expansion of IoT over the years and Cisco Systems is one of them. A specific report of Cisco Systems states that IoT will be an operational domain of over 50 billion devices by 2023 [1]. Moreover, IoT has its list of providing numerous advantages in daily life. It can efficiently monitor the working and functionality of devices and gadgets with limited resources in a very smooth way. Smart healthcare industry can do the following things such as, Telehealth, Assisted living, Hospital asset management, Drug management, Hygiene compliance, Disease management and rehabilitation. B. Lalithadevi (B) Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, India S. Krishnaveni Department of Software Engineering, SRM Institute of Science and Technology, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_16
219
220
B. Lalithadevi and S. Krishnaveni
These variety of management services makes the patient life easier and assist the medical experts as well as hospital to manage and delivering the service within a short time intervals [2]. IoT is a dynamic community infrastructure that interconnects different sensor networks via the Internet, accumulates sensor data that transmits and receives records/facts for further processing. According to structure and strategies followed in IoT environment, security challenges are there in terms of maintain the person’s clinical records and hospital records confidentially.
1.1 Contributions The main contribution of this paper are as follows, • Presented the deep analysis of smart healthcare framework based on Internet of Things • Discussed about the architectural design of cloud computing as well as fog computing in healthcare field for data storage and retrieval. • In addition, the impact of block chain for data security particularly security challenges to manage the medical data records. • Presented the influence of Internet of things in medical care industry that makes it as smart environment • Presented the existing framework for healthcare industry and focused on cloud computing in general and also sensors for the generalization of health parameters. • Discussed the machine learning algorithms that used for diseases prediction. This paper is organized as follows: in Sect. 2, discuss the background of an IoT in healthcare field that address the related technologies and monitoring the remote patients through wearable sensors. Section 3 listed out different IoT architecture, various healthcare services which are available nowadays and various applications toward patient, hospital and doctors. Section 4 describes the machine learning techniques and various algorithm. Section 5 discuss the data communication standards used for deploying the healthcare environment based on Internet of things. Section 6 describes enabling cloud computing in IoT for data transmission. Section 7 gives the ideas about fog computing in smart environment. Section 8 describes various security challenges in IoT due to global data access. Section 9 discusses about the research challenges and limitations in IoT-based healthcare framework design. Finally, in Sect. 10, we present the concluding remarks and reviews.
2 Background IoT has included a lot of research ideas over the years and the possible areas of success of it being implemented are studied strictly. Elder people can track down their health levels through the use of IoT is reduces the stress and dependency of people on the healthcare system [2].
Analysis of (IoT)-Based Healthcare Framework …
221
2.1 m-Health Things (m-IoT) Mobile—IoT (m-IoT) has come up in recent days following the fusion of contemporary digital fitness standards with LTE connectivity. It has been summed up with high speed 4G networks to boost up the standard further. The 6LoWPAN is summed up with the 4G protocols.
2.2 Wearable Medical Sensors The health-related dataset includes the following attributes such as, lung disease, severe headache, kidney disorder, liver disorder, LDL, TC, DBP, HDL, TC, obesity, BG, and HR [3]. Any one of these attributes can be the major cause of hypertension disease [4]. Figure 1 represents the overview of healthcare framework model. According to healthcare industry sector, the basic building blocks of an IoT are various wearable sensors that collects the data from patients remotely and relevant data is collected from sensors. These data is transmitted into cloud server and stored on it. In this, artificial intelligence or machine learning based prediction and detection model is incorporated to predict the risk values. Alert or warning message is send to medical experts via the cloud server. In turn, the respective physician can prescribe the appropriate medicine and take necessary action to protect the persons in critical situation.
Fig. 1 Overview of healthcare framework
222
B. Lalithadevi and S. Krishnaveni
Role of Datasets and Acquisition methods Dataset is a collection of information used to train our model through machine learning or deep learning techniques there are four different ways to acquire the data from end nodes such as data collection, data conversion and data sharing. Table 1 demonstrates the role of patient dataset and acquisition method of wearable body sensors. Pulse Sensors The accurate measurement of the pulse depends on a variety of factors but it mostly depends on the body part from where it is measured. The chest and wrist are common areas. It is measured from fingertips and earlobes as well [1]. Respiratory Rate Sensors It is based on the fact that the exhaled air is warmer than the intrinsic temperature. The number of breaths taken is also considered into account. Often advanced signals are used to make it more precise. Body Temperature Sensors The importance of body temperature sensors in wearable devices is noteworthy as well. Fever, high body temperature, and other ailments can be measured. Figure 2 shows the collection of various healthcare sensors based on IoT. Terrible temperature coefficient (NTC) and pressure temperature coefficient (PTC) are temperature sensors [1]. Heart strokes can be detected through this sensor [4]. Hypothermia can be identified based on these body temperature sensor which is embedded in a wearable device. Blood Pressure Sensor Hypertension is a significant danger component for cardiovascular sickness, comprising of a coronary respiratory failure. Heartbeat travel time (PTT) defined as the time taken among beat at the heart and heartbeat at another area, which incorporates the ear cartilage or outspread artery [4]. Glucose Sensor A sensible device for monitoring the blood-glucose degrees in diabetic sufferers have been proposed earlier. This gadget calls for sufferers to manually check blood-sugar levels at ordinary time intervals [5]. ECG sensor ECG sensors are based on electrocardiogram. Their function is to monitor cardiac activity. Smart sensors and microcontrollers are used in these systems. There is a smart device involved in a smartphone that is connected through Bluetooth with the clinical data [6]. Table 1 Patient medical report Patient data Patient name:
Wearable body sensors xyz
Sensed data
Pulse sensors, Acquired data from respiratory rate various wearable body Unique ID No.: 1234 sensors, body sensors Address: ABC temperature sensors, blood pressure sensor, glucose sensor, ECG sensor, pulse oximetry sensors
Disease symptoms Fever, headache, heart beat rate, inconsistent pulse rate and breath rate, abnormal glucose Level and pressure level
Analysis of (IoT)-Based Healthcare Framework …
223
Fig. 2 Representation of various IoT healthcare sensors
Pulse Oximetry Sensors It estimates the oxygen level in the blood. The degrees of oxygen in the blood are resolved. This is not a much essential thing to consider in designing a medical wearable device but can certainly provide an edge in certain cases [1]. Accuracy and precision of sensed data from wearable devices may be corrupted by the malicious intruders and change it as erroneous data. So it leads to misguide the end user in terms of decision support system in treatment. If we want to make a smart healthcare environment based on IoT, then need to provide a highly secured framework and efficient model to maintain the privacy and confidentiality of patient medical data.
3 IoT Architecture, Healthcare Services and Application Iot architecture describes the flow of data transmission from edge devices to cloud server through interconnected network for data analysis, storage and retrieval. In that, all sensed data will be processed further through our prediction model. Figure 3 represents the evolution of IoT architecture. About IoT Healthcare Services and their Applications, different fields assume a significant part in the administration of private
Fig. 3 Evolution of IoT architectures
224
B. Lalithadevi and S. Krishnaveni
wellbeing and wellness, care for pediatric, management of persistent sicknesses, elder patients among others [2].
3.1 Real-Time Significance of Proposed Model The proposed model has the significant ability to transmit the data from wearable medical sensors into cloud server via data transmission and communication protocol standards as shown in Fig. 1. Smart phone acts like as an intermediate agent between sensors and web apps. Request and response protocol (XMPP) is applied to send the amount of data payload and data length code from sensors to android listening port finally web interface layer activate the physician, patients and hospital to respond the action based on request parallel. Figure 4 represents the list of Healthcare Services in medical field. A robotic supporter is used for tracking senior citizen status. This robot is utilized for the ZigBee sensor gadget to extraordinarily distinguish individuals that it is tracking [7].
Fig. 4 List of healthcare services
Analysis of (IoT)-Based Healthcare Framework …
225
3.2 IoT for Patients Doctors cannot attend to all patients at all the time, so patients can check the progress in their health by themselves through the wearable IoT devices. This is made by the healthcare system in a very efficient way for doctors. Basic research challenge involved in this, monitor multiple reports of various patients at a time through their mobile phones and give appropriate medications on time without any delay.
3.3 IoT for Physicians The various fitness rate are useful to monitor the overall condition of the patient. One of the research challenges is associated with the doctor. They should give preventive medications and the patients also have to suffer less if the disease is diagnosed in the early stage.
3.4 IoT for Hospitals IoT contraptions labeled with sensors are utilized for observing the real-time district of clinical bits of gear like wheelchairs, nebulizers, oxygen siphons, and so on. Appointment fees and travel costs can be reduced by IoT devices [8]. Azure Web application is used to store the data to perform analysis and predict the health conditions within the expected time [9]. The most important advantages and challenges of IoT in healthcare consist of: • • • •
Cost and error reduction Improved and proactive Treatment Faster Disease Diagnosis Drugs and Equipment Management.
4 Machine Learning Techniques for Disease Prediction The medical field has experienced innovation in disease diagnosis and medical data analysis since it has collaborated its research work with machine learning [10]. The data of patient details which are generated every day is huge and cannot be surveyed by simple methods. This statistical data is given to the model which has been trained. The model might not achieve higher accuracy but near to it.
226
B. Lalithadevi and S. Krishnaveni
4.1 Naive Bayes To understand the naive Bayes theorem, let us first understand the Bayes theorem. It is based on conditional probability. There are two types of events namely dependent and independent events. P(X/Y ) = P(X ∩ Y )/P(Y ), if P(Y ) = 0
(1)
P(Y ∩ X ) = P(X ∩ Y )
(2)
P(X ∩ Y ) = P(X/Y )P(Y ) = P(Y/ X )P(X )
(3)
P(X/Y ) = (P(Y/ X )P(X ))/(P(Y )), i f
P(Y ) = 0)
(4)
P(X/Y )—conditional probability, the probability of event X occurring given that Y is true. P(Y /X)—likelihood probability, the probability of event Y occurring given that X is true. This idea is applied on classification datasets where there is a categorical column such as YES and NO or TRUE and FALSE. The categorical data can be binary data or multi-classified data [11].
4.2 Artificial Neural Networks Machine learning has found its use in various fields and medical science is one of them. Machine learning or deep learning models need very little human assistance to solve any problem. Figure 5 shows the overview of artificial neural network model. A machine learn ing model uses feature selection techniques to find out the probable outcome.
Fig. 5 Overview of artificial neural network
Analysis of (IoT)-Based Healthcare Framework …
227
Fig. 6 Representation of support vector machine
Certain limitations hinder the performance of a machine learning model. Machine learning algorithm is only bound to the features that are given to it as input, if any foreign feature comes in, it might predict wrongly.
4.3 Support Vector Machine There are three kinds of learning techniques in Artificial Intelligence such as regulated, unaided, and support learning. SVM lies under supervised learning which is used for classification and regression analysis. Figure 6 shows the support vector machine classifier. Classification datasets contain categorical data as target variables and regression datasets contain continuous variable as a target [12]. SVM is a supervised learning technique that is working on labeled data [12]. If there is a dataset consisting of circles and quadrilaterals, it can predict whether the new data is a circle or a quadrilateral. SVM creates a boundary between the two classes. This boundary plane is the decision boundary which helps the model to predict.
4.4 Random Forest Ensemble techniques in machine learning are methods that combine multiple models into one model. There are two types of ensemble techniques, bagging and boosting. Random forest is a bagging technique. In bagging, many base models can be created for feature extraction [13]. A new random set of data will be given as a sample to the models. This method is also known as Row Sampling with replacement. Figure 7 represents the random forest classifier model. For a particular test data, the output of different models are observed. The output of all models may not be the same so we used a voting classifier. All the votes are combined and the output which has the
228
B. Lalithadevi and S. Krishnaveni
Fig. 7 Overview of random forest classifier mode
highest frequency or majority votes are considered [5]. Multiple decision trees are used in a random forest. Decision trees have low bias and high variance. The various decision trees of different models are aggregated and the final output achieved after majority voting has low variance [5].
5 Data Communication Standards To comprehend information correspondence in remote gadgets, we need to comprehend the body zone organization. EEG sensors [14], body temperature sensors, heart rate sensors, etc., are interconnected with the cloud through some handheld devices.
5.1 Short-Range Communication Techniques The short-range communication techniques present in this paper are ZigBee and Bluetooth. Both are come under the home network. Zigbee is a technology that was created to control and sense a network. The various layers in Zig-Bee [15] are the application layer, security layer, networking layer, media access control (MAC) layer, and physical layer. As demonstrated in Table 2, a few communication standards are mentioned for short distance coverage.
Analysis of (IoT)-Based Healthcare Framework …
229
Table 2 Short-range communication standards Features
Bluetooth
Infrared
ZigBee
References
Band
2.4 GHz
430 THz to 300 GHz
2.4 GHz
[15]
Data rate 1 Mbps
9.6–115.2 Kbps
20–250 Kbps
[15]
Range
150 m
Less than 1.5 m
10–100 m
[16]
Security
P-128-AES encryption Very low bit error rate S-128-AES encryption [7]
Topology Star
Point-to-point
Mesh
[7]
5.2 Long-Range Communication Techniques A network that allows long distance data communication or transmission is known as a wide area network. This transmission is suitable for large geographical areas. The range of WAN is beyond 100 km. Tables 3 demonstrates the communication standards for long-range distance coverage. The major challenge of data communication standard is selecting the range of data coverage based on dataset collection feasibility and problem statement for further process. Table 3 Long range communication standards Features/Reference
LoRaWAN
6LoWPAN
SigFox
Band/[15]
125 kHz and 250 kHz (868 MHz band and 780 MHz band) 100 to 600 bit/sec
5 MHz (2.4 GHz band) 100 Hz–1.2 kHz 125 2 MHz KHz and 500 kHz (915 MHz ban) (915 MHz band) 600 kHz (868.3 MHz band)
Data rate/[15]
5–15 km
250 kbit/s (2.4 GHz band) 40 kbit/s (915 MHz band) 20 kbit/s (868.3 MHz band) 10 to 100 ms
980 bit/sec to 21.9 kbit/s (915 MHz band)
Range/[15] Security/[15]
NwkSKey (128 bits)ensures data integrity, AppSKey (128 bits)—provides data confidentiality
Handled at link layer which includes secure and non-secure mode
10 to 50 km Encryption mechanism
Payload/[16]
Between 19 and 250 bytes
Header (6 bytes) and session data unit (127 bytes)
Between 0 and 12 bytes
230
B. Lalithadevi and S. Krishnaveni
Fig. 8 Representation of cloud computing for healthcare
6 Cloud Computing in Healthcare Cloud computing is a technology that emerged when the size of data generated daily became impossible to store and handle. The patient data needs security and privacy. The resources that are available on our computers like storage and computational power are managed by the cloud services according to the data [15]. Figure 8 shows an impact of cloud computing in healthcare field for instant data transmission. Storage Access Control Layer is the spine of the cloud enabled environment, which access medical services by utilizing sensors along with BG and sphygmo manometers in every day’s exercises [16]. Data Annotation Layer resolves hetero geneity trouble normally occurs throughout statistics processing. Data testing layer analyzes the medical records saved inside the cloud platform. Portability and integrity level of data transfer from end devices to cloud server is a challenging task in IoT healthcare platform.
7 Influence of Fog Computing in Healthcare Previously, the ability for medical services in 2017 was actualized by utilizing Fog Computing [16]. A healthcare system was launched in 2016, called health fog [15]. Figure 9 illustrates the recent fog computing studies in healthcare industry using machine learning and deep learning techniques. Next, to improve system reliability, cloud-based security functionality was included into health fog [15]. The benefits of edge computing for home and hospital control systems have been exploited by the current architecture. Health technologies have been moving from cloud computing to fog computing in recent years [17]. Similar work has been performed by authors [18] as a four-layered healthcare model comprising of sensation layer, classification layer, mining layer, and application layer. Sensation layer obtained the data from various sensors which are located in the office room. Classification is done based on five different categories such as, Data about health, Data about Environment, Data about Meal, Data about Physical posture,
Analysis of (IoT)-Based Healthcare Framework …
231
Fig. 9 Representation of recent fog based healthcare studies
Data about behavior in classification layer. The mining layer is used for extract the information from a cloud database. Finally, the application layer provides various services like personal health recommender system, remote medical care monitoring system and personal health maintenance system to the end user [19].
7.1 Fog Computing Architecture In cloud computing science, Fog Computing Architecture shows potential challenges and security. The basic architecture of fog computing is shown. It is separated into three main layers as shown in Fig. 10. Device layer is the nearest layer to the end users or devices are the application layer. It comprises of many hardware such as mobile devices and sensors. These devices are spread globally. They are responsible for detecting and communicating the knowledge about the physical object for analysis and storing to the upper layer. Fog layer is the second layer at the edge of the network is the mist layer, which incorporates a tremendous measure of haze hubs [20]. Cloud layer is responsible for permanent management and the comprehensive computational processing of data [21]. Fog Nodes and cloud platform must consider the following challenges, such as, • Retrieve data from IoT devices based on any protocol. • Check-up IoT-enabled applications for control and analysis of real-time applications, with minimum response time. • Send cyclic data to the cloud for further process. • Data are aggregated which are received from many fog nodes.
232
B. Lalithadevi and S. Krishnaveni
Fig. 10 Overview of Fog computing architecture
• Analysis can be done on the IoT data to get business insight.
8 Security Challenges in IoT The primary goal of IoT security is to safeguard the consumer privacy, data confidentiality, availability, transportation infrastructure by an IoT platform. Block chain innovation improves responsibility among patients and doctor [23]. DDoS attacks perhaps one of the great examples of the issues that include shipping gadgets with default passwords and no longer telling customers to exchange them as soon as they obtain them [16]. As the wide variety of IoT related gadgets proceeds to upward push before very long, a wide assortment of malware and ransomware are utilized to misuse them [24]. An IP camera is suitable for capturing sensitive statistics on the usage of a huge variety of places, such as your private home, paintings office, or maybe the nearby gasoline station [25]. There are various security problems in IOT. Mostly the password of IoT devices is weak or hard coded [26]. Various security challenges in IoT are, • • • •
Insufficient checking out and updating Brute-forcing and the issue of default passwords IoT malware and ransomware Data security, privacy and untrustworthy communication.
Investigators applied the various machine learning and deep learning models in healthcare framework for different disease detection. Table 4 summarize the various methodology used for disease prediction and detection.
Early prediction and survival rate through MRI images for breast cancer Development of deep learning system to pre-dict carcinoma and its risk level Implementation of semi- automated framework based on genotype- phenotype association for Diabetes Comparative analysis of traditional and machine learning models for early prediction of emergency case Analysis of pattern classification models for variety of brain tumors Monitoring early and de- lay movement of infant based on kinematics analysis model Human activity recognition using DBN model and dimensionality reduction through kernel principle component analysis Detection and progressive analysis of Parkinson’s disease using Machine learning Deeplearningbased emotionclassification throughphysiological, environmentaland location based signals Severitydetectionof proteins associated with acute myeloid leukemia patients
[27]/Better decision support given using XG boost classifier for further treatment [28]/High accuracy pre- diction model for survival rate of HCC disease
[31]/SVM based multi-class classification of brain tumor [32]/Monitoring early movement of infants through wearable sensors [33]/DBN based activity recognition through PCA and LDA approaches
[30]/Temporal based pre- diction model for high risk factors of emergency ad- mission
[29]/Efficient feature se- lection model for classification of T2DM
Objective
Reference/Attainment
Clinical data—MRI Wearable sensordata through ankle of infant Wearable body sensors
Electronic health records
Electronic health records
Clinical data—MRI Clinical data
Type of data
Recursive feature elimination, Lineardiscriminant analysis, KNN Ada boost, Support vector machine, Logistic regression Deep belief network, Feature extraction
Random forest, Gradient boost, Cox model
Random forest, Decision trees, KNN, Bayes classifier
Logistic regression, Support vector ma chine, XG boost, Linear discriminant analysis DNN, KNN, Support vector machine
Methodology
Table 4 Summary of Data analysis using Machine learning and Deep learning algorithms in healthcare
(continued)
Confusionma trix,Entropy, t-test for Stan-dard deviation Accuracy, Preci sion, Recall, F1 score Accuracy
Confidence matrix, AUC
AUC, Precision, Recall, Sensitivity, Specificity
AUC, ROC Precision, Recall F1 Score, Accuracy
Performance metrics
Analysis of (IoT)-Based Healthcare Framework … 233
[34]/Progression and severity analysis of Parkinson disease symp toms using ML [35]/Hybrid deep learning model for efficient emotion classification [36]/Autoencoder model for high prediction rate of proteins associated with FLT3-ITD
Reference/Attainment
Table 4 (continued)
Objective
Methodology
Sensordata CNN, Random forest collected CNN, LSTM from hands, Stackedautoencoder(SAE) thighs and arms Sensordata from wearable devices through smart phones Protein data
Type of data AUC, Confidence interval, Standard deviation, ROC Precision, Recall, F1 score, Accuracy, Error rate Sensitivity, Specificity, Accuracy
Performance metrics
234 B. Lalithadevi and S. Krishnaveni
Analysis of (IoT)-Based Healthcare Framework …
235
9 Discussion In this section, discuss about the research limitations and challenges of IoT-based healthcare framework. flexible wearable sensors are required to monitor the patient’s health status. Design a secured framework for data transmission from edge device to control device then cloud server. In that, various intruders may be involved to modify the data and break the confidentiality. Analysis of signals should be done in ECG and EEG monitoring using ML. Energy efficient optimization algorithm is needed to protect the consumption and reduce the amount of usage level. Data privacy is more important especially in healthcare domain. It can be achieved through cryptographic model and standards.
10 Conclusion IoT-based healthcare technologies offer different architectures and platforms for healthcare networks that support the connectivity to IoT backbone and enable the transmission and reception of medical data. Studies about various fields of healthcare in IoT, cloud computing and fog computing, this research review is valuable for researchers. The Internet of Things have developed the healthcare sector, enhancing the performance, reducing costs and concentrating on quality care for patients. It provides a complete healthcare network for IoT which associated with cloud and fog computing, also serve as backbone for cloud computing applications and provides a framework for sharing Medical data between medical devices and remote servers.
References 1. Baker SB, Xiang W, Atkinson I (2017) Internet of things for smart healthcare: technologies, challenges, and opportunities. Institute of Electrical and Electronics Engineers Inc., vol 5, pp 26521–26544, Nov. 29, 2017. IEEE Access. https://doi.org/10.1109/ACCESS.2017.2775180 2. Carnaz GJF, Nogueira V (2019) An overview of IoT and health- care question answering systems in medical and healthcare domain view project NanoSen AQM view project Vitor Nogueira Universidade de E´vora An Overview of IoT and Healthcare. Available: https:// www.researchgate.net/publication/330933788 3. Hussain S, Huh E, Kang BH, Lee S (2015) GUDM: automatic generation of unified datasets for learning and reasoning in healthcare, pp 15772–15798. https://doi.org/10.3390/s150715772 4. Majumder AJA, Elsaadany YA, Young R, Ucci DR (2019) An energy efficient wearable smart IoT system to predict cardiac arrest. Adv Hum-Comput Interact vol 2019. https://doi.org/10. 1155/2019/1507465 5. Ani R, Krishna S, Anju N, Sona AM, Deepa OS (2017) IoT based patient monitoring and diagnostic prediction tool using ensemble classifier. In: 2017 International Conference on Advanced Computing and Communication Informatics, ICACCI 2017, vol 2017-January, pp 1588–1593. https://doi.org/10.1109/ICACCI.2017.8126068
236
B. Lalithadevi and S. Krishnaveni
6. Joyia GJ, Liaqat RM, Farooq A, Rehman S (2017) Internet of Medical Things (IOMT): applications, benefits and future challenges in healthcare domain, May 2018. https://doi.org/10.12720/ jcm.12.4.240-247 7. Konstantinidis EI, Antoniou PE, Bamparopoulos G, Bamidis PD (2015) A lightweight framework for transparent cross platform communication of controller data in ambient assisted living environments. Inf Sci (NY) 300(1):124–139. https://doi.org/10.1016/j.ins.2014.10.070 8. Saba T, Haseeb K, Ahmed I, Rehman A (2020) Journal of Infection and Public Health Secure and energy-efficient framework using Internet of Medical Things for e-healthcare. J Infect Public Health 13(10):1567–1575. https://doi.org/10.1016/j.jiph.2020.06.027 9. Krishnaveni S, Prabakaran S, Sivamohan S (2016) Automated vulnerability detection and prediction by security testing for cloud SAAS. Indian J Sci Technol 9(S1). https://doi.org/10. 17485/ijst/2016/v9is1/112288 10. Yang X, Wang X, Li X, Gu D, Liang C, Li K (2020) Exploring emerging IoT technologies in smart health research: a knowledge graph analysis 9:1–12 11. Nashif S, Raihan MR, Islam MR, Imam MH (2018) Heart disease detection by using machine learning algorithms and a real-time cardiovascular health monitoring system. World J Eng Technol 06(04):854–873. https://doi.org/10.4236/wjet.2018.64057 12. Krishnaveni S, Vigneshwar P, Kishore S, Jothi B, Sivamohan S (2020) Anomaly-based intrusion detection system using support vector machine. In: Artificial intelligence and evolutionary computations in engineering systems, pp 723–731 13. Ram SS, Apduhan B, Shiratori N (2019) A machine learning framework for edge computing to improve prediction accuracy in mobile health monitoring. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), July 2019, vol 11621 LNCS, pp 417–431. https://doi.org/10.1007/978-3-030-24302-930 14. Umar S, Alsulaiman M, Muhammad G (2019) Deep learning for EEG motor imagery classification based on multi-layer CNNs feature fusion. Futur Gener Comput Syst 101:542–554. https://doi.org/10.1016/j.future.2019.06.027 15. Minh Dang L, Piran MJ, Han D, Min K, Moon H (2019) A survey on internet of things and cloud computing for healthcare. Electronics 8(7). https://doi.org/10.3390/electronics8070768 16. Dewangan K, Mishra M (2018) Internet of things for healthcare: a review. Researchgate.Net 8(Iii):526–534. Available: http://ijamtes.org/ 17. Sood SK, Mahajan I (2019) IoT-Fog-based healthcare framework to identify and control hypertension attack. IEEE Internet Things J 6(2):1920–1927. https://doi.org/10.1109/JIOT.2018.287 1630 18. Bhatia M, Sood SK (2019) Exploring temporal analytics in fog-cloud architecture for smart office healthcare. Mob Networks Appl 24(4):1392–1410. https://doi.org/10.1007/s11036-0180991-5 19. Raj JS (2021) Security enhanced blockchain based unmanned aerial vehicle health monitoring system. J ISMAC 3(02):121–131 20. Nandyala CS, Kim HK (2016) From cloud to fog and IoT-based real-time U- healthcare monitoring for smart homes and hospitals. Int J Smart Home 10(2):187–196. https://doi.org/10. 14257/ijsh.2016.10.2.18 21. Dubey H, Yang J, Constant N, Amiri AM, Yang Q, Makodiya K (2016) Fog data: enhancing Telehealth big data through fog computing. In: ACM international conference on proceeding series, vol 07–09-October-2015. May 2016. https://doi.org/10.1145/2818869.2818889 22. He W, Yan G, Da Xu L, Member S (2017) Developing vehicular data cloud services in the IoT environment. https://doi.org/10.1109/TII.2014.2299233 23. Suma V (2021) Wearable IoT based distributed framework for ubiquitous computing. J Ubiquitous Comput Commun Technol (UCCT) 3(01):23–32 24. Hariharakrishnan J, Bhalaji N (2021) Adaptability analysis of 6LoWPAN and RPL for healthcare applications of internet-of-things. J ISMAC 3(02):69–81 25. Pazienza A, Polimeno G, Vitulano F (2019) Towards a digital future: an innovative semantic IoT integrated platform for Industry 4.0. In: Healthcare, and territorial control
Analysis of (IoT)-Based Healthcare Framework …
237
26. Aceto G, Persico V, Pescap´e A (2018) The role of Information and Communication Technologies in healthcare: taxonomies, perspectives, and challenges. J Netw Comput Appl 107:125–154. https://doi.org/10.1016/j.jnca.2018.02.008 27. Tahmassebi A et al (2019) Impact of machine learning with multiparametric magnetic resonance imaging of the breast for early prediction of response to neo adjuvant chemotherapy and survival outcomes in breast cancer patients. Invest Radiol 54(2):110–117. https://doi.org/10.1097/RLI. 0000000000000518 28. Kayal CK, Bagchi S, Dhar D, Maitra T, Chatterjee S (2019) Hepatocellular carcinoma survival prediction using deep neural network. In: Proceedings of international ethical hacking conference 2018, pp 349–358 29. Zheng T et al (2017) A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform 97:120–127. https://doi.org/10.1016/j.ijmedinf. 2016.09.014 30. Rahimian F et al (2018) Predicting the risk of emergency admission with machine learning: development and validation using linked electronic health records. PLoS Med 15(11):e1002695. https://doi.org/10.1371/journal.pmed.1002695 31. Zacharaki EI et al (2009) Classification of brain tumor type and grade using MRI texture and shape in a machine learning scheme. Magn Reson Med 62(6):1609–1618. https://doi.org/10. 1002/mrm.22147 32. Goodfellow D, Zhi R, Funke R, Pulido JC, Mataric M, Smith BA (2018) Predicting infant motor development status using day long movement data from wearable sensors. Available: http://arxiv.org/abs/1807.02617 33. Hassan MM, Huda S, Uddin MZ, Almogren A, Alrubaian M (2018) Human activity recognition from body sensor data using deep learning. J Med Syst 42(6):99. https://doi.org/10.1007/s10 916-018-0948-z 34. Lonini L et al (2018) Wearable sensors for Parkinson’s disease: which data are worth collecting for training symptom detection models. npj Digit Med 1(1). https://doi.org/10.1038/s41746018-0071-z 35. Kanjo E, Younis EMG, Ang CS (2019) Deep learning analysis of mobile physiological, environmental and location sensor data for emotion detection. Inf Fusion 49:46–56. https://doi.org/ 10.1016/j.inffus.2018.09.001 36. Liang CA, Chen L, Wahed A, Nguyen AND (2019) Proteomics analysis of FLT3-ITD mutation in acute myeloid leukemia using deep learning neural network. Ann Clin Lab Sci 49(1):119– 126. https://doi.org/10.1093/ajcp/aqx121.148
Hand Gesture Recognition for Disabled Person with Speech Using CNN E. P. Shadiya Febin and Arun T. Nair
Abstract Because handicapped people account for a large percentage of our community, we should make an effort to interact with them in order to exchange knowledge, perspectives, and ideas. To that aim, we wish to establish a means of contact. Individuals who are deaf or hard of hearing can communicate with one another using sign language. A handicapped person can communicate without using acoustic noises when they use sign language. The objective of this article is to explain the design and development of a hand gesture-based sign language recognition system. To aid handicapped individuals, particularly those who are unable to communicate verbally, sign language is translated into text and subsequently into speech. The solution is based on a web camera as the major component, which is used to record a live stream video using a proprietary MATLAB algorithm. Recognition of hand movements is possible with the technology. Recognizing hand gestures is a straightforward technique of providing a meaningful, highly flexible interaction between robots and their users. There is no physical communication between the user and the devices. A deep learning system that is efficient at picture recognition is used to locate the dynamically recorded hand movements. Convolutional neural networks are used to optimize performance. A static image of a hand gesture is used to train the model. Without relying on a pre-trained model, the CNN is constructed. Keywords Human–computer interaction · Gesture recognition · Web camera · CNN · MATLAB
E. P. Shadiya Febin (B) Electronics and Communication Engineering, KMCT College of Engineering, Kozhikode, Kerala, India A. T. Nair KMCT College of Engineering, Kozhikode, Kerala, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_17
239
240
E. P. Shadiya Febin and A. T. Nair
1 Introduction Technology has been ingrained in human existence in today’s modern civilization. It penetrates many facets of our lives, from job to education, health, communication, and security [1]. Human–computer interaction is one of a plethora of disciplines undergoing rapid evolution (HCI). This method helps people to improve their understanding of technology and to develop innovative ideas and algorithms that benefit society. Humans will be able to converse with technology in the same manner they do with each other. Machines recognizing hand gestures was one of the early solutions tested [2]. In the proposed system, we use hand gestures for communication of disabled people by using CNN. Gesture detection is particularly essential for real-time applications. Numerous researchers are studying the applications of gesture recognition, owing to the ubiquitous availability of digital cameras. Due to the intricacy of gesture recognition, several challenges remain. As a consequence, this obstacle is overcome through the use of convolutional neural networks and deep learning. Deep learning outperforms machine learning when it comes to image identification. This example makes use of the ASL dataset including the hand motions (1–5). By and large, a picture is preprocessed to the extent that it facilitates the extraction of movements from static images (i.e., background subtraction, image binarization). The characteristic is then extracted from all of the images following binarization. Convolutional neural networks are constructed using neurons that have learnable weights and biases. Each neuron gets several inputs and computes their weighted total. Following that an activation function processes it and provides an output.
2 Literature Survey “A Deep Learning Method for Detecting Non-Small Cell Lung Cancer.” The researchers [1] conducted a series of experiments in order to build a statistical model enabling deaf individuals to translate speech to sign language. Additionally, they created a system for automating speech recognition using ASR using an animated presentation and a statistical translation module for a variety of sign sets. They translated the text using state transducer and phrase defined system methods. Various sorts of figures were utilized during the review process: WER, BLEU, and finally NIST. This article will walk you through the process of speech translation using an automatic recognizer in each of the three configurations. The research produced a result for the output of ASR employing a finite type state transducer with a word error rate of between 28.21 and 29.27%. A review on deaf-mute communication interpretation [2]: This article will examine the several deaf-mute communication translator systems in use today. Wearable communication devices and online learning systems are the two major communication techniques used by deaf-mute persons. Wearable communication systems
Hand Gesture Recognition for Disabled Person …
241
are classified into three categories: glove-based systems, keypad-based systems, and handicom touch-screen systems. All three of the aforementioned technologies make use of a number of sensors, an accelerometer, a suitable microcontroller, a text to voice conversion module, a keypad, and a touch-screen. The second method, namely an online learning system, obviates the need for external equipment to understand messages between deaf and hearing-impaired individuals. The online learning system makes use of a number of instructional techniques. The five subdivision techniques are as follows: SLIM module, TESSA, Wi-See Technology, SWI PELE System, and Web-Sign Technology. An efficient framework for recognizing Indian sign language implementation of the wavelet transform [3]: The suggested ISLR system is a technique for pattern recognition that entails two key modules: feature extraction and classification. To recognize sign language, a combination of feature extraction using the Discrete Wavelet Transform (DWT) and closest neighbor classification is utilized. According to the experimental results, the suggested hand gesture recognition system achieves a maximum classification accuracy of 99.23% when the cosine distance classifier is utilized. Hand gesture recognition using principal component nalysis in [4]: The authors proposed a database-driven hand gesture recognition technique that is useful for human robots and related other applications. It is based on a skin color model approach and thresholding approach, as well as an effective template matching strategy. To begin, the hand area is divided into segments using the YCbCr color space skin color model. The subsequent stage makes use of thresholding to differentiate foreground from background. Finally, using Principal Component Analysis, a recognition technique based on template matching is created (PCA). Hand gesture recognition system for dumb people [5]: The authors presented a static hand gesture recognition system using digital image processing. The SIFT technique is used to construct the vector representing the hand gestures. At the edges, SIFT features that are invariant to scaling, rotation, and noise addition have been calculated. An automated system for recognizing Indian sign language in [6]: This article discusses an approach to automatic sign identification that is based on shape-based characteristics. The hand region is separated from the pictures using Otsu’s thresholding approach, which calculates the optimal threshold for minimizing the variance of thresholded black and white pixels within classes. Hu’s invariant moments are utilized to determine the segmented hand region’s characteristics, which are then classified using an Artificial Neural Network. The performance of a system is determined by its accuracy, sensitivity, and specificity. Recognition of hand gestures for sign language recognition: A review in [7]: The authors examined a variety of previous scholarly proposals for hand gesture and sign language recognition. The sole method of communication accessible to deaf and dumb persons is sign language. These physically handicapped persons use sign language to express their feelings and thoughts to others. The design issues and proposed implementation of a deaf and stupid persons communication aid in [8]: The author developed a technique to help deaf and dumb
242
E. P. Shadiya Febin and A. T. Nair
individuals in interacting with hearing people using Indian sign language (ISL), in which suitable hand gestures are converted to text messages. The major objective is to build an algorithm capable of instantly translating dynamic motions to text. Following completion of testing, the system will be incorporated into the Android platform and made accessible as a mobile application for smartphones and tablet computers. Indian and American Sign Language Real-Time Detection and Identification Using Sift In [9]: The author demonstrated a real-time vision-based system for hand gesture detection that may be used in a range of human–computer interaction applications. The system is capable of recognizing 35 distinct hand gestures used in Indian and American Sign Language, or ISL and ASL. An RGB-to-GRAY segmentation method was used to reduce the chance of incorrect detection. The authors demonstrated how to extract features using an improvised Scale Invariant Feature Transform (SIFT). The system is modeled in MATLAB. A graphical user interface (GUI) concept was created to produce an efficient and user-friendly hand gesture recognition system. A Review of the Extraction of Indian and American Sign Language Features in [10]: This article examined the current state of sign language study and development, which is focused on manual communication and body language. The three steps of sign language recognition are generally as follows: pre-processing, feature extraction, and classification. Neural Networks (NN), Support Vector Machines (SVM), Hidden Markov Models (HMM), and Scale Invariant Feature Transforms (SIFT) are just a few of the classification methods available (Table 1). The present technique makes use of the orientation histogram, which has a number of disadvantages, including the following: comparable motions may have distinct orientation histograms, while dissimilar gestures may have similar orientation histograms. Additionally, the suggested technique worked effectively for any items that took up the majority of the image, even if they were not hand motions.
3 Proposed System This article makes use of Deep Learning techniques to identify the hand motion. To train the recommended system, static image datasets are employed. The network is built using convolutional neural networks rather than pre-trained models. The proposed vision-based solution does not require external hardware and is not restricted by dress code constraints. The deep learning algorithm CNN is used to convert hand motions to numbers. A camera is utilized to capture a gesture, which is subsequently used as input for the motion recognition system. Conversion of signal language to numerical data and speech in real time, or more precisely: Recognize male and female signal gestures 2. Creating a model for image-to-text content translation using a system-learning approach three. The genesis of words 4. Composition of sentences 5. Composing the comprehensive text 6. Convert audio to a digital format. Figure 1 depicts the steps necessary to accomplish the project’s objectives.
Hand Gesture Recognition for Disabled Person …
243
Table 1 Review on types of hand gestures used Author
Methods
Features
Challenges
Virajshinde [11]
Electronics based
Use of electronic hardware
Lot of noise in transition
Maria Eugenia [12]
Glove based
Requires use of hand gloves
Complex to use
Sayemmuhammed [13]
Marker based
Use of markers on fingers or wrist
Complex to use multiple markers
Shneiderman [14]
HCI
Describe practical techniques
Current HCI design methods
Lawerence d o [15]
HCI
Usability, higher educations
User should be prepare to use the system
Chu, Ju, Jung [16]
EMG
Differential mechanism Minimize the wt of the prosthetic hand
O. Marques [17]
MATLAB
More than 30 tutorials
Accessible all people
A. Mcandrew [18]
MATLAB
Information about DSP using matlab
Understand every one
Khan, T. M. [19]
CED
Accurate orientation
Noisy real life test data
Nishad PM [20]
Algorithm
Different color conversion
Convert one space to other
Fig. 1 The system block diagram
Gestures are captured by the web camera. This Open CV video stream captures the whole signing period. Frames are extracted from the stream and transformed to grayscale pictures with a resolution of 50 * 50 pixels. Due to the fact that the entire dataset is the same size, this dimension is consistent throughout the project. In the gathered photographs, hand movements are recognized. This is a preprocessing phase that occurs before to submitting the picture to the model for prediction. The paragraphs including gestures are highlighted. This effectively doubles the chance of prediction. The preprocessed images are put into the keras CNN model. The trained model generates the anticipated label. Each gesture label is connected with a probability. The projected label is assumed to be the most likely. The model transforms recognized movements into text. The pyttsx3 package is used to convert recognized words to their corresponding speech. Although the text to speech output is a simple workaround, it is beneficial since it replicates a verbal dialog. The system architecture is depicted in Fig. 2.
244
E. P. Shadiya Febin and A. T. Nair
Fig. 2 System architecture
3.1 Convolutional Neural Network (CNN) Convolutional Neural Networks are used for detection (CNN) CNNs are a special form of neural network that is highly effective for solving computer vision issues. They took inspiration from the way image is perceived in the visual cortex of our brain. They utilize a filter/kernel to scan the entire image’s pixel values and conduct computations by assigning suitable weights to allow feature detection [25, 26]. A CNN is made up of multiple layers, including a convolution layer, a maximum pooling layer, a flatten layer, a dense layer, and a dropout layer. When combined, these layers form a very powerful tool for detecting characteristics in images. The early layers detect low-level features and move to higher-level features gradually. Alexnet is a widely used machine learning method. Which is a sort of deep learning approach that utilizes pictures, video, text, and sound to do classification tasks. CNNs excel in recognizing patterns in pictures, allowing for the detection of hand movements, faces, and other objects. The benefit of CNN is that training the model does not need feature extraction. CNNs are invariant in terms of scale and rotation. In the proposed system alexnet is used for object detection the alexnet has eight layers with learnable parameters the model consists of five layers with a combination of max pooling followed by 3 fully connected layers and they use relu activation in each of these layers except the output layer.
3.2 MATLAB MATLAB is a programming environment for signal processing and analysis that is often used. MATLAB is a computer language for the creation and manipulation of discrete-time signals. Individual expressions may be directly entered into the text
Hand Gesture Recognition for Disabled Person …
245
window of the MATLAB interpreter. You may store text files or scripts (with .m extensions) that include collections of commands and then execute them from the command line. Users can also create MATLAB routines. Optimizations have been made to MATLAB’s matrix [27] algebra procedures. Typically, loops take longer to complete than straight lines. Its functions can be written as C executables to increase efficiency (though you must have the compiler). Additionally, you may utilize class structures to organize your code and create apps with intricate graphical user interfaces. MATLAB comes pre-loaded with various functions for importing and exporting audio files. MATLAB’s audioread and audiowrite functions allow you to read and write data to and from a variety of different types of audio files. The sound (unnormalized) or soundsc (normalized) functions in most versions of MATLAB may send signals to the computer’s audio hardware.
3.3 Recognition of Numbers To identify the bounding containers of various objects, we used Gaussian historical past subtraction, a technique that modified each historical pixel using a mixture of K Gaussian set distributions (k varies from 3 to 5). The colorations associated with the presumed historical past are those that remain over a longer length of time and are thus more static. Around those changing pixels, we build a square bounding field. Following the collection of all gesture and heritage photos, a Convolutional Neural Network model was built to disentangle the gesture symptoms and indications from their historical context. These function maps illustrate how the CNN can grasp the shared unexposed structures associated with several of the training gesture markers and so discriminate between a gesture and the past. The numbers linked with the hand movements are depicted in Fig. 3 the training is done in CNN. After training an input image is given by capturing from a webcam. The given image is tested for recognizing the gesture.
3.4 Results and Discussions The dataset for sign language-based numerical is carefully assembled in two distinct modalities for test and training data. These datasets were trained using the Adam optimizer over a 20-epoch period, providing accuracy, validation accuracy, loss, and validation loss for each epoch, as shown in Table 2. It indicates a progressive rise in the accuracy of instruction. As demonstrated in Fig. 4, accurate categorization requires a minimum of twenty epochs. The accuracy value obtained at the most recent epoch indicates the entire accuracy of the training dataset. The categorical cross entropy of the loss function is used to determine the overall system performance. Between training and testing, the performance of the [28, 29] CNN algorithm is compared using a range of parameters, including execution time, the amount of time necessary
246
E. P. Shadiya Febin and A. T. Nair
Fig. 3 The gesture symbols for numbers that will be in the training data Table 2 Training on single CPU, initializing input data normalization Epoch
Iteration
Time elapsed (hh:mm:ss)
Mini batch accuracy (%)
Mini batch loss
Base learning rate
1
1
00:00:07
28.13
2.2446
0.0010
10
50
00:04:42
100.00
0.0001
0.0010
20
100
00:09:23
100.00
2.1477e–06
0.0010
Fig. 4 Confusion matrix: alexnet
Hand Gesture Recognition for Disabled Person …
247
for the program to accomplish the task. Sensitivity measures the fraction of positively recognized positives that are properly classified. The term “specificity” relates to the frequency with which false positives are detected. The graph displays the class 5 roc curve. To obtain the best results, investigations are done both visually and in real time. The CNN algorithm is advantageous in this task for a variety of reasons. To begin, CNN is capable of collecting image characteristics without requiring human intervention. It is faster at memorizing pictures or videos than ANN. CNN executes more slowly than AN. The proposed system is implemented by using MATLAB 2019 version. The system can also be used in real time by using addition of cameras, used to address the complex background problem and improve the robustness of hand detection (Table 2).
3.5 Future Enhancement Hand gesture recognition was created and developed in accordance with current design and development methodologies and scopes. This system is very flexible, allowing for simple maintenance and modifications in response to changing surroundings and requirements simply by adding more information. Additional modifications to bring assessment tools up to date are possible. This section may be reorganized if required.
4 Conclusion The major goal of the system was to advance hand gesture recognition. A prototype of the system has been constructed and tested, with promising results reported. The device is capable of recognizing and generating audio depending on hand motions. The characteristics used combine picture capture and image processing to improve and identify the image using built-in MATLAB techniques. The project is built using MATLAB. This language selection is based on the user’s needs statement and an evaluation of the existing system, which includes space for future expansions.
References 1. Hegde B, Dayananda P, Hegde M, Chetan C (2019) Deep learning technique for detecting NSCLC. Int J Recent Technol Eng (IJRTE) 8(3):7841–7843 2. Sunitha KA, Anitha Saraswathi P, Aarthi M, Jayapriya K, Sunny L (2016) Deaf mute communication interpreter—a review. Int J Appl Eng Res 11:290–296 3. Anand MS, Kumar NM, Kumaresan A (2016) An efficient framework for Indian sign language recognition using wavelet transform. Circuits Syst 7:1874–1883
248
E. P. Shadiya Febin and A. T. Nair
4. Ahuja MK, Singh A (2015) Hand gesture recognition using PCA. Int J Comput Sci Eng Technol (IJCSET) 5(7):267–27 5. More SP, Sattar A, Hand gesture recognition system for dumb people. Int J Sci Res (IJSR) 6. Kaur C, Gill N, An automated system for Indian sign language recognition. Int J Adv Res Comput Sci Software Eng 7. Pandey P, Jain V (2015) Hand gesture recognition for sign language recognition: a review. Int J Sci Eng Technol Res (IJSETR) 4(3) 8. Nagpal N, Mitra A, Agrawal P (2019) Design issue and proposed implementation of communication Aid for Deaf & Dumb People. Int J Recent Innov Trends Comput Commun 3(5):147–149 9. Gilorkar NK, Ingle MM (2015) Real time detection and recognition of Indian and American sign language using sift. Int J Electron Commun Eng Technol (IJECET) 5(5):11–18 10. Shinde V, Bacchav T, Pawar J, Sanap M (2014) Hand gesture recognition system using camera 03(01) 11. Gebrera ME (2016) Glove-based gesture recognition system. In: IEEE international conference on robotics and biomimetics 2016 12. Siam SM, Sakel JA (2016) Human computer interaction using marker based hand gesture recognition 13. Shneiderman B, Plaisant C, Cohen M, Jacobs S, Elmqvist N, Diakopoulos N (2016) Designing the user interface: strategies for effective human-computer interaction Pearson 14. Lawrence DO, Ashleigh M (2019) Impact of Human-Computer Interaction (HCI) on users in higher educational system: Southampton University as a case study. Int J Manage Technol 6(3):1–12 15. Chu JU, Jung DH, Lee YJ (2008) Design and control of a multifunction myoelectric hand with new adaptive grasping and self-locking mechanisms. In: 2008 IEEE international conference on robotics and automation, pp 743–748, May 2008 16. Marques O (2011) Practical image and video processing using MATLAB. Wiley 17. McAndrew A (2004) An introduction to digital image processing with Matlab notes for SCM2511 image processing, p 264 18. Khan TM, Bailey DG, Khan MA, Kong Y (2017) Efficient hardware implementation for fingerprint image enhancement using anisotropic Gaussian filter. IEEE Trans Image Process 26(5):2116–2126 19. Nishad PM (2013) Various colour spaces and colour space conversion. J Global Res Comput Sci 4(1):44–48 20. Abhishek B, Krishi K, Meghana M, Daaniyaal M, Anupama HS (2019) Hand gesture recognition using machine learning algorithms. Int J Recent Technol Eng (IJRTE) 8(1) ISSN: 2277-3878 21. Ankita W, Parteek K (2020) Deep learning-based sign language recognition system for static signs. Neural Comput Appl 22. Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Comput Vis Pattern Recogn 23. Chuan CH, Regina E, Guardino C (2014)American sign language recognition using leap motion sensor. In: 13th International Conference on Machine Learning and Applications (ICMLA), pp 541–544 24. Devineau G, Moutarde F, Xi W, Yang J (2018) Deep learning for hand gesture recognition on skeletal data. In: 13th IEEE International conference on automatic face & gesture recognition, pp 106–113 25. Nair AT, Muthuvel K (2021) Automated screening of diabetic retinopathy with optimized deep convolutional neural network: enhanced moth flame model. J Mech Med Biol 21(1):2150005 (29 p). https://doi.org/10.1142/S0219519421500056 26. Nair AT, Muthuvel K (2019) Blood vessel segmentation and diabetic retinopathy recognition: an intelligent approach. Comput Methods Biomech Biomed Eng: Imaging Visual. https://doi. org/10.1080/21681163.2019.1647459
Hand Gesture Recognition for Disabled Person …
249
27. Nair AT, Muthuvel K (2020) Research contributions with algorithmic comparison on the diagnosis of diabetic retinopathy. Int J Image Graphics 20(4):2050030(29 p). https://doi.org/10. 1142/S0219467820500308 28. Nair AT, Muthuvel K, Haritha KS (2020) Effectual evaluation on diabetic retinopathy. In: Publication in lecture notes. Springer, Berlin 29. Nair AT, Muthuvel K, Haritha KS (2021) Blood vessel segmentation for diabetic retinopathy. In: Publication in the IOP: Journal of Physics Conference Series (JPCS)
Coronavirus Pandemic: A Review of Different Machine Learning Approaches Bhupinder Singh and Ritu Agarwal
Abstract Millions of individuals have been affected by coronavirus illness. The coronavirus epidemic offers a significant medical danger to the wider range of population. The COVID-19 disease outbreak and subsequent control strategies have created a global syndrome that has impacted all aspects of human life. The initial stage detection of COVID-19 has become a difficult task for all researchers and scientists. There exist various ML and Deep Learning techniques to detect COVID-19 disease. There are various stages of COVID-19, initially, it was spread by people who travelled from countries which were severely affected by Corona Virus. After some time, it entered the community transmission phase. The virus has different impact on different individuals and there is no known cure found for this disease. The virus shows immediate affect on certain individuals whereas on others it takes few days to weeks for the symptoms to show but on some people it does not show any symptoms. The most common symptoms are dry cough, fever, lung infection, etc. This paper provides information about the several tests available for the detection of COVID-19. This paper provides a detailed comparison among the deep learning (DL) and AI (artificial intelligence) based techniques which are used to detect COVID-19 diesease. Keywords COVID-19 disease · CXR images · AI · Stages and symptoms
1 Introduction COVID-19 is a viral infection which began spreading in December 2019. This epidemic has affected every part of the globe in a very short period of time . The World Medical Community declared it a pandemic on March 11, 2020 [1]. The coronavirus is a very contagious virus and can easily transfer from one person to another B. Singh · R. Agarwal (B) Delhi Technological University, New Delhi, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_18
251
252
B. Singh and R. Agarwal
person. This virus, also called as extreme acute respiratory syndrome CORONA VIRUS-2 , affects the respiratory system. The virus shows different reactions to individual bodies [2]. The basic structure of COVID-19 virus is shown in Fig. 1. The spread of COVID-19 disease is divided into different stages such as [4]: STAGE-1, STAGE-2, STAGE-3, and STAGE-4. STAGE-1 occurs when infected people travells from one country to another. Especially, people that have travelled overseas are found to be infected with respiratory ailment. During that time, the illness is not spreading domestically. During STAGE-2, localized transmissions occur and the source, i.e., the sick individual who may have traveled to certain other nations that have already been affected, becomes recognized and can be tracked down. The STAGE-3 is community transmission. At this stage, the virus is spread from one person to another or one cluster to another. At this stage, the infected person is hard to trace. The most serious stage of a contagious diseases propagating inside a country is STAGE-4. During this stage, there seem to be multiple spots of illness in various sections of the population, as well as the disease has taken on pandemic proportions. Because of the flu virus’s quick and extensive spread, the medical organization classified coronavirus infection 2019 a pandemic around March 11, 2020. However, it started as a Chinese outbreak, with the first case verified on February 26th in Hubei province, Wuhan [4]. The epidemiological component of COVID-19 was chosen independently as a novel coronavirus, initially dubbed 2019-nCoV [5]. The disease genome was eventually transcribed, and it was termed SARS-CoV-2 as acute respiratory syndrome by the World Association of Categorization of Infections because it was morphologically closer to the COVID-19 breakout that triggered the SARS outbreak in 2003.
Fig. 1 Structure of COVID-19 [3]
Coronavirus Pandemic: A Review of Different …
253
Different tests are available for COVID-19 disease detection such as PCR (Polymerase Chain Reaction) [5], COVID-19 Antigen [6], and COVID-19 Antibody Test. COVID-19 infection resulted in a slew of devastating diseases, with eye-catching signs and symptoms [7]. Nausea, clogged nose, tiredness, and breathing problems were among the flu-related symptoms mentioned by participants. The composition of the COVID-19 viral infection is tough to grasp for any researcher. COVID-19 has several symptoms. Certain signs are frequent, some that are less common, and some that are unusual. COVID-19 affects human health as well as mental status. The COVID-19 epidemic has caused many workers who have lost their jobs. People go through issues of anxiety and stress. The various mental issues and symptoms of COVID-19 are depicted in Table 1. Early detection is essential for COVID-19 identification, and it can improve intensive physical rates. In the early stages of COVID-19, image processing is a vital approach for exploring and identifying the disease. Although, manually analyzing a large range of healthcare imaging can sometimes be time-consuming and monotonous, and it is susceptible to human error and biased [10]. Deep learning (DL) and AI (artificial intelligence) make it simple to distinguish between infected and non-infected patients. There are various aspects by which AI and ML become helpful for COVID-19 detection. In review practice, AI approaches are used to diagnose disease and anticipate therapy outcomes. AI can give essential information regarding allocation of resources and judgment through prioritizing the requirement for mechanical ventilation and breathing assistance in the ICU (Intensive Care Unit) respondents through questionnaires through supporting documents and clinical factors. AI also can be utilized to forecast recovery or mortality in COVID-19, as well as to offer regular reports, preservation and predictive analytics, and therapy tracking. AI is used to diagnose individuals into low, medium, and serious divisions depend on the outcome of their sensations, predisposition, and clinical reports so that alternative actions can be implemented to treat individuals as quickly and efficiently as possible [11]. In medical applications, deep learning has achieved significant enhancement. Deep learning can find patterns in very massive, complicated datasets. They have been recognized as a viable tool for analyzing COVID-19-infected individuals. COVID-19 identification based on neural network is a deep learning model that uses perceptual 2-dimensional and 3-dimensional data collected from a dimensional lung CT scan to Table 1 Various issues/problems due to COVID-19 S. No.
Mental Issues [9]
Symptoms
1
Common mental illness
Fear, Depression, Burnout, Anxiety
2
Less common mental illness
Sadness, Sleep shortness, Energy shortness, Dizziness
3
Rare issues
Self-harm addiction, Domestic abuse, Loneliness, Suicide, Social isolation
254
B. Singh and R. Agarwal
distinguish among COVID-19 and neighborhood bacterial meningitis. The use of DL in the medical illness area of corona medical imaging technique minimizes inaccurate and pessimistic inaccuracies in the monitoring and classification of COVID-19 illness, providing a one-of-a-kind chance to give patients quick, low cost, and reliable medication management [12]. This paper is divided into different sub sections. The first section is the introduction section which provides the introduction about COVID-19. In Sect. 2, several existing methods are reviewed. In Sect. 3, the various approaches for COVID-19 disease detection is discussed. The research challenges and limitations with COVID19 detection techniques are discussed in Sect. 4. In the last Sect. 5, overall conclusion and future scope are mentioned.
2 Literature Review For the analysis and detection of CORONA virus, Jain [13] designed a novel approach which was based on the DL concept. The testing and training of the DL based model were done with CXR images. The images of infected and non-infected persons were utilized for the training purpose of various DL models. The images of chest x-ray were filtered out and data augmentation was applied to them. The three-DL based approaches ResNeXt, Inception V3, and Xception were examined based on their accuracy of COVID-19 detection. The collected dataset had 6432 images of CXR which were collected from Kaggle site. The collection of 5467 images was used for the training purpose of models and 965 used for testing purposes. The Xception model provided the highest accuracy among other models. The performances of the models were examined on three parameters: precision rate, f1-score, and recall rate. As the CNN approach provided a standard sate of results in the medical field. For efficient results with a deep convolutional model, Kamal et al. [14] provided an evaluation of prototypes based on pre-trained for the COVID-19 classification of CXR frames. The Neural Architecture Search Network (NASNet), resnet, mobilenet, DenseNet, VGG19, and InceptionV3 pre-trained models were examined. The comparison outcomes of pre-trained models show that three class classifications had achieved the highest accuracy. Ibrahim et al. [15] proposed a system that can classify three different classes of COVID-19. The AlexNet pre-trained model was implemented for the classification of patients. The model was used to predict the type of COVID-19 class as well as predict the infected patient or non-infected patient. The CXR medical images were composed from public datasets. The database of images contained bacterial pneumonia, COVID-19, pneumonia viral infected, and healthy or CXR normal images. The classification outcomes of the proposed model were based on two-way classification, three ways, and four-way classification. In a two-way classification of non-infected or normal and viral pneumonia images, the proposed model provided 94.43% of accuracy. In normal and bacterial pneumonia classification, the model provided 91.43% of accuracy. The model got 93.42% of accuracy in the four-way
Coronavirus Pandemic: A Review of Different …
255
classification of images. Annavarapu et al. [16] have proposed a COVID detection technique that is based on DL. In the proposed system, a pre-trained feature extractor was used for efficient results. The pre-trained model used by the authors was the ResNet-50 model that enhanced the learning. The model was based on the COVID CXR dataset, which contains 2905 images of COVID, infected or pneumonia, and medical images of chest. The model performance was examined with AU-PR, AU-ROC, and Jaccard Index. The model achieved standard results: 95% of accuracy, 95% f1-score, and 97% specificity. Ozturk et al. [17] designed a novel technique for the recognition of COVID-19 infection automatically by utilizing raw chest X-ray images. Ozturk et al. [17] designed a prototype technique for automatically detecting COVID-19 infection employing unprocessed chest X-ray scans. This technique provided correct diagnostics for binary classification which was used for comparison of COVID and. no-findings and MC (multi-class) classification which was used for comparison of COVID infection, no-findings, and pneumonia in binary classes, this approach had a classification accuracy of 98 percent, while in multi-class scenarios, it had an accuracy of 87 percent. In this strategy, the DarkNet architecture was used as a classifier. 17 convolutional layers were executed and separate filtering on every stage was used. Based on the deep learning pre-trained model, Alantari et al. [18] implemented a system that identifies the patients of CORONA. The pre-trained YOLO model was used with a computer-aided diagnosis system. The purposed system was used for multiple classifications of respiratory diseases. The system provided the differentiation between eight different types of diseases related to respiratory. The performance of the system was examined on two datasets: one was the CXR 8 data collection, and the second was the dataset of COVID-19. Using two separate datasets of CXR images, the planned system was evaluated using fivefold tests for the MC prediction issue. For the training purpose, 50,490 images were used and achieved 96.31% detection accuracy with 97.40% classification accuracy. The CAD system works like a real-time system and can predict at a rate of 108 frames per second (FPS) with 0.0093 s only. Table 2 shows the various COVID-19 detection methods presently in use. The graphical representation of existing methods results of COVID-19 detection is shown in Fig. 2.
3 Various Approaches for Disease Detection of COVID-19 For the detection of COVID-19, there are two approaches: AI based and DL based. Due to artificial intelligence fast-tracking technologies, AI is helpful in lowering doctors’ stress, because it can analyze radiographic findings using DL (deep learning) and ML (machine learning) systems. AI’s fast-tracking platforms encourage costeffective and time operations through swiftly assessing a huge proportion of images, leading to better patient care.
256
B. Singh and R. Agarwal
Table 2 COVID-19 detection existing methods Author Name, Year
Proposed methods used
Problem/Gaps
Dataset
Jain et al. [13]
ResNeXt, Inception V3, and Xception
Need of large dataset for validation of the model
Dataset of 6432 images of chest X-ray
Accuracy = 97.97%
Kamal et al. [14]
The deep convolutional model with a pre-trained model
Work only on specific datasets
Dataset of 760 images
Accuracy = 98.69%
For better performance, CNN models must collaborate with SVR (support vector regression) and SVM (support vector machine)
Chest X-ray images collected from public datasets
Accuracy = 99.62%
Ibrahim et al. AlexNet [15] pre-trained model with deep learning
Annavarapu et al. [16]
Transfer learning Hard to with ResNet-50 implement pre-trained model
Publically available dataset of 2905 images
Performance metrics
Accuracy = 95%
Ozturk et al., Deep [17] learning-based Darknet model with pre-trained YOLO model
Less images for model validation
Collection of 1125 images
Accuracy = 98.08%
Al-antari et al. [18]
Need to collect more digital X-ray and CT images for validation purpose
ChestX-ray8 dataset and COVID-19 dataset
Accuracy = 97.40%
Pre-trained YOLO model with computer-aided diagnosis system
3.1 AI (Artificial Intelligence) Based Approaches Artificial intelligence technologies are expanding into fields that were traditionally thought to be in the domain of human intelligence, recent advances in computerized information collection, predictive analytics, and computational technologies. Medical practice is being influenced by machine learning. It is still difficult to develop prediction systems that could really reliably predict and identify such viruses. AI methods, also known as classification methods, may take in information, analyzes everything statistically, and determine the future based on the statistical architectural features. Many of these techniques have such a variety of uses, including image
Coronavirus Pandemic: A Review of Different …
257
Fig. 2 Comparative analysis of existing approaches
processing, facial recognition, estimation, recommender systems, and so on. ML seems to have a great deal of potential in diagnostic and treatment as well as the advancement of machine systems. Healthcare professionals encouraged the use of digital learning techniques in diagnosis and treatment in the healthcare profession as a resource to facilitate them [19]. Figure 3 depicts an artificial intelligence-based technique to COVID-19 detection.
Symptomatic COVID-19 patients dataset
AI Diagnosis
Symptomatic Analysis
Recovery
AI based treatment
Positive samples taken & Start Therapy
COVID-19 Retest
Positive or Negative
Cured
Fig. 3 AI based approach for COVID detection [19]
258
B. Singh and R. Agarwal
To control the impacts of the illness, AI techniques are used for a variety of domains. Available treatments, image classification connected to COVID-19, pharmacology investigations, and epidemiological are among the implementations.
3.1.1
Types of AI Approaches
Machine learning: Machine learning methods are employed to analyze medical conditions in order to make a diagnosis COVID-19 individuals. Individuals are encouraged fundamental participants to report their conditions. An artificial intelligence system is utilized to detect COVID-19 utilizing information from medical treatment continuously assess. In a number of studies, a variety of ML algorithms were used to identify the condition of corona virus. In, multiple machine learning algorithms are employed to analyze patient information to evaluate the COVID-19 instances: Logistic regression (LR), Support Vector Machine (SVM), Random Fores (RF), and Decision Tree (DT) [20]. SVM model for the detection of COVID-19: SVM is the simplest way to classify binary classes of data. SVM is the one of the machine learning algorithms based on supervised learning helps in classification as well as regression of CXR images of COVID-19. LR based COVID-19 detection model: Logistric regression is used to identify the catagory of disease in COVID-19 identification system. It provides the probabilistic measures for the classification of CXR images. Artificial telemedicine: Artificial telemedicine services are increasingly valuable throughout an epidemic because they allow people to obtain the care they require from the comfort of their own homes, thereby limiting the retroviruses transmission. Artificial telehealth algorithms have been developed using AI methods in several studies. The authors describe an unique AI-based technique for determining the risk of COVID-19 transmission in broadband linkages. AI types with advantages, disadvantages, layers, problems are depicted in Table 3. Table 3 AI types with advantages, disadvantages, layers, and problems Type
Model type
Advantage
Generic machine learning
XGBoost model [21]
Optimized methodology False detection of for early stage detection positive cases
Disadvantage
Ensemble machine learning
SVM. Decision tree, KNN, Naive Bayes [22]
Multi-class High-computational identification of diseases time
Artificial telemedicine
NLP based [23]
Remotely and less time cosuming
Hard to implement
Machine learning
SVM [24]
Multi-class classification
False positive results
Coronavirus Pandemic: A Review of Different …
259
3.2 Deep Learning-Based Approaches When applied in the interpretation of multimodal images, DL-based models have the potential to provide an effective and precise strategy for the diagnosis and classifying COVID-19 disease, with significant increases in image resolution. Deep learning methods have advanced significantly in the recent two decades, providing enormous prospects for application in a variety of sectors and services. Figures 4 and 5 depict the core architecture-based approach for detecting COVID-19. Deep learning-based methods with keypoints, layers, advantages, and limitations are shown in Table 4. There are several types of deep learning techniques are: • Convolutional Neural Network (CNN): Convolutional Neural Networks have grown in popularity as a result of their improved frame classification performance. The activation functions of the organization, in conjunction with classifiers, aid in the retrieval of temporal and spatial features from frames. In the levels, a weight-sharing system is implemented, which considerably reduces computing time [27]. • Recurrent Neural Network (RNN): Due to internal storage space, the RNN (Recurrent Neural Network) was one of the first algorithms to preserve starting data, making it excellent for computer vision difficulties involving sequential
Fig. 4 Deep learning-based approach [26]
Fig. 5 Different types of approaches based on deep learning
260
B. Singh and R. Agarwal
Table 4 Approaches based on deep learning with keypoints, layers, advantages and limitations Method
Keypoints
Layers
Advantages
Limitations
CNN based [13]
Work only on limited images
Multilayers
High accuracy
Data scarcity
DCNN [14]
Three class differentiation
50 layers
Low computational Implementation is cost hard
ResNet-50 pre-trained model [16]
Provided high accurate results
50 layers
Less false results
High computational cost
Deep learning-based YOLO model [17]
More suitable for binary classification
Multilayer
Highly accurate results
Small dataset
information, such as voice and communication because it has a lot of storage [28]. The collection of samples of COVID-19 and dataset description is depicted in Table 5. Table 5 Datasets of COVID-19 with description [28] S. No.
Datasets
Description
1
ImmPort
ImmPort is supported by the National Institutes of Health, the National Institute of Allergy and Infectious Diseases, and the Defense Advanced Research Projects Agency. NIH-funded programs, other research institutions, as well as scientific organizations have contributed information to ImmPort, assuring that all these findings will become the core of research consideration
2
N3C (National COVID Cohort Collaborative)
The NHI known as National Health Institute has developed a centralized, isolated compartment to collect and manage substantial quantities of patient information records from people across the nations who have been identified as having coronavirus illness. There are 35 collaborative hubs around the United States
3
OpenSAFELY
The National Health Service’s OpenSAFELY platform is a secure and trustworthy platform designed for electronic medical records analysis (NHS). It was designed to provide instant feedback amid the on-going COVID-19 crisis
4
Vivli
On Covid, the Vivli framework includes medical testing. Johns Hopkins is a part of the organization
Coronavirus Pandemic: A Review of Different …
261
4 Research Challenges and Limitations Regulations, limited resources, as well as the inaccessibility of huge training samples, massive impulse noise, and speculations, restricted knowledge of the junction among both medicine and computer science, privacy and confidentiality issues, inconsistent accessibility of textual information, and many are generally posing challenges to artificial intelligence like machine learning and deep learning implementations in COVID-19 investigation. The various research challenges and limitations of COVID19 techniques are as: • Huge training samples are scarce and unavailable. • Several intelligence-based deep learning approaches rely on huge training samples, such as diagnostic imaging as well as various environmental variables. However, because of the quick growth of COVID-19, insufficient samples are enabling AI. • In practice, evaluating training datasets takes a long time or may require the assistance of qualified healthcare workers. • There is a gap at the junction of medicine and computer science. • Data that is structurally inaccurate as well as information that is not structurally appropriate for example text, image, and numerical data [29].
5 Conclusion and Future Scope Coronavirus illness is a global pandemic. In the fight against COVID-19, smart image processing has been crucial. CT scans, X-rays as well as PCR data, are used by professionals to realistically simulate the condition. Rather than COVID-19, the PCR analysis determines the number of respiratory diseases, like bacterial pneumonia. This paper provides a brief introduction to COVID-19. There are multiple stages of COVID-19 that all are discussed. The several tests for COVID-19 such as PCR, Antigen, Antibody test, etc. The multiple sign and symptoms of COVID-19 are also discussed. This paper discusses the various concerns and problems caused by COVID-19. COVID detection can be done by using a variety of techniques, including ML and DL techniques. The comparative analysis of existing methods with the help of graphical representation is depicted. Reference [15] had provided the highest accuracy rate for the detection of COVID-19. The accuracy of the techniques is depend upon the training samples. In future, more advanced techniques will be studied and compared for a better analysis of the techniques.
262
B. Singh and R. Agarwal
References 1. Sungheetha A (2021) COVID-19 risk minimization decision making strategy using data-driven model. J Inf Technol 3(01):57–66 2. Pereira RM, Bertolini D, Teixeira LO, Silla Jr CN, Costa YM (2020) COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios. Comput Methods Programs Biomed 194:105532 3. Haque SM, Ashwaq O, Sarief A, Azad John Mohamed AK (2020) A comprehensive review about SARS-CoV-2. Future Virol 15(9):625–648 4. COVID-19: The 4 Stages Of Disease Transmission Explained (2021). Retrieved 24 June 2021, from https://www.netmeds.com/health-library/post/covid-19-the-4-stages-of-diseasetransmission-explained 5. Cai Q, Du SY, Gao S, Huang GL, Zhang Z, Li S, Wang X, Li PL, Lv P, Hou G, Zhang LN (2020) A model based on CT radiomic features for predicting RT-PCR becoming negative in coronavirus disease 2019 (COVID-19) patients. BMC Med ˙Imaging 20(1):1–10 6. Mohanty A, Kabi A, Kumar S, Hada V (2020) Role of rapid antigen test in the diagnosis of COVID-19 in India. J Adv Med Med Res 77–80 7. Coronavirus disease (COVID-19)—World Health Organization. (2021). Retrieved 9 June 2021, from https://www.who.int/emergencies/diseases/novel-coronavirus-2019?gclid=Cj0 KCQjwzYGGBhCTARIsAHdMTQwyiiQqt3qEn89y0AL5wCEdGwk1bBViX2aoqA__F7M aGeQEiuahTI4aAh4uEALw_wcB 8. Larsen JR, Martin MR, Martin JD, Kuhn P, Hicks JB (2020) Modeling the onset of symptoms of COVID-19. Front Public Health 8:473 9. Shastri S, Singh K, Kumar S, Kour P, Mansotra V (2021) Deep-LSTM ensemble framework to forecast Covid-19: an insight to the global pandemic. Int J Inform Technol, 1–11 10. Huang S, Yang J, Fong S, Zhao Q (2021) Artificial intelligence in the diagnosis of COVID-19: challenges and perspectives. Int J Biol Sci 17(6):1581 11. Arora N, Banerjee AK, Narasu ML (2020) The role of artificial intelligence in tackling COVID19 12. Nayak J, Naik B, Dinesh P, Vakula K, Dash PB, Pelusi D (2021) Significance of deep learning for Covid-19: state-of-the-art review. Res Biomed Eng, 1–24 13. Jain R, Gupta M, Taneja S, Hemanth DJ (2020) Deep learning-based detection and analysis of COVID-19 on chest X-ray images. Appl Intell 51(3):1690–1700 14. Kamal KC, Yin Z, Wu M, Wu Z (2021) Evaluation of deep learning-based approaches for COVID-19 classification based on chest X-ray images. Sign Image Video Process, 1–8 15. Ibrahim AU, Ozsoz M, Serte S, Al-Turjman F, Yakoi PS (2020) Pneumonia classification using deep learning from chest X-ray images during COVID-19. Cogn Comput, 1–13 16. Annavarapu CSR (2021) Deep learning-based improved snapshot ensemble technique for COVID-19 chest X-ray classification. Appl Intell, 1–17 17. Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Acharya UR (2021) Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput Biol Med 121:103792 18. Al-antari MA, Hua CH, Bang J, Lee S (2020) Fast deep learning computer-aided diagnosis of COVID-19 based on digital chest x-ray images. Appl Intell, 1–18 19. Eljamassi DF, Maghari AY (2020) COVID-19 detection from chest X-ray scans using machine learning. In: 2020 International Conference on Promising Electronic Technologies (ICPET), pp 1–4 20. Tayarani-N MH (2020) Applications of artificial intelligence in battling against Covid-19: a literature review. Chaos, Solitons Fractals 110338 21. Feng C, Huang Z, Wang L, Chen X, Zhai Y, Chen H, Wang Y, Su X, Huang S, Zhu W, Sun W (2020) A novel triage tool of artificial intelligence assisted diagnosis aid system for suspected COVID-19 pneumonia in fever clinics. MedRxiv 22. Annavarapu CSR (2021) Deep learning-based improved snapshot ensemble technique for COVID-19 chest X-ray classification. Appl Intell 51(5):3104–3120
Coronavirus Pandemic: A Review of Different …
263
23. Bharti U, Bajaj D, Batra H, Lalit S, Lalit S, Gangwani A (2020) Medbot: conversational artificial intelligence powered Chatbot for delivering tele-health after Covid-19. In: 2020 5th International Conference on Communication and Electronics Systems (ICCES), pp 870–875 24. de Moraes Batista AF, Miraglia JL, Donato THR, Chiavegatto Filho ADP (2020) COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. medRxiv 25. Mukhtar AH, Hamdan A (2021) Artificial intelligence and coronavirus COVID-19: applications, impact and future implications. The importance of new technologies and entrepreneurship in business development: in the context of economic diversity in developing countries, vol 194, p 830 26. Burugupalli M (2020) Image classification using transfer learning and convolution neural networks 27. Ganatra N, Patel A (2018) A Comprehensive study of deep learning architectures, applications and tools. Int J Comput Sci Eng 6:701–705 28. Chen JIZ (2021) Design of accurate classification of COVID-19 disease in X-ray images using deep learning approach. J ISMAC 3(02):132–148 29. Welch Medical Library Guides: Finding Datasets for Secondary Analysis: COVID-19 Datasets (2021). Retrieved 30 July 2021, from https://browse.welch.jhmi.edu/datasets/Covid19 30. Aishwarya T, Kumar VR (2021) Machine learning and deep learning approaches to analyze and detect COVID-19: a review. SN Comput Sci 2(3):1–9
High Spectrum and Efficiency Improved Structured Compressive Sensing-Based Channel Estimation Scheme for Massive MIMO Systems V. Baranidharan, C. Raju, S. Naveen Kumar, S. N. Keerthivasan, and S. Isaac Samson Abstract Due to its high spectrum and energy proficiency, massive MIMO will become the most promising technique for 5G communications in future. For accurate channel estimation, potential performance gain is essential. The pilot overhead in conventional channel approximation schemes is due to the enormous number of antennas used at the base station (BS), and also this will be too expensive; for frequency division duplex (FDD) massive MIMO, it is very much unaffordable. We introduced a structured compressive sensing (SCS)-based temporal joint channel estimation scheme which reduces pilot overhead where it requires, delay-domain MIMO channels are leveraged whereby the spatiotemporal common sparsity. The accurate channel estimation is required to fully exploit the mass array gain, which states the information at the transmitter side. However, FDD downlink channel estimation always requires more training and computation than TDD mode, even though the uplink and downlink channel is always not straightforwardly reciprocal, due to the massive number of antennas in base station. At the base station, we first introduce the non-orthogonal pilots which come under the structure of compressive sensing theory to reduce the pilot overhead where they are required. Then, a structured compressive sensing (SCS) algorithm is introduced to approximate the channels associated with all the other OFDM symbols in multiple forms, then the inadequate number of pilots is estimated, and the spatiotemporal common sparsity of massive MIMO channels is also exploited to recover the channel estimation with precision. Furthermore, we recommend a space–time adaptive pilot scheme to decrease the pilot overhead, by making use of the spatiotemporal channel correlation. Additionally, in the multi-cell scenario, we discussed the proposed channel estimation scheme. The spatial correlation in the wireless channels is exploited for outdoor communication scenarios, where mostly in wireless channels. Meanwhile, compared with the long signal transmission distance, the scale of the transmit antenna is negligible. By utilizing the greater number of spatial freedoms in massive MIMO can rise the system capacity and energy proficiency of magnitude. Simulation results will show that the proposed system outperforms than all the other existing systems. V. Baranidharan (B) · C. Raju · S. Naveen Kumar · S. N. Keerthivasan · S. Isaac Samson Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_19
265
266
V. Baranidharan et al.
Keywords Frequency division duplexing · Massive MIMO · Structured compressive sensing · Pilot overhead
1 Introduction Multiple input multiple output (MIMO) is the construction of a smaller number of different types of antennas at the base station (BS), when it comes to massive MIMO systems which consist of more number of antennas constructed at the BS must be of large numbers (more than 100 nos). By the orders of magnitude, the system capacity, energy efficiency, and high spectrum can be increased from massive MIMO. For future 5G communications, massive MIMO has been identified as a significant future for energy efficiency and high spectrum implementation. In massive MIMO, the information, and properties in the channel between the transmitter and receiver which is known as channel state information (CSI) are important for detecting the signal, allocating the resource, and formation of beam. Due to massive transmit antennas, channel estimation from 100s of transmit antennas with each user is different which in case causes increase in pilot overhead. The number of transmit antennas arrays at the base station is more, and it leads to pilot overhead so the estimation of channel in frequency division duplex (FDD) massive MIMO systems is difficult. Till today the massive MIMO research in time division duplex (TDD) is accurate so the FDD system is avoided. FDD-based massive MIMO systems are always used for the reduction of pilot overhead temporal correlation and delay domain sparsity has been proposed, where the number of transmitting antennas at the transmitter is always high due to where the interference cancelation of training sequence of distinct antennas is complicated. Compressive sensing technique has been identified by searching the spatial correlation. In spatial–temporal common sparsity-based delay domain vectors which exploits the structured compressive sensing theory used to reduce the pilot overhead accumulated in the FDD massive MIMO systems at the BS by the framework of compressive sensing theory using non-orthogonal pilots which is completely different from orthogonal pilots under the Nyquist theorem framework. Both orthogonal and non-orthogonal pilot schemes are always related to decrease the pilot overhead for evaluation of channel. Adaptive structured subspace pursuit (ASSP) for spatial–temporal-based common sparse delay domain MIMO channels to maximize the accurate estimation of the channel with less number of pilots by maximizing the temporal correlation with the reduction of overhead and accuracy of channel estimation. The required pilot signals are considering the array of antenna at the BS and mobility of the user from single to multiple cells.
High Spectrum and Efficiency Improved Structured …
267
2 Related Work Massive MIMO is completely based on boosting the spectral efficiency and multiplexing gain, which employs more than 100s of antennas in the base station [1]. At the transmitter side, the channel state information is required for accuracy and the consumption of large amount of downlink channel estimation where especially occurs at the FDD systems. To overcome this problem, we introduce distributed compressive sensing (DCS). Slow variation in the channel statistics is fully exploited in multi-channel frequency domain, sparsity will come under multiple sub-channels. Hybrid training is proposed to support the channel matrix previous frames that can be widely used to represent the downlink three components. These three components proposed through uplink in real-time channel state information fast-tracking is obtained. This technique is widely used for optimization of the convergence function in channel estimation and adoption of low complexity. In massive MIMO system, accurate channel estimation is important to ensure good performance [2]. Comparing to time division duplex (TDD), high data rate and high coverage of wide area are high in FDD. TDD does not require heavier training and computation compared to FDD, and it is not straightforwardly reciprocal because of a greater number of antennas. In the real-time estimation, channel variation renders become more difficult in FDD massive MIMO and downlink channel estimation is done here. In FDD systems, estimation of uplink and downlink feedback to reduce pilot overhead and cascade pre-coding have been used [3]. So, low-dimensional channel estimation will be predicted accurately and feedback is also estimated by using cascaded pre-coding techniques. The parametric model in massive MIMO is used for downlink channel estimation. Through the decided forward link, the path delay will be estimated first and then the base station will be quantized. Both downlink and uplink have the identical path delay where parametric models lead to data fitting errors. The high spectrum and energy efficiency in the massive MIMO is the most promising and developing technology for the wireless communications [4]. In FDD, downlink channel estimation becomes unaffordable due to a greater number of base station antennas. Perfect channel recovery in the minimum pilot symbols with Gaussian mixture distribution is followed by each channel vector of the general channel model. Weighted sum of Shannon mutual informal design pilot symbols between the user and corresponds channel of grass mannianmoni FDD. NMSE level is not that much good for multi-user scenarios. Least square (LS) method and distributed compressive sensing (DCS) method are combined in the DCS techniques for better estimation. Among the different subcarriers, channel vectors in the angular domain are estimated in the form of two parts and the overall problem is that computation complexity is high and channel estimation is not accurate and reduces pilot overhead. In order to obtain the channel state information accurately at the transition side, we have to exploit and improve the multiplexing and array gain of the multiple input
268
V. Baranidharan et al.
and multiple output systems (MIMO) [5]. Due to overwhelming pilot and feedback overhead, FDD will not support conventional channel estimation. Compressive channel estimation is introduced to reduce the pilot overhead in the FDD massive MIMO systems. Beam space is maximized in beam block massive MIMO and pilot overhead in the downlink training can be reduced through beam block compressive channel estimation scheme. For acquiring reliable CSIT, we wish to propose the optimal block for orthogonal transmission which comes under the pursuit algorithm at the limit of the pilot overhead, effective channel matrix algorithm is always used for representation by amplitude and phase of signal which received and developed at the feedback load. In FDD-based massive MIMO system, the major problem in these uplink and downlink for the channel estimation is discussed in work [6]. This will reduce the pilots in uplink/downlink, codebook, and dictionary-based channel model to present in this work for channel estimation, and robust channel representation is used by observing the reciprocity of the AOA/AOD is calculated for the uplink/downlink data transitions The downlink training overhead can be reduced by utilizing the information from simple uplink training which is a bottleneck of FDD massive MIMO system. For massive MIMO, the parametric channel estimation has been done to propose the channel estimation [7]. Then spatial correlation of wireless channel is estimated. The wireless channel is sparse, where the spatial correlated values of wireless channel are exploited and the scale of the antenna array will be negligible compared to long signal transmission distance. The similar path delay of the transmitting antennas usually shares the channel impulse response (CIR). Here we propose a parametric channel estimation method which exploits the spatial common sparsity of massive MIMO which leads to reduce the pilot overhead significantly. The accuracy of the channel will be increased gradually by increasing the number of antennas to acquire the same accuracy by reducing the number of pilots and the limitation is it does not support low dimension CSI.
2.1 Spatiotemporal Common Sparsity and Delay Domain In 5G wireless communication, the broadband channel shows that they exhibit the delay domain in the sparsity with the extensive of experimental studies. Because of the large time arrival the channel delay spreader as much as the earliest path. The transmitter domain is the antenna which is place at the base station the channel impulse response (CIR) is expressed as T h m,r = h m,r [1], h m,r [2], . . . , h m,r [L] , 1 < m < M, where the term r represents the index of OFDM in delay domain. Then value L is always indicates the equivalent channel length Dm,r = supp{hm,r } = {l:|hm,r [l]|>pth
High Spectrum and Efficiency Improved Structured …
269
1 ≤ l ≤ L is the sub-set of hm,r . Then Pth is indicated as noise in wireless channel. The sparsity level can be expressed as Pm,r = |Dm,r|c and Pm,r L is typical massive MIMO with two reasons. One transmitting antenna is associated with pilot estimation, then Np is the number of pilot overhead which can be at least 64 s; moreover, the delay is 3–5 µs and then the bandwidth of the typical system is 10 µhz if we refer LTE. Where they are advanced system parameter, which c p R > c p W F F F can be estimated as D with s = P and cP, c P, and c P are constants. Structured restricted isometric property (SRIP) constant can be shown as cP,c P, and c P and
274
V. Baranidharan et al.
δP, δ2P, and δ3P. The investigation of convergence case which is s = P, then we can consider D = D > s + D - D > s where D > s which denotes matrix where the largest sub-matrix {Dl}L l = 1 according to F-norms and the sub-matrix to 0 of sets where the expression can be Y = ψ D >s +ψ(D − D >s ) + W = ψ D >s +W then W = ψ(D − D > s) + W for the case of s = P, where P is the sparse signal D and the s is sparse signal Ds which is estimated. The acquired and partial correct in the set of estimation s-sparse matrix is appropriate SRIP theorem. Then s ∩ T = where s supports the s-sparse matrix and s is the true support of D and is denoted as null set. Hence,
s ∩ T = which will reduce the number of the iterations in convergence of sparsity level s + 1. The (s + 1) which is the first iteration of the sparsity level s and the prior information. The estimate support of the sparsity level which is pointed out of the proof theorem.
2.5 Computation Complexity of ASSP Algorithm Computation complexity is the operation used in several algorithms; in each iteration, the ASSP algorithm is proposed as complexity where M G is denoted as space–time pilot scheme which is adaptive where transmit antenna in each of the groups. The ratio of the correlation complexity operation is the major cause which is followed by support merger or π 3 (.) operations or the norm operation. Then the update of the Moore–Penrose matrix is inversely proportional to this operation with the parameters 2.3 × 10−2 , 1.7 × 10−6 , 5.7 × 10−5 , and 2.3 × 10−2 , respectively. The Moore–Penrose matrix operation is the main computation complexity of the ASSP algorithm with the inversion complexity as
P 2N p (MG s)2 + (MG s)3
3 Simulation Results and Discussion In this section, we give the detailed description of simulation study performed to assess the functioning of the proposed channel estimation scheme for FDD massive MIMO structures. The parameters of the simulation method were established as:
High Spectrum and Efficiency Improved Structured … Table 1 Initial simulation parameters
Simulation parameters
275 Values
OFDM Symbol (R)
1
Fluctuation of the ASSP algorithm (SNR)
10 dB
(ITUVA) channel model (p)
6
System carrier (fc)
2 GHz
Bandwidth of a system (fs)
10 MHz
Guard Interval (N g )
64
The pilot overhead ratio (ηp)
5–6
DFT size N = 4096, system carrier fc = 2 GHz, length of the guard interval Ng = 64, and the system bandwidth fs = 10 MHz, which might prevent the maximum delay spread of 6.4 µs. We assume the 4 × 16 planar antenna array (M = 64), and MG = 32 is considered to guarantee the spatial common sparsity of channels in each antenna group. Hence, for SNR = 10 to 30 dB the pth value will be estimated as 0.1, 0.08, 0.06, 0.05, and 0.04, respectively (Table 1). From the simulations, it is clear that the ASSP algorithm outperforms the oracle ASSP algorithm for ηp > 19.04%, and its performance is even better than the performance bound obtained by the oracle LS algorithm with Np_avg > 2P at SNR = 10 dB. This is because the ASSP algorithm adaptively acquires the effective channel sparsity level, which was denoted by Peff , instead of using P to obtain better channel estimation performance. Considering ηp = 17.09% at SNR = 10 dB as an example, we can find that Peff = 5 with high probability for the proposed ASSP algorithm. Therefore, the average pilot overhead obtained for each transmit antenna Np_avg = Np/MG = 10.9 is still larger than 2Peff = 10. From the analysis, we can conclude that, when Np is insufficient to estimate channels with P, the proposed ASSP algorithm can be utilized to estimate sparse channels with Peff > P, where the path gains accounting for the majority of the channel energy will be determined, meanwhile those with the small energy are discarded as noise. Also, the MSE performance fluctuation of the ASSP algorithm at SNR = 10 dB is because Peff increases from 5 to 6 when ηp increases, which leads some strong noise to be obtained at the channel paths and thus leads to the degradation of the performance of MSE (Table 2). The channel sparsity level of the proposed ASSP algorithm against SNR and pilot overhead ratio is depicted in the simulations, where the vertical axis and the horizontal axis represent the used pilot overhead ratio and the adaptively estimated channel sparsity level, respectively, and the chroma represents the probability of the estimated channel sparsity level. We consider R = 1 and fp = 1 without exploiting the temporal channel correlation in the simulations. Comparisons between the MSE performance of the introduced pilot placement scheme and conventional random pilot placement scheme are made where the introduced ASSP algorithm and the oracle LS algorithm are exploited (Fig. 1). We consider R = 1, fp = 1, and ηp = 19.53% in the simulations. It is clear that both the schemes yield a very similar performance. The proposed uniformly spaced
276
V. Baranidharan et al.
Table 2 Compared with OMP and proposed CS-based JCE algorithms Parameters
SNR
OMP
Proposed CS-based JCE algorithms
Min
5
0.02586
0.02407
Max
20
0.1158
0.068
Mean
12.5
0.06335
0.04332
Median
12.5
0.05897
0.04152
Mode
5
0.02586
0.02407
Standard deviation
4.761
0.02828
0.01412
Range
15
0.08991
0.04393
Fig. 1 Sparsity
pilot placement scheme can be more easily implemented in practical systems due to the regular pilot placement. Hence, uniformly spaced pilot placement scheme is used in LTE-Advanced systems to facilitate massive MIMO to be compatible with current cellular networks. The MSE value is compared with the proposed ASSP algorithm with (R = 4) and without (R = 1) for tie varying channel of massive MIMO systems. The SCS algorithm does not function perfectly due to a smaller number of pilots. The downlink bit error rate (BER) performance and average achievable throughput per user, respectively, in the simulations where the BS using zero-forcing (ZF) pre-coding is assumed to determine the estimated downlink channels. The BS with M = 64 antennas simultaneously serves K = 8 users using 16-QAM in the simulations and the ZF pre-coding is based on the estimated channels under the same setup. It can be noted that the proposed channel estimation scheme performs better than its counterparts (Table 3). Comparisons between the average achievable throughput per user of different pilot decontamination schemes are made. We can observe that, a multi-cell massive MIMO system with L = 7, M = 64, K = 8 sharing the same bandwidth with the
High Spectrum and Efficiency Improved Structured … Table 4.3 Compare with SP and joint channel estimation
277
Parameters
SNR
SP
Joint channel estimation
Min
5
0.01326
0.008641
Max
20
0.2019
0.0991
Mean
12.5
0.08267
0.02937
Median
12.5
0.06776
0.01292
Mode
5
0.01326
0.008641
Standard deviation
4.761
0.05938
0.03541
Range
15
0.1886
0.09046
average achievable throughput per user in the central target cell suffering from the pilot contamination is analyzed. Meanwhile, we consider R = 1, fd = 7, the path loss factor is 3.8 dB/km, the cell radius is 1 km, the distance D between the BS and its users can be from 100 m to 1 km, the SNR (the power of the unpre-coded signal from the BS is considered in SNR) for cell-edge user is 10 dB, the mobile speed of users is 3 km/h. The BSs using zero-forcing (ZF) pre-coding is assumed to know the estimated downlink channels achieved by the proposed ASSP algorithm. For the FDM scheme, pilots of L = 7 cells are orthogonal in the frequency domain (Fig. 2). Pilots of L = 7 cells in TDM are transmitted in L = 7 successive different time slots. In TDM scheme, the channel estimation of users in central target cells suffers from the pre-coded downlink data transmission of other cells, where two cases are considered. The “cell-edge” case indicates that when users in the central target cell estimate the channels, the pre-coded downlink data transmission in other cells can guarantee SNR = 10 dB for their cell-edge users. While the “ergodic” case indicates
Fig. 2 SNR versus MSE
278
V. Baranidharan et al.
that when users in the central target cell estimate the channels, the pre-coded downlink data transmission in other cells can guarantee SNR = 10 dB for their users with the ergodic distance D from 100 m to 1 km.
4 Conclusion In this paper, we have introduced the new SCS-based spatial–temporal joint channel evaluation scheme for massive MIMO systems in FDD. To decrease the pilot overhead, the spatial–temporal common sparsity of wireless MIMO channels can be exploited. The users can easily evaluate channels with decreased pilot overhead with the non-orthogonal pilot scheme at the BS and with the ASSP algorithm. According to the mobility of the user, the space–time and adaptive pilot scheme will reduce the pilot overhead. Additionally, to achieve accurate channel estimation under the framework of compressive sensing theory, we discussed the non-orthogonal pilot design, and the proposed ASSP algorithms are also discussed. The simulated results show that the modified SCS-based spatial channel estimation scheme will give the better results than the existing channel estimation schemes.
References 1. Zhang R, Zhao H, Zhang J (2018) Distributed compressed sensing aided sparse channel estimation in FDD massive MIMO system. IEEE Access 6:18383–18397. https://doi.org/10.1109/ ACCESS.2018.2818281 2. Peng W, Li W, Wang W, Wei X, Jiang T (2019) Downlink channel prediction for time-varying FDD massive MIMO systems. IEEE J Sel Top Sign Process 13(5):1090–1102. https://doi.org/ 10.1109/JSTSP.2019.2931671 3. Liu K, Tao C, Liu L, Lu Y, Zhou T, Qiu J (2018)Analysis of downlink channel estimation based on parametric model in massive MIMO systems. In: 2018 12th International Symposium on Antennas, Propagation and EM Theory (ISAPE), Hangzhou, China, 2018, pp 1–4. https://doi. org/10.1109/ISAPE.2018.8634083 4. Gu Y, Zhang YD (2019) Information-theoretic pilot design for downlink channel estimation in FDD massive MIMO systems. IEEE Trans Sign Process 67(9):2334–2346. https://doi.org/10. 1109/TSP.2019.2904018 5. Huang W, Huang Y, Xu W, Yang L (2017) Beam-blocked channel estimation for FDD massive MIMO with compressed feedback. IEEE Access 5:11791–11804. https://doi.org/10.1109/ACC ESS.2017.2715984 6. Chen J, Zhang X, Zhang P (2020) DDL-based sparse channel representation and estimation for downlink FDD massive MIMO systems. In: ICC 2020—2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 2020, pp 1–6. https://doi.org/10.1109/ICC40277. 2020.9148996
High Spectrum and Efficiency Improved Structured …
279
7. Gao Z,Zhang C, Dai C, Han Q (2014) Spectrum-efficiency parametric channel estimation scheme for massive MIMO systems. In: 2014 IEEE international symposium on broadband multimedia systems and broadcasting, Beijing, China, 2014, pp 1–4. https://doi.org/10.1109/BMSB.2014. 6873562 8. Gao Z, Dai L, Dai W, Shim B, Wang Z (2016) Structured compressive sensing-based spatiotemporal joint channel estimation for FDD massive MIMO. IEEE Trans Commun 64(2):601– 617. https://doi.org/10.1109/TCOMM.2015.2508809
A Survey on Image Steganography Techniques Using Least Significant Bit Y. Bhavani , P. Kamakshi, E. Kavya Sri, and Y. Sindhu Sai
Abstract Steganography is the technique in which the information is hidden within the objects so that the viewer cannot track it down and only the reserved recipient will be able to see it. The data can be concealed in different mediums such as text, audio and video files. Hiding the information in image or picture files is called image steganography. This steganography method helps in protecting the data from malicious attacks. The image chosen for steganography is known as cover image and the acquired image as stego image. A digital image can be described using pixel values, and those values will be modified using least significant bit (LSB) technique. To increase the security, various LSB techniques had been proposed. We made a comparison on various image steganography techniques based on the parameters like robustness, imperceptibility, capacity and security. In this paper, based on the comparisons we have suggested few image steganography algorithms. Keywords Steganography · Spatial domain · Least significant bit (LSB) · Cover image · Stego image
1 Introduction Internet technology gives numerous advantages to humans, especially in communication. In the generation of data communication, the security and privacy of data are an area that should be mostly considered. To solve these security issues, different methods like steganography, cryptography, watermarking and digital signatures were used. Steganography and cryptography [1] are used to conceal the data, watermarking is used to save copyright, and digital signatures are used to authenticate the data [2, 3]. Y. Bhavani (B) · P. Kamakshi · E. Kavya Sri · Y. Sindhu Sai Kakatiya Institute of Technology & Science, Warangal, India E. Kavya Sri e-mail: [email protected] Y. Sindhu Sai e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_20
281
282
Y. Bhavani et al.
Steganography (Stegos—to cover, grayfia—writing) [4] is the study of invisible communication. It protects the confidentiality of two communicating parties. In image steganography, secrecy is accomplished by inserting information into cover image and creating a stego image. Spatial domain and frequency domain are the two different domains used for hiding data in the image. In frequency domain [5], the message is hidden by transforming the cover image. Transformations that are commonly used include discrete cosine transform (DCT), discrete wavelet transform (DWT) and singular value decomposition (SVD). In spatial domain [6], the secret image is directly inserted by changing the pixel value of the cover image using the techniques least significant bit (LSB) and most significant bit (MSB). The softwares used in image steganography are • Quick stego: It is a software which hides the message in images using AES algorithm [7]. This software is very simple, fast and executes even complex security processes. It can be used to secure more than one file and encrypts all types of file formats such as audio, video, image and document. • Hide In Picture (HIP): It is a software in which any type of file can be hidden inside the bitmap pictures [7] using Blowfish algorithm. The users can use passwords to hide their files in pictures. So that only the people who knows the password can access the file hidden in the pictures. The user can also make a specific colour transparent, where nothing will be stored. • Chameleon: This software uses LSB algorithm [4, 8] in which LSB of pixel values of image will be replaced with data bits of message which is to be hidden. It uses an encryption algorithm which enhances the use of hiding space in a particular cover image. As technology advances, more research is being conducted to develop a better technique for steganography and cryptography to provide more security for the data. The different applications [2] for steganography are • • • • • •
E-Commerce Media Database systems Digital watermarking Secret data storing Access control system for digital content distribution.
2 Related Work Fridrich [6] this paper proposes a high-precision steganographic technique which will estimate the length of a hidden message inserted within the LSB technique. The text image is split into groups of n consecutive or disjoint pixels during this method. The changed pixel values are employed in this method to work out the content of the hidden message. This method provides advantages like more stability and accuracy for a wide range of natural images.
A Survey on Image Steganography Techniques …
283
Dumitrescu et al. [5] introduced another steganalysis procedure for distinguishing LSB steganography in computerized signals like image and audio. This method depends on the statistical analysis of sample pairs. The length of a secret message inserted using LSB steganography can be assessed with more accuracy using this method. This detection algorithm is very simple and fast compared to other algorithms. Bounds on estimating errors are created to assess the robustness of the proposed steganalytic approach. In addition, the vulnerability to potential attacks is examined, and countermeasures are offered. Ker [8] the histogram characteristic function (HCF) introduced by Harmsen is used in this paper to find steganography in colour images. But this function cannot be used in grayscale images. The HCF is applied in two innovative ways: the output is modified using a selected image and the adjacency histogram is computed instead of the normal histogram. The results of this approach reveal that the new detectors are far more dependable than previously known detectors. The adjacency histogram was not helpful in the secrecy, and it may lead to the detection of secret message by attackers. Yang et al. [9] proposed flexible LSB substitution method in the image steganography. This method focuses more on noise-sensitive area of the stego image such that it may obtain more visual quality. The proposed method distinguishes and utilizes normal text and edge area for insertion. This approach calculates the number of k-bit LSB for inserting the data to be hidden. The k value is high in the non-sensitive area of the image and modest in the sensitive image area to equilibrate the image’s overall quality of visibility. The high-order bits of the image calculate the LSB’s (k) for insertion. This approach also employs the pixel correction method to improve stego image quality. But this process will be done only on the limited data set. Joshi et al. [10] proposed different steganographic methods in spatial domain which mostly use LSB techniques and perform XOR operations on different bits of the pixel values of a particular cover image. Data will be embedded by performing two XOR operations, first XOR is on first and eighth bits and the second XOR is on second and seventh bits. The obtained value will then be compared and it is used as a rule for embedding the data into the image. A grayscale image is used as cover image, and three different message images were used with different sizes. After completing the total process, PSNR value will be obtained with the largest message length. Irawan et al. [11] combined the steganography and cryptography techniques. In this approach, before inserting a message on the LSB, it should be encrypted using the OTP method. Inserting the data into images will be done at the corner or edge area of the image to improve the undetectability and security for the data. This type of insertion at the corner is named as canny method. This method also calculates quality of stego image using a histogram. Swain [12] proposed two different techniques in spatial domain of digital steganography. He categorized two different groups where the bits have equal length and pixel values of both the groups were exchanged. This is mainly used to conceal the data. One of his techniques uses single bit to conceal data while the other uses two bits. During this process of replacement, change in the value of pixel will not
284
Y. Bhavani et al.
be exceeded by two. These techniques increase security compared to PVD schemes and LSB methods. But the security is still found to be increased after evaluation. Islam et al. [7] approach uses a variant of LSB technique with a status bit, to provide productive filtering and also AES algorithm for providing more security. Bitmap image is used for LSB technique and the hidden data will be encoded first and this encoded data will be inserted into image. This method will be having more embedding limit than normal LSB calculation because of using status bit for enquiring encoding and extraction of hidden messages. Since PSNR values are high, it results in high quality of stego image. Chinnalli and Jadhav [13] suggested an image hiding technique using LSB. To conceal data, common pattern bits (stego-key) are employed. Based on those pattern bits and the hidden data bits, the LSBs of the pixel are rearranged. Pattern bits are made up of M x N rows and columns (of a block) with a random key value. During the inserting process, each pattern bit should be checked with a data bit and if it matches successfully, the second LSB bits of the cover image are rearranged. If they won’t match, then they will remain the same. This method provides more security in hiding the data using a common pattern key. This technique has low hidden potential due to the fact single hidden data bit requires a block of (M × N) pixels. The disadvantage of this technique is having less capacity to hide the data. Dhaya [14] in his proposed method used Kalman filter function in extracting the message image, which performs the process with more accuracy. This approach decreases the complexities in extraction process and maintains more intensity of the images. Manoharan et al. [15, 16] proposed a technique which uses contourlet transform to maintain robustness in the medical images as they contain sensitive data. In this paper, PSNR and correlation coefficient were also calculated to measure accuracy. He also proposed watermarking method [16] that uses contourlet transform, the singular value decomposition and discrete cosine transform to increase the robustness. Astuti et al. [2] proposed a method to hide messages using LSB of pixel values in an image. Steganography and cryptography were combined in which image steganography uses LSB algorithm and contents of the messages were changed through cryptography by performing XOR operations on the three most significant bits. LSB method is the mostly used and simple method in the image steganography. Using LSB technique in hiding the data will not affect the visible properties of the image. To increase the security, the XOR operation is performed three times in the process of encrypting the message before it is inserted on the LSB and the three MSB bits were served as keys to facilitate message encryption and decryption. The combination of steganography and cryptography techniques will provide more security for the data and more stability in the transmission of data. The PSNR value is above 50 dB. In this method, there will be two main processes, embedding process and extraction process.
A Survey on Image Steganography Techniques …
285
2.1 Embedding Process In this process as shown in Fig. 1, a cover image and a message in the form of binary image will be taken as input. The output of this process will be stego image. • First cover image and message image should be read. • The pixel values of the images should be converted into binary format. • XOR operation should be performed between seventh and sixth bits of the binary format of cover image. • Once again, XOR operation is performed between the result obtained by the above operation and eighth bit of the binary format of the cover image. • Now XOR operations should be performed on the message image bits with the three MSB bits, i.e. eighth, seventh and sixth bits. • The obtained result is saved in the message bits. By converting this result into unit8, the pixel value of the stego image will be obtained.
Fig. 1 Embedding process (Source ICOIACT, pp. 191–195)
286
Y. Bhavani et al.
2.2 Extraction Process In this process as shown in Fig. 2, the input is stego image and the output is recovered message image. • First the stego image should be read. • The pixel values of the image should be converted into binary format. • XOR operation should be performed between seventh bit and sixth bit of the binary format of the stego image. • Once again, the XOR operation is performed between the result obtained by the above operation and eighth bit of the binary format of the stego image.
Fig. 2 Extraction process (Source ICOIACT, pp. 191–195)
A Survey on Image Steganography Techniques …
287
• Now XOR operations should be performed on the LSB with the three MSB bits, i.e. eighth, seventh and sixth bits. • The obtained result is saved on the LSB. By converting this result into unit8, the pixel value of the message image will be obtained. This technique [2] is very safe, simple, and it gives high PSNR and MSE values, so that the information which is hidden will be undetectable. The process will be completed fast and easily using the XOR operation. The secrecy is maintained very strictly that the inserted bits will not be detected directly using the XOR operator. Furthermore, the XOR operation is performed three times in which three keys were used. The stego file will be kept the same size by using the embedded key in the cover image and eliminates the need for key distribution to the recipient, which will increase the speed of communication without changing the size of the file.
3 Critical Analysis Performance metrics of image steganography technique are peak signal-to-noise ratio (PSNR) and mean square error (MSE). PSNR is mainly used to measure the robustness of the image, and MSE is used to measure accuracy of the technique. The PSNR and MSE values calculated using Eqs. 1 and 2 for some of the techniques are given in Table 1. PSNR = 10 log10
256 − 1 MSE
(1)
Table 1 Performance metrics of image steganography techniques Literature references Technique
PSNR value (in dB) MSE value
Yang et al. [9]
Texture, brightness and edge-based detective LSB
40.62
0.04756
Joshi et al. [10]
Using XOR operation
75.2833
0.0019
Irawan et al. [11]
Uses OTP encryption to hide on edge 80.5553 areas of image
0.0006
Swain [12]
Digital image steganography
51.63
0.0011
Islam et al. [7]
Using status bit along with AES cryptography
60.737
0.054
Astuti et al. [2]
Using LSB and triple XOR on MSB
54.616
0.225
Bhardwaj et al. [17]
Inverted bit LSB substitution
59.0945
0.0647
Bhuiyan et al. [4]
LSB replacement through XOR substitution
70.8560
0.0053
288
Y. Bhavani et al.
MSE =
H −1 G−1
A f (h, g) − S f (h, g)
(2)
h=1 g=1
The different types of algorithms in image steganography as shown in Table 2 are compared based on characteristics • Robustness—Maintenance of data consistency after converting cover image to stego image. Table 2 Content from image steganography techniques Literature references
Domain
Technique
Image steganography characteristics Robustness Imperceptibility Capacity Security
Fridrich [6] Spatial
Estimation of Y secret message length
Dumitrescu Frequency Detection via et al. [5] sample pair analysis
N
Y
Y
Y
Y
N
Y
Ker [8]
Spatial
Steganalysis of Y LSB matching
N
N
Y
Yang et al. [9]
Spatial
Texture, N Brightness and edge-based detective LSB
Y
Y
N
Joshi et al. [10]
Spatial
Using XOR operation
Y
Y
N
Y
Irawan et al. [11]
Spatial
Uses OTP N encryption to hide on edge areas of image
Y
Y
N
Swain [12]
Spatial
Digital image N steganography
N
N
Y
Islam et al. [7]
Spatial
Using status bit along with AES cryptography
N
Y
Y
Y
Chinnalli and Jadhav [13]
Spatial
Combine Y pattern bits (stego-key) with secret message using LSB
N
N
N
Using LSB and triple XOR on MSB
Y
N
Y
Astuti et al. Spatial [2]
Y
A Survey on Image Steganography Techniques …
289
• Imperceptibility—The property preserves the quality of image after embedding process. • Capacity—Size of data inserted into an image. • Security—Confidentiality of data.
4 Conclusion In this paper, different image steganography techniques are analysed on the basis of characteristics of image. The different methods which are currently being used in the image steganography were highly secured as they won’t allow to detect the presence of message and retrieve the message for unauthorized access. The combination of steganographic and cryptographic techniques results in an accurate process for maintaining secrecy of information. Majority of these techniques use LSB algorithm to maintain confidentiality and quality of image. Since it is very advantageous as it is simple and provides imperceptibility, robustness for the data.
References 1. Ardy RD, Indriani OR, Sari CA, Setiadi DRIM, Rachmawanto EH (2017) Digital image signature using triple protection cryptosystem (RSA, Vigenere, and MD5). In: IEEE International conference on smart cities, automation & intelligent computing systems (ICON-SONICS), pp 87–92 2. Astuti YP, Setiadi DRIM, Rachmawanto EH, Sari CA (2018) Simple and secure image steganography using LSB and triple XOR operation on MSB. In: International conference on information and communications technology (ICOIACT), pp 191–195 3. Bhavani Y, Sai Srikar P, Spoorthy Shivani P, Kavya Sri K, Anvitha K (2020) Image segmentation based hybrid watermarking algorithm for copyright protection. In: 11th IEEE international conference on computing, communication and networking technologies (ICCCNT) 4. Bhuiyan T, Sarower AH, Karim R, Hassan M (2019) An image steganography algorithm using LSB replacement through XOR substitution. In: IEEE international conference on information and communications technology (ICOIACT), pp 44–49 5. Dumitrescu S, Wu X, Wang Z (2003) Detection of LSB steganography via sample pair analysis. IEEE Trans Sign Process 51(7):1995–2007 6. Fridrich J, Goljan M (2004) On estimation of secret message length in LSB steganography in spatial domain. In: Delp EJ, Wong PW (eds) IS&T/SPIE electronic imaging: security, steganography, and watermarking of multimedia contents VI. SPIE, San Jose, pp 23–34 7. Islam MR, Siddiqa A, Uddin MP, Mandal AK, Hossain MD (2014) An efficient filtering based approach improving LSB image steganography using status bit along with AES cryptography. In: IEEE international conference on informatics, electronics & vision (ICIEV), pp 1–6 8. Ker AD (2005): Steganalysis of LSB matching in gray scale images. IEEE Sign Process Lett 12(6):441–444 9. Yang H, Sun X, Sun G (2009) A high-capacity image data hiding scheme using adaptive LSB substitution. J. Radio Eng 18:509–516 10. Joshi K, Dhankhar P, Yadav R (2015) A new image steganography method in spatial domain using XOR. In: Annual IEEE India conference (INDICON), pp 1–6, New Delhi
290
Y. Bhavani et al.
11. Irawan C, Setiadi DRIMC, Sari A, Rachmawanto EH (2017) Hiding and securing message on edge areas of image using LSB steganography and OTP encryption. In: International conference on informatics and computational sciences (ICICoS), Semarang 12. Swain G ((2016)) Digital image steganography using variable length group of bits substitution. Proc Comput Sci 85:31–38 13. Channalli S, Jadhav A (2009) Steganography an art of hiding data. J Int J Comput Sci Eng (IJCSE) 1(3) 14. Dhaya R (2021) Analysis of adaptive image retrieval by transition Kalman filter approach based on intensity parameter. J Innov Image Process (JIIP), pp 7–20 15. Manoharan JS (2016) Enhancing robustness of embedded medical images with a 4 level Contourlet transform. Int J Sci Res Sci Eng Technol pp 149–154 16. Mathew N, Manoharan JS (2012) A hybrid transform for robustness enhancement of watermarking in medical images. Int J Digital Image Process 4(18):989–993 17. Bhardwaj R, Sharma V (2016) Image steganography based on complemented message and inverted bit LSB substitution. Proc Comput Sci 93:832–838
Efficient Multi-platform Honeypot for Capturing Real-time Cyber Attacks S. Sivamohan, S. S. Sridhar, and S. Krishnaveni
Abstract In today’s world, cyber-attacks are becoming highly complicated. The hacker intends to expose sensitive information or potentially change the operation of the targeted machine. Cybersecurity has become a major bottleneck to the ondemand service’s growth since it is widely accessible to hackers for any type of attack. Traditional or existing intrusion detection systems is proving unreliable due to heavy traffic and its dynamic nature. A honeypot is a device that exposes a server or network that has vulnerabilities to the internet and collects attack information by monitoring and researching the techniques used by attackers. In this paper, we setup an effective active protection architecture by integrating the usage of Docker container-based technologies with an enhanced honeynet-based IDS. T-Pot platform will be used to host a honeynet of different honeypots in the real-time AWS cloud environment. The development of this honeynet methodology is essential to recover threat identification and securing the cloud environment. Moreover, the experiment results reveal that this defending mechanism may detect and log an attacker’s behavior which can expose the new attack techniques and even zero-day exploits. Keywords Cyber security · Intrusion detection system · Honeynet · AWS cloud · Docker
S. Sivamohan (B) · S. S. Sridhar · S. Krishnaveni Departments of Computer Science and Engineering, SRMIST, Kattankulathur, Chennai, India e-mail: [email protected] S. S. Sridhar e-mail: [email protected] S. Krishnaveni e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_21
291
292
S. Sivamohan et al.
1 Introduction Nowadays, the number of cyber-attacks is growing at a rapid pace, the existing detection techniques are becoming increasingly ineffective, demanding the development of more relevant detection systems [1]. According to the statistics show that security vulnerabilities in the virtual network layer of cloud computing have increased dramatically in recent years. Security breaches have become more difficult and pervasive as a result of the massive increase in network traffic. With the use of traditional network-based intrusion detection systems (IDS), combating these assaults has proven ineffective. In most cases, an intrusion detection system (IDS) is used in conjunction with a firewall to provide a complete security solution. One of the main challenges in securing cloud networks is in the area of the appropriate design and use of an intrusion detection system, which can track network traffic and, hopefully, detect network breaches [2]. After many of the investigations over the recent decades, the cybersecurity challenge remains unsolved. The factor for increase is attackers’ access to more processing power and resources, which allows them to perform more complicated attacks [3]. Intrusion detection systems (IDS) and Firewall systems are widely used for the detection and prevention of malicious threats [4]. Deep knowledge about the malicious codes and their target destination is required for improving security. This information to honeypot was established to perform the information, where the information was captured and stored. KDD 99, ISCX, DARPA, and CAIDA, among others, have constraints, such as obsolete traffic, simulated traffic that does not match realworld network traffic, and a lack of typical normal data [5]. The general architecture of a honeypot system is depicted in Fig. 1.
Fig. 1 The general architecture of honeypot system
Efficient Multi-platform Honeypot for Capturing …
293
In this study, to learn more about attackers, their motivations, and techniques, a honeynet system was used. These systems let attackers engage with them, while monitoring attacks by posing as actual machines with sensitive information [6]. We setup an effective active protection architecture by integrating the usage of Docker container-based technologies with an enhanced honeynet-based IDS. T-Pot Platform will be used to host a honeynet of different honeypots in the real-time AWS cloud environment [7]. A network attack benchmark should include a wide range of cyber-attacks created by various tools and methodologies in addition to representative normal data. The outcomes of building and assessing detection models using representative data are realistic, this can bridge the gap between machine learning-based building identification algorithms and their actual deployment in real-world cloud networks. Integration of honeypot collected information with a firewall and the IDS could be made to reduce the occurrence of false-positive and to improved security. There are two types of honeypots namely, research and production honeypots. ‘Research Honeypot’ collects information relating to blackhat [8]. This is done by giving full access to the system without any filtration. Production honeypots are used where they acted as a filter between the state and blackhat and for preventing malicious attacks [9]. Honeypots are characterized based on their design, deployment, and deception technology. Figure 2 illustrated the various types of honeypots. The complexity of attacks, changes in attack patterns, and technique are all factors that should be considered in the cloud environment. Inability to resolve these security breaches has always had serious impacts and has made the environment more susceptible [10]. From the attacker’s perspective, there has been an increase in cyberrelated events, as well as greater complexity [11, 32]. To address the concerns, the
Fig. 2 Various types of honeypots
294
S. Sivamohan et al.
study presented in this paper suggests the use of honeypots to analyze anomalies and patterns based on honeypot data. This study intends to create a prototype as a proof of concept to find appropriate attack detection approaches via honeypots. The objectives of this work will be leveraged to further intrusion attack detection analytical tools and techniques. As a result, the followings are the main objective of this current work: • To detecting attacks in a cloud environment by developing an intrusion detection system based on honeypots. • To develop a prototype as a proof of concept. • To learn from the attacker’s actions in a virtual environment. • To evaluate and interpret cyber-attacks. The main goal of this paper is to deploy a multi honeypot in a cloud environment that captures attacker patterns and then evaluates the collected data for intrusion detection functionality. This paper presents attack detection approaches based on the use of honeypots in a cloud environment to create an intrusion detection system. The following are the key performance contributions: • Improved honeynet-based IDS that will be used to identify attacks and anomalies in a cloud environment. It provides the multi honeypots platform for analyzing and understanding the behavior of attackers. • Identified the abnormalities and intrusion attempts by using anomaly detection. • Analyzed and recognized anomalies in attacks in a cloud environment by learning from an attacker’s behaviors. • The development of a honeynet data collection system is the major contribution of this study. • A rapidly deployable pre-configured network of honeypots, which are devices possible to detect active threats in public cloud, is a unique component of this system. The following is how the rest of the paper is structured: A overview of relevant honeypot work for intrusion detection systems is included in Sect. 2. Section 3 describes the proposed framework and detecting intrusion attacks in the cloud and offers a methodology for data collection, and Sect. 4 presents the findings of the data analysis and the experiments. Finally, we come to this conclusion in Sect. 5.
2 Related Works This section presents the relevant honeypot work for intrusion detection systems. The honeypots are a type of network traffic and malware investigation tool that has been widely utilized. Lance Spitzner [12], the Honeynet Project’s creator, defines a honeypot as “a security resource” whose usefulness is contingent on being targeted or compromised.
Efficient Multi-platform Honeypot for Capturing …
295
Majithia et al. [13] have used the model of running honeypots of three types on a Docker server, with a logging management mechanism that is built on top of the ELK framework, and discussed issues and security concerns associated with each honeypot. The honeypots used were HoneySMB7, Honey WEB-SQLi, an HTTP protocol honeypot that includes SQL injection vulnerability, and HoneyDB, a honeypot built for MySQL databases vulnerabilities, the work displayed analysis of the attacks using unique IPs and the distribution among the honeypots. Adufu et al. [14] investigated and compared running molecular modeling simulation software, auto dock, on container-based virtualization technology systems and hypervisor-based virtualization technology systems, and concluded that the container-based systems managed memory resources in an efficient manner even when memory allocated to instances are higher than physical resources, not to mention the reduction in the number of execution times for multiple containers running in parallel. Seamus et al. [15] built a study honeypot aimed at Zigbee device attackers. Zigbee devices are typically used in Manets. Since IoT devices are becoming more extensively used, their risks are being more generally recognized, motivating the development of this honeypot. As a result, a risk evaluation of these devices is critical. They used the honeypot in their implementation. To catch the hacker’s unethical behavior, Jiang et al. [16] used an open-source honeynet setup. During the process of the study, nearly 200,000 hits were discovered. This test explored ways for intruders to be notified of their goals, such as a web server, FTP server, or database server. Sokol et al. [24] created a distribution of honeypot and honeynet used for OS virtualization, a method that was largely unexplored in research at the time. The research’s most major contribution is in the automation of honeypots, with their technique for generating and evaluating their honeynet solution being remarkable. According to the study, OS-level virtualization has very little performance or maintenance overhead when compared to virtualization technologies or bare-metal systems. They also point out that utilizing containers to disguise honeypots adds an extra element of obfuscation. Even though, they are confined environments sharing the kernel of a legitimate operating system, when fingerprinted, they are more likely to appear as a valid system [17, 30]. Alexander et al. [25] have employed as an alternative to virtualization, researchers investigated the usage of Linux containers to circumvent a variety of virtual environment and monitoring tool detection methods. The goal was to see if using container environments as a way to host honeypots without being identified by malware would be possible in the long run [18]. Chin et al. [26] proposed a system called HoneyLab, which is public infrastructure for hosting and monitoring honeypots with distributed computing resource structure. Its development was prompted by the discovery that combining data collected from honeypots in diverse deployment scenarios allows attack data to be connected, allowing for the detection of expanding outbreaks of related attacks. This system collects data from a huge number of honeypots throughout the globe in order to
296
S. Sivamohan et al.
Table 1 Comparison of various honeypots ManTrap
BOF
Spector
Honeynet
Honeynet
Interaction level
High
Low
High
Low
High
Freely available
No
No
No
Yes
Yes
Open source
No
No
No
Yes
Yes
Log file support
Yes
No
Yes
Yes
Yes
OS emulation
Yes
No
Yes
Yes
Yes
Supported service
Unrestricted
7
13
Unrestricted
Unrestricted
identify attack occurrences. Their approach, on the other hand, is based on two lowinteraction honeypots, which restrict the amount of data acquired from the attack. [20]. In order to gain a better knowledge of attacker motivations and techniques, an improved system would be able to gather more data on attack occurrences [28]. Table 1 summarized the comparative analysis of five different honeypots in the tabular form.
3 Methodology This work proposed a new honeynet-based intelligent system for detecting cyberattacks in the cloud environment. It demonstrates the system configuration of container-based multiple honeypots that have the ability to investigate and discover the attacks on a cloud system. The complete implementation of all honeypots created and deployed throughout the investigation, as well as a centralized logging and monitoring server based on the Elasticsearch, Logstash, and Kibana (ELK) stack, were included in the section. This tracking system is also capable of monitoring live traffic. Elasticsearch was chosen because it can provide quick search results by searching an index rather than searching the text directly. Elasticsearch is a scalable and distributed search engine [19]. Kibana is a freely available client-side analytics and search dashboard that visualizes data for easier understanding. It’s used to display logs from honeypots that have been hacked [21]. The information was acquired over a period of a month, during which time all of the honeypots were placed in various locations across the globe. The honeypots’ capabilities can considerably assist in attaining the recommended method to reducing threats to critical service infrastructures. Many of these tasks have been recognized as being provided by containers. The simplicity with which identically configured environments may be deployed is one of the major advantages of container technologies. Container technologies, on the other hand, cannot provide the same simplicity of deployment for a fully networked system [18]. This motivated the development of a deployment mechanism for the whole system, allowing for its reconditioning in a limited span of time. Figure 3 shows the detailed system overview of the model framework.
Efficient Multi-platform Honeypot for Capturing …
297
Fig. 3 A detailed system overview of the deployed honeynet framework
As there are several existing techniques, both in research and development that have made significant contributions to such a solution. There is no one method that can give a viable, workable way to deploy active network defense as a single-networked deployable unit [22]. The preliminary development approach used a network design that would help the researchers achieve their main goal of creating a flexible honeynet in a Cloud scenario, which can manage any illegal entry or untrusted connections and open a separate Docker container for each attacker’s remote IP address. Figure 4 describes the dataflow of the proposed solution. This proposed system was designed to be scalable and adaptable, allowing new features to be added rapidly and the platform to adapt to the unique requirements of a given infrastructure. It is made up of three primary components that were created independently using the three-tier application approach as follows: • DCM: It is a data collection module that gathers essential information from a variety of physical and virtual data sources.
Fig. 4 Data flow for the honeynet-based attack detection framework
298
S. Sivamohan et al.
• DAM: It is a data analysis module that provides the user with a set of advanced analyzes to produce physical, cyber, or mixed intelligence information (for example, cyber threats evaluation, facility classification by criticality, and pattern detection from social interactions) by processing the stored raw data. • DVM: It is a data visualization module that provides true awareness of the physical and cyber environments through a combined and geospatial representation of the security information. This experimental design has tested the two types of containers (SSH and HTTP). The Suricata container has been added as an example of the different types of honeypots that were utilized in the model. In order to overwhelmed limits and the difficult setup of the honeypot network, virtualized systems on cloud infrastructures have been used. AWS cloud provider was considered for these purposes. The architecture of the honeypot system is illustrated in Fig. 5. The route of the attackers in the attack scenario is depicted in Fig. 6. Within the set-up, the following attack scenario was carried out: Using SSH or Telnet, an attacker was able to obtain access. Any root credentials would get access to the SSH session when requested to log in. The attacker would again try to find further weaknesses on the computer. When the attacker is satisfied, he will try to download and run malicious programs on the machine. In this approach, a purposeful vulnerability is presumed, the goal of which is to fool the attacker into thinking the system has a flaw, essentially studying the attacker’s path and attack tactics. The experimental setup comprises of five honeypots and a supplementary system for collecting the logs generated [23]. This experimental design has tested the two types of scenarios (SSH and HTTP).
Fig. 5 Architecture of the honeypots system
Efficient Multi-platform Honeypot for Capturing …
299
Fig. 6 Attacker’s path in the attack scenario
Testing Scenarios In this experiment, we applied three test case scenarios for the purpose of verification of the functionality of the model. SSH Scenario SSH scenario created an SSH connection from a considerate simulated attacker and observed the following: • An instance of the Kippo container was created, and the traffic was forwarded to it as shown in Fig. 7. • The attacker was able to navigate through the Kippo interactive terminal with fake file-system observed in Fig. 8. • A fingerprinting attack easily detected a well-known fingerprint indicator for Kippo honeypot using the command ‘vi’. • Kippo honeypot container logs were saved and forwarded to syslog for recording all the interactive communication with the attacking session.
Fig. 7 SSH container established session
300
S. Sivamohan et al.
Fig. 8 Client browsing through Kippo fake file-system
Fig. 9 HTTP container established session
HTTP Scenario Creating an http request to the reverse proxy address would result into the following: • Create an http honeypot using Glastopf Docker image with the specified naming convention shown in Fig. 9. • Attacker browsing a fake web server page where he can apply different attacks trying to authenticate shown in Fig. 10. • Container logs collected and sent to syslog highlighting the source IP of the original attacker. The honeynet behavior, as expected was creating a container per attacking session (unique IP) with the naming convention of having the image name associated with the IP of the originated source of attack to make it exclusive for this session. The attacker was directed to a fake website to apply different attacks that recorded and limited inside the dedicated Glastopf container [24].
4 Data Analysis and Results Discussion The experimental study was performed for a period of six month, over 5,195,499 log entries from attackers were acquired for further analysis. The real-time data was gathered from August 19, 2020, to February 19, 2021, and the findings were compiled from the dataset using the Kibana interface, which allows for data aggregation across multiple fields of the whole database. The main task is to find solutions
Efficient Multi-platform Honeypot for Capturing …
301
Fig. 10 Client browsing through Glastopf fake web server
to particular investigation queries, such as the source, target, and attack technique. The observation reflects the legitimate implementation of honey net system. Honey net with a flexible and dynamic transition between honeypots can reveal some of the future and potential attacks for a cloud environment, through allowing attacker strike a fraudulent system with the same potential vulnerabilities [29]. The intrusion data was investigated by manipulating the counts, ratios, statistical Chi-Squared χ 2 test and the P-value = 0 is scale parameter, ϕ(t)denotes mother wavelet. Two types of sensors were used, goniometer and EMG electrode [14]. Good classification results with accurate readings are obtained. Table 1 lists the findings and the sensors used. Table 2 explains the accuracy rate of the system, and Table 3 enumerates the methods to find muscle condition. Several studies show the increase in electromyography signal amplitude and the spectrum change to low frequency band will increase the muscle fatigue degree during isometric contraction. Different methodology and their correlation were discussed in many papers. Table 3 Methods to find muscle condition Methodology
Function
Short-term Fourier transform To extract MDF and MF
Median frequency (MDF), Mean frequency (MF)
Indicators of spectrum shifting
Formulas X(t, f) = +∞ − j 2π f τ dτ −∞ x(t)h(τ − t)e MDF MDF = 0 p( f )d f = f0 p( f f )d MDF MF =
Root mean square
Determine the amplitude of surface EMG
f0
0
f0 0
f p( f )d f p( f )d f
W x (a, τ ) = √1 x(t)ϕ ∗ t−τ a dt |a|
A Study on Surface Electromyography in Sports Applications …
859
1.2 Correlation Coefficient Variables X and Y; X = {Xi }, i = 1, 2, . . . ..N1
(7)
Y = Yj , j = 1, 2, . . . N2
(8)
N
X i − X Yi − Y r= 2 N 2
N i=1 X i − X i=1 Yi − Y i=1
(9)
X and Y denote the mean value of X i and Y i . The muscle fatigue experiment is performed in right forearm of an athlete under isometric contractions [15]. Deltoid muscles and biceps were monitored with the help of athletes. The surface electrodes were placed in the deltoid muscle at distance of one finger thickness in distal and anterior. In bicep muscle, electrodes are placed in the medial acromion which extends over the shoulder joint and fossa cubit. These muscles have been analyzed, and finally, it has been found that the repetitive task of post-stroke rehabilitation is a most suitable methodology for muscle weakness recovery [16]. Wearable devices are used to detect bicep muscle fatigue that occurs during gym activities. During elbow flexion, an electromyographic information from the upper limbs were monitored, normalized and filtered during maximal isometric tasks based on the amplitudes of EMG signal [17].
1.3 Lower Limb Muscle fatigue reduces the metabolic performance and neuromuscular system which results in persistent muscle contraction and decreases its steady activity. Postural control plays an important role in an appropriate biochemical stance. But the main factor which affects this postural control is fatigue. Recent researches found that lower extremity muscles plays an important role in maintaining and balancing postural control [18]. Athletes those who uses their lower limb mostly will be affected by less balancing in in postural control and muscle fatigue in their lower limb. The activities of lower extremity muscles were analyzed for the fatigueness using surface EMG before and after activity. These results indicate that the muscle activity level of rectus femoris, hamstrings and gastrocnemius muscles significantly changes before and after fatigue. An important relationship was found between the postural, rectus femoris muscle and fabulous anterior muscles [19]. Investigating the activity of major muscles of lower limbs
860
N. Nithya et al.
during a soccer sport have taken with the help of 10 soccer players. Electromyographic activities of lower limb muscles have taken. Muscles like rectus femoris, biceps femoris, tibialis anterior and gastrocnemius were monitored before and after exercise. Then the EMG data’s were analyzed, and root mean square were computed over ten gait cycles. The results showed that after the exercise of intensity soccerplay simulation, the electromyographic activity in most of the lower limb muscles was lower than before [20]. A real-time fatigue monitoring system to detect muscle fatigue during cycling exercise has been developed, which provides an online fatigue monitoring and also discusses the analysis on lower limb. It contains physical bicycle with more number of peripheral devices, wireless EMG sensor set and a computer which provides a visual feedback. The bicyclers were allowed to pedal with constant speed and EMG signals of lower limb muscles, velocity and time were recorded. Once the fatigue occurs, cycling speed will show larger deviation in velocity, and reference was used to judge the cycling stability. This method can be applied on bicycle ergometer to monitor real-time onset and activities of lower limb muscles fatigue. Kinesiological and kinematical data’s are measured using this system [21].
1.4 Lumbar Region Lumbar muscle function is considered as an important factor among physical deficiencies which causes long-term lower trouble [22]. It is difficult to monitor the back muscles because they have several fascicles which together generate the trunk tasks [23]. Literatures have demonstrated the detection of lumbar muscle fatigue in sports players. One of the study shows the estimation of frequency compression of surface EMG signal during cyclical lifting. Surface EMG techniques plays an important role in monitoring normal function interaction among active trunk muscles during exercise and specific movements. Activities such as cyclic lifting and isometric trunk extension were performed and the paraspinal EMG signals are observed and noted. The signals were extracted from paraspinal muscles using surface EMG electrodes. This method is called as back analysis and the signals were extracted with isometric muscle contraction. EMG signals were accurately recorded from six bilateral lumbar paraspinal regions, which demonstrated the static and dynamic results in different patterns of EMG spectral changes and record metabolic fatigue processors [24].
2 Prevention of Injuries Observing the muscle condition of an athlete during their activities is very important to prevent injury. Swimming is a sport where arms and legs plays a vital role in swimmers which is used to create successive movements in propelling the body through the water. So a system was developed which is capable to measure the stress level of the muscle of the swimmer and also indicates the muscle fatigue level. This device is
A Study on Surface Electromyography in Sports Applications …
861
designed by using EMG sensor. It consists of EMG electrodes and a microcontroller unit which is used to analyze deltoid muscle of a swimmer because it is mostly involved in the swimming activity. Once the detection of muscle movement started, it will trigger and gives an alert signal when the measured EMG signal exceeds the reference muscle fatigue level. By doing so, it helps to prevent the injury [25]. Another application of EMG is its normalization techniques that is used to detect alterations in neuromuscular system. This system is diagnosed in the trainers with anterior cruciate ligament (ACL) knee injury who does heavy treadmill walks [26]. Internet of things is the interconnection of computational devices that is inbuilt in everyday objects and permit them to transmit and receive data over Internet. Recent technologies have found that muscle fatigue prevention can also be designed with the help of IoT. Muscle fatigue can be determined and recovered with the help of pulse modulation techniques like pulse width modulation (PWM) and with ESP8266. With the combination of PWM, ESP8266 and surface electromyographic signals, muscle fatigue can be monitored and detected in the real-time basis through wireless network. This technique consists of power supply, EMG transducer, infrared transducer, ESP8266 Wi-Fi module, vibration motor and a motor drive module. ESP8266 is not only a Wi-Fi adapter but also a processor which can run independently. The main function of this wireless fatigue detection system is to prevent the injuries caused due to muscle fatigue during heavy training activities [27]. Table 4 gives the components of IoT-based muscle fatigue detection system. Table 5 lists out the various layers and its functions in the architecture of IoT. Table 4 Components of IoT-based muscle fatigue detection system Components
Functions
Infrared transducer
Detects infrared signals to find if someone is using the system
EMG transducer
Detects muscle activation through potential and transmits EMG pulse signals
ESP8266
Acts as a sole communicator and a processor processing monitored data from the sensors that serve as wireless access point
Table 5 Architecture of IoT Layers
Functıons
Perceptual bottom-level layer
Consist of EMG transducer and IR transducer
Network middle-level layer
Processes and sends data given by ESP8266 to the intelligent mobile terminals
Upper-level application layer
Information from perceptual layer is analyzed and displayed
862
N. Nithya et al.
3 Performance Accessing In the field of sports, surface EMG can analyze and monitor different situations and also makes it a special sort of interest. Improvement in the efficiency of a movement is mainly determined by the economy of effort, its effectiveness and also injury prevention [28]. The main goal of performance monitoring systems was to prevent overtraining of athletes to reduce injuries caused by muscle fatigue and to monitor the training activities as well as to ensure the performance maintenance [29]. In sports, movement strategy is very critical, and surface EMG is used to evaluate activation of muscles in sports application which includes performance, recovery and also evaluating the risk parameters in injuries. There is a system called Athos wearable garment system which integrates the surface electromyography electrodes into the construction of compression athletic apparel. It decreases the complexity and increases the portability of collection of EMG data as well as gives processed data. A portable device collects the surface EMG signal, and it clips them into apparel, process it and sends them wirelessly to a device which is handled by a client that presents to a trainer or coach. It monitors and provides the measure of surface EMG which is consistent [30]. Performance of muscles is calculated in terms of its strength or during contraction, its ability to generate force [31]. As we are evolving in a highly competitive world, we need a monitoring system which analyze our body functions with high performance level, especially athletes those who vigorously train their body. So monitoring of fatigue condition is necessary to measure accurate fatigue stress level to maximize their performance [32]. Table 6 discusses the various application of EMG sensor in sports.
4 Signal Processing In the past few years, electromyogram signals were becoming a great need in different fields of application like human machine interaction, rehabilitation devices, clinical uses, biomedical application, sports application and many more [33]. EMG signals which is acquired from the muscle need advanced technique for detection, processing, decomposition and classification. But these signals are very complicated because they are controlled by the nervous system which is dependent on physiological and anatomical properties of muscles. If the EMG transducer is mainly placed on the skin surface, it basically collects the signals from all the motors at a given time. This can generate the interaction of various signals [34]. The EMG signals which were collected from the muscles using electrode consists of noise. Removing noise from the signal also becomes an important factor. Such noises are caused due to different factors which originates from the skin electrode interface, hardware source and also from other external sources. The internal noise generated form the semiconductor devices also affect the signal. Some of them include motion artifact, ambient noise, ECG noise, crosstalk and so on [35]. The EMG signals may be
A Study on Surface Electromyography in Sports Applications …
863
Table 6 Application of EMG sensor in various sports field Sports
Muscles
Methods/Parameters
Findings
Wearable EMG device
Provides a study on the use of EMG wearable sensors in sports person with disability
Body builders
Biceps brachii muscles 1D spectral analysis, sun SPOT (small programmable object technology), wearable surface EMG with goniometer
Found 90.37% accuracy in monitoring and detecting muscle fatigue
Students who completed sport science
Lower extremity muscles, like rectus femoris, tibialis anterior muscles, lateral hamstrings, gastrocnemius muscles
Dynamic balance is evaluated accurately. The paired T-test shows the activity level of lower extremity muscles which changes after and before fatigue
Soccer
Rectus femoris, tibialis Custom written anterior muscles, software to compute gastrocnemius RMS muscles, biceps femoris
EMG activities in lower limb muscles were reduced after a soccer match play
Cyclical lifting
Paraspinal muscles
Time–frequency analysis
Static and dynamic tasks from different pattern of EMG spectrum changes
Swimming
Deltoid muscles
EMG device with ARDUINO UNO REV3,Bluetooth and EMG sensor
Measures muscle stress level and indicates muscle fatigue level in athletes
Wheel chair basket Trunk muscles ball
Y balance test evaluates dynamic balance and to standardize SEBT (star excursion balance test). Paired T-test to find relationship between postural control and muscle fatigue in lower extremity muscles
high or low. The amplifiers direct current offsets produce low-frequency noise. This low-frequency noise can be filtered using high-pass filters, whereas nerve conduction produce high-frequency noise. High-frequency interference comes from radio broadcasts, computers which can be filtered using low-pass filter [36]. A specific band of frequencies should be transmitted in the EMG transmission process which needs to remove low and high frequencies. It is achieved by a filter called band pass filter. It is much suitable for EMG signals because it allows specific bands to be transmitted according to the range fixed by a trainer [37]. EMG signal processing techniques include three procedures; they are filtration, rectification and smoothing. Advanced signal processing methods are used in the detection of muscle fatigue system. The suitable surface EMG signal processing methods for muscle fatigue evaluation and detection have been listed below.
864
1. 2. 3. 4.
5.
N. Nithya et al.
Time Domain Methods—consist of estimation of surface EMG amplitude, zero crossing rate of the signal, spike analysis. Frequency Domain Methods—consist of Fourier-based spectral analysis, parametric-based spectral analysis. Combined analysis of the spectrum and amplitude of the EMG signal will give out the fatigue and the force involved. Time-Frequency and Time Scale methods—consist of general time–frequency representations which is also known as Cohen class, shot-time Fourier transform and spectrogram, winger distribution, time varying auto aggressive approach, wavelets, Choi-Williams distribution. Spectral shape indicators and other mathematical methods like frequency band method, representation of logarithmic power–frequency, fractional analysis, recurrence qualification analysis, Hilbert-Haung transform [38].
Many sports activities require heavy physical trainings which are undergone by athletes during their vigorous workouts that lead to muscle fatigue and also sometimes causes injuries [39]. By these advanced signal processing techniques, muscle fatigue can be detected and analyzed and also help to prevent injuries caused by it.
5 Discussion This paper is mainly focused on the role of surface EMG sensors and its contribution in monitoring and detecting muscle fatigue in different body parts such as lower limb, upper limb and lumbar region during sports activities. The study is also concentrated in analyzing its various signal processing methods. From this analysis, it is found that there are only prototypes and samples built up with certain conditions. The challenges faced with surface EMG are: (1) the signal received from the surface EMG must be accurate. If any noise gets mixed up, interpretation might go wrong; (2) the wearables are handy and use batteries for their power consumption. The challenge is the operating hours should be higher, avoiding repeated replacements (3) every person is concerned about their data privacy. Data security must be ensured while transferring them. (4) If there is displacement of electrodes on the muscles, then the spatial relationship cannot be maintained which will affect the amplitude of the signal (5) the variation between surface EMG and the power loss is higher before and after the activity. So the EMG models might not give the proper values of the muscle fatigue after an intense training. There are no sufficient and advanced technologies for evaluating muscle fatigue. Further, researchers can concentrate on adopting efficient machine learning and artificial intelligence technologies with secured IoT data transfer to give an instant update about the strain encountered in the muscles.
A Study on Surface Electromyography in Sports Applications …
865
6 Conclusion The aim of this study is to analyze various methods of surface EMG techniques used to monitor muscle fatigue condition in different sports activities. This paper also demonstrated various categories such as prevention of injuries in athletes with the use of surface EMG, monitoring the performance with muscle activity and its signal processing techniques. EMG signals can be transmitted over the Internet for further analysis. Cloud infrastructure provides storage and processing resources over the Internet to support EMG monitoring system. Researchers should focus on the feasibility of the wearable devices to be made available in market as a reliable one to monitor the signals and derive valuable information in real time. Efficient machine learning algorithms can be introduced to classify the signals based on the activity. In future, technologically advanced and compact muscle fatigue detection system with surface EMG can be implemented.
References 1. Nithya N, Nallavan G (2021) Role of wearables in sports based on activity recognition and biometric parameters: a survey. In: International conference on artificial intelligence and smart systems (ICAIS), pp 1700–1705 2. Chaudhari S, Saxena A, Rajendran S, Srividya P (2020) Sensors to monitor the musclar activity—a survey. Int J Sci Res Eng Manage (IJSREM) 4(3):1–11 3. Yousif H, Ammar Z, Norasmadi AR, Salleh A, Mustafa M, Alfaran K, Kamarudin K, Syed Z Syed Muhammad M, Hasan A, Hussain K (2019) Assessment of muscle fatigue based on surface EMG signals using machine learning and statistical approaches: a review. In: IOP conference series materials science and engineering, pp 1–8 4. Adam DEEB, Sathesh P (2021) Survey on medical imaging of electrical impedance tomography (EIT) by variable current pattern methods. J IoT Soc Mob Anal Cloud 3(2):82–95 5. Liu SH, Lin CB, Chen Y, Chen W, Hsu CY (2019) An EmG patch for real-time monitoring of muscle-fatigue conditions during exercise. Sensors (Basel) 1–15 6. Taborri J, Keogh J, Kos A, Santuz A, Umek A, Urbanczyk C, Kruk E, Rossi S (2020) Sport biomechanics applications using inertial, force, and EMG sensors: a literature overview. Appl Bionics Biomech 1–18 7. Fernandez-Lazaro D, Mielgo-Ayuso J, Adams DP, Gonzalez-Bernal JJ, Fernández Araque A (2020) Electromyography: a simple and accessible tool to assess physical performance and health during hypoxia training. Syst Rev Sustain 12(21):1–16 8. Worsey MTO, Jones BS, Cervantes A, Chauvet SP, Thiel DV, Espinosa HG (2020) Assessment of head impacts and muscle activity in soccer using a T3 inertial sensor and a portable electromyography (EMG) system: a preliminary study. Electronics 9(5):1–15 9. Gonzalez-Izal M, Malanda A, Gorostiaga E, Izquierdo M (2012) Electromyographic models to access muscle fatigue. J Electromyogr Kinesiol 501–512 10. Boyas S, Guevel A (2011) Neuromuscular fatigue in healthy muscle: underlying factors and adaptation mechanisms. Annal Phys Rehabil Med 88–108 11. Al-Mulla MR, Sepulveda F, Colley M (2012) Techniques to detect and predict localised muscle fatigue 157–186 12. Rum L, Sten O, Vendrame E, Belluscio V, Camomilla V, Vannozzi G, Truppa L, Notarantonio M, Sciarra T, Lazich A, Manniini A, Bergamini E (2021) Wearable sensors in sports for persons with disability. Sensors (Basel) 1–25
866
N. Nithya et al.
13. Chang KM, Liu SH, Wu XH (2012) A Wirwless sEMG recording system and its application to muscle fatigue detection. Sensors (Basel) 489–499 14. Al-Mulla MR, Sepulveda F, Colley M (2011) An autonomous wearable system for predicting and detecting localised muscle fatigue. Sensors (Basel) 1542–1557 15. Ming D, Wang X, Xu R, Qiu S, Zhao Xin X, Qi H, Zhou P, Zhang L, Wan B (2014) SEMG feature analysis on forearm muscle fatigue during isometric contractions 139–143 16. Cahyadi BN, Khairunizam W, Zunaidi I, Lee Hui L, Shahriman AB, Zuradzman MR, Mustafa WA, Noriman NZ (2019) Muscle fatigue detection during arm movement using EMG Signal. In: IOP conference series: materials science and engineering, pp 1–6 17. Angelova S, Ribagin S, Raikova R, Veneva I (2018) Power frequency spectrum analysis of surface EMG signals of upper limb muscles during elbow flexion—a comparison between healthy subjects and stroke survivors. J Electromyogr Kinesiol 1–29 18. Filipa A, Bymes R, Paterno MV, Myer GD, Hewett TE (2010) Neuromuscular training improves performance on the star excursion balance test in young female athletes. J Orthopeadic Sports Phys Theraphy 551–558 19. Fatahi M, Ghesemi GHA, Mongasthi Joni Y, Zolaktaf V, Fatahi M (2016) The effect of lower extremity muscle fatigue on dynamic postural control analysed by electromyography. Phys Treatments. 6(1):37–50 20. Rahnama N, Lees A, Reilly T (2006) Electromyography of selected lower-limb muscles fatigued by exersice at the intensity of soccer match-play. J Electromyogr Kinesiol 16(3):257– 263 21. Chen SW, Liaw JW, Chan HL, Chang YJ, Ku CH (2014) A real-time fatigue monitoring and analysis system for lower extremity muscles with cycling movement. Sensors (Basel) 14(7):12410–12424 22. Elfving B, Dedering A, Nemeth G (2003) Lumbar muscle fatigue and recovery in patients with long-term low-back trouble—electromyography and health-related factors. Clin Biomech (Bristol, Avon) 18(7):619–630 23. Coorevits P, Danneels L, Cambier D, Ramon H, Vandeerstraeten G (2008) Assessment of the validity of the biering- sorensen test for measuring back muscle fatigue based on EMG median frequency characteristics of back and hip muscles. J Electromyogr Kinesiol 18(6):997–1005 24. Roy SH, Bonato P, KnaflitZ M (1998) EMG assessment of back muscles during cyclical lifting. J Electromyogr Kinesiol 8(4):233–245 25. Helmi M, Ping C, Ishak N, Saad M, Mokthar A (2017) Assesment of muslce fatigue using electromyographm sensing. In: AIP conference proceedings, pp 1–8 26. Benoit DL, Lamontage M, Cerulli G, Liti A (2003) The clinical significance of electromyography normalisation techniques in subjects with anterior cruciate ligament injury during treadmill walking. Gait Posture 18(2):56–63 27. Yousif HA, Zakaria A, Rahim NA, Salleh AF, Mahmood M, Alfran KA, Kamarudin L, Mamduh SM, Hsan A, Hussain MK (2019) Assesment of muscle fatigue based on surface EMG signal using machine learning and statistical approaches: a review. In: IOP conference series: materials science and engineering, pp 1–8 28. Masso N, Rey F, Remero D, Gual G (2010) Surface electromyography application in the sport. Apunts Med Esport 45(165):121–130 29. Taylor KL, Chapman D, Cronin J, Newton M, Gill N (2012) Fatigue monitoring in high performance sport: a survey of current trends. J Aust Strength Conditioning 12–23 30. Lynn SK, Watkins CM, Wong MA, Balfany K, Feeney DF (2018) Validity and reliability of surface electromyography measurements from a wearable athlete performance system. J Sports Sci Med 17(2):205–215 31. Kuthe C, Uddanwadiker R, Ramteke A (2018) Surface electromyography based method for computing muscle strength and fatigue of biceps brachii muscle and its clinical implementation. Inf Med Unlocked 34–43 32. Austruy P (2016) Neuromuscular fatigue in contact sports: theories and reality of a high performance environment. J Sports Med Doping Stud 6(4):1–5
A Study on Surface Electromyography in Sports Applications …
867
33. Chowdhury RH, Reaz RH, Ali MA, Bakar AA, Chellapan K, Chang TG (2013) Surface electromyography signal processing and classification techniques. Sensors (Basel) 13(9):12431– 12466 34. Raez MB, Hussain MS, Mohd-Yasin F (2006) Techniques of EMG signal analysis: detection, processing, classification and application. Biol Proced Online 11–35 35. Shair EF, Ahmad S, Marhaban MH, Tamrin SM, Abdullah AR (2017) EMG processing based measures of fatigue assessment during manual lifting. BioMedical Res Int 1–12 36. Senthil Kumar S, Bharath Knnan M, Sankaranarayanan S, Venkatakrishnan A (2013) Human hand prosthesis on surface EMG signals for lower arm amputees. Int J Emerg Technol Adv Eng 3(4):199–203 37. De Luca CJ, Gilmore LD, Kuznetsov M, Roy SH (2010) Filtering the surface EMG signal: movement artifacts and baseline noise contamination. J Biomech 43(8):1573–1579 38. Cifrek M, Medved V, Tonkovic S, Ostojic S (2009) Surface EMG based muscle fatigue evaluation in biomechanics 24(4):327–340 39. Ahmad Z, Jamaudin MN, Asari MA, Omar A (2017) Detection of localised muscle fatigue by using wireless surface electromyogram(sEMG) and heart rate in sports. Int Med Devices Technol Conf 215–218
Detection of IoT Botnet Using Recurrent Neural Network P. Tulasi Ratnakar, N. Uday Vishal, P. Sai Siddharth, and S. Saravanan
Abstract The Internet of Things (IoT) is one of the most used technologies nowadays. Hence, the number of DDoS attacks generated using IoT devices has raised. Normal anomaly detection methods, like signature-based and flow-based methods, cannot be used for detecting IOT anomalies as the user interface in the IOT is incorrect or helpless. This paper proposes a solution for detecting the botnet activity within IoT devices and networks. Deep learning is currently a prominent technique used to detect attacks on the Internet. Hence, we developed a botnet detection model based on a bidirectional gated recurrent unit (BGRU).The developed BGRU detection model is compared with gated recurrent unit (GRU) for detecting four attack vectors Mirai, UDP, ACK, and DNS generated by the Mirai malware botnet, and evaluated for loss and accuracy. The dataset used for the evaluation is the traffic data created using the Mirai malware attack performed on a target server using C&C and scan server. Keywords Internet of Things (IoT) · Botnet · Gated recurrent unit (GRU) · Bidirectional gated recurrent unit (BGRU) · Deep learning
1 Introduction 1.1 Internet of Things (IoT) The Internet of Things (IoT) is an ongoing means of communication [1]. In the near future, it is anticipated that objects of regular lifestyle can be equipped with microcontrollers, microprocessors for virtual communication [2], and proper protocol stacks in order for them to speak to everyone else and to the users and become a vital element of the Internet.
P. Tulasi Ratnakar · N. Uday Vishal · P. Sai Siddharth · S. Saravanan (B) Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_63
869
870
P. Tulasi Ratnakar et al.
1.2 Distributed Denial of Service (DDoS) Attacks DOS and DDoS attacks have evolved to be a widespread area, posing substantial hazards to network security and online services’ efficiencies. Due to the interconnection between machines that are based on the World Wide Web, the target for denial of service (DoS) attacks is convenient [2]. A denial of service (DoS) attack attempts to prohibit potential users from accessing a computer or network resource by disrupting or stopping the service on an Internet host on a permanent basis. A distributed DoS attack occurs when several hosts are working together to bomb a victim with excess of attack packets, and at the same time, the attack takes place in several locations.
1.3 Botnet A bot is a computer program which carries out complex tasks. Bots are automatic, ensuring that they can run without assistance from a human user according to their own set of instructions. Bots are always trying to imitate or replace human activities [3]. Botnets are computer networks used to steal information, send phishing emails, perform distributed denial of service attacks, and allow a hacker to access and extract information from a particular system. Botnet detection is a method used in various techniques to identify botnets of IoT devices [4]. The botmasters are responsible for sending commands to all bots in that particular network using command and control (C&C) tools. Multiple DoS and DDoS attacks have emerged as threats with increased use and implementation of IoT devices in recent years. These attacks take place at different IoT network protocol levels. The layers of the protocol are physical, MAC, 6LoWPAN, network, and device layer [5]. In 2016, a DDoS attack against DNS provider ‘Dyn’ culminated in the largest DDoS attack ever recorded. Linux.Mirai created a massive botnet (network of infected devices) via which millions of linked devices, including webcams, routers, and digital video recorders were infected. This incident is known as the largest DDoS attack that occurred on October 21, 2016, with an attack speed of approximately 1.1 terabits per second (Tbps).
1.4 Deep Learning Deep learning is a new field of machine learning designed to model higher-level data abstraction. The objective of deep learning is to become closer to artificial intelligence. It substitutes typical handcrafted features with unsupervised or semisupervised feature learning algorithms and hierarchical feature extraction. Deep learning is a type of machine learning algorithm which uses multiple layers to extract superior characteristics from raw data. Lower layers, for instance, identify image
Detection of IoT Botnet Using Recurrent Neural Network
871
processing edges and high layers identify human or animal, numbers, letters, or facets [6]. Deep learning mainly benefits by leveraging unstructured data, achieving higher quality results, reducing costs, and eliminating need for data classification, which allows deep learning to be used in neural networks.
1.5 Role of Deep Learning in Detecting Botnet In the domain of networking and cyber security, deep learning is crucial since networks are prone to security risks including IP spoofing, attack replay or SYN inundation, jamming as well as resource restrictions including out-of-memory, insecure software, etc., [7]. Deep learning’s self-learning utility has improved accuracy and processing speed, allowing it to be used effectively to detect Mirai botnet attacks in IoT devices. This paper proposes a way for detection of botnet activity among IoT devices and networks. A detection model is created using recurrent Neural networks (RNN), and the algorithm used for detection is gated recurrent unit (GRU). Detection is performed at the packet stage, with an emphasis on text recognition within features rather than flow-based approaches. For text recognition and conversion, a method called wordembedding is used. The BGRU based detection model is further compared with GRU based detection model based on the evaluation metrics, accuracy, and loss. The main contribution of this paper is: • To develop GRU and BGRU recurrent neural networks (RNN)-based botnet detection models. • To compare the performance of GRU and BGRU models with respect to loss and accuracy. The rest of the paper is organized as follows: Section 2 deals with the related work. Section 3 outlines the design of the system to develop GRU and BGRU recurrent neural networks (RNN)-based detection models. Section 4 explains about the detailed implementation of GRU and BGRU recurrent neural networks (RNN). Section 5 contains results for comparing the accuracy and loss of GRU and BGRU recurrent neural networks (RNN). Section 6 concludes the paper and makes recommendations for future studies.
2 Related Work Torres et al. [8] have proposed a work, whereby using a sequence of time changing states to model network traffic, the viable behavior of recurrent neural networks is analyzed. The recent success of the RNN’s application to data sequence issues makes it a viable sequence analysis candidate. The performance of the RNN is evaluated in view of two important issues, optimal sequence length, and network
872
P. Tulasi Ratnakar et al.
traffic imbalances. Both issues have a potentially real impact on implementation. The evaluation is performed by means of a stratified k-fold check and a separated test of unprecedented traffic from another botnet takes place. The RNN model resulted in an accuracy of 99.9% on unseen traffic. Sriram et al. [9] have proposed a botnet detection system based on deep learning (DL), which works with network flows. On various datasets, this paper compares and analyzes the performance of machine learning models versus deep neural network models for P2P botnet detection. They employ the t-distributed stochastic neighborembedding (t-SNE) visualization technique to comprehend the various characteristics of the datasets used in this study. On the DS-1 V3 dataset, the DNN model they used achieved 100% accuracy. A recently implemented DNN approach is used to detect malware in an efficient way. DNN methods have a key importance in their ability to achieve a high rate of detection while generating a low false positive rate. Ahmed et al. [10] have proposed a strategy for identifying botnet assaults that relies on a deep learning ANN. Other machine learning techniques are compared to the model developed. The performance of the ANN model is evaluated by number of neurons within the hidden layers. For six neurons, accuracy is 95%; for eight neurons, accuracy is 96%; for ten neurons, accuracy is 96.25%. Yerima et al. [11] have proposed a deep learning approach based on convolutional neural networks (CNN) to detect Android botnet. A CNN model which is able to differentiate between botnet applications and normal applications with 342 static app features is implemented in the proposed botnet detection system. The trained botnet detection model is evaluated by a series of 6802 real apps with 1929 botnets of the open botnet dataset ISCX. The results of this model are examined by different filter sizes. The best results can be achieved with 32 filters, with an accuracy of 98.9%. Nowadays, IOT devices are widely used to form botnet, and as a result, McDermott et al. [12] have proposed a solution to detect IOT based botnet attack packets using deep learning algorithms such as long short-term memory (LSTM) and bidirectional long short-term memory (BLSTM). They have used a technique called word embedding for mapping text data to vectors or real numbers. As LSTM is a recurrent neural network, it stores past data in order to predict future results. To remember the past memory, it uses three gates: forget gate, input gate, and output gate, whereas bidirectional LSTM uses these gates to store both past and future memory. Both LSTM and BLSTM resulted in the accuracy of 0.97. By comparing the collected data with actual expected data, it is possible to detect real glitch in the collected data by comparing the collected data with unexpected data received from lower-level fog network devices. The glitches impacting performance might take the form of a single data point, a set of data points or even data from sensors of the same type or many different components to detect these glitches. Shakya et al. [13] proposed a deep learning approach that learns through the expected data to identify the glitches. The proposed deep learning model resulted in an accuracy that is nearer to 1.0. A network attack is possible on IoT devices since they are interconnected with the network to analyze accumulated data via the internet. To detect IoT attacks, it
Detection of IoT Botnet Using Recurrent Neural Network
873
is necessary to develop a security solution that takes into account the characteristics of various types of IoT devices. Developing a custom designed safety solution for every sort of IoT device is, however, a challenge. A large number of false alarms would be generated using traditional rule-based detection techniques. Hence, Kim et al. [14] proposed a deep learning-based model using LSTM and recurrent neural network (RNN) for detecting IoT based attacks. N-Balot IoT dataset is used to train this model. When it came to detecting BashLite Scam botnet data, LSTM achieved the highest accuracy of 0.99. A massively connected world, such as the Internet of Things (IoT), generates a tremendous amount of network traffic. It takes a long time to detect malicious traffic in such a large volume of traffic. Detection time can be considerably decreased if this is done at the packet-level. Hwang et al. [15] proposed a unique word embedding technique to extract the semantic value of the data packet and used LSTM to find out the time relationship between the fields in the data packet header, and determine whether the incoming data packet is a normal flow component or a malicious flow component. This model was trained on four datasets: ISCX-IDS-2012, USTC-TFC2016, Mirai-RGU, and Mirai-CCU. The highest accuracy of 0.9999 is achieved on ISCX-IDS-2012 dataset. Hackers have been attracted to IoT devices by their proliferation. The detection of IoT traffic anomalies is necessary to mitigate these attacks and protecting the services provided by smart devices. Since anomaly detection systems are not scalable, they fail miserably when dealing with large amounts of data generated by IoT devices. Hence, in order to achieve scalability, Bhuvaneswari Amma et al. [16] proposed an anomaly detection framework for IoT using vector convolutional deep learning (VCDL) approach. Device, fog, and cloud layers are included in the proposed framework. As the network traffic is sent to the fog layer nodes for processing, this anomaly detection system is scalable. This framework has a precision of 0.9971%. IoT-connected devices dependability depends on the security model employed to safeguard user data and prevent devices from participating in malicious activity. Many DDoS assaults and botnet attacks are identified utilizing technologies that target devices or network backends. Parra et al. [17] proposed a cloud-based distributed deep learning framework to detect and defend against botnet and phishing attacks. The model contains two important security mechanisms that work in tandem: (i) the distributed convolutional neural network (DCNN) model is embedded in the microsecurity plug-in of IoT devices to detect application-level phishing and DDoS attacks and (ii) temporal long short memory (LSTM) network model hosted in the cloud is used to detect botnet attacks and receive CNN attachments. The CNN component in the model achieved an accuracy of 0.9430, whereas the LSTM component achieved an accuracy of 0.9784. In the above-mentioned works, many botnet identification methods have employed deep learning algorithms. We can refer from [18] that deep learning algorithms have better performance than basic machine learning algorithms. As our detection is performed at the packet level, most of the packet information is present in a sequential pattern in the info feature. Recurrent neural network is more efficient than the artificial neural network when it comes to sequential series [19]. We developed a gated
874
P. Tulasi Ratnakar et al.
recurrent neural network (GRU)-based botnet detection model that runs faster than LSTM [20] as it has fewer training parameters, and the word embedding technique is used for mapping text data to vectors or real numbers.
3 System Architecture This section provides the blueprint for developing the GRU and BGRU-based recurrent neural networks (RNN) for detection of IOT based botnet attack vectors. This architecture functions as illustrated in Fig. 1.
3.1 Feature Selection The network traffic dataset contains the following features (1) No., (2) Time, (3) Source, (4) Destination, (5) Protocol, (6) Length, (7) Info, and (8) Label. There may be some features which do not affect the performance of the classification or perhaps make the results worse; hence, we need to remove those features as a result we selected Protocol, Length, and Info features from the dataset [10] and Label is the target feature.
3.2 Word Embedding Our computers, scripts, and deep learning models are unable to read and understand text in any human sense. Text data must therefore be represented numerically. The
Fig. 1 System architecture
Detection of IoT Botnet Using Recurrent Neural Network
875
numerical values should capture as much of a word’s linguistic meaning as possible. Choosing an input representation that is both informative and well-chosen can have a significant impact on model performance. To solve this problem, word embeddings are the most commonly used techniques. Hence, post-feature selection, all of the text characters in the network traffic data must be converted to vector or real number format. We adopted word embedding to transform text characters in info field to real number format.
3.3 Building GRU and BGRU Models Gated recurrent units simplify the process of training a new model by improving the memory capacity of recurrent neural networks. They also solve the vanishing gradient problem in recurrent neural networks. Among other uses, they can be applied to the modeling of speech signals as well as to machine translation and handwriting recognition. Considering the advantages of GRU, we employed it for detection of botnets. Once the feature selection and word embedding are complete, we need to split the data into train data and test data. The GRU and BGRU models are built, and the models are trained using train data. The trained models are tested using test data, and the required metrics are evaluated to compare the models.
4 Implementation The developed model uses a GRU and BGRU recurrent neural network, as well as word embedding, to convert the string data found in the captured data packets into data that can be used as GRU and BGRU input.
4.1 Dataset and Algorithm Dataset [21] used in this work includes both normal network traffic and botnet attack network traffic. No., Time, Source, Destination, Protocol, Length, Info, and Label are some of the features in our dataset. Some features, such as No., Time, Source, and Destination, are omitted as they are not useful for data processing. The Info feature contains most of the captured information. Algorithm 1 shows the detailed steps of our implementation.
876
P. Tulasi Ratnakar et al.
Algorithm 1: Algorithm for Detecting Botnet 1: Read the training dataset 2: Extract length, protocol, info, label features from dataset 3: Set vocabulary size ← 50000 4: repeat 5: for row ←1, rows do 6: Convert text data into tokenized integer format through hashing (hash values ranges from 0 to 49999 as vocabulary size is set to 50000) 7: Pad data arrays with 0s to max 35 8: end for 9: until return training dataset 10: set model ← sequential() 11: add 3 GRU hidden layers and 3 BGRU hidden layers with each layer of size 50 units (neurons) to the model 12: add dense layer i.e. output layer with activation function that is sigmoid to the model 13: compile the model by setting the following parameters: 14: optimizer ← adam, loss ← categorical_crossentropy, metrics ← accuracy 15: Fit the model to the training data by dividing 10% of the data as validation data to check for overfitting. 16: run the model for 50 epochs 17: after running all the epochs return loss, validation loss, accuracy, validation accuracy 18: Read the test data and perform the steps 1 to 9 19: Predict the results of test data using the trained model and evaluate the accuracy
4.2 Feature Selection As explained in 4.1, No., Time, Source, Destination are not useful; hence, we omitted them. The remaining features Protocol, Length, and Info are selected for further processing.
4.3 Word Embedding Actually, the data in our dataset’s Info feature follows a sequential pattern. Hence, we built our solution by converting each letter into a token and storing it in binary format. A vocabulary dictionary of all tokenized words is produced, and their associated index is substituted with the index number in the info column. To understand each type of attack, the order of the indices in a series must be maintained, and hence, an array of the indices is generated. Since the protocol and length of the packet that was captured are related with each attack, the protocol and length features are both included in the array we previously generated. Word embedding is also used to convert and generate a dictionary of tokenized protocols together with their index. The length features, as well as the tokenized protocols, are added to the array. The target feature is converted from string to integer to classify each type of captured packet. We used one hot
Detection of IoT Botnet Using Recurrent Neural Network
877
function to transform strings into indexes while simultaneously creating a 2D list and a dictionary. At last, as we know that deep neural networks require equal length arrays, hence we need to find the max length of the text which is in the info feature and pad_sequences function is used to pad all the arrays such that the maximum length should be equal to 35 for better processing. The arrays obtained are converted into 3D Numpy arrays, as required for the GRU layer.
4.4 Building GRU and BGRU Models After feature selection and word embedding, the data is split into train and test data. The IOT based botnet detection models are built using GRU and BGRU and trained by the train data. The detection model incorporates the output layer with sigmoid activation. A total of 50 iterations of the categorical cross entropy loss function and Adam optimizer are used to build the models. We evaluated the metrics like loss, accuracy, validation loss, validation accuracy, and compared the results of GRU and BGRU to find out its efficiency.
5 Results The six experiments will assess the overall performance of the two GRU and BGRU models. Python is the programming language used to build these models. We used Anaconda (IDE for Python), Keras (Python library for building deep learning models), Scikit learn (Python library for data preprocessing), Pandas, Numpy (Python libraries for working on data frames and arrays) to build the models.
5.1 Model Comparison Six experiments for comparing GRU and BGRU models are conducted on each model. The first four experiments use a train dataset and a test dataset containing normal network traffic and an attack vector network traffic. Both models are trained using train data and then tested using test data. For each attack vector, evaluation metrics such as accuracy and loss are calculated. The fifth experiment uses the train dataset containing normal network traffic and multi-attack vector [Mirai, UDP, DNS, ACK] network traffic. Both models are trained using train data and then tested using test data. Evaluation metrics like accuracy and loss are calculated for multiple attack vectors. The sixth experiment uses the train dataset containing normal network traffic and multi-attack vector [excluding ACK attack] network traffic. Both models are trained using train data and then tested using test data. Evaluation metrics like accuracy and loss are calculated for multiple attack vectors. The validation data used
878
P. Tulasi Ratnakar et al.
Table 1 Evaluation metrics Train data GRU
BGRU
Validation data
Test data
Accuracy
Loss
Accuracy
Loss
Accuracy
1.0
8.0517*10ˆ−6
1.0
5.2180*10ˆ−6
0.999717
EXPT-2
1.0
2.3431*10ˆ−5
1.0
1.4151*10ˆ−5
0.999999
EXPT-3
1.0
2.3432*10ˆ−4
1.0
4.2185*10ˆ−4
0.999981
EXPT-4
1.0
3.0320*10ˆ−5
1.0
1.8424*10ˆ−5
0.999992
EXPT-5
1.0
0.0011
1.0
5.5934*10ˆ−4
1.0
EXPT-6
0.9988
0.0013
0.9990
8.3498*10ˆ−4
0.999317
EXPT-1
1.0
8.0868*10ˆ−6
1.0
1.1475*10ˆ−6
0.999717
EXPT-2
1.0
4.6347*10ˆ−6
1.0
2.9357*10ˆ−6
0.999980
EXPT-3
1.0
4.6718*10ˆ−6
1.0
2.4362*10ˆ−6
0.999687
EXPT-4
1.0
6.0714*10ˆ−6
1.0
3.8378*10ˆ−6
1.0
EXPT-5
1.0
1.3253*10ˆ−4
1.0
4.3010*10ˆ−6
0.999987
EXPT-6
1.0
6.5985*10ˆ−5
1.0
3.0802*10ˆ−5
0.939178
EXPT-l
in these experiments is 10% of the train data, and this data is further validated to determine whether or not overfitting exists in our model. Table 1 shows the evaluation metrics for all six experiments, including accuracy, validity accuracy, test accuracy, loss, and validation loss. According to the above Table 1, BGRU is more efficient than GRU since the accuracy of both algorithms is almost equal, but the loss for BGRU is minimal when compared to GRU in all the experiments performed. While detecting ACK attacks in conjunction with other attack vectors, the accuracy is reduced; however, the GRU model used in this paper performs commendably when predicting ACK attacks. Table 1 shows that the accuracy of experiments that include ACK attack vector (EXPT-3, EXPT-5) is nearly equal to 1.0. Table 1 shows that the accuracy of validation data in all experiments is nearly equal to 1.0. This indicates that our model does not exhibit overfitting. Table 2 displays the number of training and testing tuples used in each of the six experiments, as well as the Avgtime/Epoch. Since BGRU is bidirectional, it takes more time to train than GRU, as given in Table 2. Though BGRU takes more time compared to GRU, we can refer from Table 1 that it has minimal loss compared to GRU model which makes it effective than GRU. As mentioned in Sect. 4.4, 50 epochs are executed for both models, and two graphs are plotted for each experiment to show how the accuracy and loss varied across each epoch. The graphs obtained from each experiment are shown in Figs. 2, 3, 4, 5, 6, and 7. Once the highest accuracy is reached, the variation of the accuracy and loss across the epochs in GRU and BGRU in experiments 1, 2, 3, 4 (single attack vector network traffic) is linear, as shown in Figs. 2, 3, 4, and 5. However, in experiments 5, 6 (multiattack vector network traffic), the graphs of accuracy and loss across each epoch show slight deviations in the case of GRU, as shown in Figs. 6 and 7, whereas the
Detection of IoT Botnet Using Recurrent Neural Network
879
Table 2 Training and testing tuples GRU
BGRU
Experiment
Train tuples
Test tuples
Avgtime/Epoch (s)
EXPT-1
462,174
586,615
216
EXPT-2
444,672
205,957
213
EXPT-3
518,770
214,300
274
EXPT-4
489,552
139,798
269
EXPT-5
521,446
211,585
271
EXPT-6
510,861
193,548
357
EXPT-1
462,174
586,615
459
EXPT-2
444,672
205,957
443
EXPT-3
518,770
214,300
550
EXPT-4
489,552
139,798
513
EXPT-5
521,446
211,585
567
EXPT-6
510,861
193,548
526
Fig. 2 a–d Graphs of experiment-1 (Mirai attack) a GRU accuracy b GRU loss c BGRU accuracy d BGRU loss
880
P. Tulasi Ratnakar et al.
Fig. 3 a–d Graphs of experiment-2 (UDP attack) a GRU accuracy b GRU loss c BGRU accuracy d BGRU loss
Fig. 4 a–d Graphs of experiment-3 (ACK attack) a GRU accuracy b GRU loss c BGRU accuracy d BGRU loss
Detection of IoT Botnet Using Recurrent Neural Network
881
Fig. 5 a–d Graphs of experiment-4 (DNS attack) a GRU accuracy b GRU loss c BGRU accuracy d BGRU loss
Fig. 6 a–d Graphs of experiment-5 (Multi-attack with ACK) a GRU accuracy b GRU loss c BGRU accuracy d BGRU loss
882
P. Tulasi Ratnakar et al.
Fig. 7 a–d Graphs of experiment-6 (Multi-attack without ACK) a GRU accuracy b GRU loss c GRU accuracy d BGRU loss
BGRU model works the same as in experiments 1, 2, 3, 4. That is why, despite the additional overheads, BGRU is a better model than GRU.
6 Conclusion This paper contains implementation of GRU, BGRU along with a technique called word embedding for the detection of IoT based botnet attacks. GRU and BGRU models are compared based on the evaluation metrics such as accuracy, loss, validation accuracy, validation loss, and test accuracy. The attack vectors “Mirai,” “UDP,” “ACK,” “DNS” resulted in a test accuracy of 0.999717, 0.999999, 0.999981, 0.999992 for GRU model and 0.999717, 0.999980, 0.999687, 1.0 for BGRU model. These results demonstrate the power of our IoT botnet detection model, which concentrates and analyzes packet-level detection and applies text recognition on features. The bidirectional approach adds overhead for every epoch during training and increases processing time, compared to single-direction approach, but seems like a better model with efficient results as of its layers. The client server architecture for formation of botnet in IoT networks has a problem of single point failure. Hence, botnet attackers started using peer to peer
Detection of IoT Botnet Using Recurrent Neural Network
883
architecture for designing botnet. Hence, we need to develop P2P botnet detection method to detect P2P botnets within IoT.
References 1. Ullas S, Upadhyay S, Chandran V, Pradeep S, Mohankumar TM (2020) Control console of sewage treatment plant with sensors as application of IOT. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT). IEEE, pp 1–7 2. Mahjabin T, Xiao Y, Sun G, Jiang W (2017) A survey of distributed denial-of-service attack, prevention, and mitigation techniques. Int J Distrib Sens Netw 13(12):1550147717741463 3. Vinayakumar R, Soman KP, Poornachandran P, Alazab M, Jolfaei A (2019) DBD: deep learning DGA-based botnet detection. In: Deep learning applications for cyber security. Springer, Cham, pp 127–149 4. Thejiya V, Radhika N, Thanudhas B (2016) J-Botnet detector: a java based tool for HTTP botnet detection. Int J Sci Res (IJSR) 5(7):282–290 5. Džaferovi´c E, Sokol A, Almisreb AA, Norzeli SM (2019) DoS and DDoS vulnerability of IoT: a review. Sustain Eng Innovation 1(1):43–48 6. Harun Babu R, Mohammed, Soman KP (2019) RNNSecureNet: recurrent neural networks for cyber security use-cases. arXiv e-prints (2019): arXiv-1901 7. Vinayakumar R, Soman KP, Prabaharan Poornachandran (2017) Applying deep learning approaches for network traffic prediction. In: 2017 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 2353–2358 8. Torres P, Catania C, Garcia S, Garino CG (2016) An analysis of recurrent neural networks for botnet detection behavior. In: 2016 IEEE biennial congress of Argentina (ARGENCON). IEEE, pp 1–6 9. Sriram S, Vinayakumar R, Alazab M, Soman KP (2020) Network flow based IoT botnet attack detection using deep learning. In: IEEE INFOCOM 2020-IEEE conference on computer communications workshops (INFOCOM WKSHPS). IEEE, pp 189–194 10. Ahmed AA, Jabbar WA, Sadiq AS, Patel H (2020) Deep learning-based classification model for botnet attack detection. J Ambient Intell Humanized Comput 1–10 11. Yerima SY, Alzaylaee MK (2020) Mobile botnet detection: a deep learning approach using convolutional neural networks. In: 2020 international conference on cyber situational awareness, data analytics and assessment (CyberSA). IEEE, pp 1–8 12. McDermott CD, Majdani F, Petrovski AV (2018) Botnet detection in the internet of things using deep learning approaches. In: 2018 international joint conference on neural networks (IJCNN). IEEE, pp 1–8 13. Shakya S, Pulchowk LN, Smys S (2020) Anomalies detection in fog computing architectures using deep learning. J Trends Comput Sci Smart Technol 1:46–55, 1 Mar 2020 14. Kim J, Won H, Shim M, Hong S, Choi E (2020) Feature analysis of IoT botnet attacks based on RNN and LSTM. Int J Eng Trends Technol 68(4):43–47, Apr 2020 15. Hwang R-H, Peng M-C, Nguyen V-L, Chang Y-L (2019) An LSTM-based deep learning approach for classifying malicious traffic at the packet level. Appl Sci 9(16):3414 16. Bhuvaneswari Amma NG, Selvakumar S (2020) Anomaly detection framework for Internet of things traffic using vector convolutional deep learning approach in fog environment. Future Gener Comput Syst 113: 255–265 17. Parra GDLT, Rad P, Choo K-KR, Beebe N (2020) Detecting Internet of Things attacks using distributed deep learning. J Net Comput Appl 163:102662 18. Kumar V, Garg ML (2018) Deep learning as a frontier of machine learning: a review. Int J Comput Appl 182(1):22–30, July 2018
884
P. Tulasi Ratnakar et al.
19. Apaydin H, Feizi H, Sattari MT, Colak MS, Shamshirband S, Chau K-W (2020) Comparative analysis of recurrent neural network architectures for reservoir inflow forecasting. Water 12(5):1500 20. Yang S, Yu X, Zhou Y (2020) LSTM and GRU neural network performance comparison study: taking yelp review dataset as an example. In: 2020 international workshop on electronic communication and artificial intelligence (IWECAI). IEEE, pp 98–101 21. Dataset link: https://drive.google.com/drive/folders/148XD5gU7cAIlOGzF98N2uC42Bf74LID?usp=sharing
Biomass Energy for Rural India: A Sustainable Source Namra Joshi
Abstract Energy plays a crucial role in the social-economic development of developing nations like India. To address the issues like depletion of fossil fuels and increasing concern toward environmental pollution, the Government of India promoting the use of renewable energy sources which are clean and green. Biomass energy is a type of effective source of energy. This paper focuses on the Indian potential of biomass energy and grid integration opportunities for biomass energy-based plants. Various methodologies are also addressed to utilize biomass energy at a major level. Keywords Renewable energy · Biomass energy · Grid ınterconnection
1 Introduction India is a developing nation, and the population is getting rise year by year. As per the census of the year, 2011 the Indian population is 1.21 billion, and it is expected to rise by 25% by the year 2036. With such a rise in population, the power demand is increasing tremendously [1]. In the upcoming two decades, worldwide power consumption will be rise by 60–70%. According to the world outlook magazine, India will have peak energy demand and to fulfill the same emissions will also increase. India is looking toward clean sources of energy, i.e., renewable sources of energy. It is having around 17% of the entire GDP. The energy sources which can renew again are termed as Renewable Source of Energy. Renewable energy sources like Solar [2], Wind [3], Geothermal, etc., include any type of energy obtained from natural resources that are infinite or constantly renewed. The classification of renewable energy sources is illustrated in Fig. 1. India is about to achieve the aim of 10 GW bioenergy-based generation by the year 2022. The position of India is fourth in renewable energy capacity. Government of India promoting waste to energy program with the help of financial support from the ministry of petroleum and natural gas. N. Joshi (B) Department of Electrical Engineering, SVKM’s Institute of Technology, Dhule, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_64
885
886
N. Joshi
Fig. 1 Types of renewable energy sources
Geothermal Energy Biomass Enery
Renewable Energy
Wind Energy
Hydel Energy
Solar Energy
Agriculture plays a very crucial role in the Indian Economy. India has 60.43% of agricultural land. Agriculture waste material can be used for power generation through the biomass-based plant. In India, MNRE [4] is promoting biomass-based power plants and cogeneration plants. The basic target is to extract as much power possible from the sugarcane bagasse, agricultural waste. A special scheme was introduced in the year 2018 by the ministry to promote such type of generation. The estimated potential is around 18,000 MW. As per MNRE annual report, 2020–21 more than 550 cogeneration plants are installed in India upto December 2020 [5]. The major states having the potential of such type of generation are Maharashtra, Chhattisgarh, Tamil Nadu, Uttar Pradesh, West Bengal, Punjab, and Andhra Pradesh. In the rural areas which are located away from the central grid, biomass-based power plants are a very good option for such sites. Currently, around 200 biomass-based plants with a capacity of 772 MW are installed in the country till now. The MNRE has launched New National Biogas and Organic Manure Program (NNBOMP), for promoting Biogas Plant installation in rural India. As of 3.June.2021 installed capacity of such type of power plant in India is 10170 MW. A major contributor to achieve the set target of bioenergy is sugar mills bagasse-based plants [6]. The source of generation percentage-wise in January 2021 is illustrated in Fig. 2 [7]. Fig. 2 Generation in India from RES in Jan 2021
Others 2%
Wind 31%
Solar 42%
Small Hydro 5%
Bagasse Biomass17% 3%
Biomass Energy for Rural India: A Sustainable Source
887
2 Biomass-based power plant Biomass is a very effective source of energy. The wood garbage, crops, agricultural waste are categories as biomass as shown in Fig. 3. It can be transformed into energyrich fuel either chemically or biochemically. The energy can be extracted through various methods as illustrated in Fig. 4. Either we can adopt a dry process or a wet process for extraction of energy from biomass. The dry process is further classified into pyrolysis and combustion, whereas the wet process is further classified into Fig. 3 Types of biomass
Biomass Energy
Wet Process
Dry Process
Anerobic Digestion Pyrolysis
Gasification
Combustion Fermentation
Fig. 4 Methods of extracting energy from biomass
888
N. Joshi
anaerobic digestion, gasification and fermentation [8]. The energy obtained from biomass can be further utilized to either produce electrical power or heat.
2.1 Types of biomass-based power plant The major types of biomass power generation modes are as follows: • Combustion Based Plant: In such type of plant, biomass is feed into boiler and steam is produced through it which is further converted into electricity. This type of plant is having a low rung cost. It is having low efficiency at small scale and large investment is required. It is suitable for large scale only. • Gasification Combustion Based: In such types of plants, solid parts of biomass are spitted into a flammable gas. Then, biomass is gasified and after that fuel gas is burned. • Mixed Burning Based: In this type of plant, biomass is burned in a boiler along with coal. Its mode of operation is very simple and convenient. Low investment is needed for such type of plant. It is most appropriate for timber biomass [9]. • Gasification Mixed Burning Based: As the name illustrates in such type of plant, both types of biomass materials like solid and liquid biomass are used to burn in the boiler. In such type of plant, low-energy density and liquid-based biomass are suitable. In this system, biomass is gasified and after that fuel gas is burned with coal in the boiler. It is having good economic advantages. Metal erosion issues are also observed in such types of plants. It is suitable for applications like power generation for mass biomass [10].
2.2 Working of biomass-based power plant Biomass-based power plants are more popular in rural areas. As illustrated in Fig. 5, for the operation of biomass-based power plants first of all we have to gather biomass materials like agricultural waste, garbage, wood, animal dung cakes, etc. After that suitable sorting is being carried out [11]. Once sorting is done that we treat gathered biomass materials and make them suitable to go through the gasification process. After gasification, we check whether the gas obtained is suitable to run turbines or not. If it is suitable to run turbines then we feed to turbines which in turn runs the generator shaft and electrical power is obtained. If gas is not capable enough to drive the turbine it is given to bio fueled engine. Engines run the shaft of the generator, and thus, we generate the power [12].
Biomass Energy for Rural India: A Sustainable Source
Fig. 5 Working of biomass-based power plant
889
890
N. Joshi
3 Biomass-based power plant: Indian Scenario Biomass is a very crucial source of energy particularly in rural regions. It is nonconventional in nature, and it is competent enough to give firm energy [13]. Around 32% of the entire usage of energy is fulfilled through this source. MNRE is promoting the use of biomass energy through several schemes. As on June 30, 2021, the installed capacity of biomass power is 10170 MW as illustrated in Fig. 6, the installed capacity of biomass-based independent power producers is 74% and that for bagasse based plants is 18%, and the non-baggase plant is 8%. The Indian government is proving a subsidy of 25 lakh for biomass bagasse cogeneration plants and for non-bagasse cogeneration plants 50 lakh subsidy is provided. Figure 7 shows the 5 MW biomassbased power plant located in Punjab. Major industries that can contribute their waste for biomass-based power generation are: • • • •
Sugar Industries Corn Industries Palm Oil Industries Food Processing Industries.
3.1 Advantages of biomass-based power plant The advantages of biomass-based power plant are as follows:
8%
18%
Installed Capacity of Biomass IPP: 1836 MW
Installed Capacity of Bagasse Cogeneration: 7562 MW
74%
Fig. 6 Installed capacity of biomass power in India
Installed Capacity of NonBagasse Cogeneration: 772 MW
Biomass Energy for Rural India: A Sustainable Source
891
Fig. 7. 6 MW Biomass power plant at Birpind, Punjab
• Reliability: The power obtained from biomass-based power plant is power generated is reliable, and it reduces dependability on the central power plant. • Economic Feasibility: The cost per unit generated and capital cost of biomass is very less as compared to the thermal power plant. Thus, power generated from such type of plant is economically competitive. • Ecofriendly: In the biomass-based power plant the process of power generation is environmentally sustainable. • Local Availability: As the power generated is available locally so dependence on foreign sources will reduce. • Less Residue: Also cost required for disposal of residue material will be very less. • Employment Opportunities: It creates employability opportunities for people residing in rural regions. • Cost-Effective: In biomass-based power plant, the transmission cost, labor cost, and overall running cost are low as compared to the thermal power plant. • Residue Utilization: The waste obtained from such types of plants can be used as organic fertilizer. • A barrier to Pollution: It helps in minimizing water and soil pollution.
892
N. Joshi
3.2 Challenges for biomass-based power plant Biomass-based power plants are proven to be a good option to fulfill energy needs. But so many challenges are associated will biomass-based power plants [14]. The major challenges associated with biomass-based power are as follows: • Various seasoned agricultural waste is used as a biomass fuel it is quite typical to have a constant supply of such type of biomass as agriculture depends on climatic conditions [15]. • The cost/unit may not able to sustain throughout the year in a competitive power market. • The space requirement is more in such type of plant. • It is not suitable for densely populated areas. • It is affected by temperature variation.
4 Future Scope The number of biomass power plants is getting increase day by day in rural areas worldwide. Although, in India [16], we have to look for more emphasis on the usage of such a useful mode of power generation. GoI is promoting biomass plants through several schemes and policy framework as discussed in this paper. Several research projects are going on to improve effective generation through biomass power plant. The investment from outside the country will also helps to promote the biomass plant installation. Power production through biomass plants is a nice step toward sustainable development.
5 Conclusion India is the second largest country for producing agricultural waste, and it is having a very nice potential for biomass energy. As of now around 30% capacity of available potential is used for generation. The government of India is having an excellent policy framework to implement biomass-based power generation plants in India. This can be concluded that in rural region of India biomass energy is proven to be a best available option for fulfilling power requirements. As power is generated locally so the cost required to construct huge transmission and distribution network will be saved. At the same time, T & D losses are also minimized. The feed-in tariff will also motivate to use of biomass-based power generation plants.
Biomass Energy for Rural India: A Sustainable Source
893
References 1. Paul S, Dey T, Saha P, Dey S, Sen R (2021) Review on the development scenario of renewable energy in different country. In: 2021 Innovations in energy management and renewable resources (52042), pp 1–2 2. Khandelwal A, Nema P (2021) A 150 kW grid-connected roof top solar energy system—case study. In: Baredar PV, Tangellapalli S, Solanki CS (eds) Advances in clean energy technologies. Springer Proceedings in Energy. Springer, Singapore 3. Joshi N, Sharma J (2020) Analysis and control of wind power plant. In: 2020 4th ınternational conference on electronics, communication and aerospace technology (ICECA), pp 412–415 4. Annual Report MNRE year 2020–21 5. Tyagi VV, Pathak AK, Singh HM, Kothari R, Selvaraj J, Pandey AK (2016) Renewable energy scenario in Indian context: vision and achievements. In: 4th IET clean energy and technology conference (CEAT 2016), pp 1–8 6. Joshi N, Nagar D, Sharma J (2020) Application of IoT in Indian power system. In: 2020 5th ınternational conference on communication and electronics systems (ICCES), pp 1257–1260 7. www.ireda.in 8. Rahil Akhtar Usmani (2020) Potential for energy and biofuel from biomass in India. Renew Energy 155:921–930 9. Patel S, Rao KVS (2016) Social acceptance of a biomass plant in India. In: 2016 biennial ınternational conference on power and energy systems: towards sustainable energy (PESTSE), pp 1–6 10. Parihar AKS, Sethi V, Banerjee R (2019) Sizing of biomass based distributed hybrid power generation systems in India. Renew Energy 134:1400–1422 11. Sharma A, Singh HP, Sinha SK, Anwer N, Viral RK (2019) Renewable energy powered electrification in Uttar Pradesh State. In: 2019 3rd ınternational conference on recent developments in control, automation and power engineering (RDCAPE), pp 443–447 12. Khandelwal A. Nema P (2020) Harmonic analysis of a grid connected rooftop solar energy system. In: 2020 fourth ınternational conference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC). pp 1093–1096 13. Sen GP, Saxena BK, Mishra S (2020) Feasibility analysis of community level biogas based power plant in a village of Rajasthan. In: 2020 ınternational conference on advances in computing, communication and materials (ICACCM), pp 385–389 14. Saidmamatov O, Rudenko I, Baier et al (2021) Challenges and solutions for biogas production from agriculture waste in the aral sea basin. Processes 9:199 15. Ghosh S (2018) Biomass-based distributed energy systems: opportunities and challenges. In: Gautam A, De S, Dhar A, Gupta J, Pandey A (eds) Sustainable energy and transportation. energy, environment, and sustainability. Springer, Singapore 16. Seth R, Seth R, Bajpai S (2006) Need of biomass energy in India. Prog Sci Eng Res J PISER 18, 3(02/06):13–17
Constructive Approach for Text Summarization Using Advanced Techniques of Deep Learning Shruti J. Sapra, Shruti A. Thakur, and Avinash S. Kapse
Abstract Text summarization is one of the popular fields, and a great demand is also associated with text summarization due to a large amount of text which is available with the Internet in the form of various social media sites, blogs, and other Web sites. Therefore, the demand with the shortening the information is increasing for reducing the information for various reasons. Nowadays, there are plenty of resources for the data is available, and also there the number of tools available for reducing the amount of information is increasing due to such a great requirement. This paper also discusses the various types of methods and techniques which are effective in shortening the text or information using the various advanced technology and advanced algorithms such as deep learning, machine learning, and artificial intelligence. The advanced algorithms and technology also work with the other technology to make a great combination of technology which will resolve the various issues regarding the text summarization or in other words reduction of information. The main aspect while reducing the amount of information is that the reduced information must retain the information which is very essential from the user or application point of view and must maintain the consistency in the information which is available with the different sauces. Keywords Text summarization · Deep learning · Information · Machine learning
S. J. Sapra Research Scholar, Department of Computer Science and Engineering, Sant Gadge Baba Amravati University, Amravati, India S. A. Thakur (B) Assistant Professor, Department of Computer Science and Engineering, G H Raisoni College of Engineering, Nagpur, India e-mail: [email protected] A. S. Kapse Head of Department, Information Technology, Anuradha College of Engineering, Amravati University, Chikhli, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_65
895
896
S. J. Sapra et al.
1 Introduction Though there are various challenges associated with the reduction in the size or the amount of information on the different social media or Web sites sources, there are many effective methods or techniques available which can efficiently reduce the text without changing the meaning of the information [1]. Text summarization is the restating of the actual information of the text and makes that text as short as possible or in other words expresses the large information in very few words or sentences as is possible [2]. This research also studies and analyzes the various tools for maintaining the integrity of the data or information which is to be reduced and also finds the various parameters which must be efficiently handled while dealing with such a large amount of data. This research mostly studies the different aspects that how the redundant information should be removed from the main content of information and to be replaced by short or summarized text or information [3]. The most important parameter while reducing the size of the data or information available should be shortened; in other words, after the summarization of the text or information, this will lead to a significant reduction in the amount of memory required for saving the shortened content of information which is very less as compared to the original text [4]. Deep learning will give better results as it has plenty of quality data and becomes available to obtain from it, and this tedious circulates as the data available increases. But, if the quality data is not available, this may result in the loss of data which may be severe or may create damage to the whole system due to loss of useful data. There is another great example where researchers make the deep learning system of Google fooled by introducing errors and by changing the useful data and added noise. Such errors are forcefully introduced by the researchers on a trial basis in the case of image recognition algorithms. And, it is found that the performance is greatly hampered because of the changes in the quality and quantity of data that were introduced with the system [5]. Though there are very small data changes, it is found that the results are greatly changed, and no change in the input data is allowed in such case, and hence, it is suggested not to alter the input data; hence, it is has become very essential to add some constraints to deep learning algorithms that will improve the accuracy of such systems which will lead to great efficiency of the system performance [6] (Fig. 1). After reduction of the text or information, different analytical methods are also applied to judge the performance of the applied method and technology. Thus, the performance measurement is also again the important aspect while understanding the text summarization [7]. Many times, the text may contain data that is non-essential data, and such data must be removed in an efficient way to reduce the text or shorten the text. There is another data called metadata which is the important data, that is, data about the data that must be preserved for the shortening of the text and must be represented differently.
Constructive Approach for Text Summarization Using Advanced …
897
Fig. 1 Summarization scenario to reduce a large amount of data to sort data
Then, another significant parameter while reducing the text or information is the application domain where this reduced information is to be used; this plays a very important role as based on the application domain; the method or technology to be used changes continuously [8].
2 Literature Survey 2.1 Automated Text Summarization Techniques Researchers in the early days designed a system faultless depending on the neural networks of the intelligence based on human analogy. They grouped and mixed most of the mathematics and algorithms to create the below processes. Researchers from the various corners of the world are also continuously making efforts for smart techniques of the text summarization and are also very successful in most of the cases, but still, there is more requirement of finding the still more efficient techniques which are again more effective and efficient. This section deals with the various studies made by the researchers in the field of text summarization and also studies and analyzes the different advantages and disadvantages concerning the various crucial parameters [7]. Different methods may have some disadvantages but are useful in particular scenarios and are very useful in the various applications related to the various needs of the users and produce some special usefulness for the different domain and might have different characteristics of the particular method and are essential to be studied [9]. In automated text summarization, employing machines will perform shortly summarizing different kinds of documents using different forms of statistical or heuristics techniques. A summary in this case is shortened form of text which may exactly grab and refer to the most crucial and related data which may be contained in the document or document that is being summarized. These numerous tried are true and automated text summarization methods that are recently in application of different domains and different fields of information.
898
S. J. Sapra et al.
Fig. 2 Automated text summarization approaches
This process of classifying automated text summarization methods can be done in different ways. This article may represent these methods from the point of aspects of summarization. In this reference, there are two different kinds of methods, namely extractive and abstractive [10] (Fig. 2).
3 Proposed Methodology The new approach proposed is applicable in all the different domains of the data and information for the use of different methods of artificial intelligence, and deep learning algorithms are also used for better and faster output. These deep learning algorithms are used for increasing efficiency and effectiveness which may lead to faster operation of the proposed methods and may give better very short and abstract data regarding the context of the information and may save a large amount of time [11]. Different operations can be explored in different ways for the proposed methods as shown in Fig. 3. The method proposed is very useful in many domains and applications, and technologies are very useful and have produced very great results related to the various parameters and constraints which are related to the close of the summarization. Thus, the proposed method has different constraints based on the process of summarization, and such great things also have many benefits for the specific applications associated with the different parameters such as time of summarization, speed of summarization, and accuracy or perfectness of the summarization as shown in Fig. 4. The architecture addresses the different techniques of abstractive summarization task accurate details in the source document, like dates, locations or, phone numbers, that were often imitated erroneously in the summary [12].
Constructive Approach for Text Summarization Using Advanced …
899
Fig. 3 Automatic summarization using ranking algorithm for text summarization
Fig. 4 Graph of ROUGE measure performance for text summarization
A finite vocabulary prevents different patterns of words and sentences or paragraphs like similar or repeated words and their proper names from being taken into concern. Unnecessary repetitions of source fragments or sentences can be often removed from the paragraph or complete information which is irrespective of contained information (Fig. 5). The proposed method deals with the various parameters such as time or summarization, the effectiveness of summarization, efficiency of summarization, and performance of this method has been carefully analyzed based on the different crucial
900
S. J. Sapra et al.
Fig. 5 Seq2Seq attention model
parameters, and all parameters are important making the text useful for the showcasing the large information in a very short space, in other words, making the text very crucial and readable to the user and different applications also [13]. The datasets formulated for the proposed methodology were implemented on CNN/DailyMail dataset.
4 Scope The proposed model is very efficient in dealing with many challenges faced during the shortening of text and forming a new short text. Thus, there is a great use of the proposed method, and its scope is broad in a large number of business applications also [14]. This proposed method is very suitable in the real-time applications related to the various domains which provide numerous advantages to the user and gives better efficiency as compared to the other studied methods for the shortening or information which may lead to a better quality of text and can be directly used in many sorts of application where short data or information is essential or where it is very essential to represent the data or information in very short words. Thus, the proposed method provides a great help in reducing the amount of redundant information and producing meaningful information in very few words [1]. • Precision-target—Precision-target (prec(t)) does the same thing as prec(s) but w.r.t to real summary. The acquaintance is to calculate how many entities does model produces in the proposition summary is also measure of the real summary. Mathematically, it is set as
prec(t) = N (h ∩ t)/N (h)
Constructive Approach for Text Summarization Using Advanced …
901
here, N(h) and N(t) refer to named entity set in the generated/premise and the real summary, respectively. • Recall-target—Under recall-target (recall(t)), the knowledge is to calculate how many entities in the real summary are not present in the model generated hypothesis summary. Mathematically, it is set as
recall (t) = N (h ∩ t)/N (t) here, N(h) and N(t) refer to named entity set in the produced/assumption and the real summary, respectively. To consume an individual measurable number, they merge together prec(t) and prec(s) and signify as F1-score. Mathematically, it is set as F1 = 2 prec(t) recall(t)/(prec(t) + recall(t)) So, these mathematical terms were used: The following formula is found very beneficial for minimizing the various large words or repetitive sentences which have the same significance, and the final text is very short, crucial, and informative regarding the perspective of the different applications. L tcoverage :=
min ait , cit
i
The mathematical terms are initiated to be very useful and essential for the use of summarization of different types of text or information. L t := L tM L + λL tcoverage . Classification can be done on different datasets which would be useful for generating the different usage patterns of the necessary information and can be grouped in the cluster for the better quality of data for the application and are also useful in the representation of different notations used in a specific domain of application [15].
ROUGE 1
ROUGE2 ROUGE L
Original + 43.7±0.1 21.1±0.1 40.6±0.1 filtering + 43.4±0.2 20.8±0.1 40.3±0.2 classification 43.5±0.2 20.8±0.2 40.4±0.3 JAENS 42.4±0.6 20.2±0.2 39.5±0.5
Original + 45.6±0.1 22.5±0.1 37.2±0.1 filtering + 45.4±0.1 22.2±0.1 36.9±0.1 classification 45.3±0.1 22.1±0.0 36.9±0.1 JAENS 43.4±0.7 21.0±0.3 35.5±0.4
CNNDM
XSUM
Newsroom Original + 47.7±0.2 35.0±0.3 44.1±0.2 filtering + 47.7±0.1 35.1±0.1 44.1±0.1 classification 47.7±0.2 35.1±0.1 44.2±0.2 JAENS 46.6±0.5 34.3±0.3 43.2±0.3
Training data
98.2±0.1 98.2±0.1 99.0±0.1
98.2±0.0 98.3±0.1 99.0±0.1
99.9±0.0
99.9±0.0 93.6±0.2
99.9±0.0
99.9±0.0 93.9±0.1
99.9±0.0
98.3±0.1
98.3±0.1 99.9±0.0
98.0±0.0
98.1±0.1 99.4±0.1
98.0±0.0
98.1±0.1
99.5±0.1
97.0±0.1
97.2±0.1
77.6±0.9
78.6±0.3
77.9±0.2
74.1±0.2
67.9±0.7
67.0±0.6
66.2±0.4
66.0±0.4
69.5±1.6
67.2±0.4
66.5±0.1
65.4±0.3
77.1±0.6
78.0±0.3
77.3±0.2
73.3±0.2
68.4±0.6
67.5±0.5
66.6±0.3
66.5±0.4
67.3±1.2
64.2±0.4
63.8±0.1
62.9±0.4
79.5±0.6
79.5±0.3
79.4±0.2
80.1±0.1
75.1±0.7
74.7±0.2
74.1±0.6
74.7+0.7
68.9±1.5
70.3±0.2
70.2±0.2
70.8±0.3
80.0±0.5
79.8±0.4
79.6±0.2
80.3±0.3
76.4±0.7
75.5±0.1
74.9±0.6
75.4±0.6
66.8±1.6
67.8±0.4
67.7±0.3
68.5±0.2
78.5±0.2
79.1±0.1
78.6±0.1
77.0±0.1
71.3±0.2
70.6±0.3
69.9±0.2
70.0±0.2
69.2±0.1
68.7±0.3
68.3±0.1
68.0±0.2
78.5±0.1
78.9±0.1
78.4±0.2
76.6±0.2
72.2±0.3
71.3±0.3
70.5±0.2
70.7±0.3
67.0±0.2
65.9±0.4
65.7±0.1
65.6±0.3
Macro-precs Micro-precs Macro-prect Micro-prect Macro-recallt Micro-recallt Macro-F1t Micro-F1t
902 S. J. Sapra et al.
Constructive Approach for Text Summarization Using Advanced …
903
The result will be generated: LiBIO
ts(i) i i θ (enc), x , z = − logpθ(enc) zti xi t=1
5 Results Results of the proposed method found to be very accurate summarization of the text as it produces a short type of the summary; otherwise, it would be self-explanatory and directly applicable in the application for the representation of the data which leads to the better performance of the overall system.
6 Conclusion This contribution is a multidimensional approach useful for the interdisciplinary field of applications in which the proposed method demonstrates a new strategy of generating a unique and short summary which is the need for different areas of applications like expert summarization of a variety of text and information. This approach also finds the best in such a complex domain in which other techniques of summarization may not work effectively. This approach finds the most suitable technique in the field of summarization of the text or information available in different sources of the Internet.
7 Future Scope It is expected that the continuous research and improvement in the proposed model will definitely increase the usefulness of the proposed architecture in the field of text summarization and will eventually result in a variety of utility and tools. These strategies will also improve the effectiveness and efficiency of implementing various methods and technologies of text summarization for fast application in different domains. And therefore, it leads to enhanced summarization approach that will improve the proposed method to a great extent.
904
S. J. Sapra et al.
References 1. Bhargava R, Sharma Y, Sharma G (2016) ATSSI: abstractive text summarization using sentiment infusion. Procedia Comput Sci 89:404–411. https://doi.org/10.1016/j.procs.2016. 06.088 2. Zarrin P, Jamal F, Roeckendorf N, Wenger C (2019) Development of a portable dielectric biosensor for rapid detection of viscosity variations and It’s in vitro evaluations using saliva samples of COPD patients and healthy control. Healthcare 7(1):11. https://doi.org/10.3390/hea lthcare7010011 3. Shruti M, Thakur JS, Kapse AS, Analysis of effective approaches for legal texts summarization using deep learning 3307:53–59 4. Verma S, Nidhi V (2019) Extractive Summarization using deep learning, arxiv.org, v2(1) arxiv: 1708.04439 5. Roulston S, Hansson U, Cook S, McKenzie P (2017) If you are not one of them you feel out of place: understanding divisions in a Northern Irish town. Child Geogr 15(4):452–465. https:// doi.org/10.1080/14733285.2016.1271943 6. Baxendale PB (2010) Machine-made index for technical literature—an experiment. IBM J Res Dev 2(4):354–361. https://doi.org/10.1147/rd.24.0354 7. Allahyari M et al (2017) Text summarization techniques: a brief survey. Int J Adv Comput Sci Appl 8(10). https://doi.org/10.14569/ijacsa.2017.081052 8. Sahoo D, Bhoi A, Balabantaray RC (2018) ScienceDirect hybrid approach to abstractive summarization. Procedia Comput Sci 132(Iccids):1228–1237. https://doi.org/10.1016/j.procs. 2018.05.038 9. Song S, Huang H, Ruan T (2019) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78(1):857–875. https://doi.org/10.1007/s11042-018-5749-3 10. Widyassari AP et al (2020) Review of automatic text summarization techniques and methods. J King Saud Univ Comput Inf Sci xxxx. https://doi.org/10.1016/j.jksuci.2020.05.006 11. Martín C, Langendoerfer P, Zarrin PS, Díaz M, Rubio B (2020) Kafka-ML: connecting the data stream with ML/AI frameworks. (June):1–10. [Online]. Available: http://arxiv.org/abs/2006. 04105 12. Anand D, Wagh R (2019) Effective deep learning approaches for summarization of legal texts. J King Saud Univ Comput Inf Sci xxxx. https://doi.org/10.1016/j.jksuci.2019.11.015 13. Barzilay R, McKeown KR, Elhadad M (1999) Information fusion in the context of multidocument summarization 550–557. https://doi.org/10.3115/1034678.1034760 14. Abualigah L, Bashabsheh MQ, Alabool H, Shehab M (2020) Text summarization: a brief review. Stud Comput Intell 874:1–15. https://doi.org/10.1007/978-3-030-34614-0_1 15. Khatri C, Singh G, Parikh N (2018) Abstractive and extractive text summarization using document context vector and recurrent neural networks [Online]. Available: http://arxiv.org/abs/ 1807.08000
Lane Vehicle Detection and Tracking Algorithm Based on Sliding Window R. Rajakumar, M. Charan, R. Pandian, T. Prem Jacob, A. Pravin, and P. Indumathi
Abstract Lane vehicle detection is fundamental to vehicle driving systems and selfdriving. The proposed concept is to employ the pixel difference in the intended lane line backdrop to isolate the lane and the road surface, and then, the curve fitting model is used to identify the lane in the image. A histogram on gradient, histogram graph, and binary spatial features are extracted from the vehicle and non-vehicle images. For vehicle detection, support vector machine classifier is employed to separate the vehicle and non-vehicle images using the extracted features. But many methods are constrained by light conditions and road circumstances, such as weak light, fog, rain, etc., which may result in invisible lane lines. Feature extraction is the lane images being picked using various filters. Our work focuses on a lane detection technique founded on the Sobel filter and curve fitting model for lane line tracking in different conditions. Preprocessing encompasses the mitigation of noise as well as getting the image ready for the subsequent procedure. To achieve this, HLS color space was performed which identifies the lane by adding pixel values. The main aim is to increase the accuracy and reduce the computation time compared to other existing methods. Keywords Sobel filter · Curve fitting model · Lane detection · Vehicle detection · Sliding window · Support vector machine
1 Introduction Most accidents occur due to invisible road lanes. The accidents can be reduced drastically, by employing improved driving assists. A system that warns the driver can save a lot of a considerable number of lives. To increase safety and reducing road R. Rajakumar (B) · R. Pandian · T. P. Jacob · A. Pravin Sathyabama Institute of Science and Technology, Sholinganallur, Chennai 119, India e-mail: [email protected] M. Charan · P. Indumathi Anna University, MIT Campus, Chromepet, Chennai 44, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_66
905
906
R. Rajakumar et al.
accidents, researchers have been worked for better driving techniques that assure security. While driving, accidents occur due to the driver’s unawareness of the lane, specifically at the curved lane which leads to accidents. Therefore, if it is possible to infer the road and vehicles before the advent of any lane conditions, assist the driver in advance to reduce speed and avoid road accidents by using the proposed sliding window algorithm. In driving assistance to achieve safety on roads, the challenging tasks are road lane detection or boundary detection which is exposed in white and yellow lines on roads. Many researchers are working on lane detection, lane tracking, and warning on lane departure. Yet, many systems have limitations of shadows, changing illumination, worse conditions of road paintings, and other image interference. This problem can be overcome by using the proposed algorithm. This paper developed a curve fitting model enhancing the strength of detecting the lane and tracking for safe transportation. In our method, lane detection and tracking will be inspected by the curve fitting model and related component function to improvise detection of lane and tracking. The support vector machine classifier was widely used for the detection of a vehicle.
2 Related Works The literature [1] extracted the AROI to overcome the complexity of computation. Then, Kalman filter along with progressive probabilistic Hough transform (PPHT) is used to find boundaries of the lane in the image. Depending on the lane and the position of the vehicle, their algorithm decides if the vehicle is offset. Different lane conditions are used for detection and tracking for both city roads and highways. In the literature [2], lane marks in road images are extracted which is based on the multi-constraint model and a clustering algorithm is proposed to detect the lane. By dividing the region of interest into sections, it is easy to track lane lines with curved shapes. The literature [3] used the B-spline fitting from the RANSAC algorithm for the front lane and Hough transform for the rear lanes. The algorithm is used for lane detection, and it eliminates the interference lines, better than the RANSAC algorithm. The literature [4] improved the accuracy of lane recognition and aimed to minimize the pixel-wise difference. The predicted lane has both white and yellow pixels that also do not directly reflect the lane parameters which are essential to detect the straight line. To detect a lane, we avoid the interference of fixed objects and other parameters on the outside of the lanes. After the pixels in the road area are selected as a reorganized data matrix, for the detection of a pre-trained vehicle, a deep neural network is employed to get the moving vehicle’s information. The literature [5] proposed a flexible road identification method that connects both lane lines and obstacle boundaries, applicable for detecting lanes. This algorithm uses an adaptive sliding window for lane extraction using the least-squares method for lane line fitting. The literature [6] employs a color threshold method
Lane Vehicle Detection and Tracking Algorithm …
907
to identify the lane edges along with perspective transform and Hough transform technique to detect lane segments in the image. These conditions are a straight lane and sunny climate. Literature [7] dealing with Vision Based methodology performs well only in controlled weather conditions and uses Hough transform to identify the straight road. In this paper, edge-based detection with the open street map is used to detect lanes which increases computation time. Hough transform [8] identifies the straight lane, and the curve fitting identifies the curved lane which increases the computation time. In [9], vehicles are constructed by their geometry and structured as a combination of a small image to form one histogram, similar to the sliding window model. In the literature [10], feature pairing elimination (FPE filter) is used for only feature extraction and SVM, random forest, and K-nearest neighbor classifiers were compared. In this lane detection, Hough transform [11, 12] is used to detect them, but these algorithms increase the computational time and the complex processing. It is essential to focus on the edge image derived from the response of the CenSurE algorithm. By using the edge lane, we can identify the traffic lane which is detected from its geometry. For identifying the blobs, an SVM classifier is used [13]. The literature [14] predicts the position of the vehicle by using the Kalman filter and histogram technique with a mean detection accuracy of 94.05% during the day. Particle-filter-based tracking was used to learn some road scene variations in [15].
3 Lane Detection Algorithm Lane detection is used in image processing and computer vision that has many applications. Previous literature in lane detection, dealt with curve detection methods. These algorithms detect the lane edges and are used to determine the vehicle position in the lane. A dataset captured using a single monocular camera is used for lane detection. This work contributes to the correct position of the vehicle in the same lane. This system recognizes most of the white and yellow markings across the lane effectively during different climatic conditions which include shadows, rain, snow, or any damage on the road. The lane detection algorithm includes lane detection and lane tracking method. By changing the input parameters, regions of interest are identified. Both perspective and inverse perspective transforms were performed on the lanes to detect the region of interest. In the next step, detected lanes are analyzed by the Sobel filter and the future lanes are calculated using the polynomial curve model. In this section, we will explain lane detection by using the HLS method and Sobel filter method edge detection and that examine the results by using this given method in Chap. 4. The following procedures are performed to detect the lane. The preprocessed image is perspective transformed which converts a 3-dimensional image to a 2dimensional image. Then, the Sobel filter was performed for noise reduction and to identify the pixel representing the edge. The filtered image is converted into HLS colour space and its components (hue, lightness and saturation) to detect the yellow lane were identified. To detect the white lane, maximum lightness value of 100% was
908
R. Rajakumar et al.
Captured Lane Image
Perspective transform
Sobel filter and HLS color transformation
Sliding window to detect the lane
Inverse perspective transform
Detected Lane Image Fig. 1 Lane detection and tracking flowchart
selected. The histogram is computed to separate the left and right lane by summing the pixel value and select the maximum pixel which identifies the lane. The sliding window method is applied from the bottom of the image by identifying lane pixels. The next upward sliding window is constructed based on the previous window. Then, the polynomial fit to find both lanes using the previous lane is used to estimate the search area for the next frame. Eventually, the fitted lane is etched on the original image and an inverse perspective transform is performed.
3.1 Lane Detection and Tracking Flowchart See Fig. 1.
3.2 Sobel Edge Detection The Sobel filter performs by estimating the image intensity at every pixel of the lane. Sobel filter estimates the direction of the change in light for any direction. Figure 2 shows how the lane image changes at each pixel and how the pixel representing edges changes. The Sobel filter has two 3 × 3 kernels: one kernel to identify changes in the horizontal direction and another kernel to identify changes in the vertical direction. The two kernels are combined with the original lane to calculate the equation derivatives.
Lane Vehicle Detection and Tracking Algorithm …
909
Fig. 2 Sobel filter and perspective transform
By applying the threshold selected part of the image, ROI, we have a hue, lightness, and saturation (HLS) component color image as input. In this step, to find lane boundaries one edge detection method called the Sobel filter is used and boundaries detected. In the Sobel filter, the main objective is to detect the edges that are nearer to the real lane edges. Sobel edge detection basically uses the gradient vector of an intense image. Lane boundary features are extracted using a gradient vector and through which we can detect the lane. Many edge detection methods have different edge operators that can be used, but the efficiency levels are different. One of the best and efficient methods is Sobel edge detection. The novel feature of the Sobel method is that the error rate of this method is low because this algorithm uses a double threshold for a yellow and white lane. Therefore, the detected edge is close to the real-world lane. In the next step, the captured color image is transformed to HLS color space to speed up the process and be less sensitive to scene conditions. To detect the white lane, lightness is set to a value close to 100%. Then, the combination of saturation and lightness value was defined to detect the yellow lane. In our proposed method, captured images chose from the directory of the Xi’an city database would be processed. The camera is so calibrated that the vanishing point of the road should be placed on the top of the region of interest (ROI).
3.3 Histogram Computation A histogram contains the numerical value of an image. Information obtained from the histogram is very large in quality. The histogram indicates the particular frequency
910
R. Rajakumar et al.
Fig. 3 Histogram computation
of different gray levels in a lane. Lane images contain a series of pixel values, and a different pixel value includes a specific color intensity value. This is an important step in the segmentation, and computation time will be decreased. At the lower level of the lane, the only lane will be present. When we scan up in the image, other structures will be present. For all the bands, a peak will be drawn. The histogram contains a sum of pixel values horizontally from that left and the right lane that can be identified which contains a larger pixel value which is shown in Fig. 3.
4 Lane Tracking Algorithm Lane tracking is mainly employed to overcome the computation calculation by storing the information of estimate of the future states. This algorithm includes a prediction step as well as a measurement step. In the case of lane tracking, the prediction stage involves shifting the detected lanes by a specific amount in the image, based on polynomial fit. In the measurement step, the radius of curvature and vehicle offset were computed. Much research has already been done in lane tracking. The most efficiently used lane tracking is the curve fitting model. Lane tracking contains information from the past state to estimate the current detection. Lane algorithm is efficient for lane tracking method. If the lane is detected, it is identified by points on both lane boundaries. Lane tracking contains previously
Lane Vehicle Detection and Tracking Algorithm …
911
identified lane points, changes them depending on the vehicle projection, and then alternates points based on the values of the left and right edge points on the lane. A curve fitting model was selected for efficient tracking. Another algorithm requires less computational time and is less vulnerable to distortion. Equations with more degrees can provide a correct fit of the lane. In this work, a curve fitting model that is applicable to trace curved roads, vigorous during noise, shadows, and weak lane markings is employed. Also, it can give details about lane orientation and curvature. Lane tracking involves two parameters which will be discussed in the next section.
4.1 Sliding Window To construct the sliding window, the initial point of the windows must be known. To find the initial point, a histogram for the bottom part of the image is calculated. Based on the peak value of the histogram, the initial window is selected and the mean of the nonzero points inside the window is determined. For the first half of the image, the left lane peak is obtained and the other right half gives the peak of the right lane. Thus, left and right starting sliding windows are formed, and then, left lane center and right lane center are calculated. This kind of selection works fine for both lanes on the left and right sides of the image. In some cases, for example, where the vehicle is gradually steered more toward the right, then we might see the right lane present in the left half. In such situations, improper detection is possible. To avoid such situations, a variable cache is defined to save the starting point windows of previous lanes. The histogram is not calculated throughout the detection process but only for the first few frames, and later, it will be dynamically tracked using the cache. For each initial sliding window, the mean of the points inside each window is calculated. Two windows to the left and right of the mean point and three more windows on top of the mean point are selected as the next sliding windows. This kind of selection of windows helps to detect the sharp curves and dashed lines. The selection of sliding windows is shown in Fig. 4. The window width and height are fixed depending upon the input dataset. The width of the sliding window should be adjusted depending on the distance between both lanes. The sliding windows on top help track the lane points turning left and right, respectively. The windows need to have a relatively well-tuned size to make sure the left- and right-curved lanes are not tracked interchangeably when lanes have a sharp turn and become horizontally parallel to each other. The detected points inside the sliding window are saved. The process of finding the mean point and next set of sliding windows based on valid points inside the respective sliding windows for left and right lanes is continued until no new lane points are detected. Points detected in the previous sliding windows are discarded when finding points in the next set of sliding windows. Then, the searching can stop tracking when no new points are discovered.
912
R. Rajakumar et al.
Fig. 4 Sliding window output
4.2 Polynomial Fit Curve Lane Once the left and right points are detected, these points are processed to polynomial fitting to fit the respective lanes. Average polynomial fit values of the past few frames are used to avoid any intermittent frames, which may have unreliable lane information. The lane starting points are retrieved from the polynomial fitting equation. This approach helps increase the confidence of the lane’s starting point detection based on lanes rather than relying on starting sliding windows. The deviation of the vehicle from the center of the lanes is estimated. Then, the image is inverse perspective transformed, and the lanes are fitted onto the input image. The sliding window output is shown in Fig. 4.
4.3 Lane Design Parameters The curve model is obtained for the lane curve, and the quadratic equation is implemented to analyze and compare the model’s merits and demerits of the different structures. The equation of the curve model is given as Ax 2 + Bx + C = 0
(1)
where A, B, C are the given constants of the quadratic curve, five of which three constants of the quadratic curve are thus stated.
Lane Vehicle Detection and Tracking Algorithm …
913
5 Vehicle Detection and Tracking Algorithm 5.1 Vehicle Detection Algorithm The vehicle detection is implemented through the support vector machine classifier. To extract features, histogram-oriented gradient (HOG), histogram, and binary spatial were performed on training images and input images. Then, the processed image is converted into YCbCr color space transformation which increases the brightness. The training input images are fed into the SVM network. This model performs normalizing the data to the same scale approximately. GTI vehicle image dataset comprises 8792 vehicle images and 8968 non-vehicle images that are trained to the SVM classifier and stored in a pickle file. For vehicle detection, a sliding window technique is performed at each pixel level and a trained classifier is used to search for vehicles in images. After training is completed, the support vector machine classifier is applied to the lane images. From [13], support vector machine classifier is a simple and efficient technique for classifying vehicles based on the features. To eliminate the false positives, the heat map function of a higher threshold value was selected. This algorithm was simulated using PyCharm software.
5.2 Histogram on Gradients A histogram on gradients is a depiction of an image that simplifies it by taking away important information. The histogram-oriented gradient (HOG) is employed in image processing to detect objects. The gradient technique is used to count the number of gradient orientations in every image position. Vehicle appearance and shape can be found by detecting the position of edges. Figure 6 shows the HOG feature extracted output.
5.3 Histogram Graph Histogram graph computes the summation of pixels in an image at every different intensity value located in the vehicle image. A color histogram relates the color level of every color channel. The luminance histogram shows the brightness level from black to white. The maximum peak on the graph shows the presence of maximum pixels at that luminance level. Figure 7 shows the histogram output.
914
R. Rajakumar et al.
5.4 Binary Spatial Feature Our technique encodes the spatial variation among the referenced pixel and its neighbor pixels, which depends on the gray abrupt changes of the horizontal, vertical, and oblique directions. The difference between the center pixel and its surrounding neighbors is calculated to mirror the amplitude information of the entire image. We used a support vector machine classifier that uses space information to classify the lane images, and also necessary features are identified for each pixel in this method. Then, the features are quantized to train the support vector machine model. After then, the resulting regions are modeled using the statistical summaries of their textural and shape properties than the support vector machine model used to calculate the classification maps. Figure 8 shows the binary spatial graph.
5.5 Vehicle Detection Flowchart See Figs. 5, 6, 7, and 8.
Captured image
Features extraction
YCbCr color transformation
Support vector machine classifier
Sliding window to detect vehicle
Detected vehicle Image Fig. 5 Vehicle detection flowchart
Lane Vehicle Detection and Tracking Algorithm …
915
Fig. 6 HOG features extraction
Fig. 7 Histogram graph
Fig. 8 Binary spatial graph
5.6 Support Vector Machine A support vector machine classifier is a machine learning approach that enables two separate classifications [11]. The SVM classifier includes a set of labeled training data provided by the individual category and used to classify the vehicle. Support vector machine algorithm employs a hyperplane in N-dimensional space that in turn classifies the data points. The support vector machine or separator’s large margin is supervised learning methods formulated to solve classification problems. SVM technique is a way of classification of two classes that separate positive values and negative values. An SVM method is based on a hyperplane that separates the different values, so the margin will be almost maximum. The purpose of the SVM includes the
916
R. Rajakumar et al.
selection of support vectors that contain the discriminate vectors, and the hyperplane was estimated.
5.7 Heat Map Function In the given image, overlaps are detected for each of the two vehicles, and two frames exhibit a false positive detection on the center of the road. We intend to build a heat map combining overlapping detections and removing false positives. For this purpose, a heat map of a higher threshold limit is used.
6 Experimental Result 6.1 Lane Vehicle Detection for Video Frames from Xi’an City Dataset The present section details the experimental results of our lane vehicle detection method with two sets of various video frames obtained from the Xi’an city dataset. Frames in this dataset have shadows from trees and cracks on the surface of the roads. Figure 9a, b, c, d shows some sample frames marked with lanes and vehicles for the dataset. When all frames in the dataset are processed, we see that our holistic detection and tracking algorithm has 95.83% accuracy in detecting the left lane and vehicle. The proposed sliding window model was tested using a dataset with different driving scenes to check the adaptiveness and effectiveness. The results showed that the proposed sliding window algorithm can easily identify the lanes and vehicles in various situations and it is possible to avoid wrong identifications. In analyzing the parameters, different window sizes were found to make improvements on the performance of lane and vehicle detection.
6.2 Time Calculation of Xi’an City Database To assess the computational complexity of the proposed hybrid lane vehicle detection and tracking algorithm, we first computed the time required to fully process a single frame of size (1280 × 720). For a (1280 × 720) frame, processing time was found to be around 3 to 4 s/frame. To operate in real time, time for calculation is an important parameter (Table 1).
Lane Vehicle Detection and Tracking Algorithm … Table 1 Time calculation
917
Images
Computation time (seconds)
Figure 9a
3.64
Figure 9b
3.75
Figure 9c
3.79
Figure 9d
3.45
Fig. 9 a Output frame of Xi’an city database. b Output frame of Xi’an city database. c Output frame of Xi’an city database. d Output frame of Xi’an city database
918
R. Rajakumar et al.
Table 2 Accuracy calculation comparison
Table 3 Accuracy comparison
Performance (%)
Dataset
Total frames
975
MLD
1.8
ILD
2.37
Accuracy
95.83
Source
Accuracy (%)
Literature [8]
93
Literature [10]
95.35
Literature [13]
94.05
Proposed algorithm
95.83
6.3 Accuracy Formula Missed Lane Vehicle Detection, MLD = ((MD/N ) ∗ 100%) Incorrect Lane Vehicle Detection, ILD = ((ID/N ) ∗ 100%) Detection Rate, DR = (C/N ) ∗ 100% where MD denotes the detection that had a miss, ID indicates the incorrect detection, C was the images detected correctly in the dataset, and N denotes the total number of dataset images.
6.4 Accuracy Calculation GTI vehicle image dataset comprises 8792 vehicle images and 8968 non-vehicle images that are trained in the SVM classifier, and accuracy was calculated as 99%. In this paper, different scenes are selected from the video as samples to test the accuracy. A sequence of 975 frames was tested, 934 lane vehicle frames were correctly identified, and 95.83% accuracy was obtained by curve fitting model (Tables 2 and 3).
7 Conclusion and Future Work Lane vehicle detection and tracking is an important application to reduce the number of accidents. This algorithm was tested under different conditions to render the transport system very strongly and effectively.
Lane Vehicle Detection and Tracking Algorithm …
919
As in the case of lane detection, we described and implemented the HLS color space and edge detection by using the Sobel filter. Then, we analyzed the curve fitting algorithm for efficient lane detection. For vehicle detection and tracking, support vector machine classifier and sliding window techniques were performed. For our dataset, accuracy was calculated as 95.83%. This algorithm computation time was calculated as 3–4 s/frame. In the future, we will improve the lane and vehicle detection system by reducing the computation time in the proposed algorithm. In this approach, the detected lanes and vehicles can be efficient in real time. This algorithm can be further developed for self-driving vehicles.
References 1. Marzougui M, Alasiry A, Kortli Y, BailI J (2020) A lane tracking method based on progressive probabilistic Hough transform. IEEE Access 8:84893–84905, 13 May 2020 2. Xuan H, Liu H, Yuan J, Li Q (2018) Robust lane-mark extraction for autonomous driving under complex real conditions. IEEE Access, 6:5749–5766, 9 Mar 2018 3. Xiong H, Yu D, Liu J, Huang H, Xu Q, Wang J, Li K (2020) Fast and robust approaches for lane detection using multi-camera fusion in complex scenes. IET Intell Trans Syst 14(12):1582– 1593, 19 Nov 2020 4. Wang X, Yan D, Chen K, Deng Y, Long C, Zhang K, Yan S (2020) Lane extraction and quality evaluation: a hough transform based approach. In: 2020 IEEE conference on multimedia information processing and retrieval (MIPR), 03 Sept 2020 5. Li J, Shi X, Wang J, Yan M (2020) Adaptive road detection method combining lane line and obstacle boundary. IET Image Process 14(10):2216–2226, 15 Oct 2020 ˇ c N (2020) Vision-based extrapolation of road lane lines 6. Stevi´c S, Dragojevi´c M, Kruni´c M, Ceti´ in controlled conditions. In: 2020 zooming innovation in consumer technologies conference (ZINC), 15 Aug 2020 7. Wang X, Qian Y, Wang C, Yang M (2020) Map-enhanced ego-lane detection in the missing feature scenarios. IEEE Access 8:107958–107968, 8 June 2020 8. Wang H, Wang Y, Zhao X, Wang G, Huang H, Zhang J (2019) Lane detection of curving road for structural high-way with straight-curve model on vision. IEEE Trans Veh Technol 68(6):5321–5330, 26 Apr 2019 9. Vatavu A, Danescu R, Nedevschi S (2015) Stereovision-based multiple object tracking in traffic scenarios using free-form obstacle delimiters and particle filters. IEEE Trans Intell Trans Syst 16(1):498–511 10. Lim KH, Seng KP, Ang LM et al (2019) Lane detection and Kalman-based linear parabolic lane tracking. In: International conference on intelligent human-machine systems and cybernetics, pp 351–354 11. Kang DJ, Choi JW, Kweon IS (2018) Finding and tracking road lanes using line-snakes. In: Proceedings of the conference intelligent vehicles, pp 189–194 12. Wang Y, Teoh EK, Shen D (2014) Lane detection and tracking using B-snake. Image Vis Comput 22(4):269–280 13. Cortes C, Vapnil V (2020) Support vector networks. Mach Learn 20(3):273–297 14. Zhang X, Huang H (2019) Vehicle classification based on feature selection with anisotropic magnetoresistive sensor. IEEE Sens J 19(21):9976–9982, 15 July 2019, 1 Nov 2019 15. Gopalan R, Hong T, Shneier M et al (2019) A learning approach toward detection and tracking of lane markings. IEEE Trans Int Transp Syst 13(3):1088–1098
A Survey on Automated Text Summarization System for Indian Languages P. Kadam Vaishali, B. Khandale Kalpana, and C. Namrata Mahender
Abstract Text summarization is the process of finding specific information after reading the document text and generating a short summary of the same. There are various applications of text summarization. It is important when we need a quick result of information instead of reading the whole text. It has become an essential tool for many applications, such as newspaper reviews, search engines, market demands, medical diagnosis, and quick reviews of the stock market. It provides required information in a short time. This paper is an attempt to summarize and present the view of text summarization for Indian regional languages. There are two major approaches of automatic text summarization, i.e., extractive and abstractive that are discussed in detail. The techniques for summarization ranges from structured to linguistic approach. The work has been done for various Indian languages, but they are not so efficient at generating powerful summaries. Summarization has not yet reached to its mature stage. The research carried out in this area has experienced strong progress in the English language. However, research in Indian language text summarization is very few and is still in its beginning. This paper provides the present research status or an abstract view for automated text summarization for Indian languages. Keywords Automated text summarization · Natural language processing (NLP) · Extractive summary · Abstractive summary
P. K. Vaishali (B) · B. K. Kalpana · C. N. Mahender Department of Computer Science and I.T, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra, India C. N. Mahender e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_67
921
922
P. K. Vaishali et al.
1 Introduction The need for automatic summarization increases as the amount of textual information increases. Unlimited information is available on the Internet, but sorting the required information is difficult. Automated text summarization is the process of developing a computerized system that has the ability to generate an extract or abstract from an original document. It presents that information in the form of a summary. The need for summarization has increased due to unlimited sources. Summarization is useful in information retrieval, such as news article summary, email summary, mobile messages, and information of businesses, offices and for online search, etc. There are numerous online summarizers accessible, such as Microsoft News2, Google1, and Columbia Newsblaster3 [1]. For biomedical summarizing, BaseLine, FreqDist, SumBasic, MEAD, AutoSummarize, and SWESUM are utilized [2]. Online tools include Text Compacter, Sumplify, Free Summarizer, WikiSummarizer, and Summarize Tool. Open-source summarizing tools include Open Text Summarizer, Classifier4J, NClassifier, and CNGL Summarizer [3]. As the need for knowledge in abstract form has grown, so has the necessity for automatic text summarization. The first summarizing method was introduced in late 1950. The automatic summarizer chooses key sentences from the source text and condenses them into a concise form for the general subject. It takes less time to comprehend the information of a huge document [4]. Automatic text summarization is a well-known application in the field of Natural Language Processing (NLP). The majority of the work in this is focused on sentence extraction and statistical analysis. However, recent study trends are focusing on cue phrases and discourse structure. Text summarizing is widely classified into two types: extractive summarization and abstractive summarization. The extractive approach takes essential lines or phrases from the original text and puts them together to provide a summary that retains the original meaning [5]. Reading and comprehending the source text is required for abstractive summarization. It employs linguistic components and grammatical rules from the language. The abstractive system can produce additional sentences, which improves the summary’s quality or standard.
1.1 Need of Automatic Text Summarization in NLP Manual summarization of the large text document is a difficult task. It also requires more time for the summary generation. A text summarizer is an essential tool for understanding the text and then generating the summary. For this reason, an automatic summarizer tool is very much required to provide a quick view as a concise summary. It is the need of the current era of information overload. The automatic summarizer converts a large document text to its shorter form or version by maintaining its overall content by its meaning.
A Survey on Automated Text Summarization System …
923
1.2 Challenges in Abstractive Summarization The purpose of abstract summarization in Natural Language Processing is to provide a short summary of a source text. The representation of the summary with suitable linguistic components is the most difficult aspect of abstractive summarization. The development of the structure of sentences is required with the help of appropriate words and phrases in order to generate accurate meaning. However, representing such a big volume of text is a challenging task and a constraint. It is feasible to create proper representations of significant items with the help of linguistic norms and expertise. However, in practice, language has been employed in a variety of ways and is reliant on domain semantics in general.
1.3 Challenges in Extractive Summarization Extractive text summarization is used to choose the most important sentences from the original source. The relevant sentences are extracted by combining statistical and language-dependent characteristics of sentences. Extractive summaries are chosen in most instances around the world since they are simple to implement. The problem with extractive systems is that the summaries are long and may contain information that isn’t necessary for the summary. The crucial information is dispersed across the document or in several text sections [6].
2 Literature Survey To study all about the automatic text summarization system survey of past literature is done to get the specific knowledge and identification of scopes in the application. Table 1 gives a brief history of the past literature.
2.1 Types of Summarization There are different forms of summaries required for the application. Summarizer systems can be classified as per the type of summary requiremnent for the application. There are two types of summarizer systems: extractive and abstractive. The table below summarizes the key concepts of extractive and Abstractive summarization in brief (Table 2). In addition to extractive and abstractive, there are various other types of summaries that exist. Different summarization methods are used based on the type of summary
Summarization of social media data for personality prediction using machine learning and A.I
Extractive mutidocument Single doccument, summarization sentence encoding, (Malyalum) TextRank, MMR, sentence scoring aalgorithm
Survey on automatic text Extraction and Own data sets. Sentence summarization abstraction, LSA, HMM, scoring features SVM, DT models, clustering algorithms
Valanarasu et al. [8]
Sinha et al. [9]
Malagi et al. [10]
Prediction of personality of job applicants, Naïve Bayes, and SVM probability prediction models Data set with 100 document sets, each set with three news articles, TF-idf, Word2Vec and Smooth Inverse Frequency SIF, TextRank
Job applicants data, collection of the various dataset from different social media sites
Interest based parallel Interactive behavior of algorithm, semantics the user is weighted structure based, network, dynamically partitioned graph with page rank
Social networks influence analysis
Dataset and features
Sivaganesan et al. [7]
Technique/Methodology
Trend of research and language
Author
Table 1 Automatic text summarization systems for Indian languages
Lack of multi-document summarizers due to tools and sources
Lack of standard Malayalam NLP tools, problem in multi document summarization
If job applicants are non-social media users, proposed model cannot be used
Parallelism in large networks is difficult task
Lacuna
(continued)
It is an effort to bridge the gap in researches in the development of text summarizers
ROUGE-1 and ROUGE-2 based evaluation calculated at 0.59, 0.56, 0.57% for Precision Recall F-Score, respectively
Digital footprint used for prediction of people through communication, sentiments, emotions, and expectations to their data
Social influence analysis algorithm enables identifing influential users, implementing the machines with CPU architecture and community structure
Outcomes
924 P. K. Vaishali et al.
Extractive and abstractive summarization methods and NLP tools for Indian languages
Automatic text summarization techniques, text mining,
A light-weight text Extraction, extrinsic summarization system evaluation techniques, for fast access to medical MMR evidence (English)
Automatic keyword extraction for e-newspaper text (Marathi)
Verma et al. [12]
Mamidala et al. [13]
Sarkar et al. [14]
Bhosale et al. [15]
Keyword extraction algorithm, summarization module
Extraction and abstraction, Text Rank Algorithm, TF and IDF, OCR, K-Nearest Neighbor and Naïve Bayes Classifier,
Stop words list, stemmer, NER sentiment analyzer, wordNet, word vector, segmentation rules, corpus
Abstractive machine learning approaches, graph-based method
Abstractive text summarization for sanskrit prose
Manju et al. [11]
Technique/Methodology
Trend of research and language
Author
Table 1 (continued)
Online e-newspaper article, highest scored words
Semantic approach improves better analyzed summary
Outcomes
Extractive summaries are not convenient, abstractive sometimes not able to represent meaning
Limited to Marathi language
(continued)
Average article length calculated at average of 30% to 40% size of article
A ROUGE-L F1-score of 0.166
Combination of the preprocessing and processing techniques could give good model for all relevant features
Unavailability of NLP tools are essential resource for Language for summarizing the text understanding and accurately generation
Predefined structures may not result in a coherent or usable summary
Lacuna
Own corpus of Domain knowledge evidence-based medicine essential with 456 queries. similarity-based and structural features
Own datasets, sentence length, title similarity, semantic similarities in sentences, ANN, Fuzzy logic
100 news articles, NLP tools as stemmer, PoS tagger, parser, named entity recognition system, etc
Structure, semantic features, word signicance, compounds and sandhis, verb usage diversity
Dataset and features
A Survey on Automated Text Summarization System … 925
Document clustering (Tamil)
Text summarization using fuzzy logic and LSA
Comparison of extractive Word and phrase text summarization frequency algorithm, models Machine learning, HMM, Cluster-based algorithm
Text summarization overview for Indian languages
Mohamed et al. [17]
Dalwadi, et al. [18]
Kanitha et al. [19]
Gaikwad et al. [20]
Own collection of news article, similarity-based features
Dataset and features Language specific domain dependent
Lacuna
Abstractive as well extractive approaches
Extractive and abstractive summarization techniques
Researcher used own datasets, news articles, story docum-ents, linguistic, statistical features
Own datasets, sentence ranking methods
Various own designed document dataset
Clustering of Tamil text. Gives good results to generate cohesive summaries
File1 Score is 0.84 File 2 Score is 0.56 Average ROUGE-2 score 0.70
Outcomes
Aabstraction requires more learning and reasoning
Comparin manual summary with machine sumary not appropriatt
(continued)
Study gives all about text summarization. with its importance
Domain independent generic summary. LSA based systems summarize the large datasets within the limited time
Current systems not so Study concludes most efficient to produce researchers used summary rule-base approaches
LSA, Multiple document Own database, word Language specific summarization, weight, sentence feature, similarity, clustering length, position, centrality, proper nouns
News articles Extractive, Text rank for summarization (Marathi) sentence extraction, Graph-based ranking model
Rathod [16]
Technique/Methodology
Trend of research and language
Author
Table 1 (continued)
926 P. K. Vaishali et al.
Study of text summarization techniques
Ontology based Concept extraction document summarization algorithm, ontology model
Query Dependent Multi-Document Summarization
Text summarization using fuzzy logic
Gulati et al. [22]
Ragunath et al. [23]
Deshmukh et al. [24]
Patil et al. [25]
Difficulties in Neural network training,
Lacuna
Transform-ation of knowledge base into fuzzy rule set is difficult task
News document dataset, Issues regarding Clustering K-means, i.e., limitations of feature Hierarchical, partitioned and clustering algorithm
Own database collection, Genre specific Domain specific features
Most of the researchers Separation of used their own collection important contents of text corpus as from text is difficult database
Own document collections, sentence ranking, clustering
Dataset and features
Summarization with Own database. Title, feature extraction, use of Sentence length, osition, fuzzy rule sets numerical data, Term weight, sentence similarity features
Multi document, using Feature based and Cluster based Method
Machine learning techniques, text mining algorithms and semantic technologies
Neural network model, back propagation technique, rhetorical structure theory
Text summarization using neural networks and Rhetorical structure theory
Sarda et al. [21]
Technique/Methodology
Trend of research and language
Author
Table 1 (continued)
(continued)
Fuzzy logic improves quality of summary. Proposed model given better results as compared to online summary
Study gives all detail on multi-document summarization
Accuracy is calculated at 87.03%
Two main techniques extraction and abstraction studied for text summarization
Numerical data feature and rhetorical structure theory helps to select highly ranked summary sentences
Outcomes
A Survey on Automated Text Summarization System … 927
Trend of research and language
Text summarization using Fuzzy Logic and LSA
Survey on summarizer for Indian languages (Punjabi)
Text summarization using Clustering
Comparison of text summarization technique for eight different languages
Automatic text summarization using fuzzy logic
Author
Babar et al. [26]
Gupta [27]
Deshpande et al. [28]
Dhanya et al. [29]
Dixit et al. [30]
Table 1 (continued)
Own database, direct word matching and sentence feature feature score
Dataset and features
Feature based extraction of important sentences using fuzzy logic, sentence scoring, fuzzy inference rule
Extractive, Tf-Idf, sentence scoring, graph-based sentence weights
Extractive, Multi-document summarization, document, sentence clustering by K-means
Lack of techniques of text Summarization
Result compared using precision, recall and F-measure. Clustering reduces redundancy
Study observed research on summarization is at initial state for Indian languages
(continued)
81% resemblance with human summ-ary. And similarity in sentence position has got 79% resemb-lance
Same set of sentences Feature selection is in English are used for important in summary comparing all the generation methods
30 documents from news The system is tested based URLs. compared only with 30 news with Copernic and MS document Word 2007 summarizer
Own collection of documents, LSW similarity weight, sentence score, features
Outcomes
The focus of this paper Precision of fuzzy based is narrow summary is 86.91% average recall is 41.64% average f-measure is 64.66%
Lacuna
Own collection sentence Lack of simplification scoring, document on technique for large clustering by cosine and complex sentences similarity
Weight learning Topic identification, algorithm and regression, statistical and language dependent features
Extractive summarization Feature vector algorithm, Fuzzy, Inference model
Technique/Methodology
928 P. K. Vaishali et al.
Text summarization using sentence ranking (Kannada)
Query-based summarizer Multi-document, topic-driven summarizer sentence similarity, word frequency, VSM model
Opinion summarization (Bengali)
Resources and techniques development for Marathi, Hindi, Tamil, Gujarati, Kannada
Jayashree et al. [32]
Siva Kumar et al. [33]
Das et al. [34]
Agrawal et al. [35]
Dataset and features
Lacuna
Corpus development for multi-lingual and Multi-document summarization
Extractive, single document, theme clustering, relational graph representation
Extractive, key word based summary
Own developed corpora using news articles. Linguistic features
Own dataset, Theme identification using lexical, syntactic, discourse level features
Newswire articles from AQUAINT-2 IR Text Research Collections, TAC 2009 datasets
Database obtained from Kannada Webdunia news articles. GSS coefficients and IDF, TF for extracting key words
Module with 9 and 5 features has better accuracy for precision, recall and f-measure as compared to MS—Word
Outcomes
More language expertise is the requirement
Issue related to sentence ordering. It is important in summa-rization
It introduced different techniques of corpus development
Result calculated at precision, recall and F-score which is 72.15%, 67.32%, and 69.65%
Need of simplification Summary can be techniques for very Evaluated using N-gram complex d large Co-occurrences sentences
Requirement of human Machine summary summary from expert compared with Human summary average at 0.14%, 0.11%, and 0.12% for sports, Entertainment, Religious article respectively
Extraction, sentence Own collected dataset, Limited dataset or scoring, fuzzy algorithm, utilizes a combination of documents feature decision module nine features to achieve feature scores of each sentence
Feature based text summarization
Prasad et al. [31]
Technique/Methodology
Trend of research and language
Author
Table 1 (continued)
A Survey on Automated Text Summarization System … 929
930
P. K. Vaishali et al.
Table 2 Comparison of extractive and abstractive technique [15, 20, 32] Extractive technique
Abstractive technique
Summary is a collection of extracted sentences Summary is a collection of meaningful phrases or sentences The extracted sentences follow the order in which they have appeared in the text
For the summary, new sentences or paraphrases are generated
It is unnecessary to develop domain knowledge It is necessary to develop domain knowledge and features and features It produces a summary with specific important sentences as result
It produces a summary with new sentences showing the theme of the source as a result
It is easier to produce expected results
Difficult to achieve the desired results
It has a great demand for early research
It has great scope in the present NLP applications
Results are based on statistical approach
Results based on linguistic and semantic approach
It does not use a framework
It uses encoder and decoder frameworks
Most of the work has been based on sentence extraction and statistical analysis
Current research is underway to use cue phrases and discourse structure
It extracts important sentences, phrases from It uses linguistic components, grammatical the text and groups them to produce a summary rules, and significance of the language to write the summary It does not need interpretation
It need study and analysis for the text interpretation
It does not consist reading and understanding of the text
It consists reading and understanding of the text for its meaning
Unable to generate new sentences
Ability to generate new meaningful sentences
Generated summary is not so standard it consists repeated sentences
It raises the quality of the summary by reducing sentence redundancy
The issue with extraction is coherence
The issue with abstraction is the separation of main content from the text
and applications. The below table shows the classification of summarization systems by their categories (Table 3).
2.2 Observed Techniques for Feature Identification Text summarizers can identify and extract key sentences from the source and group them properly to generate a concise summary. A list of features required to select for analysis and for better understanding of the theme. Some of the features are given in below table that used for selection of important content from the text on which meaning depends (Table 4).
A Survey on Automated Text Summarization System …
931
Table 3 Text summarization classification [13, 20, 28] Content
Type
Scope of the summary
Technique
Supervised
Training data or predefined datasets are needed for selecting the main contents from the document
Unsupervised
No need of training data. System automatically summarize the text
Statistical
It counts the sentence weights based on various parameters. Tf-Idf, term frequency, word count etc
Linguistic
It is related to words lexical, semantic, and syntactic features. Word dictionary, POS tagger, word pattern, n-grams are used for lexical analysis of words
Machine learning
It has used training datasets. It is based on linguistic features. It finds the relevance of the sentence in the summary. Naive Bayes, Decision Trees, HMM, Neural Networks, SVM, etc., are used to extract relevant sentences
Hybrid
Combination of the features of statistical, lexical, and machine learning based models
Extractive
Summary consists of extracted sentences from the source. Methods use text mining approaches
Abstractive
The summary consists of the overall meaning or theme from the source and is presented with new sentence generation. It uses natural language generation tools to derive summary
Real-time
It produce a relevant real time summary. When new contents are added in the source summary is updated by the system
Informative
Concise information given to the user as a summary
Indicative
Provides the main idea and a quick view of a lengthy document
Contents
Generic
Summary is subject independent or generic in nature
Query-based
Summary is a result of some question
Limitations
Domain/genre dependent
It only accepts special input like newspaper articles, stories, manuals, medical reports. Summary is limited to that input
Domain independent
It can accept different type of text for summary generation
Approach
Summary information
Details
(continued)
932
P. K. Vaishali et al.
Table 3 (continued) Content
Type
Scope of the summary
Input
Single document
It involves summarization of single document
Multi-document
Several documents are used to summarize at a time
Mono-lingual
Input documents only with specific language and output is also based on that language
Multi-lingual
It accepts documents as an input with different languages and generate summary in different languages
Language
2.3 Observed Preprocessing Methods for Text Summarization Preprocessing is a process of performing basic operation for the preparation and simplification of data for the further processing. In this, unstructured data are transformed into structured form as per of the need of summary application. Below table gives the idea of last few years widely used preprocessing techniques (Table 5).
2.4 Observed Methods for Text Summarization There are different types of methods implemented for the summarizer systems that are capable of identifying and extracting the important sentences from the source text and grouping them to generate the final summary. Tables 6 and 7 provide the important metods of extractive and abstractive summarization, respectively.
3 Dataset From the early days of summarization, most of the work has been done in English. There are a number of standard datasets available in English for research like DUC (Data Understanding Conference), CL-SciSumm (Computational Linguistics scientific document summarizer), TAC Text Analysis Conference, TISPER text summarization evaluation conference (SUMMAC). These datasets are used to test the language and performing experimental researches. But in the case of Indian languages, there are no proper datasets available for the researchers. Most of the data are collected from newspapers, medical documents another source is by own collected dataset in respective languages. On the basis of the specifications outlined in the system, the corpus was designed by the researchers.
A Survey on Automated Text Summarization System …
933
Table 4 Text summarization features [19, 20, 30] Feature
Description
Term frequency
The number of repeated words in the sentence
Word location
The find importance of the word by its position
Title word
Identification of title or theme of the source
Sentence location
First and last position of sentence in a paragraph is important to be included in summary
Numerical data
Presence of numerical data in the sentence
Sentence to sentence similarity
For each sentence S, the similarity between S and every other sentence is computed by the method of token matching
Sentence length
It measures the size of sentences, long or very short sentences
Cue phrases
Indicative words that show positive or negative sense
Upper-case word
Sentences containing acronyms or proper names are included in summary. Some languages are exception of this
Keywords
Most important words from the source document
Similarity Ratio
It is the similarity between the sentence and the title of the document
Clauses
The key textual elements present in the source
Paragraphs
Each paragraphs is used to discuss single point
Sentence weight
The ratio of the size of paragraphs
Proximity
Determining factor in the formation of relationships between entities
Sentence boundary
A dot or comma or semicolon is the indicator of end of sentence
Stop words
Frequently occurring words that do not affect meaning of the sentence
Proper noun
A lexicon representing the name of a person or a place or an organization
Thematic words
Domain specific words with maximum possible relativity in sentence
Term weight
The ratio of summation of term frequencies of all terms in a sentence over the maximum of summation values of all sentences in a document
Information density
Factors for determination of the relevance of sentences and their meaning
Font based feature
Words appearing in upper case, bold, italics or underlined fonts are usually important
Biased word feature
A word in a sentence is from biased word list, then that sentence is important. These are domain specific words
934
P. K. Vaishali et al.
Table 5 Preprocessing techniques for text summarization [19, 29] Procedure
Purpose
Stop word removal
Removal of frequently occurring words that has less linguistical importance in the sentence
Text validation
Confirmation of the text for a required language script, correct grammatically in spelling
Sentence selection
Selection of sentences for generation of the summary
Sentence segmentation
Sentences are break out into individual words
Term weight
A statistical measure for the weight of the word considered for summary
Word frequency
Counting of the word for number of occurrences in the sentence
Stemming
Removal of the suffixes from the word to get the stem or base word
Sentence tokenization
Paragraph text are splited into number of individual sentences
Word tokenization
Sentences are splited into individual tokens or words
Lemmatization
Removing of suffixes to generate base word or lemma
Morphological analysis
Investigation of the structure of the language or format by linguistical, semantical aspects of the language
Text normalization
Simplification of text by stemming or lemmatization for its quality
Paragraph segmentation
Paragraph texts are divided into number of individual sentences
POS tagging
Labeling of the words by its part-of-speech
Chapter segmentation
It separates or cuts the text document into number of chapters
Set of proper nouns
A collection of all the pronouns extracted from the source
Bag of words/tags [b]
A vector space model in which each sentence is described as a token and each appearance of a word is counted regardless of its order
4 Proposed Methodology Through the literature review, it is observed that there are various types of methodologies useful or followed for the development of the text summarization systems. Figure 1 shows general architectural view for text summarization. Most of the system generally follows some important steps to achieve the target summary after selection of textual contents.
4.1 Major Steps for Text Summarization The text summarization can be done by following steps as shown in Fig. 1 (a)
Input Text Document—In this, input source document is given as an Input to the system.
A Survey on Automated Text Summarization System …
935
Table 6 Extractive text summarization methods [20, 23, 29] Technique
Description
Features
Example
Tf-Idf model
It is based on term frequency and inverse document frequency
Term frequency, inverse document frequency count for the sentence scoring
A document have 100 words in this the word cat appears 3 times. The term frequency (i.e., tf) for cat is then (3/100) = 0.03. for 10 million documents and the word cat appears in 1000 times. Then, Idf = log (10, 000, 000 /1000) = 4. Thus, the Tf-idf product is 0.03 * 4 = 0.12
LSA (Latent semantic Semantic analysis) method representation of terms, sentences, or documents
Semantic features
Useful for selection of sentences on the basis of their contextual use
Neural network-based Summary by selecting model the most important words from the input sentence
Linguistic features to generate a grammatical summary. It requires advanced language modeling techniques
Create artificial semantic networks have been modeled using graph theory
Fuzzy logic based Model[d]
Fuzzy model, fuzzy rules and, Triangular membership function
The fuzzy rules are in in the form of IF–THEN. It fuzzifies at Low, Medium, and high value
Model are useful to determine whether sentence is important, unimportant or average
Query-based Model
Sentences in a given document are scored based on the query
Sentence features extraction
Used in frequency counts of terms
Machine learning Model
Training the machine to learn using experience
Linguistic and Useful naive bayes statistical features for algorithm relevance of sentence
Graph theoretic model It is based on the identification of the themes
Semantic relationship Useful for knowledge or features representation
Cluster-based model
Document clusting based model
Document clustering Cluster models to generate a generally used for meaningful summary grouping and classification. like k-means algorithm
Statistical-based
Words, sentences are Statistical features, counted for number of word, phrase, occurrences keywords, and sentence length
It is useful to find the concept relevance
936
P. K. Vaishali et al.
Table 7 Abstractive text summarization methods [20] Techniques
Description
Features
Advantages
Machine learning
A popular method it is to train the machine by experience
Linguistic and statistical features
It has a predefined training dataset
Topic modeling
The important information in the text is identified
Statistical and Useful for discourse structure features like segmentation position, cue phrases, word frequency
Rule-based method
This method is based Linguistic features on grammar rules and like part-of-speech, predefined data sets suffix removal, etc
Rule sets for correct identification of word type
Fuzzy based
It is based on Fuzzy inference properties of a text as rules, sentence similarity to the title length, and keyword similarity
Used as a semantic analysis model
Neural network model
Training the neural networks to learn the types of sentences that should be included in the summary
Sentence scoring
Used as vector space model
Ontology-based
Domain specific systems designed by domain experts
It gives a cosine distance between the feature vectors of the sentence and its category
Useful for identification of the word which has high weight for a particular domain
Tree based
It uses a dependency It creates nodes of tree for important sentences representation of text for the given text document
Template based
Template are used the Linguistic features or It enhances the quality representation of the extraction rules of the summary whole document matched to identify text that mapped into template slots
Lead and body phrase
It is based on the Semantic features operations of phrases used to rewrite the that have the same lead sentence syntactic head chunk in the lead and body sentences
Information-Item-based It is an abstract method representation of source documents
Easy to generate summary
It is easy to identify the important sentences
Linguistic features to System produces generate abstract short, coherent, summary content rich summary (continued)
A Survey on Automated Text Summarization System …
937
Table 7 (continued) Techniques
Description
Features
Advantages
Multimodal semantic model
It is the semantic model it captures concepts or their relationship
Semantic features, useful to generate an abstract summary
Represent the contents of multimodal documents
Semantic graph based method
It summarizes a Semantic features document by creating a rich semantic graph (RSG)
Used for single document summarization
Query-based
It generate summary of text based on the query
Useful to generate precise summary
Sentence scoring based on the frequency counts
Processing
Preprocessing
Feature Extraction
Special characters removal
Feature weight computation Removing stop words Theme identification
Input Text Document Stemming
Selection of salient content Sentence Tokenization Compute similarity and rank sentences Word Tokenization
POS Tagging
Generate Extract or Abstract Summary
Fig. 1 General architectural view of the text summarization system
(b)
(c) (d)
Preprocessing—In this, some basic operations are performed for normalization of the text. It is an important for selection of text with a particular script. Processing—In this, text is processed for selection of the text and extraction of main important sentences using the features. Theme Identification—In this, the most important information in text is identified. Techniques such as word position, sentence length, cue phrases, etc.
938
P. K. Vaishali et al.
(e)
Interpretation—This is important for abstractive summarization. In this, different techniques are employed to form a generalize content from the source text. Generate extract or abstract Summary—In this, extract or abstract summary is generated as an output.
(f)
5 Evaluation of the Summarization Systems Evaluation of the summary is important to measure the performance of the text summarization system for its quality. Intrinsic and extrinsic methods are widely used to measure quality of the summaries. It is taken in terms of exract and abstract of the summary. For extractive summarizers, evaluation is done by the popular measures precision, recall, and F-score. But for abstractive summaries or content-based system evaluation is done using Rouge or the N-grams similarity matching methods that used the word or semtence similarity and perform comparison for human generated summary and the machine-generated summary using the wordNet or synonyms list, word paraphraser, dictionary tools.
5.1 Result and Discussion For our literature studies, we used the last 10 years’ research papers and their trends. From the study, it is observed that extraction methods are mostly used for summarization. Extraction is easier than abstraction. The results are good for extractive systems. Today, extractive systems have good scope in the industry. Abstraction systems are difficult to implement. The most useful features for generating a summary are word frequency, word length, sentence scoring, sentence position, keywords and phrases, semantics, and linguistic or structural features. Abstractive summaries are sometimes not clear enough to express the meaning and are a challenging task for development.
5.2 Observed Issues in Text Summarization for Indian Languages Based on the reserach study of past literature from 2008 to 2021, it is observed that. There are various challenges in the development of automated text summarization. The problems have been the challenges for present technologies to resolve. • It is a difficult task to summarize the original content by selection of significant contents from the other text. • No standard metric available for evaluation of the summary.
A Survey on Automated Text Summarization System …
• • • • • • • • •
939
Ambiguity of words is the main problem with Indian Languages. Language expertise is essential for text analysis and interpretation. Language variants create complexity to understand the meaning. Machine-generated automatic summaries would result in incoherence within the sentences. Abstractive summarizers mainly depends on the internal tools to perform interpretation and language generation. Abstraction requires an expert system for linguistic or semantic analysis. Designing a generic standard for evaluating a summary is a great challenge. It is difficult to achieve similarity between a machine-generated summary and an ideal summary. Problem with achieving accuracy or efficiency in results due to human interface.
6 Conclusion Text summarization is an important NLP application. It is the demand for summarization of large amounts of information due to the internet and related services. It helps to search more effectively. It is a need for professionals, marketing agencies, government and private organizations, research students and institutions. Summarization seen to be powerful to provide required information in a short time. This paper takes into all about the details of both the extractive and abstractive approaches along with the techniques, features, language specification for Indian languages. Text summarization has its importance in the commercial and research field. An abstract summary requires proper learning and linguistic reasoning. Implementation of abstractive systems is complex than the extractive systems. Abstraction provides a more meaningful and appropriate summary based on knowledge as compared to extraction. Through the study, it is observed that very little work has been done using abstractive methods in Indian languages. Research has a lot of scope for exploring methods for more appropriate and efficient summarization. It makes the study of automated summarization exciting and more challenging. Acknowledgements Authors would like to acknowledge and thanks to CSRI DST Major Project sanctioned No.SR/CSRI/71/2015 (G), Computational and Psycholinguistics Research Lab Facility supporting to this work and Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra, India. Also thankful to the SARATHI organization for providing financial assistant as a Ph. D. research fellow. I would like to express my sincere thanks to my research guide Dr. C. Namrata Mahender (Asst. Professor) of the Computer Science and IT Department, Dr. B.A.M.U, Aurangabad. For providing research facilities, constant technical and moral support.
940
P. K. Vaishali et al.
References 1. Sindhu CS (2014) A survey on automatic text summarization. Int J Comput Sci Inf Technol 5(6) 2. Reeve Lawrence H, Hyoil H, Nagori Saya V, Yang Jonathan C, Schwimmer Tamara A, Brooks Ari D (2006) Concept frequency distribution in biomedical text summarization. In: ACM 15th conference on information and knowledge management (CIKM), Arlington, VA, USA 3. Mashape.com/list-of-30-summarizer-apis-libraries-and-software 4. Atif K, Naomie S (2014) A review on abstractive summarization methods. J Theor Appl Inf Technol 59(1) 5. Manne S, Mohd ZPS, Fatima SS (2012) Extraction based automatic text summarization system with HMM tagger. In: Proceedings of the international conference on information systems design and intelligent applications, vol 132, pp 421–428 6. Sarwadnya VV, Sonawane SS (2018) Marathi extractive text summarization using graph based model. In: Fourth international conference on computing communication control and automation. (ICCUBEA). 978-1-5386-5257-2-/18/$31.00 IEEE 7. Sivaganesan D (2021) Novel influence maximization algorithm for social network behavior management. J ISMAC 03(1):60–68. http://irojournals.com/iroismac/. https://doi.org/10. 36548/jismac.2021.1.006 8. Valanarasu R (2021) Comparative analysis for personality prediction by digital footprints in social media. J Inf Technol Digital World 03(02):77–91. https://www.irojournals.com/itdw/. https://doi.org/10.36548/jitdw. 2021.2.002 9. Sinha S, Jha GN (2020) Abstractive text summarization for Sanskrit prose: a study of methods and approaches. In: Proceedings of the WILDRE5–5th workshop on Indian language data: resources and evaluation, language resources and evaluation conference (LREC 2020), Marseille, 11–16 May 2020 European Language Resources Association (ELRA), licensed under CC-BY-NC, pp 60–65 10. Malagi SS, Rachana, Ashoka DV (2020) An overview of automatic text summarization techniques. In: International journal of engineering research and technology (IJERT), Published by, www.ijert.org NCAIT—2020 Conference proceedings, vol 8(15) 11. Manju K, David Peter S, Idicula SM (2021) A framework for generating extractive summary from multiple Malayalam documents. Information 12:41. https://doi.org/10.3390/ info12010041 https://www.mdpi.com/journal/information 12. Verma P, Verma A (2020) Accountability of NLP tools in text summarization for Indian languages. J Sci Res 64(1) 13. Mamidala KK, Sanampudi SK (2021) Text summarization for Indian languages: a survey. Int J Adv Res Eng Technol (IJARET), 12(1):530–538. Article ID: IJARET_12_01_049, ISSN Print: 0976-6480 and ISSN Online: 0976-6499 14. Sarker A (2020) A light-weight text summarization system for fast access to medical evidence. https://www.frontiersin.org/journals/digital-health 15. Bhosale S, Joshi D, Bhise V, Deshmukh RA (2018) Marathi e-newspaper text summarization using automatic keyword extraction. Int J Adv Eng Res Dev 5(03) 16. Rathod YV (2018) Extractive text summarization of Marathi news articles. Int Res J Eng Technol (IRJET) 05(07), e-ISSN: 2395-0056 17. Mohamed SS, Hariharan S (2018) Experiments on document clustering in Tamil language. ARPN J Eng Appl Sci 13(10), ISSN 1819-6608 18. Dalwadi B, Patel N, Suthar S (2017) A review paper on text summarization for Indian languages. IJSRD Int J Sci Res Dev 5(07), ISSN (online): 2321 19. Kanitha DK, Muhammad Noorul Mubarak D (2016) An overview of extractive based automatic text summarization systems. AIRCC’s Int J Comput Sci Inf Technol 8(5). http://www.i-scholar. in/index.php/IJCSIT/issue/view/12602 20. Gaikwad DK, Mahender CN (2016) A review paper on text summarization. Int J Adv Res Comput Commun Eng 5(3)
A Survey on Automated Text Summarization System …
941
21. Sarda AT, Kulkarni AR (2015) Text summarization using neural networks and rhetorical structure theory. Int J Adv Res Comput Commun Eng 4(6) 22. Gulati AN, Sarkar SD (2015) A pandect of different text summarization techniques. Int J Adv Res Comput Sci Softw Eng 5(4), Apr 2015, ISSN: 2277 128X 23. Ragunath SR, Sivaranjani N (2015) Ontology based text document summarization system using concept terms. ARPN J Eng Appl Sci 10(6), ISSN 1819-660 24. Deshmukh YS, Nikam RR, Chintamani RD, Kolhe ST, Jore SS (2014) Query dependent multidocument summarization using feature based and cluster based method 2(10), ISSN (Online): 2347-2820 25. Patil PD, Kulkarni NJ (2014) Text summarization using fuzzy logic. Int J Innovative Res Adv Eng (IJIRAE) 1(3), ISSN: 2278-2311 IJIRAE | http://ijirae.com © 2014, IJIRAE 26. Babar SA, Thorat SA (2014) Improving text summarization using fuzzy logic and latent semantic analysis. Int J Innovative Res Adv Eng (IJIRAE) 1(4) (May 2014) http://ijirae.com, ISSN: 2349-2163 27. Gupta V (2013) A survey of text summarizer for Indian languages and comparison of their performance. J Emerg Technol Web, ojs.academypublisher.com 28. Deshpande AR, Lobo LMRJ (2013) Text summarization using clustering technique. Int J Eng Trends Technol (IJETT) 4(8) 29. Dhanya PM, Jethavedan M (2013) Comparative study of text summarization in Indian languages. Int J Comput Appl (0975-8887) 75(6) 30. Dixit RS, Apte SS (2012)Improvement of text summarization using fuzzy logic based method. IOSR J Comput Eng (IOSRJCE) 5(6):05–10 (Sep-Oct 2012). www.iosrjournals.org, ISSN: 2278-0661, ISBN: 2278-8727 31. Prasad RS, Uplavikar Nitish M, Sanket W (2012) Feature based text summarization. Int J Adv Comput Inf Res Pune. https://www.researchgate.net/publication/328176042 32. Jayashree R, Srikanta KM, Sunny K (2011) Document summarization for Kannada, soft computing and pattern, 2011-ieeexplore.ieee.org 33. Siva Kumar AP, Premchand P, Govardhan A (2011) Query-based summarizer based on similarity of sentences and word frequency. Int J Data Min Knowl Manage Process (IJDKP) 1(3) 34. Das A (2010) Opinion summarization in Bengali: a theme network model. Soc Comput (Social Com), -ieeexplore.ieee.org 35. Agrawal SS (2008) Developing of resources and techniques for processing of some Indian languages 36. Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. Int J Emerg Technol Web Intell 2:258–268
A Dynamic Packet Scheduling Algorithm Based on Active Flows for Enhancing the Performance of Internet Traffic Y. Suresh, J. Senthilkumar, and V. Mohanraj
Abstract The Internet is being a large-scale network, the packet scheduling scheme must be highly scalable. This work is to develop a new packet scheduling mechanism to enhance the performance of today’s Internet communication. Per-flow control technique has a scalability challenge because of the vast number of flows in a large network. The proposed method G-DQS is based on aggregated flow scheduling and can be used to manage huge networks. Packets are divided into two categories in this study: short TCP flows and long TCP flows. A scheduling ratio is determined based on edge-to-edge bandwidth and the maximum number of flows that can be accepted in the path. This ratio varies dynamically and minimizes the packet drop for short flows, also the long flows are not starved. This is required for today’s Internet communication as the recent Internet traffic shows huge short flows. The simulation results show that the suggested technique outperforms the other algorithms that use a constant packet scheduling ratio, such as RuN2C and DRR-SFF. Keywords TCP flows · Internet · Scheduling · Edge-to-edge bandwidth
1 Introduction The Internet has been transformed into the world’s greatest public network as a result of the Web. The Web has acted as a platform for delivering innovative applications in the domains of education, business, entertainment, and medicine in recent years. Banking and multimedia teleconferencing are just two examples of business applications. The practice of storing multimedia data on servers and allowing users to access it via the Internet has become increasingly common. Other applications include distance education provided by colleges via video servers and interactive games that are revolutionizing the entertainment sector. The quality of Internet communication Y. Suresh (B) · J. Senthilkumar · V. Mohanraj Department of Information Technology, Sona College of Technology, Salem, TamilNadu, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_68
943
944
Y. Suresh et al.
will have a big impact on the utility value of these apps. The network should be able to handle large volumes of this type of traffic in a scalable. The network must be able to meet quality-of-service (QoS) standards to enable this service.
2 Literature Review Today’s Internet is still based on best-effort service model [1]. It represents a service that all data packets are treated equally, and the network tries to ensure reliable delivery. The design of such model is very simple and easily scalable, but it does not ensure the delivery of traffic flows from end to end. In this model, during the congestion of network, the FIFO drops the packets regardless its priority and transmission control protocol (TCP) assures the retransmission for the dropped packets. The work done in [2, 3] shows that Internet traffic characteristics have heavytailed property. Several academics have attempted to increase system performance by utilizing the considerable unpredictability of Internet traffic. The research includes high-speed switching [4], dynamic routing [5], and scheduling [6]. The short flow and long-flow concept have been applied to the data centers [7]. Determining whether Internet traffic fits heavy-tailed distributions or not is difficult [8]. According to the research, studies applied on Internet traffic, the majority of flows are short, with less than 5% of long flows carrying more than 50% of total bytes. The average size of a short flow is only 10–20 packets. In [9], short flows and long flows are referred as mice and elephants. The long flows have been observed in specifically P2P data transfers [10] in Web servers [11]. The short flows are primarily attributable to online data transfers initiated by interactivity [12]. The author [13] proved that the preferential treatment to the short flows reduces Web latency. Most of the long flows originates from P2P applications. The MMPTCP transport protocol is introduced [14], which benefits short flows by randomly scattering packets. The protocol then switches to multi-path TCP (MPTCP), which is a very efficient mode for long flows. FDCTCP [13] dramatically improves performance and decreases flow completion times, particularly for small and medium-sized flows when compared to DCTCP. As discussed above, from the recent Internet traffic measurement, it is necessary to classify the Internet flows as short and long to achieve service guarantee. The proposed scheduling algorithm applies, flow-classified service to improve the performance of Internet. In today’s Internet routers, a basic scheduling technique known as first in first out (FIFO) or first come first served (FCFS) is extensively utilized. FIFO only provides best-effort service. Because flows are not categorized, it is not suited for delivering guaranteed service. Any special requirements of a flow, such as latency and throughput, are not taken into account by the scheduling technique. A highthroughput data transfer in FIFO, for example, can starve a low-throughput real-time link like voice traffic.
A Dynamic Packet Scheduling Algorithm Based on Active Flows …
945
Existing algorithms such as weighted fair queuing (WFQ), deficit round robin (DRR), deficit round robin-short flow first (DRR-SFF), and least attained service (LAS), and execute per-flow scheduling that involves a complicated mechanism for flow identification as well as flow state maintenance. It is impracticable to keep all of the flow state in routers with the significant expansion in Internet communication. The huge advantage of the proposed approach is that no need to maintain flow state in the routers. RuN2C scheduling technique [15], in which packets with a low running number (class-1) are placed in one queue, whereas packets with a high running number (class2) are placed in another. The class-2 packets get chance only if the class-1 packets are completely scheduled. This creates starvation for the class-2 packets. LAS [16] is used in packet networks to intercommunicate effectively with TCP to support short flows. This is accomplished by placing the first packet of a newly arriving flow with the least amount of service at the top of the queue. LAS prioritizes this packet and reduces the round trip time (RTT) of a slow-starting flow. Short-flow transfer times are reduced as a result. The author [17] compared the performance of round robin-based scheduling algorithms. The various attributes of network performance of WRR/SB is compared with WRR and PWRR. WRR/SB, the WRR/SB outperforms better than other algorithms. The author [18] proposed that the network time delay might be reduced by using the neural network approach. This is accomplished by collecting weights that influence network speed, such as the number of nodes in the path and congestion on each path. The study demonstrates that an efficient technique that can assist in determining the shortest path can be used to improve existing methods that use weights.
3 Calculation of Fmax Maximum active flows that can be allowed on the edge-to-edge network path is represented by Fmax. In [19], a method described to determine flow completion time (FCT). For a packet flow of size S f , the FCT is determined using Eq. (1). FCT = 1.5 × RTT + log2
Sf × RTT MSS
(1)
where MSS RTT
Maximum segment size Round trip time
Relating S f and FCT, the throughput T f and Fmax are determined based on network bandwidth using Eqs. (2) and (3) Tf =
S f × (MSS + H ) FCT × MSS
(2)
946
Y. Suresh et al.
Fmax =
BWPr F Ai (dk ) ⊗ Tf
(3)
Fmax is related with the scheduling ratio in the proposed algorithm which is detailed in Sect. 4 for effective scheduling of packets.
4 Proposed Algorithm The proposed algorithm captures short flows using threshold th and considers the remaining flows as long flows. They are inserted in two queues: SFQ and LFQ. The total number of flows in SFQ is used to initialize the variable counter DC(r). Using the number of flows in SFQ and LFQ, the algorithm derives the dynamic scheduling ratio Q(r). It also determines the maximum flows that will be available on the path using Fmax, which is connected to Q(r). To schedule the flows from SFQ and LFQ, the conditions Fmax > Q(r) and Fmax < Q(r) are tested. Algorithm
Begin Using a th threshold, divide flows into short and long flows, and place them in two queues, namely SFQ and LFQ, respectively. S:
n • Total flows in SFQ = i=1 n BSFQ (i) • Total flows in LFQ = i=1 BLFQ (i) n • Variable counter initialization DC = i=1 BSFQ (i) D: • Scheduling ratio Q(r) for any round r Q(r ) = • Estimate Fmax using Eq. (3).
n i=1
n BSFQ (i)+ i=1 BLFQ (i) n B i=1 LFQ (i)
If Fmax > Q(r). • • • •
Flows served in SFQ = Q(r) Flows served in LFQ = 1. Perform DC(r) = DC(r) − Q(r) If DC(r) > Q(r) then return to D: else return to S: for the calculation of Q(r) and nFmax for the next round. • When i=1 BSFQ (i) = 0 then flows served in LFQ = Q(r)
If Fmax < Q(r)
A Dynamic Packet Scheduling Algorithm Based on Active Flows …
947
• Flows served in SFQ = Fmax, and no flow is served in LFQ • Perform DC(r) = DC(r) − Q(r) • If DC(r) > Q(r) then return to D: else return to S:for the calculation of Q(r) and Fmax for the next round. End When Fmax is greater than the flows to be scheduled, the algorithm schedules both short and long flows in SFQ and LFQ. When Fmax is limited, it prioritizes only short flows. The proposed algorithm works in accordance with the characteristics of Internet as Internet traffic exhibits short flows in vast manner. This method provides a more reliable service than the best-effort method utilized in the Internet.
5 Performance Analysis The proposed scheduling algorithm’s performance is measured in Dumbbell topology. The performance is evaluated based on the parameters like mean transmission time, packet loss, and throughput. Using network simulation, the G-DQS is compared with other algorithms such as LAS, RuN2C, and FIFO. The results recorded from the simulation are presented and analyzed. ns-2 is used to run all of the simulations.
Fig. 1 Dumbbell topology
948
Y. Suresh et al.
In Fig. 1, R0 and R1 are the edge routers. S1–S5 are source, and C1–C5 are sink nodes which transmits and receives the packets. In our simulation, the packet size is set to 500 bytes, and short flows are considered as packets with the size of 1 to 20, and long flows are considered as packets with the size of 1–500. Transmission time, packet drop, and throughput parameters are analyzed here.
5.1 Transmission Time of Short Flows
Transmission Time (sec)
The transmission time of short flows is depicted in Fig. 2. In comparison with FIFO, the proposed algorithm G-DQS and the other algorithm RuN2C greatly lower the transmission time of short flows. A single queue discipline method FIFO does not make any distinction between the two flows which increases the transmission time largely. In FIFO, long flows can obtain priority over short flows, increasing transmission time as shown in Fig. 2. The proposed method is giving short flows preference over long flows and reduces 30.7% of mean transmission time for the short flows in comparison with FIFO.
Flowsize (packets) Fig. 2 Transmission time versus flow size
A Dynamic Packet Scheduling Algorithm Based on Active Flows …
949
5.2 Packet Drop Analysis
Number of packets dropped
Figures 3 and 4 indicate packet drop for various flow size. It demonstrates that short flows of less than 25 packets do not incur packet loss when using the proposed technique, although FIFO flows of the same size do. Packet loss for short flows is lower in
Flow Size (packets)
Number of packets dropped
Fig. 3 Packets drop versus flow size
Flow Size ( in packets) Fig. 4 Packet drop versus flow size (Zoom version of image of Fig. 3)
Y. Suresh et al.
Throughput (packets/sec)
950
Simulation Time (sec) Fig. 5 Throughput versus simulation time
the proposed approach than in RuN2C due to dynamic approach in scheduling ratio. The proposed algorithm schedules packets from both SFQ and LFQ, whereas the Ru2NC technique schedules long flows only if short flows are completely supplied. As a result, in addition to small flows, long flows are serviced in G-DQS.
5.3 Throughput Figure 5 depicts flow throughput as a number of packets received per second. During the simulation time, it has been observed that the FIFO’s throughput drops abruptly. Long flows are penalized and starved as a result of this in FIFO. It also shows that the FIFO and RuN2C throughputs are not constant across the simulated duration. The proposed G-DQS has a nearly constant throughput and guarantees it.
6 Conclusion The proposed algorithm reduces transmission time of all flows in comparison with other protocols. Transmission time of short flows has been analyzed and found the no packet loss till th. The proposed algorithm reduces packet loss as it schedules the flows based on Fmax the path. Since it reduces the packet loss, the number of retransmission decreases and results in reduction of mean transmission time. The throughput analysis has also been made and results show that G-DQS maintains almost constant throughput performs better than other protocols.
A Dynamic Packet Scheduling Algorithm Based on Active Flows …
951
References 1. Anand A, Dogar FR, Han D, Li B, Lim H, Machado M, Wu W, Akella A, Andersen DG, Byers JW, Seshan S, Steenkiste P (2011) XIA: an architecture for an evolvable and trustworthy ınternet. ˙In: Proceedings of theHOTNETS ’11,tenth acm workshop on hot topics in networks, vol 2 2. Bansal N, Harchol-Balter M (2001) Analysis of SRPT scheduling: ınvestigating unfairness. In: Proceding of the sigmetrics 2001/performance 2001, pp 279–290 3. Crovella M (2001) Performance evaluation with heavy tailed distributions. In: Proceedings of the JSSPP 2001, 7th ınternational workshop, job scheduling strategies for parallel processing, pp 1–10 4. Harchol-Balter M, Downey A (1997) Exploiting process lifetime distributions for dynamic load balancing. Proc ACM Trans Comput Syst 15(3):253–285 5. Shaikh A, Rexford J, Shin KG (1999) Load-sensitive routing of long-lived IP flows. Proc ACM SIGCOMM 215–226 6. Qiu L, Zhang Y, Keshav S (2001) Understanding the performance of many TCP flows. Comput Netw 37(4):277–306 7. Carpio F, Engelmann A, Jukan A (2016) DiffFlow: differentiating short and long flows for load balancing in data center networks. Washington, DC, USA, pp1–6 8. Gong W, Liu Y, Misra V, Towsley DF (2005) Self-similarity and long range dependence on the internet: a second look at the evidence. Orig Implications Comput Netw 48(3):377–399 9. Guo L, Matta I (2001) The war between mice and elephants. In: Proceedings of the ICNP 2001, IEEE ınternational conference on network protocols, pp 180–191 10. Brownlee N, Claffy KC (2004) Internet measurement. IEEE Internet Comput 8(5):30–33 11. Bharti V, Kankar P, Setia, Gürsun G, Lakhina A, Crovella M (2010) Inferring ınvisible traffi. In: Proceedings of the CoNEXT 2010, ACM conference on emerging networking experiments and technology, vol 22, ACM, New York 12. Chen X, Heidemann J (2003) Preferential treatment for short flows to reduce web latency computer networks. Int J Comput Telecommun Network 41:779–794 13. Wang M, Yuan L (2019) FDCTCP: a fast data center TCP. In: IEEE ınternational conference on computer science and educational ınformatization (CSEI). China, pp 76–80 14. Kheirkhah M, Wakeman I, Parisis G (2015) Short versus long flows. ACM SIGCOMM Comput Commun Rev 45:349–350 15. Avrachenkov K, Ayesta U, Brown P, Nyberg E (2004) Differentiation between short and long TCP flows: predictability of the response time. ˙In: Procedimg IEEE INFOCOM, 04(2):762–733 16. Rai IA, Urvoy-Keller G, Biersack EW (2004) LAS scheduling to avoid bandwidth hogging in heterogeneous TCP Networks. HSNMC 2004 high speed networks and multimedia communications. In: 7th IEEE ınternational conference, Toulouse, France, pp 179–190 17. Balogh T, Luknarva D, Medvecky M (2010) Performance of round robin-based queue schedulining algorithms. In: Proceedings of the CTRQ’10, third ınternational conference on communication theory, reliability, and quality of service. Athens, Greece, pp 156–161 18. Zhang B, Liao R (2021) Selecting the best routing traffic for packets in LAN via machine learning to achieve the best strategy. Complexity 1–10 19. Jiang Y, Striegel A (2009) Fast admission control for short TCP flows. In: Proceedings of the global communications conference, 2009. GLOBECOM 2009, Hawaii, USA, pp1–6
Automated Evaluation of Short Answers: a Systematic Review Shweta Patil
and Krishnakant P. Adhiya
Abstract Automated short answer grading (ASAG) of free text responses is a field of study, wherein student’s answer is evaluated considering baseline concepts required by question. It mainly concentrates on evaluating the content written by student, more than its grammatical form. In educational domain, assessment is indeed a tedious and time-consuming task. If in anyway the time utilized for this task is, then the instructor can focus more on teaching and learning activity and can help students in their overall growth. Many researchers are working in this field to provide a solution that can assign more accurate score to student response which are similar to the score assigned by human tutor. The goal of this paper is to provide insight in the field of ASAG domain by presenting concise review of existing ASAG research work. We have included the research work carried out using machine learning and deep learning approaches. We have also proposed our methodology to address this problem. Keywords Automated evaluation · Short answer grading · Feature engineering · Natural language processing · Machine learning · Deep learning
1 Introduction Research in the field of natural language processing (NLP), machine learning and deep learning has opened doors for providing solution to complex problems. One such complex problem is automated short answer grading (ASAG). It is the field wherein students’ short answer which comprises of few sentences or one paragraph is evaluated and scores are assigned which are close enough to grades assigned by human evaluator. In education domain along with teaching & learning, evaluation is one of the important task. Evaluation helps to assess student understanding about the course S. Patil (B) · K. P. Adhiya Department of Computer Engineering, SSBT’s College of Engineering and Technology, Bambhori, Jalgaon, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_69
953
954
S. Patil and K. P. Adhiya
being taught. Evaluation mainly comprises of multiple choice, true or false, fill in the blanks, short answer, and essay type questions [1]. We are interested in assessing short answer which comprises of 2–3 sentences or few paragraphs & closed ended, as it helps to analyze overall understanding related to the course. In evaluating, short closed-ended answers, students are usually expected to concentrate on including specific concepts related to question, which ultimately help student to get good score. It will also help to reduce the amount of time devoted in checking student answers and will provide them with immediate detailed feedback which finally assist them in their overall growth.Even the system will develop an unbiased score. Example 1: What is Stack? Model Answer: Stack is a linear data structure which has PUSH and POP operations for inserting and deleting an element in an array, respectively. Student Response: Stack is a data structure where elements are arranged sequentially and which has majorly 2 operations PUSH for inserting element and POP for deleting an element from an array. In the example 1 shown above, the words underlined are important concepts which are essential to be included in student response to get evaluated. But many a times, it may happen that student can represent the same concepts with the help of synonyms, paraphrases, or polysemous words. So the system should be developed such that it can recognize all such surface representations and assess student responses correctly. Our main motivation to take this task for research are; • To evaluate contents rather concentrating on grammar and style. • Unbiased Evaluation. • Save instructors time of evaluation and utilize the same for overall progress of students. • To provide immediate detail feedback to students, which will help them in their future progress. The study conducted so far has clearly showed that problem of ASAG can be solved by three state-of-the-art methodologies such as rule-based approach, machine learning, and deep learning [1, 2]. We have majorly studied, analyzed and reported machine learning and deep learning approaches in this paper. The remainder section of this article comprises of first we have reviewed the existing ASAG systems. Then, we illustrated the general approach of ASAG and our proposed methodology. At last, we present discussion and conclusion.
Automated Evaluation of Short …
955
2 Related Study Many researchers have gone through the problem of automated short answer grading and provided the solution for the same by applying various feature extraction techniques, machine learning, and deep learning approaches. In this study, we have studied automated grading via machine learning and deep learning.
2.1 Automated Grading by Applying Machine Learning Approach Author [3] proposed a method in which the feature vector for student answer was generated by utilizing the word Part of speech (POS) tag acquired from Penn tree bank, along with POS tag of preceding and next word. Also term frequency-inverse document frequency (TF-IDF) value and entrophy are included in feature vector. Finally, using SVM classifier, the students answer were labeled as +1 and −1. Author has shown the average precision rate of the proposed model to be 68%. Kumar et al. [4] proposed AutoSAS system which incorporates various feature vector generation techniques such as Lexical diversity, word2vec, prompt, and content overlap.The authors have later trained all students answer using the features described above and later used random forest for regression analysis. They have employed quadratic weighted kappa for calculating the level of agreement between their proposed model and human annotated scores which comes to be 0.79. The authors have tested their proposed model on universal available dataset ASAP-SAS. Galhardi et al. [5] has presented an approach for well-known ASAG dataset Beetle and SciEnts Bank which consist of electricity & electronics and physics, life, earth, and space science questions, respectively. They have utilized the best of distinct features such as text statistics (SA and question length ratio, count of words, avg. word length, and avg. words per sentence), lexical similarity (token based, edit based, sequence based and compression based), semantic similarity (LC, Lin, Resnik, Wu & Palmer, Jiang and Conrath and shortest path), and bag of n-grams. Once the features were generated, they experimented with random forest and extreme gradient boosting classifiers. System was evaluated using macro-averaged F1-score, weighted average F1-score, and accuracy. They reported the overall accuracy between 0.049 and 0.083. In [6], automated assessment system proposed for engineering assignments which majorly comprises of textual and mathematical data. They have used tf-idf method for extracting feature from textual data by performing initial preprocessing such as stop word removal, case folding, and stemming. Later they utilized support vector machine (SVM) technique for assigning score to student answers. They have shown the accuracy of 84% for textual data.
956
S. Patil and K. P. Adhiya
2.2 Automated Grading by Applying Deep Learning Approach Following we have presented various existing ASAG systems via deep learning: Zhang et al. [7] has developed word embedding through domain general (through Wikipedia) and domain-specific (through student responses) information by CBOW.Later, student responses are evaluated using LSTM classifier. Ichida et al. [8] deployed a measure to compute semantic similarity between the sentences using Siamese neural network which uses two symmetric recurrent network, i.e., GRU due to its capability to handle the issue of vanishing/ exploding gradient problem. They have also studied LSTM and proved how their approach of GRU is superior to LSTM as it has very fewer parameters to train. The system can be improved by utilizing sentence embedding instead of word embedding. They showed Pearson correlation to be 0.844 which is far better than baseline approaches studied by author. Kumar et al. [9] proposed model which comprises of cascading of three neural building blocks: Siamese bidirectional LSTM unit which is applied for both student and model answer, later a pooling layer based on earth mover distance applied over LSTM network and finally a regression layer which comprises of support vector ordinal regression for predicting scores.The evaluation of LSTM-Earth Mover Distance with support vector ordinal regression has shown 0.83 RMSE score which is better than scores generated by softmax. Kwong et al. [10] Author has ALSS, which checks content, grammar, and delivery of speech using three Bi-LSTM. The end-to-end attention of context is learned through MemN2N network. System can be enhanced by utilization of GAN. Whatever systems we studied so far have used word embedding technique, but it has the limitation of context. So, [11, 12] both approaches have utilized sentence embedding techniques skip thought and sentence BERT, respectively. Author in [11] deployed a model, wherein vectors for both student and model answer are generated using skip thoughts sentence embedding techniques. Later, component-wise product and absolute difference of both vectors is computed. To predict the final score logistic linear classifier is utilized. Wherein, [12] proposed a method that provided automated scoring of short answers by utilizing SBERT language model. The model performs search through all reference answers provided to the model during training and determine the more semantically closer answer and provide the rating. Limitation: False negative scores are also generated which need to be checked manually which is very tedious job. Hassan et al. [13] employed paragraph embedding using two approaches: (1) Sum of pretrained word vector model such as word2vec, glove, Elmo, and Fasttext (2) utilizing pretrained deep learning model for computing paragraph embedding such as skip-thought, doc2vec, Infersent for both student and reference answer. Once the vectors are generated, author used cosine similarity metrics for computing the similarity between both vectors. Yang et al. [14] utilized deep encoder model which has two encoding and decoding layers. Wherein in encoding layer, student answers
Automated Evaluation of Short …
957
are represented in lower dimensions, and later, their labels information is encoded using softmax regression. While in decoding layer, the output of encoding are reconstructed. Gong and Yao [15], Riordan et al. [16] utilized bidirectional LSTM with attention mechanism. In [15], initially word embeddings are fed to CNN to extract the relevant features which are later given as an input to LSTM layer. The hidden layers of LSTM are aggregated in either mean over time or attention layer which gives single vector. That single vector is passed through a fully connected layer to compute scalar value or a label, while in [16] student response and reference answers are segmented into sentences which in turn are tokenized. Each feature is fed into bidirectional RNN network to generate sentence vectors. On top of it, attention mechanism is applied on each sentence vector and final answer vectors are generated. At last,the answer vector is passed through logistic regression function to predict the scores. Tan et al.[17] have introduced extremely new approach for ASAG by utilizing graph convulational network (GAN). The author has deployed a three-step process: (1) Graph building: Undirected heterogeneous graph for sentence level nodes and word bi-gram level nodes are constructed with edges between them. (2) Graph representation: Two-layer GCN model encode the graph structure. (3) Grade prediction. Table 1 gives a summary of ASAG systems studied by us.
3 Methodology 3.1 General ASAG Architecture Architecture employed by most of the ASAG systems studied so far is shown (see Fig. 1). It majorly comprises of four modules: Preprocessing, feature engineering, model building, and evaluation.
3.1.1
Preprocessing
Though it is not a compulsory phase, but still some sought of preprocessing such as stop word removal, case folding, stemming/lemmatization are employed in many works to extract content rich text for generating vectors.
3.1.2
Feature Engineering
It is a task in which domain specific as well as domain general information is extracted and a vectors are generated which can be fed into the model to generate more accurate score for Student Answer (SA). Various feature engineering techniques are utilized in
958
S. Patil and K. P. Adhiya
Table 1 Summary of Existing ASAG systems studied Ref. No. and Year Approach Technique
Evaluation metrics
Machine learning
TF-IDF and SVM
Accuracy
Machine learning
POS, TF-IDF, entropy and SVM Dependency graph and SVM Siamese Bi-LSTM
Precision
[6] and 2009 [3] and 2010 Machine learning [18] and 2011 Deep learning
RMSE and Pearson correlation MAE and RMSE
[9] and 2017 Deep learning [15] and 2017 Deep learning [8] and 2018 Deep learning
Bi-LSTM with attention mechanism word2vec and Siamese GRU model Paragraph embedding
[13] and 2018 Deep learning [14] and 2018 Machine learning [5] and 2018
Deep learning [2] and 2019 Deep learning
Deep sutoencoder grader Text statics, lexical similarity, semantic similarity, n-gram, random forest, and extreme gradient boosting Bi-LSTM with attention mechanism LSTM
QWKappa Pearson correlation, Spearman, and MSE Cosine similarity, Pearson correlation, and RMSE Accuracy and QWKappa Accuracy, macro F1-score, and weighted F1-score
MSE and MAE QWKappa
[7] and 2019 Deep learning [10] and 2019 Deep learning [4] and 2019
Deep learning [11] and 2020 Deep learning [17] and 2020 Deep learning [12] and 2020
3-attention based Pearson correlation Bi-LSTM and MemN2N Lexical diversity, QWKappa word2vec, prompt and content overlap, and random forest Skip thought Pearson correlation and RMSE Graph convulation QWKappa network Sentence-BERT QWKappa and accuracy
Automated Evaluation of Short …
959
Fig. 1 General system architecture for automated short answer grading system
the study carried out so far. Some of them are TF-IDF, n-gram, word embedding [8], and sentence embedding [11, 12].
3.1.3
Model Building
In this phase, researchers have incorporated various machine learning such as SVM [3, 6, 18], logistic regression, and deep learning techniques such as LSTM [7], Bi-LSTM [9, 10], Bi-LSTM with attention mechanism [15, 16], and many more for predicting correct class labels or predicting correct score for SA.
3.1.4
Evaluation
It majorly contributes in computing the amount of similarity and variance in system generated labels/scores as compared to labels/score assigned by human evaluator. Various evaluation methods employed are root mean square error (RMSE), Pearson correlation, QWKappa, accuracy, etc.
960
S. Patil and K. P. Adhiya
3.2 Proposed Methodology Our major intention to carry out this work is to recognize the level of semantic similarity between student answer and model answer. As per the study conducted, there are many ways through which semantic equivalence between terms can be recognized such as TF-IDF, LSA, and embedding. Many research work studied related to ASAG utilized word embedding-based feature for recognizing the semantic similarity between terms, but the corpus on which word embedding are trained is usually model answer as well as student collected responses, which many a times are limited in context. So, domain-specific and domain general corpora can be utilized to train word embedding. Even utilization of sentence embedding technique can overcome the problem of understanding context and intention in the entire text. The proposed methodology concentrates on utilizing the word2vec skip gram model for feature extraction and 2-Siamese Bi-LSTM with attention mechanism for predicting scores for students answers. The work will be limited to data structure course of undergraduate program of engineering. Instead of using the pretrained word embedding, we are going to generate domain-specific word vectors by utilizing the knowledge available for data structures. Later, once the domain-specific word embedding is generated, word vectors for concepts utilized by reference answer and students answer will be extracted from those embedding, and we will fed the same to Siamese networks to predict the similarity between SA and RA. For the purpose of this research, we have created our own dataset by conducting two assignments on class of undergraduate students wherein ten general questions related to data structures are asked in assignment-1 to more than 200 students wherein students were expected to attempt the questions in 3–4 sentences and in assignment-2 which has four application-oriented open-ended questions on real-world situation, students were asked to answer the same using more than 150 words. Total of about 1820 answers are collected so far, even we have graded the acquired answer through human evaluator for checking the reliability of scores predicted by model in near future. Table 2 shows sample questions asked in assignment-1 and assignment-2 and sample student answers collected.
4 Discussion and Conclusion The main objective behind carrying out this work was to study, analyze, and present the development going on in the field of automated short answer grading which is related to machine learning and deep learning approaches. We discovered that many researchers utilized only reference answers (RA) provided by instructor for rating student answer (SA). So there is a chance that some of the concepts which are left out or presented in different way in RA and SA may lead to incorrect label/score assignment. Also generation of word vectors contribute
Automated Evaluation of Short …
961
Table 2 Sample question and answer collected for future study Assignment
Sample question
Sample student answers collected
Assignment-1
What do you understand from SA1: Stack overflow-A stack overflow is an stack overflow and stack undesirable condition in which a particular computer underflow condition? program tries to use more memory space than the call stack has available. Stack Underflow-An error condition that occurs when an item is called for from the stack, but the stack is empty SA2: Stack overflow condition means user cannot be able to insert or push any element as the stack is already been filled by elements i.e. if top = max − 1 then it is overflow. Stack underflow condition means there is no element in the stack as top is pointing to null
Assignment-2
Suppose we want to implement a navigation option in a web browser. Now we have two options for this particular purpose, a circular queue array based and doubly linked list. Which option you will select and compare both the options
SA1: The most recommended among both of them is the doubly linked list because we can easily go back and forth with the doubly linked list as it hast 2 pointers aiming to front and previous and it will be easy to navigate back and forth which is a common feature in our file managers navigation bar whereas if we consider the circular queue array we can easily navigate in forward direction but to go back we have to cover a loop and hence it’ll take more time and also will destroy the users experience. Also the circular queue array don’t have end nodes hence creating an overall mess of how the navigation will be executed SA2: Circular Queue is a linear data structure in which the operations are performed based on FIFO (First In First Out) principle and the last position is connected back to the first position to form a circle.Where as doubly linked list is the linear data structure in which data is sequentially stored. It has one extra pointer(*prev) along with *next to point to the previous node.Hence in doubly linked list we can traverse back and forth through the list. Main purpose of the navigation option is to help user quickly switch between the pages. If it is implemented using circular queue then if the user wants to traverse to first page from somewhere in middle,he will have to first traverse till the end and then only he will be able to access the first page. Hence more time will be required.Where as if the same was implemented using doubly linked list the user could have back traversed easily to the first page,which he cannot do in circular queue. Hence navigation option should be implemented using doubly linked list
962
S. Patil and K. P. Adhiya
highly towards assigning labels/score to SA as compared to model building, because word vector captures a semantical relationship, which will categorize SA more accurately even if it does not have exact terms or concepts as that of RA. Therefore, there is a need of identifying textual entailment in SA and RA and computing correct level of similarities in them. In near future, we will implement the proposed model and test it on the data collected by us. We will also compare the accuracy of our proposed model with already available popular ASAG systems. The proposed method mainly concentrates on implementing an assessment method for evaluating textual answers, mathematical expressions, and diagrammatic representation will not be evaluated by proposed model. So, we can conclude that latest advancement in the field of natural language processing, machine learning, deep learning, and feature extraction method will surely contribute in the domain of short answer grading task.
References 1. Galhardi LB, Brancher JD (2018) Machine learning approach for automatic short answer grading: a systematic review. In: Simari GR, Fermé E, Gutiérrez Segura F, Rodríguez Melquiades JA (eds) IBERAMIA 2018. LNCS (LNAI), vol 11238. Springer, Cham, pp 380–391. https:// doi.org/10.10007/1234567890 2. Burrows S, Gurevych I, Stein B (2014) The eras and trends of automatic short answer grading. Int J Artif Intell Educ 25(1):60–117. https://doi.org/10.1007/s40593-014-0026-8 3. Hou WJ, Tsao JH, Li SY, Chen L (2010) Automatic assessment of students’ free-text answers with support vector machines. In: García-Pedrajas N, Herrera F, Fyfe C, Benítez JM, Ali M (eds) Trends in applied intelligent systems. IEA/AIE 2010. Lecture Notes in Computer Science, vol 6096. Springer, Berlin, Heidelberg 4. Kumar Y, Aggarwal S, Mahata D, Shah R, Kumaraguru P, Zimmermann R (2019) Get IT scored using AutoSAS—an automated system for scoring short answers, AAAI 5. Galhardi LB, de Mattos Senefonte HC, de Souza RC, Brancher JD (2018) Exploring distinct features for automatic short answer grading. In: Proceedings of the 15th national meeting on artificial and computational intelligence. SBC, São Paulo, pp 1–12 6. Quah JT, Lim L, Budi H, Lua K (2009) Towards automated assessment of engineering assignments. In: Proceedings of international joint conference on neural networks, pp 2588–2595. https://doi.org/10.1109/IJCNN.2009.5178782 7. Zhang L, Huang Y, Yang X, Yu S, Zhuang F (2019) An automatic short-answer grading model for semi-open-ended questions. Interact Learn Environ, pp 1–14 8. Ichida AY, Meneguzzi F, Ruiz DD (2018) Measuring semantic similarity between sentences using a siamese neural network. In: International joint conference on neural networks (IJCNN), pp 1–7. https://doi.org/10.1109/IJCNN.2018.8489433 9. Kumar S, Chakrabarti S, Roy S (2017) Earth mover’s distance pooling over Siamese LSTMs for automatic short answer grading. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, pp 2046–2052. https://doi.org/10.24963/ijcai.2017/284 10. Kwong A, Muzamal JH, Khan UG (2019) Automated language scoring system by employing neural network approaches. In: 15th International conference on emerging technologies (ICET), pp 1–6. https://doi.org/10.1109/ICET48972.2019.8994673 11. Gomaa WH, Fahmy AA (2019) Ans2vec: a scoring system for short answers. In: Hassanien A, Azar A, Gaber T, Bhatnagar R, Tolba MF (eds) The international conference on advanced machine learning technologies and applications (AMLTA2019). AMLTA 2019. Advances in intelligent systems and computing, vol 921. Springer, Cham
Automated Evaluation of Short …
963
12. Ndukwe IG, Amadi CE, Nkomo LM, Daniel BK (2020) Automatic grading system using sentence-BERT network. In: Bittencourt I, Cukurova M, Muldner K, Luckin R, Millán E (eds) Artificial intelligence in education. AIED 2020. Lecture Notes in Computer Science, vol 12164. Springer, Cham 13. Hassan S, Fahmy AA, El-Ramly M (2018) Automatic short answer scoring based on paragraph embeddings. Int J Adv Comput Sci Appl (IJACSA) 9(10):397-402. https://doi.org/10.14569/ IJACSA.2018.091048 14. Yang X, Huang Y, Zhuang F, Zhang L, Yu S (2018) Automatic Chinese short answer grading with deep autoencoder. In: Penstein Rosé C et al (eds) AIED 2018, vol 10948. LNCS (LNAI). Springer, Cham, pp 399–404 15. Riordan B, Horbach A, Cahill A, Zesch T, Lee C (2017) Investigating neural architectures for short answer scoring. In: Proceedings of the 12th workshop on innovative use of NLP for building educational applications, pp 159–168. https://doi.org/10.18653/v1/W17-5017 16. Gong T, Yao X (2019) An attention-based deep model for automatic short answer score. Int J Comput Sci Softw Eng 8(6):127–132 17. Tan H, Wang C, Duan Q, Lu Y, Zhang H, Li R (2020) Automatic short answer grading by encoding student responses via a graph convolutional network. In: Interactive learning environments, pp 1–15 18. Mohler M, Bunescu R, Mihalcea R (2011) Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In Lin D (ed) Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies volume 1 of HLT ’11. Association for Computational Linguistics, Portland, pp 752–762
Interactive Agricultural Chatbot Based on Deep Learning S. Suman and Jalesh Kumar
Abstract The goal of technological innovation is to assist humans in making their lives easier. This is especially true in the field of natural language processing (NLP). This is why conversational systems, often known as chatbots, have gain popularity in recent years. Chatbot has been used in different domains. These are the intelligent systems developed using machine learning algorithms and NLP. Although technology has advanced significantly in the sector of agriculture, farmers still do not have easy access to this knowledge, which necessitates extensive online searches. This is where a chatbot can help them in providing the answers to their queries quickly and easily when compared to traditional methods. In this paper, the chatbot has been intelligently built to recognize poorly grammatically defined statements, misspelled words, and unfinished phrases. Natural language processing is used by the system to read user queries and keywords, match them against the knowledge base, and offer an answer with correct results, making it easier for users to communicate with the bot. To make the responses more intelligible, classification algorithms are used to provide non-textual responses that are easily seen by the farmers. Keywords Natural language processing · Chatbot · Natural language toolkit · Knowledge base · Machine learning
1 Introduction Agriculture plays a significant role in employing people in many parts of the world. Agriculture is the main source of income for the majority of the population. India is a country where 70% of people reside in rural areas, and they primarily depend on agriculture where 82% of farmers being marginal and small. The GDP growth of many countries is still based on agriculture [1]. S. Suman (B) · J. Kumar J N N College of Engineering, Shivamogga, India J. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_70
965
966
S. Suman and J. Kumar
The advancement in the field of farming is at a faster pace nowadays. But much information is still not accessible to the farmers as it requires many steps, and it also fails in fetching the responses to the queries. A chatbot is conversational assistant which provides easy communication to the users as they are conversing with the human being. The users’ requests will be processed and interpreted, and the appropriate responses will be sent [2]. The chatbot will extract relevant entities by identifying and interpreting the intent of a user’s request, which is a vital task for the chatbot. Farmers are facing a low-yield issue due to the lack of information. Many of the agricultural-related advanced techniques are discussed in [3–5]. In the proposed work, querying techniques that help the farmers to get agriculture information is designed and implemented. The NLP [6] technique used to take the natural language of humans as input. It will also help the system in interpreting the user query even if there is an incomplete sentence or grammatical mistake. The objectives of the project are as follows, (i) (ii) (iii) (iv)
Creating a user interface that allows people to engage successfully in order to attain the required results in fewer steps. Processing of the extracted data into a suitable format using machine learning algorithms. Respond quickly to the user query and suggest the response for it. A system that will respond to the users in real time.
The paper is organized into the different sections as follows. The literature review is covered in Sect. 2. Section 3 describes the system architecture. In Sect. 4, the methodology is explained. The results and analysis are presented in Sect. 5. Section 6 is the conclusion.
2 Literature Survey This section discusses a literature review to highlight the work that has been done so far in the field of chatbots. Kannagi et al. [1] gives insight into the farmbot application which helps the farmers in solving their queries related to their agricultural farmland. Farmbot uses natural language processing (NLP) technique to identify keywords and to respond with accurate results. NLP technique is used to interpret the natural language of human as an input. Based on training dataset, neural network will be constructed, and gradient descent algorithm is used in error optimization. The test dataset will go through certain preprocessing steps, classification, and finally the construction of a neural network. The system output will be shown in text format in the user interface, and the text will be translated to speech using the Web Speech API. The ‘ARIMA’ prediction method used to forecast future cost of the agricultural products. The study by Karri et al. [7] discusses the chatbot that was successful in answering the queries. It follows two steps.
Interactive Agricultural Chatbot Based on Deep Learning
(1) (2)
967
Bag-of-Words algorithm The Seq2seq model (training model).
This model is then used by a recurrent neural network. It takes two inputs at the same time, one from the prior and the other from the user, making it recurrent. The state-space tree will be constructed using the breadth-first search strategy. The successor will be generated from all current states at each level, and the heuristic values will be determined by sorting in ascending order. If no response is observed, the technique will be repeated by widening the beam. The chatbot is trained with the help of the Corpus dataset. Sawant et al. [8] proposes an intelligent system that uses analytics and data mining to assist farmers in assisting different agricultural approaches so that they can choose appropriate crops based on the meteorological, geographical, and soil circumstances of their location. The implementation algorithm is a K-nearest neighbor algorithm. The precision of the model is determined by the k value. Each tree’s output will be single class, and the forest will select class with the highest number of votes. The results of the algorithm are compared to one another. Algorithms are used to determine the accuracy of training and testing. Agribot’s role is not limited to suggesting crops to farmers and also assists in acquiring a better understanding of their crops so that they can extend their shelf life. Vijayalakshmi et al. [9] portrayed how machine learning and artificial intelligence are transforming the IT industry into a new landscape. Talkbot is a virtual conversational assistant that parses inquiries, identifies keywords, matches them to the knowledge base, and responds to the user with correct results using natural language processing. If the query is based on classification, it undergoes naïve Bayes classifier, which retrieves related results using a knowledge base. To get equivalent responses, the highest responses are looped. Then, the output of textual responses will be sent to the API for speech synthesis. The API accepts text as input, converts it to speech, and outputs it. An overview of the numerous question-answering systems that have been used to resolve farmer-related queries are discussed in [10–12]. A chatbot is being constructed using the Kisan Call Center dataset to answer the farmers’ questions. The Sen2Vec model is used by Agribot. The model’s outputs, as well as the weights required for a matrix of the model, are trained into embedding words. The model outputs are derived from the most comparable query of training data, and then, cosine similarity is used to compare them to embedding vectors. The ranking was determined by the best response to the production. Agribot helped the farmer in solving the queries related to agriculture, animal husbandry, and horticulture using natural language technology. Kohli et al. [13] proposed that even though several chatbots have been created, there is still a problem with the data-driven system, when you consider that it is tough to cope with the massive quantity of facts required for the development. So, with the help of a python pypy source, a chatbot is developed by way of taking user necessities into account. The principal buffer is where the message that wants to be printed via the chatbot to a consumer is saved in the Run.py file. It builds a connection socket
968
S. Suman and J. Kumar
switch. The send message is declared throughout the socket to the joinroom feature efficiently as soon as the initialization is completed. One hundred interactions are mined for the trying out domain to see how the chatbot will correctly apprehend user inquiries, and then, it is run with questions that are identifiable in the chatroom through an agent if the chatbot and examined to see whether the chatbot will reply incredibly or deceptively. Arora et al. [14] portrayed that chatbot has been proposed that would help the farmers in providing various solutions to their queries as well as help them in the process of decision-making. Bot not only provides an answer but also answers to the questions that have been frequently asked and emphasizes weather forecasting and crop disease detection. Sequence-to-sequence model is used for building a conversational system. It is a multilayer perceptron RNN. Also known as encoder-decoder. It is the generative class of models. It means model will automatically grasp the data and response will be in terms of word by word. Next step will be the creation of model of classification. They can be constructed from beginning or transfer learning. The created model trained for 50 epochs and batch size 20. Prediction of weather is included as one of the features in Agribot. OpenWeatherMap is an administrator who gives the information regarding climate, which includes current climatic condition information. The chatbot will be able to guide farmers as in the part of detection of disease in crops, weather prediction. The creation of chatbots using natural language approaches which is an initiative to annotate and observe the interaction between humans and chatbots are described in [15]. The proposed system performs an analysis of the parameters of machine learning that will help the farmers in increasing their yield. The analysis is done on the rainfall, season, weather, and type of soil of particular area which is based on the historic data. The chatbot is trained using NLP. The system helps the farmers of remote places to understand the crop to be grown based on the atmospheric condition. K-nearest neighbors algorithm is used which stores the available cases and also classifies based the measure of similarity. The data has been collected from different sources of government websites and repositories. The database is trained using machine learning using TensorFlow architecture and KNN algorithm. The NLP is used in training and validation of the data. Once the system has gone through all the processes of data collection, cleaning, preprocessing, training, and testing, it sends it to the server for use. The system helps the farmers of remote places where the reach of connectivity is less and to better understand the crop to be grown based on the atmospheric condition and also suggest the answers to their queries. Based on the analysis made on the literature survey, we require a chat platform that uses the Internet facility to make the discussion process more accessible and automated. In addition, the system should include capabilities such as real-time outputs and a user-friendly interface for farmers. A system like this could help farmers bridge the knowledge gap and develop a more productive market.
Interactive Agricultural Chatbot Based on Deep Learning
969
Fig. 1 System architecture of the chatbot
3 System Architecture This section gives an overview of the system architecture that was employed in the project. The proposed model has been divided into three stages, processing of the query, training, development of a chatbot, and retrieval of responses. The chatbot application system architecture is shown in Fig. 1. The user enters their query as text through the user interface. The interface receives user questions, which are then sent to the Chatbot application. Then, the textual query in the application goes through a preprocessing stage. During the preprocessing phase, the query sentence tokenized into words, stopwords removed, and words are stemmed to the root words. Then, query would be classified using a neural network classifier, with the appropriate results being given to the user as text.
4 Methodology The proposed methodology focuses on responding to the farmer queries, from which they can get the benefits; it comprises of three steps: (A) (B) (C)
Processing of the query Training and development of a chatbot Retrieval of responses
970
(A)
S. Suman and J. Kumar
Processing of the query
The primary goal of NLP is to understand human language as input. This aids system in comprehending the input, even if it contains grammatical errors or is incomplete phrases. As a result, the classification algorithm’s efficiency improves. (1)
(2) (3)
(4)
(5)
(B)
Segmentation of Sentences: The initial step in NLP is sentence segmentation, which divides a paragraph into individual sentences. Tokenization: Tokenization is a technique for separating sentences into tokens or words. Noise Removal: Noise removal is the process of removing stop words that are not related to the context. The stopwords have been omitted so that the classification of likelihood is not accounted for. Normalization of the Lexicon: The process of transforming various input representations into a single representation is known as lexicon normalization. One of these is the stemming process, in which the suffixes of a word are removed. Bag of Words or a Vector Space: Extracted word will be converted into a feature vector, with the value of binary serving as a weight for each feature’s representation (0 indicates that the feature is not there, 1 indicates that the feature is present). Training and development of a chatbot
The dataset file, which comprises 200 agricultural questions and responses for various crops (rice, groundnut, cotton, wheat, bajra, and sugarcane) has been created and imported. After that, the data is processed and translated to a vectored format. The chatbot is trained by developing neural network, which is computational model with two hidden layers that consists of input layer and output layer. Each hidden layer transforms the inputs into a format that can be used by outputs. In the input layer, nodes or neurons reflect the number of words in the dataset. Each node given a random weight. The weighted value is then combined with bias and run through activation function. The second hidden layer accomplishes same task as first, but its input is the output of the first hidden layer. The output layer multiplies weights before sending them to activation function. The nodes in output layer are features or classes. The activation function used is softmax, with a value between 0 and 1 as the output. For classification, this deep neural network is employed. (C)
Retrieval of responses
A neural network classification model is built using the training dataset. The probability for test data is generated using the built model. The output of the system provided to user as text via the user interface.
Interactive Agricultural Chatbot Based on Deep Learning
971
5 Result and Analysis The chatbot was created based on the research and the various methodologies. After making reasonable predictions, a Google Colaboratory setup was created to make the chatbot interactive. The developed chatbot can help farmers in the following areas like soil detection, recommendation of pesticides, and details about Kisan call center as shown in Fig. 2. Testing can be done to assess the chatbot’s quality. The procedures involved in conducting chatbot tests are (i) (ii)
Gathering an overview of questions that can be asked and Determining if the responses are correct or incorrect.
Table 1 gives the sample of the responses that have been retrieved from the chatbot application in the first query; there is a spelling mistake, but still, the chatbot is successful in rendering the answer. In the second query, the chatbot has not been trained for the particular query, but it learned from the training of a similar type of dataset. In the third query, the user asks questions that are not related, so the chatbot is responding as out-of-bound questions. The false-negative scenario is described in Table 2, where the chatbot inaccurately predicts the response to the user query. This can be overcome by training the model with a large number of datasets.
Fig. 2 Chatbot interface
Table 1 Chatbot responses
User query
Chatbot response
Soil for maize/maiz crop The best soil for maize crop is clay loam Hiiiiiiiiiiiii
Good to see you
Do u have bf/boyfriend?
This is out of bound question I can’t answer
972 Table 2 Incorrect response of the chatbot
S. Suman and J. Kumar User query
Chatbot response
Sugar cane pesticide
The pesticides for groundnut crop is Carbaryl 10 D-25 kg/ha
Kisan call center number Please visit our website pmkisan.gov
6 Conclusion The need and necessity for chatbots in numerous industries are justified by the expansion and popularity of chatbots. The performance of chatbots is shown to be relatively high when compared to traditional approaches. The typical amount of time spent interacting with a chatbot is fairly brief, and it helps farmers to get quick responses to their questions. A chatbot has been proved to suit the needs of users by responding quickly and offering services and information. By leveraging natural language to answer questions about agriculture, our chatbot has benefited neglected communities. The chatbot will provide agricultural facts to the farmer. To get an answer, a farmer can send a direct message. Our approach would allow a farmer to ask any number of questions at any moment, which would aid in the speedier and more widespread adoption of current farming technology. Because most farmers interact in their native languages, future advancements are possible. As a result, there is a need for a solution that can connect the model and their languages, as well as rainfall prediction, production, and other aspects estimation.
7 Future Scope Farmers can ask their questions verbally and receive answers from the bot using the speech recognition capability. Because most farmers interact in their native languages as a result, there is a need for a solution that can connect the model and their languages. The weather prediction module can be added to accesses the location and suggest the crops based on that. To support farmers, integration with different channels such as phone calls, SMS, and various social media platforms can be used.
References 1. Kannagi L, Ramya C, Shreya R, Sowmiya R (2018) Virtual conversational assistant—the FARMBOT”. In: International journal of engineering technology science and research, vol. 5, pp 520–527 2. Akma N, Hafiz M, Zainal A, Fairuz M, Adnan Z (2018) Review of chatbots design techniques. Int J Comput Appl 181:7–10
Interactive Agricultural Chatbot Based on Deep Learning
973
3. Talaviya T, Shah D, Patel N, Yagnik H, Shah M (2020) Implementation of artificial intelligence in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artif Intell Agric 4:58–73 4. Jha K, Doshi A, Patel P, Shah M (2019) A comprehensive review on automation in agriculture using artificial intelligence. Artif Intell Agric 2:1–12 5. Ganatra N, Patel A (2021) Deep learning methods and applications for precision agriculture. In: Joshi A, Khosravy M, Gupta N (eds) Machine learning for predictive analysis. Lecture notes in networks and systems, vol 141. Springer, Singapore. https://doi.org/10.1007/978-98115-7106-0_51 6. Nagarhalli TP, Vaze V, Rana NK (2020) A review of current trends in the development of chatbot systems. In: 2020 6th ınternational 48 conference on advanced computing and communication systems (ICACCS), pp 706–710 7. Karri SPR, Kumar (2020) Deep learning techniques for ımplementation of chatbots. In: 2020 ınternational conference on computer communication and ınformation (ICCCI-2020) 8. Sawant D, Jaiswal A, Singh J, AgriBot P (2019) An intelligent interactive interface to assist farmers in agricultural activities. In: 2019 IEEE Bombay section signature conference (IBSSC) 9. Vijayalakshmi K, Meena P (2019) Agriculture TalkBot using AI. Int J Recent Technol Eng (IJRTE) 8:186–190 10. Jain N, Jain P, Kayal P, Sahit PJ, Pachpande S, Choudhari J, Singh M (2019) AgriBot: agriculture-specific question answer system. In: 2019 IEEE Bombay section signature conference (IBSSC) 11. Niranjan PY, Rajpurohit VS, Malgi R (2019) A survey on chatbot system for agriculture domain. In: 2019 ınternational conference on advances in ınformation technology, pp 99–103 12. Jain N, Jain P, Kayal P, Sahit J, Pachpande S, Choudhari J (2019) AgriBot: agriculture-specific question answer system 13. Kohli B, Choudhury T, Sharma S, Kumar P (2018) A platform for human-chatbot ınteraction using python. In: Second ınternational conference on green computing and ınternet of things, pp 439–444 14. Arora B, Chaudhary DS, Satsangi M, Yadav M, Singh L, Sudhish PS (2020) Agribot: a natural language generative neural networks engine for agricultural applications. In: 2020 ınternational conference on contemporary computing and applications (IC3A), pp 28–33 15. Yashaswini DK, Hemalatha R, Niveditha G (2019) Smart chatbot for agriculture. Int J Eng Sci Comput 9:22203–22205, May 2019
Analytical Study of YOLO and Its Various Versions in Crowd Counting Ruchika, Ravindra Kumar Purwar, and Shailesh Verma
Abstract Crowd counting is one of the main concerns of crowd analysis. Estimating density map and crowd count in crowd videos and images has a large application area such as traffic monitoring, surveillance, crowd anomalies, congestion, public safety, urbanization, planning and development, etc. There are many difficulties in crowd counting, such as occlusion, inter and intra scene deviations in perception and size. Nonetheless, in recent years, crowd count analysis has improved from previous approaches typically restricted to minor changes in crowd density and move up to recent state-of-the-art systems, which can successfully perform in a broad variety of circumstances. The recent success of crowd counting methods can be credited mostly to the deep learning and different datasets published. In this paper, a CNNbased technique named You Only Look Once (YOLO), and its various versions have been studied, and its latest version, YOLOv5, is analyzed in the crowd counting application. This technique is studied on three benchmark datasets with different crowd densities. It is being observed that YOLOv5 gives favorable results in crowd counting applications with density ranges from low to medium but not in a very dense crowd. Keywords CNN · YOLO · YOLOv5 · Crowd counting · Density estimation · Crowd analysis
1 Introduction Crowd counting is an approximation of the number of persons in an image or a video sequence. It is extensively used in application domains such as public safety, traffic monitoring, surveillance, smart city strategies, crowd abnormality detection, Ruchika (B) · R. K. Purwar · S. Verma Guru Gobind Singh Indraprastha University, Delhi, India e-mail: [email protected] R. K. Purwar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_71
975
976
Ruchika et al.
etc. [1]. Crowd counting has remained a persistent challenge in the computer vision and machine learning fields due to perspective distortions, severe occlusions, diverse densities, and other issues. Existing research has primarily focused on crowds of consistent density, i.e., sparse or dense. In the real world, however, an image may include uneven densities due to camera perspective and changing distribution of people in the crowd. As a result, accurately counting individuals in crowds need to focus on all density levels. Crowd counting can be done in number of ways. The fundamental approach is to count manually, but it is infeasible to do in moderate and highly congested scenes. Another approach is to enumerate the strength of humans in a video frame, further extrapolating it to the whole frame to estimate the total strength. Since no such algorithms give a precise count, but computer vision techniques can produce notably accurate estimates. Broadly used five methods of crowd counting are given below [2]: • Detection-based methods: Detectors are like moving windows used to identify and count the people in an image. This method can further be classified as: Monolithic detection: a direct approach to count people in an image or video. Pedestrian detection involves this technique by training the classifier based on full human body appearance [3–5]. Part-based detection: classifiers are trained on partially occluded humans body parts like head, shoulder, face, etc., to count the people [6–8]. Shape based detection: realistic shape prototypes are used for detection purposes. These prototypes are employed to identify the people in images. Multi-sensor detection: this technique incorporates multi-view data generated by multiple surveillance cameras applied in an area. However, multi-camera setup suffers due to varied resolutions, viewpoints, variations in illuminations and backgrounds. Different solutions to such issues are spatio-temporal occurrences, scene structures, object size [9, 10]. • Clustering-based methods: This method uses relative uniformity in visual features and individual motion fields. Lucid feature trajectories are clustered to show the independently moving entities. Kanade-Lucas-Tomasi (KLT) tracker [11], Bayesian clustering to track and group local features into clusters [12], generate head detection-based person hypothesis [13] are some the techniques usedin this method. • Regression-based methods: In these methods, patches are cropped from the image, and corresponding to each patch, low-level features are extracted. Density is estimated based on the collective and holistic description of crowd patterns [14]. • Density estimation-based methods: A density map is formed for objects in the image. Extracted features and their object density maps are linearly mapped. Random forest regression is also used for learning non-linear mapping. • CNN-based methods: CNNs are used to build an end-to-end regression model to analyse an image. Crowd counting is done on the whole image rather than
Analytical Study of YOLO and Its Various Versions …
977
only on a particular part of it. CNNs give remarkable results when working with regression or classification tasks for crowd counting. For sparse crowd authors in [11, 15] used sliding window detectors and hand crafted features are used by authors in [16, 17] for regression-based techniques. These techniques are not effective in dense crowd counts due to occlusions. Researchers used CNN-based approaches to predict the density and gives better results as given in [18–22]. YOLOv5 [23] has been introduced for object detection of different types of objects in video and images. ˙In this paper, it has been analysed exclusively for crowd counting in different video sequences ranging from low to high density. Weights are obtained by training the model on the COCO dataset [24]. Based on these pre-trained weights, model is tested on three different datasets. Results show that the model works well for low to medium dense crowds, but the performance degrades in densely crowded scenarios.
2 Literature Survey YOLO—You Only Look Once is a fast and easy-to-use model designed for object detection. YOLO was firstly introduced by Joseph Redmon et al. [25] in the year 2016. Future versions YOLOv2, v3, v4, v5 were published with improvements in previous releases. Table 1 describes the basic working and year of publication of all the YOLO versions. Table 2 shows the features and limitations of all the versions of YOLO.
2.1 YOLO Before YOLO, classifiers were used for object detection, but in YOLO, the full image is directly used in NN to speculate class probabilities and bounding boxes. In realtime, YOLO processing speed is 45 fps. Its modified version, Fast YOLO processes at 155 fps. The detailed architecture of YOLO can be found in [25]. Steps involved in object detection are as given below: 1. 2. 3. 4.
The input image is divided into A × A grids. A grid cell containing the center of the object is used for detection. Predict bounding box B and confidence score C for each grid cell. The confidence of model for the bounding box and the object in is predicted as: Confidence scores = Pr(object)*IoUtruth pred If no object C = 0; Otherwise, C = IoU(ground truth and predicted box)
978
Ruchika et al.
Table 1 YOLO, its versions, and their working procedures YOLO versions
Publication year Working
YOLO [25]
2016
• Divide image in a grid size A X A, and detect bounding box (BB) • Calculate object center coordinates (x, y), height and width (h, w), confidence score C, conditional probability Pr factors for different classes • Select boxes depending on the threshold for confidence and Intersection of Union (IoU) • Non-Max suppression removes duplicate detection of object in the image by accepting the prediction with maximum confidence level from all detected BB • Detection window with threshold more than the threshold of both IoU and confidence score is accepted
YOLOv2/YOLO9000 [26] 2017
• Batch normalization layers are used after each convolutional layer • Model consists of 30 layers • Anchor box is added to the model architecture
YOLOv3 [27]
2018
• Network has 106 layers • Small to tiny objects ate detected on three different scales • Nine anchor boxes are taken corresponding to three boxes per scale • It is a MultiLabel problem with modified error functions
YOLOv4 [28]
2020
• Model uses Weighted-Residual-Connections (WRC), Cross Stage-Partial Connections (CSP), to improve the learning competency of CNN • Self-adversarial training (SAT), and data augmentation techniques operates both in forward–backward stages of network • A new self-regularized non-monotonic activation function is used • Mosaic data augmentation mix four training images instead of single image • Drop block regularization method is applied for CNN
YOLOv5 [23]
2020
• YOLOv5 has four variations as YOOv5-s, -m, -l, -x: small, medium, large, extra-large, respectively • A two-stage detector contains Cross Stage Partial Network (CSPNet) [29] as backbone, Path Aggregation Network (PANet) [30] as head of model
Analytical Study of YOLO and Its Various Versions …
979
Table 2 Features and limitations of YOLO and all of its versions YOLO version
Features
Limitations
YOLO
• YOLO is a fast regression-based problem. ˙It train and test the whole image for detection, so it implicitly encodes circumstantial data about classes and their appearance • Background errors are almost half in YOLO than the existing method Fast R-CNN • YOLO represents objects generally, i.e., training the model in real images and testing on artwork. Probability of system failure is very less even when tested on novel fields or hasty inputs [31, 25] • In YOLO, the whole model is trained collectively and is trained on the loss function that is directly linked to detection accuracy
• Accuracy of YOLO is less • YOLO cannot localize small objects in a group efficiently • It is not able to detect objects with unusual aspect ratios • YOLO uses ’A x A’ grid size, and each grid can predict one class of objects, which leads to a limited number of detections, so it misses various objects • It can detect a maximum of ’A x A’ objects YOLO classifier network trains and detect on images with resolution 224 X 224 and 448 X 448, respectively. So, simultaneous switching between two different resolutions is required
YOLOv2/YOLO 9000 • It has better speed and accuracy than YOLO • It can execute small-sized, low and high-resolution images, high-framerate videos, multiple video streams, and real-time videos • Network can realize generalized object representation, so model training on real world images is easy
• Dimensions of anchor boxes are selected manually • Model instability occurs as location coordinates (x, y) of the anchor box is predicted during initial iterations • It takes longer to stabilize the prediction due to random initialization
YOLOv3
• It can detect tiny objects • It can detect the objects at three different scales • More number of bounding boxes are there in YOLOv3, so perform better predictions • Multilabel classification of detected objects is possible
• Reduced speed due to Darknet53 architecture • Dissimilar-sized objects are difficult to detect • It is not easy to detect objects that are very close to one another • It is not applicable in sensitive domains like autonomous driving, surveillance, security, etc
YOLOv4
• For accuracy and convergence • YOLOv4 is incompatible with speed, Complete IoU loss is mobile device integration with better than Bounding Box virtual reality [32] regression problem • It can be trained even on a single GPU (continued)
980
Ruchika et al.
Table 2 (continued) YOLO version
Features
YOLOv5
• YOLOv5 is blazingly fast and • YOLOv5 has limited accurate performance on highly dense • It can detect objects with images and videos inconsistent aspect ratios • YOLOv5 architecture is small, so it can be deployed to embedded devices easily • Pycharm weights of YOLOv5 can be translated to Open Neural Network Exchange (ONNX) weights to Core Machine Learning (CoreML) to iOS
5.
6. 7.
Limitations
For each bounding box, five prediction and confidence parameters are: center coordinates of box w.r.t. grid cell boundary (cx , cy ), height and width of bounding box relative to the image size (h, w), predicted confidence C, the IoU between ground truth and predicted box. Single conditional class probability, Pr(classx |object), is predicted for each grid cell. For testing, class-specific confidence score for all bounding boxes is computed as: Pr(classx |object) ∗ Pr(object) ∗ IoUtruth pred = (x + a)n = Pr(classx ) ∗ IoUtruth pre
(1)
2.2 YOLOv2/YOLO9000 YOLOv2 is designed to execute various image resolutions with improved speed and accuracy. A hierarchical view of object classification is used in YOLOv2, and object detectors can be trained on different datasets for detection and classification. ˙In YOLOv2 batch normalization layer is used after each convolutional layer, and dropout layers are removed without overfitting. For ten epochs on ImageNet, the classification network is adjusted at 448 × 448 resolution; meanwhile, the network filters are adjusted for higher resolution images to give efficient results. Bounding box prediction is made by adding anchor boxes [28] replacing fully connected layers of YOLO. For high-resolution outputs, pooling layers are also removed. For a singlecentered cell, the network operates on 416 input images rather than 448. IoU is the main factor in predicting objects in the image. The network takes five terms for all bounding boxes: start coordinates bx , by ; width and height bw , bh ; and centroid bo . Cell offset of the image from its top left corner is (cx , cy ), and old width and height of bounding box are ow , oh . YOLOv2 computes the predictions as below:
Analytical Study of YOLO and Its Various Versions …
981
Px = σ (bx + cx )
(2)
Py = σ b y + c y
(3)
Pw = ow ebw
(4)
Ph = oh ebh
(5)
Pr(obj)*IOU(P, obj) = σ bo
(6)
Variation in YOLOv2 is using DarkNet-19 along with it results in a faster network than YOLOv2. Another variant of YOLOv2 is YOLO9000. In YOLO9000, classification and detection are done collectively. It is capable of detecting 9000 different objects from the images and videos using WordTree method. YOLO9000 trains on the ImageNet classification and COCO detection dataset simultaneously. The network can detect unlabeled objects in images also.
2.3 YOLOv3 YOLOv3 replaced YOLO9000 because of higher accuracy. Due to the complex architecture of DarkNet-53, YOLOv3 is slower, but accuracy is better than YOLO9000. For each bounding box, the network predicts four coordinates labeled as nx , ny , nw , nh , from the top left corner of the image the cell offset is (ox , oy ), and initial height and width of bounding box is (ih , iw ). Method to predict next location is shown in Fig. 1. It is derived as: Fig. 1 Initial dimension and next location prediction of bounding box [27]
982
Ruchika et al.
Px = σ (n x ) + ox
(7)
Py = σ n y + o y
(8)
Pw = i w en w
(9)
Ph = i h en h
(10)
The overlapping threshold is fixed at 0.5. The prediction score of objectness for any bounding box is 1, i.e., the predicted bounding box covers the maximum area of the ground truth object. YOLOv3 predicts across three different scales, i.e., the detection layer detects three different-sized feature maps. For each scale, three anchor boxes are assigned, so nine boxes perform better than YOLO and YOLO9000 both.
2.4 YOLOv4 The YOLOv4 is a fast and accurate real-time network. The structure of YOLOv4 is built using CSPDarknet53 [30] as backbone; Path Aggregation Network (PAN) [2] and Spatial Pyramid Pooling (SPP) [33], as the neck; and YOLOv3 [27] as the head. Authors combine several universal features as the backbone and detector in two categories Bag of Freebies (BoF), Bag of Specials (BoS). BoF is the technique that increases the training cost, and it also improves the detector’s accuracy within a similar estimated time. Whereas the BoS comprises several plugins and post-processing units, which improves detection accuracy with a little increment in inference cost. A number of crowd counting methods have been proposed in literature. Some of them are suitable for low-density videos, while others are suitable for moderate density videos. Further, there is very little work that is applicable for all density crowd videos. Therefore, there is a need to design ubiquitous methods for all density crowd videos.
3 Proposed Model Detailed explanation of proposed model is given in this module. The main aim of this study is counting the number of people in images and videos using the object detection technique YOLOv5. Glen Jocher from Ultralytics LLC [23] designed YOLOv5 and published it on GitHub in May2020. YOLOv5 is an improvement on YOLOv3 and YOLOv4 and is implemented in PyTorch. Till date, developer is updating and
Analytical Study of YOLO and Its Various Versions …
983
Table 3 Variations of YOLOv5 model YOLOv5
Number of parameters
Number of frames per second on GPU
YOLOv5s (smallest)
7.5 million
416
YOLOv5m
21.8 million
294
YOLOv5l
47.8 million
227
YOLOv5x (largest)
89.0 million
145
improving yolov5 by adding new features related to data augmentation, activation function, post-processing to attain best possible performance in detecting the objects. A major emphasis of new features and enhancements in YOLOv5 is the cutting edge for deep learning networks in terms of data augmentation and activation functions. They were partly adapted from YOLOv4 like CSPNet Wang et al. (2000) and partly derived by the YOLOv5 maintainer prior to YOLOv4 contributions. YOLOv5 works on mosaic augmented architecture, i.e., combining N number of images to get a new image, which makes it fast and more accurate in detecting small objects. This enables object detection external to their typical environment and at lower dimensions that decreases the demand of large mini-batch sizes. The YOLOv5 architecture incorporates the extraction of final prediction boxes, features and object classification into a neural network. It simultaneously extracts prediction boxes and detects objects accordingly from the whole image. Compact model size and high-inference speed are reported by YOLOv5, allowing for easy transformation to mobile use cases thru model export. All the versions of YOLOv5 are pre-trained on COCO dataset. YOLOv5 comprises four distinct models as given in Table 3. Different models of YOLOv5 also vary in terms of width and depth of model and the layer channels, with values 1.23 and 1.33, respectively, for YOLOv5x model. The pre-trained YOLOv5x model is used in this study. List of layers with number of filters and their size is tabulated in Table 4. YOLOv5x is a two-stage detector trained on MS COCO [24], consists of Cross Stage Partial Network (CSPNet) [29] as backbone, Path Aggregation Network (PANet) [30] used in head of model for instance segmentation. Two convolutional layers with filter size 1 × 1 and 3 × 3 are composed in Bottleneck CSP unit. Spatial Pyramid Pooling network (SSP) [2] is the backbone of architecture allowing variable size input images and is robust against object distortions, i.e., it can detect the objects with different aspect ratio. Figure 2 shows the overall architecture of YOLOv5x. YOLOv5 repository uses V100 GPU along with PyTorch FP16 inference, for person identification in images and videos. Pre-training of model is done with a batch size of 32 labeled images on COCO model and corresponding weights are used for object detection. Steps of crowd counting using YOLOv5 are as below: 1. 2. 3.
Train YOLOv5 model using the COCO pre-trained weights. The RESNet classifier is used for classification of object using pre-trained weights. For testing the model, the image or video is loaded in inference directory.
984 Table 4 Layer in YOLOv5x [33]
Ruchika et al. Layers
Number of filters Filter size
Backbone Focus
12
3×3
Convolutional
160
3×3
BottleneckCSP (4 layers)
160
1×1+3×3
Convolutional
320
3×3
BottleneckCSP (12 layers) 320 Convolutional
640
1×1+3×3 3×3
BottleneckCSP (12 layers) 640
1×1+3×3
Convolutional
1280
3×3
1280
1×1+3×3
Convolutional
640
1×1
Upsample
2
BottleneckCSP (4 layers)
640
1×1+3×3
Convolutional
320
1×1
Upsample
2
BottleneckCSP (4 layers)
320
Convolutional
320
3×3
BottleneckCSP (4 layers)
640
1×1+3×3
SPP BottleneckCSP (4 layers) Head
1×1+3×3
Convolutional
640
3×3
BottleneckCSP (4 layers)
1280
1×1+3×3
Detection
Fig. 2 Overview of YOLOv5 architecture [33]
Analytical Study of YOLO and Its Various Versions …
4. 5. 6. 7.
985
The model weights are optimized using strip optimizer for input images and videos. Images are rescaled and reshaped along bounding boxes using normalization gain. The boxes are then labeled according to pre-trained weights classes. Weights are processed on the model and generates bounding boxes on input image or video frames to produce the person count as output.
4 Experimental Results The given model is tested on three benchmark datasets, AVENUE dataset [34], PETS dataset [31], Shanghaitech dataset [35], out of which, the first two datasets represent low to medium crowd density. In contrast, the last one is for high-density crowds. AVENUE Dataset: The dataset consists of 35 videos. 30,652 frames corresponding to these videos are used for testing. PETS Dataset: Regular workshops are held by the Performance Evaluation of Tracking and Surveillance (PETS) program, generating a benchmark dataset. This dataset tackles group activities in public spaces, such as crowd count, tracking individuals in the crowd and detecting different flows and specialized crowd events. Multiple cameras are used to record different incidents, and several actors are involved. Dataset consists of four different subsections with different density levels. Shanghaitech Dataset: It is a large crowd counting dataset. ˙It contains 1198 annotated crowd images. Dataset is divided into two sections: Part-A and Part-B, containing 482 and 716 images, respectively. In all, 330,165 persons are marked in the dataset. Part-A images are collected from the Internet, while Part-B images are from Shanghai’s bustling streets. Figures 3, 4, and 5 show the detected humans in bounding boxes for AVENUE, PETS2009, and Shanghaitech datasets. Average Precision (AP) and Mean Absolute Error are the evaluation parameters. Table 5 shows the average precision for three datasets. Table 6 gives the resultant MAE for all datasets. It can be seen that AP for AVENUE and PETS2009 datasets is 99.5%, 98.9%, respectively, whereas it is 40.2% for the high-density dataset Shanghaitech dataset. Further, in terms of MAE, the value is higher for high-density dataset. Therefore, it has been concluded that YOLOv5 works more efficiently for crowd detection for low to medium videos, and its performance degrades for high-density videos.
5 Conclusion and Future Work The proposed work analysis the performance of YOLOv5, which is a CNN-based architecture for crowd counting in different video sequences, ranging from low to high density. Through experimental resuls, it has been observed that crowd counting is done in a better fashion for low to medium density videos only.
986
Ruchika et al.
Fig. 3 Detection of humans in various frames of different video sequences of AVENUE dataset (i)–(ii) represent frame 1, and frame 51 of video sequence 2, (iii)–(iv) represent frame 20 and frame 101 of video sequence 7, (v)–(vi) represent frame 301 and 451 of video sequence 13
Fig. 4 Detection of humans in various frames of different video sequences of PETS2009 dataset. (i)–(ii) represent frame 1, 1001 of video sequence 2, (iii)–(iv) represent frame 101, 651 of video sequence 8, and (v)–(vi) represent frame 1, 251 of video sequence 13
As future work, authors are working on modifying the YOLOv5 architecture to make it suitable for dense video sequences.
Analytical Study of YOLO and Its Various Versions …
987
Fig. 5 Detection of humans in different images of Shanghaitech Dataset
Table 5 Average precision for three datasets
Table 6 MAE for all the datasets
Dataset
Average precısıon (AP)
AVENUE
99.5
PETS2009
98.9
SHANGHAITECH
40.2
Dataset
MAE
AVENUE
5.62
PETS2009
4.89
SHANGHAITECH
55.28
Acknowledgements This work is sponsored by Visvesvaraya Ph.D. Scheme issued by the Ministry of Electronics and Information Technology, Govt. of India, as employed by Digital India Corporation.
988
Ruchika et al.
References 1. Ford M (2017) Trump’s press secretary falsely claims: ‘Largest audience ever to witness an inauguration, period.’ The Atlantic 21(1):21 2. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2018, pp 8759–8768 3. Cheng Z, Zhang F (2020) Flower end-to-end detection based on YOLOv4 using a mobile device. Wirel Commun Mob Comput 17:2020 4. Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, IEEE, pp 878–885, 20 June 2005 5. Tuzel O, Porikli F, Meer P (2008) Pedestrian detection via classification on riemannian manifolds. IEEE Trans Pattern Anal Mach Intell 30(10):1713–1727 6. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, 20 Jun 2005. IEEE, pp 886–893 7. Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Trans Syst Man Cybern-Part A Syst Hum 31(6):645–654 8. Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int J Comput Vision 75(2):247–266 9. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627– 1645 10. Wang M, Li W, Wang X (2012) Transferring a generic pedestrian detector towards specific scenes. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3274– 3281, 16 Jun 2012 11. Wang M, Wang X (2011) Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In: CVPR 2011. IEEE, 20 June 2011, pp 3401–3408 12. Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision 1981 13. Brostow GJ, Cipolla R (2006) Unsupervised Bayesian detection of independent motion in crowds. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 1 17. Jun 2006. IEEE, pp 594–601 14. Tu P, Sebastian T, Doretto G, Krahnstoever N, Rittscher J, Yu T (2008) Unified crowd segmentation. In: European conference on computer vision. Springer, Berlin, pp 691–704, 12 Oct 2008 15. Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: Tenth IEEE international conference on computer vision (ICCV’05), vol 1. IEEE, pp 90–97, 17 Oct 2005 16. Chan AB, Vasconcelos N (2009) Bayesian poisson regression for crowd counting. In: 2009 IEEE 12th international conference on computer vision 2009 Sep 29. IEEE, pp 545–551 17. Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features. In: 2009 digital image computing: techniques and applications. IEEE, pp 81–88, 1 Dec 2009 18. Ruchika, Purwar RK (2019) Crowd density estimation using hough circle transform for video surveillance. In: 2019 6th international conference on signal processing and integrated networks (SPIN). IEEE, 2019 Mar 7, pp 442–447 19. Kampffmeyer M, Dong N, Liang X, Zhang Y, Xing EP (2018) ConnNet: A long-range relation-aware pixel-connectivity network for salient segmentation. IEEE Trans Image Process 28(5):2518–2529 20. Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28(5):2439–2450 21. Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans Image Process 26(7):3142–3155
Analytical Study of YOLO and Its Various Versions …
989
22. Chao H, He Y, Zhang J, Feng J (2019) Gaitset: regarding gait as a set for cross-view gait recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33(01), pp 8126–8133, 17 Jul 2019 23. Jocher G, Changyu L, Hogan A, Changyu LY, Rai P, Sullivan T (2020) Ultralytics/yolov5. Init Release. https://doi.org/10.5281/zenodo.3908560 24. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll’ar P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: ECCV, 2014. ISBN 978-3-319-10601-4. https://doi.org/10.1007/978-3-319-10602-148 25. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2016, pp 779–788 26. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2017, pp 7263–7271 27. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804. 02767. 8 Apr 2018 28. Bochkovskiy A, Wang CY, Liao HY (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934. 23 Apr 2020 29. Davies AC, Yin JH, Velastin SA (1995) Crowd monitoring using image processing. Electron Commun Eng J 7(1):37–47 30. Wang CY, Liao HY, Wu YH, Chen PY, Hsieh JW, Yeh IH (2020) CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops 2020, pp 390–391 31. Lu C, Shi J, Jia J (2020) Abnormal event detection at 150 fps in MATLAB. In: Proceedings of the IEEE international conference on computer vision 2013, pp 2720–2727. Available at: http:// www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html [Accessed 15 Nov 2020] 32. Zhang Y, Zhou D, Chen S, Gao S, Ma Y. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition 2016, pp 589–597 33. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916 34. Oh MH, Olsen P, Ramamurthy KN (2020) Crowd counting with decomposed uncertainty. In: Proceedings of the AAAI conference on artificial intelligence, vol 34(07), pp 11799–11806, 3 Apr 2020 35. Ferryman J, Shahrokni A (2009) Pets2009: dataset and challenge. In: 2009 twelfth IEEE international workshop on performance evaluation of tracking and surveillance. IEEE, 7 Dec 2009, pp 1–6. Available at: http://www.cvg.reading.ac.uk/PETS2009/a.html [Accessed 15 July 2021]
IoT Enabled Elderly Monitoring System and the Role of Privacy Preservation Frameworks in e-health Applications Vidyadhar Jinnappa Aski, Vijaypal Singh Dhaka, Sunil Kumar, and Anubha Parashar
Abstract Healthcare IoT (HIOT) or electronic health (e-health) is an emerging paradigm of IoT in which multiple bio-sensors are capturing body vitals and disseminating captured information to the nearest data center through the underlying wireless infrastructure. Despite the observation of such rapid research and development trends in e-health field with its key facets (i.e., sensing, communication, data consolidation, and delivery of information) and its inherent benefits (e.g., error reduction, homecare, and better patient management), it is still facing several challenges. These challenges are ranging from development of an interoperable e-health framework to design of an attack free security model for both data and device. In this article, an overview of recent technological trends in designing HIoT privacy preservation framework is provided, and the corresponding security challenges are discussed subsequently. Alongside, we also propose an architectural framework for monitoring health vitals of differently abled or a patient with degenerative chronic disorder. The interaction of application components is illustrated through the help of different use-case scenarios. Keywords Healthcare IoT (HIoT) · Interoperability · e-health · Access control · Authentication
V. J. Aski (B) · V. S. Dhaka · S. Kumar Department of Comupter and Communicaton Engineering, Manipal University Jaipur, Jaipur, India e-mail: [email protected] V. S. Dhaka e-mail: [email protected] S. Kumar e-mail: [email protected] A. Parashar Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_72
991
992
V. J. Aski et al.
1 Introduction IoT has gained a massive ground in day-to-day’s life of the researchers and practitioners due to its capability of offering an advanced connectivity and uniquely identifying nature of every physical instance on the planet earth. Thanks to IPV4 and IPV6 addressing spaces which facilitates the seamless addressing schemes that can be remotely being called. These technological evolutions offer the new ways for smart objects to communicate with things (M2T), machines (M2M), and humans (M2H) [1]. With the rapid increment in the requirements of remote monitoring, the IoT enabled healthcare devices can be commonly seen in many domestic places nowadays. These devices help in monitoring body vitals majorly temperature, heartrate, blood pressure, and glucose level., thus by enabling patients with self-helped diagnosis and avoids overcrowding at hospitals. The captured vitals are being constantly uploaded to iDoctor like cloud centric data centers and eventually one can seek profession medical advices on regular basis [2]. In addition, such self-helped medical services are becoming more prominent and able to produce more accurate results because of recent advancements in wireless communication techniques (WCTs) and information technologies [3, 4]. The privacy protection (data or device) is one of the key problems that IoT is facing since its inception. There is an immense need for a universal solution to overcome such issues in order to widely accept the technological solutions in critical application domains such as healthcare and allied sectors. Researchers are witnessing a huge spike in medical information leakage patterns compromising the security goals from recent past since the malwares are becoming more vulnerant and resistive [5, 6]. For instance, the US’s second-largest health insurance corporation database was once targeted by hackers and led to the leakage of approximately 80 million customers’ personal information including health data of those individuals. Therefore, the privacy protection strategies play a vital role in designing any e-health application. In this article, authors provide a holistic overview of recent trends and techniques that are employed in development of healthcare privacy preservation frameworks and associated security concerns. The following scenario help us understand as to how the privacy breaching has becoming day-to-day reality in all our lives. Akshata looks at her smart wrist-band while doing the regular workouts and observes that the heartrate was little higher than the target rate (Target heart rate during the exercise should be normally between the range of “220 minus your age” [7]). After the workout, she asks Alexa (smart speaker) to book her an appointment with nearest cardiologist for general heart health checkup. Next day after completing her office works, she visit to a cardiologist and felt relaxed when doctors reassured nothing was wrong and the heartrate went high due to the intensified workouts. The next time Akshata uses her browser, she felt irritated because of those annoying adds related to heart medications, heart complications, and many tutorials about identifying heart attack were popping up on the browser constantly. Things got more worsen when she received a telephone call from a health insurance agency to recommend her a plan. This could be only one of several such
IoT Enabled Elderly Monitoring System and the Role of Privacy …
993
incidents where the modern technology brings us the high risks of privacy and it became unavoidable in daily life.
2 Related Work Our primary objective is to carry out a joint assessment of privacy preservation protocol surveys on healthcare IoT (data and device prospects). Secondly we aim to investigate the research studies that are simultaneously or separately discussed the implementation of privacy preservation protocols in HIoT application scenarios. In this direction, we have divided the literature review in following subsections.
2.1 Survey Articles on HIoT Security and Privacy Aspects There exist several survey articles in the domain of HIoT security and privacy concerns [8–13]. Most of these survey articles overview on security issues and privacy aspects along with the proposed solutions in healthcare IoT prospects as shown in the Table 1. In the proposed article, we have focused on providing a holistic overview on privacy protection of healthcare data and devices. In addition, we also provide a conceptual architecture for designing an application for monitoring health status of elderly (patient with degenerative disorder) and differently abled person.
2.2 Research Study on Implementation of Security Protocols and Algorithms In this subsection, we review some of the prominent security algorithms proposed from a recent past. In [15], authors proposed a scheme for labeling the data in order to manage the privacy in healthcare IoT applications. With the help of adopted techniques to control the flow of information, events representing the data are tagged with several private attributes such as DoB and place. These tagging features enable the data privacy in the applications. This model was not suitable for largescale IoT application due to its nature of being extremely difficult to handle the large set of user attributes through a small computing platform. In [16], the authors proposed an access control algorithm for preserving the privacy of IoT user which is based on privacy policy of anonymous nature. In this scheme, user can have a control over the data and define the policies to enable which system user is having what kind of access rights and the same can be manipulated in real time. In [14], authors proposed an algorithm for naming the dataflow in a continuous manner with the help of adaptive cluster. In this cluster-based approach, it guarantees the novelty in naming and imposes the
Survey on healthcare Not mentioned cloud implementation
Comprehensive on security issues of e-health
A review on security and privacy issues in healthcare
Nuaimi et al. [10]
Idoga et al. [11]
Pankomera et al. [12]
Proposed study
Biometric security functions
Not mentioned
AES
Attribute-based encryption
A holistic overview Access control and on privacy authentication preservation schemes
Olaronke et al. [14] A survey on bigdata challenges in healthcare
Survey on health clouds
Abbas et al. [9]
Access controlling rules and policies
An exhaustive literature review on medical IoT
Luis et al. [8]
Security concerns discussed
Aim of the article
Author and year of publication
Research questions ✓
✕ ✕ ✕
✕
✕
✓
Architecture ✕
✕ ✕ ✕
✓
✕
✓
✓
✕
✓
✓
✓
✓
✕
Open issues
Table 1 Comparative analysis of existing survey articles in the domain of HIoT and implementation issues
✓
✕
✕
✕
✕
✓
✕
Challenges
NA
Incomplete information
Lack of public health concerns
Application scope limitation
Limited scope variation
Centralized architecture
Limited frameworks
Drawbacks
994 V. J. Aski et al.
IoT Enabled Elderly Monitoring System and the Role of Privacy …
995
latency restrictions on continuous dataflow. In addition, there are several research studies published in recent past which centralizes their discussion on HıoT privacy [17–23]
3 Privacy Preservation in HIoT and Its Implications IoT provides healthcare consumers with a high degree of control on how to carry out day-to-day tasks that are ranging from capturing the data from patient body to disseminate the captured information to the remote servers for further analysis. It also provides a way to saturate the patient environments with the smart things. Smart things denote a broad spectrum made of low power computing platforms such as microcontroller, microprocessors, sensor area network, and wireless communication entities which help data to be settled at a cloud platform. Figure 1 describe a generic healthcare environment which comprises sensor network, data processing platforms, wireless infrastructure, cloud platform, and the base station where multiple health workers are being benefitted. HIoT can be implemented in both static and dynamic environments. In static environment, patient’s movement is static to a place like ICU and physician’s examination hall (In-hospital monitoring use case of Fig. 1). Data Packets
GPRS/ UMTS/ CDMA
Admin Updates
Outdoor Patients Local Processing Unit (LPU) of Sensor Nodes
Network Gateway
Prescription upload
IP Network
Out Patient Monitoring
Cellular Services
Wi-Fi gateway
Road Ambulance
Air Ambulance
2G/3G/4G, LPWAN (Sig fox, LoRA), WiMAX, ZigBee, Wi-Fi, BLE,6LoWPAN, NFC, RFID Backbone Wireless Protocols for HIoT Systems
Fig. 1 Generic healthcare IoT architecture
Enhanced Drug Management
Outdoor Wi-Fi Access Points
Ambulance Monitoring
In-hospital Patient Monitoring
Patient Monitoring
Medical Server
Data Packets
Indoor Patients
Smart Rehabilitation
Report Analysis
Improved Hospital Resource Utilization HIoT Use-cases Wireless Connectivity WSN Connectivity
996
V. J. Aski et al.
However, in dynamic environments, patient can wear the device and perform his daily activities such as jogging and walking (out patient monitoring use case of Fig. 1). In this article, we provide privacy concerns related to healthcare application. In addition, the risk exposure for these devices are much higher at the place of development than that of the deployment and a safeguard technique shall be used to prevent these devices from the security threats at deployment place. Given the high vulnerable nature of IoT devices, it is essential for us to know the risks and challenges such devices pose to the privacy of patient data. Moreover, one has to obtain satisfactory answer to the following question before going to opt an IoT enabled healthcare device from a hospital. Is it possible to get a device that fully supports privacy preserving and safe environments like the traditional internet provides? To get a precise response to this questionn one must understand the logical differences between trust, privacy, and confidentiality. Privacy in healthcare IoT terms is defined as the information of any individual’s health data must be protected from third party accesses. In the same way, privacy also means that the information should not be exposed to others without an explicit consent of a patient. It is a fundamental right of a patient to decide whom to share his/her data. For instance, in our previous example, only Akshata who decides whether to share the data to insurance company or not. In the same way, trust is consequential product of transparency and consistency. Finally, the confidentiality is a factor that decides the data and manages the right person is accessing right data, and it prevents data being accessed from unauthorized entities. If I say, my data is confidential that means the data is accessible only by me and without my permissions no one is authorized to access.
4 Taxonomy of Security and Privacy Preservation in Healthcare In this section, we deliberate the various privacy concern issues and security aspects. Mainly we categorize the different security and privacy concerns in three heads such as process based, scheme based, and network and traffic based. The detailed classification is explained in the taxonomy diagram as shown in Fig. 2.
4.1 Process-Oriented Schemes The modern lifestyle has generated a need to incorporate the huge number of smart devices around us. These smart devices are made of multiple sensors and actuators comprising data acquisition system which captures numerous physical parameters such as temperature and heartrate. These devices creating massive amount of data
IoT Enabled Elderly Monitoring System and the Role of Privacy …
997
Fig. 2 Security and privacy taxonomy in HIoT
which can only be handled with specific set of algorithms specified in Bigdata technologies. This massive amount of data is using open channel such as Internet. Though it is needless to mention the vulnerable nature of Internet, its more challenging job is to handle the issues of process-based techniques such as cloud computing, fog computing, and edge computing.
4.1.1
Distributed Approaches
In the centralized systems, the user data is stored at one central database, and it is queried as and when required by end user from the same database. During the failure of central server, the whole system is frozen and it is difficult to recover the lost data. Therefore, it is a biggest drawback of such systems. In distributed systems, such issues can be easily handled. Table 2 shows the state-of-the-art security protocols in distributed computing field.
4.1.2
Centralized Approaches
Traditional Internet technologies are rapidly benefitted by the latest advancements in WCTs and ICTs. Due to such advancements, there is enormous data is being generated in all the sectors. Storage and processing of such humongous data in traditional internet devices is difficult. This problem can be solved by the use of centralized cloud-based systems. Table 3 discusses the various state-of-the-art in the centralized computing area.
998
V. J. Aski et al.
Table 2 State-of-the-art security protocols in distributed computing field State-of-the-art
Contribution
Schemes proposed
Observations
Hamid et al. [24]
Designed an authenticated key agreement protocol for distributed systems
Bilinear matching cryptography
It does not provide security against MIM attack, key theft, and brute force attack
Zhou et al. [25]
Designed an authenticated Key agreement scheme for distributed systems overcoming the drawbacks of [24]
Hybrid real-time More vulnerable to cryptography algorithm threats and also patients’ confidentiality may be compromised and not resistive toward replay attack
Kaneriya et al. [26] Presented a scheme that handles the replay attack Mutlag et al. [27]
4.1.3
Multi-authority ABE
They highlighted the Vision and key limitations of properties of FC computation, storage, and networking resources in a distributed environment
Not resistive towards MIM attack Non suitable for bandwidth sensitive IoT applications
Decentralized Approaches
There are multiple entities involved in managing an entire healthcare industry. These entities could be pharmacy, patient group, doctors’ community, and emergency response team. IoT provides a unique way to bind these entities together on a single platform. When they are centrally connected by means of a network, it is essential to generate trust among all of them. Distributed computing approaches are used in managing the trusted relationship between multiparty environments. Table 4 discusses the various state-of-the-art in the decentralized computing area.
4.2 Authentication Oriented Schemes IoT exposes several internet paradigms to security vulnerabilities due to its highly openness nature. In healthcare, the data needs to be securely captured, transferred, stored, and processed. The sensitive or critical data such as body vitals of several patients can be protected from unauthorized accesses with the use of password based mechanisms, cryptographical algorithms, or biometric authentication schemes.
IoT Enabled Elderly Monitoring System and the Role of Privacy …
999
Table 3 State-of-the-art security protocols in centralized computing area State-of-the-art
Contribution
Schemes proposed
Observations
Zhou et al. [28]
Designed a privacy-preserving dynamic medical text mining and image feature extraction approach in the cloud healthcare system
Medical text mining and image feature extraction approach
It is more secure for input and output data also reduced both the communication and computational cost
Ziglari et al. [29]
Analyzed the security aspects In the deployment models for the healthcare system
Performed a security analysis between service providers and cloud service providers
They presented an architecture for the deployment of information technology systems that are generated on the cloud by multiple cloud providers
Requena et al. [30]
Proposed a design for Scheme to permit a cloud-assisted patients to access the radiological gateway health images and diagnosis reports from the cloud
NA
Huang et al. [31]
Proposed a secure Biometric-based secure PHR system to collect health data collection and (BBC) and attribute-based health record accessing
It protects against known attacks such as replay, and authors claimed that their scheme is efficient in terms of storage, computational, and communication needs
4.2.1
Distributed Approaches
The primary concern in healthcare IoT technology is to prevent the unauthorized accesses. The password-based mechanisms provide an efficient way to protect the data from such attacks. Here, the password needs to be periodically changed with the complex combination of alphanumeric characters. Authors in [36] proposed a lightweight technique to be implemented on miniaturized computers such as Raspberry Pi. The scheme utilizes centrally shared key agreement scheme.
4.2.2
Biometric-Based Authentication Schemes
In biometric-based authentication schemes, various biometric features such as fingerprint, iris, gait, and facial feature set can be used to verify the legitimacy of the user. This is the most effective mechanism which prevents unauthorized accesses. Figure 3 shows the comparison of different biometric features in accordance with the security levels and accuracy.
1000
V. J. Aski et al.
Table 4 State-of-the-art security protocols in decentralized computing area State-of-the-art
Contribution
Schemes proposed
Remarks
Banerjee et al. [32]
Proposed a blockchain-based decentralized system
Model to detect and prevent current threats in IoT systems
The proposed system has the capability to predict possible threats and attacks
Gordon et al. [33]
Presented a healthcare framework, which can be used in different areas such as patient-driven and institution driven
Information exchange model
They analyzed the blockchain security considering five dimensions; data aggregation, digital access rules, data liquidity, data immutability, and patient identity
Kshetri et al. [34]
Did a comprehensive analysis of blockchain characteristics concerning security in IoT-based supply chains
Comparative model between blockchain and IoT
The authors examined the activities of associations between hierarchical systems and the healthcare industry and discussed several privacy suggestions
Kumar et al. [35]
Encountered the possible security and privacy problems in IoT and suggested a distributed ledger-based blockchain technology
Not specified
The authors highlighted the requirements of BC in IoT and its broad scope of services in various fields
Fig. 3 Comparison matrix of various biometric authentication mechanisms
IoT Enabled Elderly Monitoring System and the Role of Privacy …
1001
4.3 Network Traffic Oriented Schemes 4.3.1
Distributed Approaches
Figure 4 depicts the proposed HIoT layered architectural framework for monitoring elderly and differently abled people. Here, we have derived different application scenarios and components in accordance with their functionalities and requirements into three layers. Object layer or perception layer is a layer where all the physical objects such as sensors and actuators are functioning for a common goal of capturing
Application Layer Signaling Web 2.0 Gateway
Mobile Sensor (Application) Server
Static Sensor (Application) Server
Mobile Sensor Database
Static Sensor Database
Network/Gateways Layer
Monitoring (Application) Server
Operation Support Platform (Signaling)
Business Application Server
Back-end Alerts
SQL injection attack Phishing attack Privacy thrat Attack Replay attack Data corruption attack malware attacks DDoS attack
Monitoring Database
LoRA Gateway
Sybil attack Eavesdropping MIM Attack Replay attack DDOS attack Spoofing Routing attack
Internet Network
LPWAN WiMAX
Object Layer User Monitoring Device
Hardware tempering attack Eavesdropping Dos Attack Physical attack malware attack
Monitoring Unit Room Temperature and Light Intensity Sensor GPS Sensor
Pulse Sensor Gateway Enabled MCU
ECG Sensor
Joystick Controller
IR Sensor Wheelchair Motor Controller Sonar Sensor
Data Acquisition and Processing Unit
Pressure Sensors Object
(a)
(b)
Heart Activity Monitoring Sensor
Moment Monitoring Sensor
(c)
(a) Real-Time Monitoring of Foot Pressure and Heart Activity of Diabetic Patient. (b) A Real-Time Assistive Smart Wheel-Chair for Parkinson Patient. (c) A Real-Time Monitoring of an ICU Patient.
Fig. 4 Proposed HIoT layered architectural framework for monitoring elderly and differently abled people
1002
V. J. Aski et al.
the data from patient body in dynamic environments. The network or gateway layer is responsible for transporting data from DAQs to the storage infrastructures such as clouds and fog nodes. Further, the application layer is responsible for performing data analytical tasks such as creating graphs, flow charts to improve the business processes. Sometimes application layer also called by the name of business layer. Attack vectors of the layered architecture is also shown in Fig. 4. Several researchers worked on model-based attack-oriented schemes to prevent unauthorized accesses. For instance, authors in [37] presented a model-based attack oriented algorithm to safeguard the healthcare data which work on the basic principle of Markov model. Further authors in [38] designed a model to prevent the information breaches in healthcare applications.
5 Proposed HIoT Architecture The proposed HIoT architectural framework is shown in Fig. 4. It is a three-layered architecture, and the functionalities of each layers are briefly explained in Sect. 4.3.1.
5.1 Use Cases We have considered various chronic health issues for both elderly as well as differently abled community as use cases and they are briefly discussed in the below subsection.
5.1.1
Distributed Approaches
Here, various sensors such as force sensitive resistive (FSR) pressure sensor is being installed in the foot sole of a patient. The diabetic patient has a tendency to develop a wound easily as his/her foot skin is highly sensitive to rough surfaces. The wound may lead to gangrene and therefore may cause permanent disability followed by amputations of infected body parts. Therefore, it is important for a patient to know his/her foot pressure variations. These variations are continuously monitored by a medical professional through the FSRs. Such that, if there are any abnormal variations can be easily tracked out and further medical attention can be gained. Here, multiple other sensors such as pulse rate sensor, ECG sensor, and IR sensors are interfaced to a microcontroller and data gets transferred to medical health server (MHS) through Wimax like wireless technologies. At cloud-level data gets segregated as per the nature of applications. For instance, data from diabetic patient and ICU patient gets stored in mobile sensor database and monitoring database, respectively. Generally, at cloud-level, LoRa gateways are used for further data distribution.
IoT Enabled Elderly Monitoring System and the Role of Privacy …
5.1.2
1003
Real-Time Assistive Smart Wheelchair for Parkinson’s Disease
Here, the patient is equipped with motion sensor (to detect the motion), GPS sensor (to know the location), and motor controller. The wheelchair is smart enough to capture patient data and transfer it to the nearest cloud database through microcontroller. In Parkinson’s disease, patient cannot move his/her arms as per their wish, so the smart joystick will take care of patient’s movements. The data captured from this patient is stored in the separate database at cloud called mobile sensor database for further evaluations.
5.2 Research Challenges and Future Work In this section, we have discussed the research challenges that are common in designing IoT frameworks for monitoring vitals of elderly and disabled people. It was observed that the major challenge is customization of healthcare devices that fit comfortable for disabled people as every disabled individual has specific needs and circumstances are different. Context-aware environments are created in smart workflows which takes the intelligent decision based on the context information received from the bio-sensing devices. Another important challenge is the self-management of IoT device. It is always preferable to design a human intervention-free device which automatically updates its environment as its difficult for elderly or disabled people to work on regular updates. Standardization is another key problem that every healthcare IoT designer needs to take care. Incorporation of globally acceptable standards into IoT device is more essential to avoid interoperability related problems. Finally, the future goal is to enhance and envision the evolution of technologies associated in IoT and allied fields that helps in creating the devices for disabled and elderly people. The advances in brain–computer interface (BCI) have made it possible to create the control environments for various artificial limbs such as arms and legs. There are continuous transformations occurring in BCI technologies around the globe for further enhancing the research challenges. It is expected that the disabled community will be greatly benefited by such advancements in BCI.
6 Conclusion The paper offers a detailed overview of numerous privacy preservation concerns and security issues that are seen in day-to-day functions of an HIoT applications. We have deliberated key aspects of security aspects though the taxonomical diagram. A heterogynous verity of recent state-of-the art authentication and access control schemes and their implications in a detailed way. In addition, we have presented the insights of different policy-based, process-based, and authentication-based security and privacy preserving schemes that are used in HIoT application domain. IoT based
1004
V. J. Aski et al.
healthcare architectural framework has been discussed. Multiple use cases such as 1 real-time monitoring of foot pressure and heart activity of a diabetic patient and real-time assistive smart wheelchair for Parkinson’s disease are deliberated with the diagram. Here, various sensors such as force sensitive resistive (FSR) pressure sensors, ECG sensor, GPS sensors, and pulse rate sensor are explained with their usage implications.
References 1. Bahga, Madisetti VK (2015) Healthcare data integration and informatics in the cloud. Comput (Long Beach Calif) 48(2):50–57, Feb 2015 2. Zhang Y, Chen M, Huang D, Wu D, Li Y (2017) iDoctor: personalized and professionalized medical recommendations based on hybrid matrix factorization. Futur Gener Comput Syst 66:30–35 3. Yu K, Tan L, Shang X, Huang J (2020) Gautam srivastava, and pushpita chatterjee. In: Efficient and privacy-preserving medical research support platform against COVID-19: a blockchainbased approach. IEEE Consumer Electronics Magazine 4. Yu, K-P, Tan L, Aloqaily M, Yang H, Jararweh Y (2021) Blockchain-enhanced data sharing with traceable and direct revocation in IIoT. IEEE Trans ˙Industrial ˙Informatics (2021) 5. Sriram S, Vinayakumar R, Sowmya V, Alazab M, Soman KP (2020) Multi-scale learning based malware variant detection using spatial pyramid pooling network. In: IEEE INFOCOM 2020IEEE conference on computer communications workshops (INFOCOM WKSHPS), IEEE, pp 740–745 6. Vasan D, Alazab M, Venkatraman S, Akram J, Qin Z (2020) MTHAEL: cross-architecture IoT malware detection based on neural network advanced ensemble learning. IEEE Trans Comput 69(11):1654–1667 7. Target heart rates chart | American heart association. [Online]. Available: https://www.heart. org/en/healthy-living/fitness/fitness-basics/target-heart-rates. [Accessed: 28 May 2021] 8. Fernández-Alemán JL, Señor IC, Lozoya PÁO, Toval A (2013) Security and privacy in electronic health records: a systematic literature review. J Biomed Inf 46(3):541–562 9. Abbas A, Khan SU (2014) A review on the state-of-the-art privacy-preserving approaches in the e-health clouds. IEEE J Biomed Health Inform 18(4):1431–1441 10. Al Nuaimi N, AlShamsi A, Mohamed N, Al-Jaroodi J (2015) e-Health cloud implementation issues and efforts. In: 2015 ınternational conference on ındustrial engineering and operations management (IEOM). IEEE, pp 1–10 11. Idoga PE, Agoyi M, Coker-Farrell EY, Ekeoma OL (2016) Review of security issues in eHealthcare and solutions. In: 2016 HONET-ICT, pp 118–121. IEEE 12. Pankomera R, van Greunen D (2016) Privacy and security issues for a patient-centric approach in public healthcare in a resource constrained setting. In: 2016 IST-Africa week conference. IEEE, pp 1–10 13. Olaronke I, Oluwaseun O (2016) Big data in healthcare: prospects, challenges and resolutions. In: 2016 future technologies conference (FTC). IEEE, pp 1152–1157 14. Lee I, Lee K (2015) The Internet of Things (IoT): applications, investments, and challenges for enterprises. Bus Horiz 58(4):431–440 15. Zhang D, Zhang D, Xiong H, Hsu C-H, Vasilakos AV (2014) BASA: building mobile Ad-Hoc social networks on top of android. IEEE Network 28(1):4–9 16. Sharma G, Bala S, Verma AK (2012) Security frameworks for wireless sensor networks-review. Procedia Technol 6:978–987
IoT Enabled Elderly Monitoring System and the Role of Privacy …
1005
17. Wang K, Chen C-M, Tie Z, Shojafar M, Kumar S, Kumari S (2021) Forward privacy preservation in IoT enabled healthcare systems. IEEE Trans Ind Inf 18. Hassan MU, Rehmani MH, Chen J (2019) Privacy preservation in blockchain based IoT systems: ıntegration issues, prospects, challenges, and future research directions. Future Gener Comput Syst 97:512–529 19. Bhalaji N, Abilashkumar PC, Aboorva S (2019) A blockchain based approach for privacy preservation in healthcare iot. In: International conference on ıntelligent computing and communication technologies. Springer, Singapore, pp 465–473 20. Du J, Jiang C, Gelenbe E, Lei X, Li J, Ren Y (2018) Distributed data privacy preservation in IoT applications. IEEE Wirel Commun 25(6):68–76 21. Ahmed SM, Abbas H, Saleem K, Yang X, Derhab A, Orgun MA, Iqbal W, Rashid I, Yaseen A (2017) Privacy preservation in e-healthcare environments: state of the art and future directions. IEEE Access 6:464–478 22. Xu X, Fu S, Qi L, Zhang X, Liu Q, He Q, Li S (2018) An IoT-oriented data placement method with privacy preservation in cloud environment. J Net Comput Appl 124:148–157 23. Bhattacharya P, Tanwar S, Shah R, Ladha A (2020) Mobile edge computing-enabled blockchain framework—a survey. In: Proceedings of ICRIC 2019. Springer, Cham, pp 797–809 24. Al Hamid HA, Rahman SMM, Hossain MS, Almogren A, Alamri A (2017) A security model for preserving the privacy of medical big data in a healthcare cloud using a fog computing facility with pairing-based cryptography. IEEE Access 5 (2017):22313–22328 25. Zhou J, Cao Z, Dong X, Lin X (2015) TR-MABE: White-box traceable and revocable multiauthority attribute-based encryption and its applications to multi-level privacy-preserving ehealthcare cloud computing systems. In: 2015 IEEE conference on computer communications (INFOCOM). IEEE, pp 2398–2406 26. Kaneriya S, Chudasama M, Tanwar S, Tyagi S, Kumar N, Rodrigues JJPC (2019) Markov decision-based recommender system for sleep apnea patients. In: ICC 2019–2019 IEEE international conference on communications (ICC). IEEE, pp 1–6 27. Mutlag AA, Abd Ghani MK, Arunkumar NA, Mohammed MA, Mohd O (2019) Enabling technologies for fog computing in healthcare IoT systems. Future Gener Comput Syst 90:62–78 28. Zhou J, Cao Z, Dong X, Lin X (2015) PPDM: a privacy-preserving protocol for cloud-assisted e-healthcare systems. IEEE J Sel Top Sign Process 9(7):1332–1344 29. Ziglari H, Negini A (2017) Evaluating cloud deployment models based on security in EHR system. In: 2017 ınternational conference on engineering and technology (ICET). IEEE, pp 1–6 30. Sanz-Requena R, Mañas-García A, Cabrera-Ayala JL, García-Martí G (2015) A cloud-based radiological portal for the patients: ıt contributing to position the patient as the central axis of the 21st century healthcare cycles. In: 2015 IEEE/ACM 1st ınternational workshop on technical and legal aspects of data privacy and Security. IEEE, pp 54–57 31. Huang C, Yan K, Wei S, Hoon Lee D (2017) A privacy-preserving data sharing solution for mobile healthcare. In: 2017 ınternational conference on progress in ınformatics and computing (PIC). IEEE, pp 260–265 32. Banerjee M, Lee J, Choo K-KR (2018) A blockchain future for internet of things security: a position paper. Digital Commun Net 4(3):149–160 33. Gordon WJ, Catalini C (2018) Blockchain technology for healthcare: facilitating the transition to patient-driven interoperability. Comput Struct Biotechnol J 16:224–230 34. Kshetri N (2017) Blockchain’s roles in strengthening cybersecurity and protecting privacy. Telecommun Policy 41(10):1027–1038 35. Kumar NM, Mallick PK (2018) Blockchain technology for security issues and challenges in IoT. Procedia Comput Sci 132:1815–1823 36. Li X, Ibrahim MH, Kumari S, Sangaiah AK, Gupta V, Choo K-KR (2017) Anonymous mutual authentication and key agreement scheme for wearable sensors in wireless body area networks. Comput Netw 129:429–443
1006
V. J. Aski et al.
37. Strielkina A, Kharchenko V, Uzun D (2018) Availability models for healthcare IoT systems: classification and research considering attacks on vulnerabilities. In: 2018 IEEE 9th ınternational conference on dependable systems, services and technologies (DESSERT), IEEE, pp 58–62 38. McLeod A, Dolezel D (2018) Cyber-analytics: Modeling factors associated with healthcare data breaches. Decis Support Syst 108:57–68
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band Kavita Bhagat and Ashish Suri
Abstract Mid-bands and massive MIMO together have become a game-changer for the 5G technology in the areas of wireless systems. It is considered as a sweet pot holding the opportunities for handling new operations covering several miles to offer large throughput and spectral efficiencies along with the massive MIMO systems. Executing massive MIMO systems using mid-band frequencies rectify to boost up the speed, capacity, and coverage areas. Hybrid beamforming is an efficient solution to design our model at 6 GHz, resulting in a smaller number of RF chains and maximizing throughput and antenna gain. In our designed experimental model, we use massive—multiple inputs and multiple outputs—OFDM ray launching design which separates the precoding at the digital baseband and then to the radio frequency analog components both at the transceiver site. Less number of RF chains reduces the system complexities and is time efficient. The experiment is carried out using the scattering and MIMO propagation channel model, and the channel simulation link is the OFDM and 16, 64, 256-QAM modulation. Keywords Mid band · Massive MIMO · TDD · Hybrid beamforming · OFDM · QAM
1 Introduction The ongoing evolution in wireless technologies has become a necessary evil of our everyday life. The present system uses RF signals, electromagnetic waves (EM) to forward their data from the source point to its destination point. 5G redefines the network with new global wireless standards for the fastest communications. The use of macrocells makes the foundation of 5G technology by serving thousands of mobile K. Bhagat (B) · A. Suri School of Electronics and Communication Engineering, Shri Mata Vaishno Devi University of Katra, Katra, Jammu and Kashmir 182320, India e-mail: [email protected] A. Suri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_73
1007
1008
K. Bhagat and A. Suri
users simultaneously at longer distances. Higher frequencies of 5G technology will eventually allow even more technologies to connect on a massive scale. The frequency spectrum of 5G is divided into three bands: low band, mid-band, and millimeter wave. The low band offers a similar capacity to the 4G advanced version with an air latency of 8–10 ms. The millimeter wave is the super fastest from all supporting high data transfer speeds and large coverage areas at the line of sight (LOS) [1]. The mid-band is the sweet pot that overcomes the shortcomings of both the low band and millimeter-wave band because it bridges the speed, capacity, coverage, and longer distances over LOS and NLOS conditions. It offers more adaptability in the 5G networks because of its long reach and more compatibility to cover large areas and penetrate through obstacles. The major ultimatum using mid-band frequencies is path loss [2], which can be covered up by using massive MIMO antennas in the particular scale. For increasing the data transfer speeds, massive MIMO is the only solution for transferring multiple data streams in parallel format simultaneously, and it can be achieved through beamforming. Hybrid beamforming is an efficient solution where beamforming is required in the radio frequency and the baseband area, resulting in a smaller number of RF chains than the quantity of transmitting element arrays [3–5]. The technology is itself the combination of analog beamforming and digital beamforming in RF and baseband domains which smartly forms the patterns transmitted from a large antenna array. In order to send more than one data stream in a particular sequence over a propagation channel, we need to express it with the help of precoding weights at the transmitter and combining weights to the receiver over an impulse matrix [6]. At last, every single data stream from the users can be recovered from the receiver independently. The SNR and complexities can be improved with the hybrid beamforming in a multiuser MIMO set [7]. Additionally, it shows the formulation of the transmit-end matrix of the channel and ray launching to trace rays from every single user. For wideband OFDM modeled systems, the analog weights of beamformers are the only average complex weights through the subcarriers [8]. The hybrid beamforming technique controls the antenna gains, cost, and power utility for maximum spectral efficiencies. The computational time and complexities in massive MIMO systems can be reduced even by increasing users and base stations through this technology as it reduces the number of RF chains [9]. However, it is more challenging for the system to design transmit vectors as base stations transmitters to communicate [10–13] with the receivers simultaneously using the same TR resources. So, allowing the number of base stations for many users thereby increasing the number of data streams in a single cell for every user to direct the smooth flow of the network.
2 Literature Review From the literature review, we observed that this technology in addition with the massive MIMO systems have become a hot topic in this research area. Article [10],
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band
1009
proposed hybrid beamforming with HBWS (Hybrid beamforming with selection) reducing its costs and enhancing the performance of the system adjusting itself with channel statistics enabling better user separability and beamforming gains. Another [14] for interference mitigation in MIMO radar reduces the dimensionality of the covariance matrix for improving jamming and interference capabilities. Complexities of radar signal can be decreased using the space–time adaptive processing. In another [15] HBF–PDVG (predefined virtual grouping) algorithm is adopted, reducing the system complexities and feedback overhead to achieve beamforming because of hardware constraints. Hybrid precoder design is proposed for cross-polarization by a joint algorithm for enhancing overall performance [16]. The author in [17] derived the hybrid EBF architecture modeling under phase shifter impairment proposing the optimal least square estimator for the energy source to EH user channel maximizing average energy. An optimal selection framework method [18] that was energy efficient by conserving the energy by deactivating parts of the beamformer structure which reduces typical power at a low level. Different methods adopted for tracing the rays such as in [19], studied MIMO indoor propagation using 3D shooting and bouncing (SBR) RT technique using 802.11n WIFI under 2.4 and 5 GHz bands. Both imaging method and ray launching method were involved for the channel calculations launched through the base station. Conventionally, it uses 2 MIMO antennas of 2 * 3 dual band for the correct prediction of the signal which needs to be highly achievable from nearby and farther locations and distributed equally among all users. A big advantage of deep MIMO dataset was designed with hybrid beamforming for machine learning applications with the help of massive MIMO and millimeterwave channels used both for indoor as well as outdoor environments [20]. Such dataset parameters include several active base stations, active users, antenna spacing, bandwidth of the system, OFDM parameters, and channel paths. It shows how this dataset can be applied to deep learning by using massive MIMO and millimeterwave beam prediction. In Ref. [21], beamforming neural network dataset for deep learning in the application of millimeter wave for optimizing its design using channel state information constraints. In Ref. [22], performance analysis of massive MIMO hybrid beamforming at downlink using massive MIMO antennas for the application of millimeter waves offers better throughput and spectral efficiencies using the OMP algorithm. ˙In this situation, it is mandatory to improve the spectral efficiency with minimum number of RF chains for the estimation of channels and to calculate accurate results [23]. Antenna gains are increased by resolving the issues of high pass loss. Author [24] proposed low complexity hybrid beamforming for downlink millimeterwave applications works on both the different analog and digital precoding and shows that higher transmission rates that are possible only with analog beamforming solutions. In Ref. [25], a combination of large dimensional analog precoding and small dimensional digital precoding together for reduction of low complexity and hardware costs and constraints. It gives a clear understanding by arranging this combination for the average CSI, better SNR ratio design implementations in massive MIMO systems. In Ref. [26], consideration of a new system model which collaborates with general transceiver hardware impairments at both the base stations (equipped with large antenna arrays) and the single-antenna user equipment (UEs). In Ref. [27], a
1010
K. Bhagat and A. Suri
survey was conducted for the macro-cells millimeter-wave system which discussed the performance of the MIMO systems showing that it is better to model for multidimensional accuracy using scattering models at outdoor scenarios. Our paper is organized in a systematic way following with the section firstly proposed model. The next section includes the mathematical model representation of our model. Thirdly, it includes measurements units following with the results and discussion. The fifth section includes the conclusion and scope in future time.
3 Proposed Model Starting with basic beamforming, the analog beamformers produce a single beam to each antenna array which makes it a little complex for multiple beams. Digital beamformers have an analog baseband channel in a single antenna for processing of digital transceivers for every station to reduce its costs, more power utilization, and system complexities. To overcome such problems, the use of hybrid beamforming is the best choice [28]. The combining of both beamforming in RF and baseband domains smartly forms the patterns transmitted from a large antenna array. In a hybrid beamforming system, the transmission is like others beamforming. If we need to send more than one data stream in a particular sequence over a propagation channel, we need to express it with the help of precoding weights at the transmitter and combining weights to the receiver over an impulse matrix. At last, every single data stream from the users can be recovered from the receiver independently. For signal propagation, the ray-tracing method is applied to the model using an SBR ray-tracing method for the estimation of the tracing of rays assumed to be launched and to be traced. The diagram shown below in Fig. 1 shows the transmission and reception of the signal in the MU-MIMO OFDM system.
Fig. 1 Block diagram of data transmission in MU-MIMO OFDM system [23]
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band
1011
In the transmitter section, one or more user’s data are sent over the transmitter antenna array through the channel encoded with the help of convolution codes. The output of channel encoded bits is then mapped into quadrature amplitude modulation (QAM) with different subcarriers (2, 16, 64, 256) complex symbols which results in generating mapped symbols of each bit for every single user [29]. The output of QAM data of users is then distributed into multiple data streams for transmission. After all this process is completed, the next phase starts where the output is passed into digital baseband precoding for assigning precoding weights for data streams. In our proposed model, these weights are measured using hybrid beamforming with Orthogonal Matching Pursuit (OMP) algorithm and Joint Spatial Division Multiplexing (JSDM) algorithm for single users and multi-users. The Joint Spatial Division multiplexing is used as its performance is better for maximum array response vector. It also allows many base stations for transmission. Channel sounding and estimation is performed at both transmitter and receiver section in reduction of the radio frequency propagation chains [30]. The base stations sound the channel by using the reference signal for transmission so that it is easily detected by the mobile station receiver point to estimate that channel. The mobile stations then transmit the same information back to the base station so that they can easily calculate the precoding required for the upcoming data transmission [31]. After assigning the precoding weights, the MU-MIMO system is used to combine these weights at the receiver resulting in complex weights. The signal received is in the digital form which is further modulated using orthogonal frequency-division multiplexing modulation with pilot contaminated mapping followed by the radio frequency analog beamforming for every single transmitter antenna. This modulated signal is then fed into a scattering MU-MIMO, and then, demodulation is performed for decoding of the originality of the signal when reached its destination point [32]. Table 1 shown below includes the parameters of our model which are generally assumed for experimenting by considering different numbers of users, data streams allotted to such users, and the OFDM system.
3.1 Mathematical Model Representation The channel matrix of the MIMO system is shown below considering H as the channel impulse response. ⎡
h 11 ⎢ h 12 ⎢ Channel matrix, H = ⎢ . ⎣ .. h 1M
⎤ h 21 · · · h 31 h 22 · · · h 32 ⎥ ⎥ ⎥ .. .. ⎦ . . h 2M · · · h 3M
We have assumed the downlink transmission from the first base station which acts at the transmitter to the mobile user. In each transmitter section, baseband digital
1012
K. Bhagat and A. Suri
Table 1 Parameters for proposed model
Serial no
Parameter
Value
1
Users
4, 8
2
Data streams per user
3, 2, 1, 2; 3, 2, 1, 2, 1, 2, 2, 3
3
Base stations
64
4
Receiving antennas
12, 8, 4, 8
5
Bits per subcarrier
2, 4, 6, 8
6
OFDM data symbols
8, 10
7
Position of MS
180
8
Position of BS
90
9
Maximum range of MS
250 m, 500 m, 1 km
10
Carrier Frequency
6 GHz
11
Sampling Rate
100 * 10ˆ6
12
Channel type
Scattering, MIMO
13
Noise figure
4, 6, 8
14
Rays
500
15
FFT length
256
16
Cyclic prefix
64
17
Carriers
234
18
Carrier indices
1:7, 129, 256
19
Code rate
0.33
20
Tail bits
6
21
Modulation schemes
QAM—2, 16, 64, 256
precoder FBB is processed with NS data streams to obtain its outputs. It is then converted into RF chains through an analog precoder TRF to NBS antenna elements for the propagation of the channel. Analog beamformers WRF are combined with RF chains from the user’s antennas to create the output at the receiver. Mathematically, it can be written as: F = Fbb ∗ Frf and W = Wbb ∗ Wrf
(1)
where Fbb = Ns × NtRF matrix, Frf = NtRF × Nt matrix, Wbb = NrRF × Ns matrix, Wrf = Nr × NrRF matrix. The mathematical representation for calculating the precoding weights and complex weights at the transmitter and receiver antennas is as: Precoding weights matrix; F = FBB ∗ FRF × N S × N T
(2)
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band
1013
where FRF = Analog precoder, FBB = Digital decoder, N S = Signal streams, and N T = Transmitter antennas. Combining weight matrix is written as: W = WRF ∗ WBB × N R × N S
(3)
where WRF = Analog combiner, WBB = Digital combiner, N R = Number of receiver antennas, and N S = Number of signal streams. The downlink signal for each user is calculated as: k, yk = Hk Wk Sk + Hk
k
x xa + n k
(4)
n=k
Whereas, k is the number of users, xk = signal allotted to the user, Hk is the channel from the transceiver point to the k user; n k is the noise. For precoding weights set, xk = wk sk
(5)
Then, downlink signal for each user is as: k, yk = Hk wk sk + Hk
k
x xa + n k
(6)
n=k
4 Measurement Units Our results for the measurement of the channel are impaired in an indoor scenario with the MATLAB software using the communication toolbox and phased antenna array toolbox. The communication toolbox plays a major role in it designing of the model and helps to provide algorithms that enable to analyze and gain outputs easily. The antenna-phased array toolbox provides the correct positioning of the transmitter and receiver antennas when locating in large numbers. The performance of the software tool is analyzed once so that it shows accurate results when applied to characterize the indoor radio channels. MIMO hybrid beamforming is designed in such a way that it is highly possible to achieve accurate results. The implementation part of the work is to plan the suitable environment for conducting measurements using the channel sounder. The environment that is considered in our work model is an indoor environment operating at a 6 GHz frequency range working for the application of broadband and the low band for enhanced capacity of the network. Equipment used for the measurement of the channel includes channel sounder, antennas such as
1014
K. Bhagat and A. Suri
Fig. 2 Channel sounder with transmitter unit and receiver unit [30]
isotropic antenna as it radiates power equally among all directions, mobile users at 250 m, 500 m, and 1 km range from the base station, uninterruptible power supply, and laptop for measurement purposes. The number of rays is set to be at 500. The modulation scheme used in it is OFDM whose number of data symbols is 10 and 8. There are four and eight users who are assigned with multiple data streams in the order 3, 2, 1, 2, 2, 2, 1, 3. The next step is to calibrate the channel sounder equipment for the spatially multiplexing system as shown in Fig. 2 consisting of two main units: transmitter and receiver. The function of the channel sounder is to apply the maximum power to the signal in the desired direction [27]. Preamble signal is sent all over to the transmitter for processing the channel at the Rx section. The preamble signal is generated for all sound channels and is then sent to the selected MIMO system. The receiver section then performs pre-amplification, demodulation (OFDM) for all established links. The experiment is practiced by varying parameters such as the transmission distances, propagation channel model, data symbols, and noise figures. We have simulated our results based on the error vector magnitude values, beam patterns, and rays patterns. Following, different cases are assumed for the measurement of their results such as users, data symbols, range of the base stations, noise figures, propagation channel model, and modulation schemes.
5 Results and Discussions MU-mm MIMO communication link between the BS and UEs is validated using scattering-based MIMO spatial channel model with “single bounce ray tracing” approximation. The users are placed randomly at different locations. Different cases are studied, and on its basis, experiments have been performed and analyzed.
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band
1015
Case 1 In this first case, we change the distance of the antennas between the users and then increase our users to check the impact on the bits received to users, EVM RMS values, and antenna patterns. Also, here, the data symbol is set to be at 10 with a noise figure of 10 dB. The number of rays is said to be 500 and remains fixed in it. The propagation channel model selected in it is MIMO as it is better and more efficient than the scattering channel model in Fig. 3. Here, in Fig. 3, it is clearly shown that increased distances does not shows any effect on its values as well as the received bits. They remain constant if the distance is increased from 250 to 1000 m and beyond it. It is observed that for users who have a high number of data streams, RMS EVM value remains low, and for users with single data streams, RMS EVM value is high. This value is increased as base stations are decreased in multiple data streams users. In other words, no impact is seen on the bits, EVM values by increasing the distances (Fig. 3).
Fig. 3 Comparison of EVM values using 2-QAM modulation by varying distances
Fig. 4 Comparison of all modulations using four users by varying distances
1016
K. Bhagat and A. Suri
Fig. 5 Comparison of EVM values using the 16-QAM, 64-QAM scheme by considering eight users
In Case 2, we assumed our users to be increased from four to eight and all other parameters remained constant as we assumed in Case 1. We analyzed that there is very little effect seen on the error vector magnitude values which relys on the performance. Figure 5 as shown, the EVM values go partially high by increasing users, the bit error rate remains low, and the output bits received are the same for the first four users. This value is increased as base stations are decreased in multiple data streams users. The number of bits is increased slightly for multiple data streams as shown in Fig. 5. The transfer bit rate remains the same for multiple data streams as those for 64-QAM. The root means square value is minimized when base station antennas are increased. More data feeds lead to limiting the root mean square value for every single data feed. Case 3 In the third case, we have changed the data symbols from 10 to 8 symbols and compared them with the outputs of Fig. 4. It shows that by varying the data symbols there is an effect seen on our output bits and error vector magnitude values. The output bits of this case are compared with the output bits of Case 2. The EVM values are low if we consider data symbols value to be low and goes on the increase by increasing the symbol rate. For increasing the output bits, the data symbols are set to be at the high value shown in Fig. 6. By comparing both the 64-QAM and 256QAM for eight users, we analyzed that the RMS EVM value of both the modulations is somewhat the same. The EVM value is very high for two data streams in 64-QAM. Case 4 As seen in Figs. 7, 8, 9, and 10, noise figures from 4 to 6 and then from 6 to 8 are varied accordingly, and there is a prominent difference seen in the RMS values, yet the bit remains constant in it. There is again a direct relation seen between the noise figure and the EVM values. It goes on decreasing by decreasing the noise figure levels. Case 5 Fig. 11 shows by varying the propagation channel model from scattering to MIMO channel model we saw that RMS EVM value is high using scatter channel,
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band
1017
Fig. 6 Comparison of values of 64-QAM and 256-QAM schemes by changing its data symbols at eight users
Fig. 7 Comparison of EVM values at different noise levels using the 2-QAM scheme for four users
whereas while using MIMO channel, it can be lower down. There is no effect on the bit error rate. It remains the same using both the propagation channels. Figures 12 and 13 show the 3D radiation antenna pattern using MIMO and scattering channel model at 256-QAM modulation schemes. More lobes are formed with the MIMO model as it works on multiple antennas. If we compare it with the scattering channel model as seen in Fig. 10, the lobes are formed in fewer numbers compared with the scattering design. The lobe that is at the right side of the diagram signifies the data streams of the users. The pointer shows that hybrid beamforming is achieved, and data streams for every user are divided. It is quite clear from the diagram that the signal radiation beam pattern is growing sharper as the antennas at base stations are increased which results in increasing the throughput of the signal to be efficient.
1018
K. Bhagat and A. Suri
Fig. 8 Comparison of EVM values at different noise levels using the 16-QAM scheme for four users
Fig. 9 Comparison of EVM values at different noise levels using the 64-QAM scheme for four users
The constellation diagrams are also shown in Figs. 14, 15, 16, and 17 by considering eight users. It reveals the point tracing blocks of every data stream for higherorder modulation schemes in the working model. The ray-tracing blocks explain to us that the streams retrieved are high for those users with fewer data streams. Position of the blocks represents that those blocks where points are adjusted so closely have a high rate of retrieved streams for users with multiple data streams and the points which are positioned with more space have less rate of retrieved streams for users with single data streams. More recovered data streams for users with multiple data streams result in less SNR ratio, and less retrieved data streams for single-stream users result in high SNR.
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band
1019
Fig. 10 Comparison of EVM values at different noise levels using the 256-QAM scheme for four users
Fig. 11 Comparison using scattering and MIMO channel model of 256-QAM scheme
6 Conclusion and Future Work We have analyzed the hybrid beamforming with the ray-tracing method in which each user can use multi-data streams. The spectral efficiency is improved significantly with the high number of data streams. It is observed that for the users who are having a high number of data streams, RMS EVM values are low, and for those users with single data streams, the RMS EVM value is comparatively high. The possibility of errors is reduced; also, we can compare the bit errors with actual bits transmitted with the bits that are received at decoder per user. The number of antennas required is decreased only if the users are transmitting its information or signal data by using multiple data streams. If the users are transmitting their data by using a single data stream, then the requirement of the antenna is also increased in this case which results in more system complexity. For more throughputs and less bit error rate, multi-data streams
1020 Fig. 12 Antenna pattern design using the MIMO channel model at 256-QAM
Fig. 13 Antenna pattern design using the scattering channel model at 256-QAM
K. Bhagat and A. Suri
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band
Fig. 14 2-QAM ray-tracing blocks
Fig. 15 16-QAM ray-tracing blocks
Fig. 16 64-QAM ray-tracing blocks
1021
1022
K. Bhagat and A. Suri
Fig. 17 256-QAM ray-tracing blocks
are more advantageous to the users for every higher-order modulation scheme as seen from the diagrams. Studies at different parameters can be analyzed in the future through this experiment. The environment can be changed from indoor to outdoor, and then, analysis can be done by comparing different environmental conditions. Additionally, we will do a comparison with the ray-tracing results with different channel models, and analysis should be done precisely and accurately. Complexity is reduced to the minimum level of RF chains in the uplink conversions. The MUMIMO hybrid beamforming is to be designed with the aim of deduction of RMS EVM values users with single data streams.
References 1. Larsson EG, Edfors O, Tufvesson F, Marzetta TL, Alcatel-Lucent (2014) Massive Mimo for next generation wireless systems, USA 2. Gupta A, Jha RK (2015) A survey of 5G network: architecture and emerging technologies 3. Ahmed I, Khammari H, Shahid A, Musa A, Kim KS, De Poorter E, Moerman I (2018) A survey on hybrid beamforming techniques in 5G: architecture and system model perspectives 4. Lizarraga EM, Maggio GN, Dowhuszko AA (2019) Hybrid beamforming algorithm using reinforcement learning for millimeter-wave wireless systems 5. Zou Y, Rave W, Fettweis G (2015) Anlaogbeamsteering for flexible hybrid beamforming design in mm-wave communications 6. Palacios J, Gonzalez-Prelcic N, Mosquera C, Shimizu T, Wang C-H (2021) Hybrid beamforming design for massive MIMO LEO satellite communication 7. Choi J, Lee G, Evans BL (2019) Two-Stage analog combining in hybrid beamforming systems with low-resolution ADCs 8. Lee JH, Kim MJ, Ko YC (2017) Based hybrid beamforming design in MIMO interference channel 9. Hefnawi M (2019) Hybrid beamforming for millimeter-wave heterogeneous networks 10. Ratnam VV, Molisch AF, Bursalioglu OY, Papadopoulos HC (2018) Hybrid beamforming with selection for multi-user massive MIMO systems 11. Chiang H-L, Rave W, Kadur T, Fettweis G (2018) Hybrid beamforming based on implicit channel state information for millimeter-wave links 12. Yoo J, Sung W, Kim I-K (2021) 2D-OPC Subarray Structure for Efficient Hybrid Beamforming over Sparse mmWave Channels
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band
1023
13. Zhang D, Wang Y, Xiang W (2017) Leakage-based hybrid beamforming design for downlink multiuser mmWave MIMO systems 14. Chahrour H, Rajan S, Dansereau R, Balaj B (2018) Hybrid beamforming for interference mitigation in MIMO radar, IEEE 15. Aldubaikhy K, Wu W, Shen X (2018) HBF-PDVG: Hybrid Beamforming and User Selection for UL MU-MIMO mmWave Systems 16. Satyanarayana K, Ivanescu T, El-Hajjar M, Kuo P-H, Mourad A, Hanzo L (2018) Hybrid beamforming design for dual-polarised millimeter wave MIMO systems 17. Mishra D, Johansson H (2020) Optimal channel estimation for hybrid energy beamforming under phase shifter impairments 18. Vlachos E, Thompson J, Kaushik A, Masouros C (2020) Radio-frequency chain selection for energy and spectral efficiency maximization in hybrid beamforming under hardware imperfections 19. Sohrabi F, Student Member, IEEE, Yu W (2017) Hybrid analog and digital beamforming for mmWave OFDM large-scale antenna arrays 20. Hybrid-beamforming design for 5G wireless communications by ELE Times Bureau published on December 12, 2016 21. Dama YAS, Abd-Alhameed RA, Salazar-Quiñonez F, Zhou D, Jones SMR, Gao S (2011) MIMO indoor propagation prediction using 3D shoot-and-bounce ray (SBR) tracing technique for 2.4 GHz and 5 GHz 22. Alkhateeb A (2019) DeepMIMO: a generic deep learning dataset for millimeter-wave and massive MIMO applications 23. Dilli R (2021) Performance analysis of multi-user massive MIMO hybrid beamforming systems at millimeter-wave frequency bands 24. Jiang X, Kaltenberger F (2017) Channel reciprocity calibration in TDD hybrid beamforming massive MIMO systems 25. A Alkhateeb G Leus R Heath 2015 Limited feedback hybrid precoding for multi-user millimeter-wave systems IEEE Trans Wireless Commun 14 11 6481 6494 26. A Liu V Lau 2014 Phase only RF precoding for massive MIMO systems with limited RF chains IEEE Trans Signal Process 62 17 4505 4515 27. E Bjornson J Hoydis M Kountouris M Debbah 2014 Massive MIMO systems with non-ideal hardware: energy efficiency, estimation, ¨ and capacity limits IEEE Trans Inf Theory 60 11 7112 7139 28. Eisenbeis J, Pfaff J, Karg C, Kowalewski J, Li Y, Pauli M, Zwick T (2020) Beam pattern optimization method for subarray-based hybrid beamforming systems 29. Alkhateeb A, El Ayach O, Leus G, Heath RW (2014) Channel estimation and hybrid precoding for millimeter wave cellular systems. IEEE J Selected Topics Signal Process 8(5):831–846 30. open example (‘phasedcomm./MassiveMIMOHybridBeamformingExample’) 31. Y Zhu Q Zhang T Yang 2018 Low-complexity hybrid precoding with dynamic beam assignment in mmwave OFDM systems IEEE Trans Vehicular Technol 67 4 3685 3689 32. Foged LJ, Scialacqua L, Saccardi F, Gross N, Scannavini A (2017) Over the air calibration of massive MIMO TDD arrays for 5G applications. In: 2017 IEEE international symposium on antennas and propagation & USNC/URSI national radio science meeting, pp. 1423–1424, San Diego, CA, USA, 2017 33. Gonzalez J (2021) Hybrid beamforming strategies for secure multicell multiuser mmWave MIMO communication 34. Eisenbeis J, Tingulstad M, Kern N et al (2020)MIMO communication measurements in small cell scenarios at 28 GHz. IEEE Trans Antennas Propag. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Multi-Class Detection of Skin Disease: Detection Using HOG and CNN Hybrid Feature Extraction K. Babna, Arun T. Nair, and K. S. Haritha
Abstract It is essential to monitor and analyse skin problems early on in order to prevent them from spreading and turning into deadly skin cancers. Due to artefacts, poor contrast and similar imaging of scars, moles and other skin lesions, it is difficult to distinguish skin diseases from skin lesions. As a consequence, automated skin lesion identification is performed using lesion detection methods that have been optimised for accuracy, efficiency and performance. Photographs of skin lesions are used to illustrate the suggested technique. To assist in the early detection of skin lesions, the proposed method uses CNN, GLCM and HOG feature extraction. The files include many skin lesions of various kinds. The suggested work includes a pre-processing step that aims to improve the quality and clarity of the skin lesion and to remove artefacts, skin colour and hair, amongst other things. Then, using geodesic active contours, segmentation is done (GAC). Skin lesions may be separated separately during the segmentation step, which is beneficial for subsequent feature extraction. The proposed system detects skin lesions via the use of feature extraction methods such as CNN, GLCM and HOG. Score features are extracted using the CNN technique, whilst texture features are extracted using the GLCM and HOG methods. After collecting characteristics, a multi-class SVM classifier is utilised to categorise skin lesions. Using ResNet-18 transfer learning for feature extraction, many skin diseases, including malignant lesions, may be rapidly classified. Keywords Geodesic active contour · Grey-level co-occurrence matrix · Histogram of oriented gradients · Convolution neural network · ResNet-18 · SVM classifier
K. Babna (B) Electronics and Communication Engineering, KMCT College of Engineering, Kozhikode, Kerala, India A. T. Nair KMCT College of Engineering, Kozhikode, Kerala, India K. S. Haritha College of Engineering, Kannur, Kerala, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_74
1025
1026
K. Babna et al.
1 Introduction The skin, which acts as the body’s outer layer, is the biggest organ in the human body. The skin is made up of up to seven layers of ectodermal tissues that serve as a protective covering over the underlying muscles, bones, ligaments and internal organs. The skin protects the body from harmful substances and viruses, aids in temperature control and gives feelings of cold, heat and touch. A skin lesion is defined as a patch of skin that is abnormal in contrast to the surrounding skin. Infections inside or on the skin are the basic and main cause of skin lesions. Skin lesions may be categorised as primary (present at birth or developed over time) or secondary (resulting from poor treatment of the original skin lesion), both of which can progress to skin cancer. Since a consequence, manual skin cancer diagnosis is not optimal, as the skin lesion is assessed with the naked eye, resulting in mistreatment and ultimately death. Accurate detection of skin cancer at an early stage may significantly increase survival chances. As a consequence, automated detection becomes more reliable, increasing accuracy and efficiency. In the proposed method, three types of skin lesions are included in the dataset. The training sets are passed through four major steps in the methods are pre-processing, segmentation, feature extraction and classification. Here, we propose a novel method of feature extraction stage includes HOG, and GLCM features along with ResNet-18 transfer learning for a better output in the classification process.
2 Literature Review Dermoscopy methods are being developed in order to produce a clear skin lesion site, which improves the visual impact by removing reflections. Automatic skin lesion identification, on the other hand, is difficult owing to artefacts, poor contrast, skin colour, hairs [1] and the visual similarities between melanoma and non-melanoma [2]. All of this may be reduced to a minimum by using pre-processing processes. The exact location of the skin lesion is determined by segmenting the pre-processed skin lesion picture. The wavelet algorithm, basic global thresholding, region-based segmentation, the watershed algorithm, the snakes approach, the Otsu method, active contours and geodesic active contours are some of the segmentation methods available. Geodesic active contours [3] are used to segment the data. There are a variety of methods for extracting characteristics from a segmented skin lesion image [4, 5], including the cash rule, ABCD rule and ABCDE rule, as well as the GLCM rule, the HOG rule, the LBP rule and the HLIFS rule. The ABCD rule is a scoring method that collects asymmetry, colour, border and diameter information [6]; the authors describe how to take the total dermoscopic score and identify melanoma and nonmelanoma using wavelet analysis. Using hog, it is possible to extract the form and edge of a skin lesion [7]. In this study, the recovered feature is passed straight to an SVM classifier, which yields an accuracy of 97.32%. The classifier is the last step in
Multi-Class Detection of Skin Disease …
1027
the process of identifying skin lesions and is responsible for categorising them. This method consists of two parts: teaching and testing. Unknown patterns are fed into the system, and the information acquired during the training process is utilised to categorise the unknown patterns. There are many different kinds of classifiers, such as SVM, KNN, Naive Bayes, and neural networks, amongst others. Author Khan [8] applied features to the SVM, KNN, and Naive Bayes classifiers and achieved accuracy rates of 96%, 84%, and 76%, respectively, for the three classifiers. In their article [9], Victor, Akila, and M. Ghalib describe pre-processing as the first and most significant step of image processing, which helps in the elimination of noise. Pre-processing is the first and most essential stage of image processing, according to the authors. The output of the median filter is supplied as an input to the histogram equalisation phase of the pre-processing stage, and the input of the histogram equalised picture is provided as an input to the segm stage after that. The use of segmentation aids in the identification of the desired area. Area, mean, variance and standard deviation calculations for feature extraction are now carried out using the extracted output from the segmentation phase, and the output is fed into classifiers such as support vector machine (SVM), k-nearest neighbour (KNN), decision tree (dt) and boosted tree (bt). The categorisations are compared one to another. Kasmi and colleagues showed that glcm extracts textural characteristics [10], and that the extracted feature may then be passed straight to a neural network, resulting in a success rate of 95.83%. Skin lesion segmentation is the essential step for most classification approaches. Codella et al. proposed a hybrid approach, integrating convolutional neural network (CNN), sparse coding and support vector machines (SVMs) to detect melanoma [11]. Yu et al. applied a very deep residual network to distinguish melanoma from non-melanoma lesions [12]. Schaefer used an automatic border detection approach [13] to segment the lesion area and then assembled the extracted features, i.e. shape, texture and colour, for melanoma recognition. Moataz et al. practised upon a genetic algorithm with an artificial neural network technique for early detection of the skin cancers and obtained a sensitivity of 91.67% and a specificity of 91.43%. [14]. Kamasak et al. classified dermoscopic images by extracting the Fourier identifiers of the lesion edges after dividing the dermoscopic images. They obtained an accuracy of 83.33% in diagnosing of the melanoma [15] (Table 1).
3 Proposed Methodology The discovery of cutaneous lesions proceeds in stages, as illustrated in Fig. 1. It entails data: data acquisition, segmentation, feature extraction and classification.
1028
K. Babna et al.
Table 1 Review of conventional methods Sl No
Author (citation)
Methodology
1
Jaisakthi et al.
Grab cut and Automatic recognition Recognition of skin k-means algorithms of skin lesion lesion has a few difficult task such as artefacts, low-contrast, skin colour and hairs
Features
Challenges
2
Chung et al.
PDE-based method The pre-processed skin lesion image is segmented used to get the accurate position of skin lesion
3
Hemalatha et al.
Active contour-based segmentation
Preparation of pixels The sample size is of interest for different relatively low image processing Decompose the image into parts for future analysis
4
Salih et al.
Active contour modelling, FCM
Fuzzy clustering based A large amount of on region growing images and different algorithm need for better classification
5
Li et al.
FCRN
A straightforward CNN is proposed for the dermoscopic feature extraction task
Accuracy is comparatively low
6
Kasmi R et al.
ABCD rule
92.8% sensitivity and 90.3% specificity reported
The accuracy is low
7
Bakheet et al.
HOG, SVM
Evaluations on a large Only two type of dataset of dermoscopic classification images have demonstrated
8
Khan et al.
K-mean clustering, Extraction of textural FCM and colour features from the lesion
9
Victor et al.
SVM, KNN
Detect and classify the KNN is 92.70%; benign and the normal SVM is 93.70% image
10
Goel et al.
GLCM, back propagation neural network
GLCM matrix characterises the feature of image
Accuracy is low
11
Celebi et al.
Survey on lesion border detection
A lesion border detection in dermoscopy images
It is a comparison with various criterion
Accuracy is low Only melanoma and non-melanoma detection
Only two classification
(continued)
Multi-Class Detection of Skin Disease …
1029
Table 1 (continued) Sl No
Author (citation)
Methodology
Features
Challenges
12
Yu et al.
Very deep residual networks
Automated melanoma recognition in dermoscopy images
Accuracy low and only two classifications
13
Schaefer et al.
An ensemble classification approach
Ensemble classification
Accuracy is 93.83%
14
Moataz et al.
Artificial intelligence techniques
Image classification using ANN and AI
Sensitivity 91.67% and the specificity 91.43%
15
Kamasak et al.
ANN, SVM, KNN and decision tree
Classification with different machine learning methods
Comparison of different classifiers
Fig. 1 Block diagram
3.1 Data Acquisition 3.1.1
Dataset
It is intended that the initial phase of this project will include the gathering of data from the International Skin Imaging Collaboration’s databases of images of skin lesions (ISIC). There are three types of cancer represented in this experiment: actinic
1030
K. Babna et al.
Fig. 2 Examples for skin lesion images
keratosis, basal cell carcinoma and melanoma. Photographs of skin lesions were taken using data from the ISIC 2017 dataset. Images in JPEG format are utilised. It was decided to divide the skin lesion pictures into three groups. There was 69 actinic keratosis, 80 basal cell carcinoma and 60 melanoma images for training and testing. Figure 2a, b, c shows actinic keratosis, basal cell carcinoma and melanoma, respectively.
3.2 Pre-Processing It is necessary to do pre-processing on the skin lesions datasets in the second phase. To ensure that the lesion is detected in the future stages, pre-processing removes everything else from the sample except for the lesion. Artefacts, poor contrast, hairs, veins, skin tones and moles are all examples of undesirable components. Using the following ways, they are disposed of: it is necessary to convert an RGB picture to greyscale in order for digital systems to recognise the intensity information included within the image. (ii) Following the use of median filtering to reduce noise from the greyscale picture, which enhances the image of the skin lesion, this median-filtered image was utilised for hair identification and removal from the skin lesion. Using bottom hat filtering, which separates the smallest element in an image, such as hair, researchers found hair on skin lesions for the first time. By employing an area filling morphology, which interpolates pixels from the outside in, the found hair may be removed.
3.3 Segmentation The third step involves the segmentation of the images that have been pre-processed. The technique of segmentation is used to pinpoint the exact site of a skin lesion. Geodesic active contours were used in this study to segment the dataset for segmentation (GAC). In general, GAC identifies the most significant changes in the overall
Multi-Class Detection of Skin Disease …
1031
skin lesion, which are usually seen near the lesion’s borders. The Otsu thresholding technique is used to binarize pre-processed skin images, and the binarized image is then applied using the GAC technique.
3.4 Feature Extraction The extraction of characteristics from the segmented skin lesion is the subject of the fourth step. In order to acquire accurate information about the skin lesion, the feature extraction method was utilised to gather information on the lesion’s border [16], colour, diameter, symmetry and textural nature. The identification of skin cancer is a straightforward process. Three distinct feature extraction methods were employed: GLCM, HOG and CNN. GLCM was the most often used methodology.
3.4.1
GLCM (Grey-Level Co-occurrence Matrix)
In textural analysis, GLCM is usually used to get the distributed intensity of an item, which is accomplished via the use of a GLCM. The GLCM [17, 18] algorithm analyses two pixels, one of which is a neighbouring pixel and the other of which is a reference. GLCM may be used to produce contrast, correlation, energy, entropy, homogeneity, prominence and shadow, amongst other things. The computation for each characteristic is described in more detail below: • • • •
Contrast: In a skin lesion, the spatial frequency of texture is measured. Correlation: The linear relationships of a skin lesion at the grey level. Energy: The degree to which a skin lesion is disordered. Homogeneity: The element’s distribution throughout the skin lesion.
3.4.2
HOG (Histogram of Oriented Gradients)
HOG is used to extract information about the shape and edges of objects. It is necessary to utilise the orientation histogram in order to assess the intensity of a lesion’s edges. When it comes to this goal, there are two basic components to consider: the cell and the block.
3.4.3
ResNet-18 Convolution Neural Network
Known as a residual network, an artificial neural network (also known as ResNet) is a network that helps in the development of a deeper neural network by utilising skip connections or shortcuts to avoid particular layers. You will see how skipping allows for the construction of deeper network layers whilst also avoiding the problem of gradients vanishing in the process. ResNet is available in a number of flavours,
1032
K. Babna et al.
Fig. 3 Residual blocks in ResNet
Fig. 4 Architecture ResNet-18
including ResNet-18, ResNet-34 and ResNet-50. ResNet is available in the following sizes: ResNet-18 is a convolutional neural network with 18 layers of layers. Despite the fact that the design is similar, the numbers indicate the amount of layers. The addition of a shortcut to the main route in the conventional neural network results in the generation of residual blocks, as shown in Fig. 3. A diagram showing the architecture of ResNet-18 is shown in Fig. 4.
3.5 Classification There are a plethora of models available for distinguishing between malignant and non-cancerous skin lesions. The SVM, KNN, Naive Bayes and neural networks
Multi-Class Detection of Skin Disease …
1033
algorithms are the most frequently used machine learning methods for lesion classification. Specifically, a multi-SVM classifier is used in this study, with the obtained features being instantly sent on to the classifier. A framework for training and testing that is based on SVMs. The support vector machine method, which makes use of these element vectors, builds and trains our proposed structural model (colour and texture). In the database, each cancer image’s colour and texture attributes are recorded, and these qualities will be used in the following phase of categorisation. This suggested structure based on SVM will categorise cancer pictures in the light of the component vectors colour and texture. Multiple distance metrics are used to measure feature similarity between one picture and other photographs in order to successfully categorise one image with other photos. This was done by comparing the characteristics of the query image with the features of the database images, which was accomplished using SVM classifiers in this instance. Based on these values, the SVM classifier will decide which class the input picture belongs to. The SVM classifier will compute the feature values of the input image and the database images; the SVM classifier will determine which class the input image belongs to.
4 Experimental Results Proposed method is applied to skin lesion images collected from skin lesion images. When applied to ISIC skin lesion pictures, the suggested approach yields excellent results. There are 69 pictures of actinic keratosis in the datasets, 80 images of basal cell carcinoma and 60 images of melanoma in the databases. Classes are taught to classifiers by utilising a number of different training and testing sets. As specified in the method, three different feature extractors are used in the analysis. In addition to GLCM [19] and HOG, CNN is used for feature extraction in this application. Many stages of the process, including as training, pre-processing, segmentation, feature extraction and classification, may be automated using the algorithms that have been proposed. Multi-SVM classifier is used for the classification. We created five push buttons for the easy finding of different stages of the process. For each step, the relevant findings are shown. Figure 5 shows the selected image and the processed stage of the image. The processed images then entering to the noise removal stage. Hair removed image of selected image is shown in Fig. 6. Then, the image undergoes segmentation, and the segmented images are shown in Fig. 7. We need a confusion matrix in order to get a thorough grasp of our suggested models, which is necessary due to the problem of class imbalance. This allows us to identify areas in which our models may be inaccurate, and the confusion matrix is used to assess the performance of the architecture. A comparison of the accuracy and precision of feature extraction using the proposed approach is shown in Table 2 [20, 21]. The accuracy of CNN, HOG and GLCM may be increased to 95.2% by combining them. The statistical result shown in Table 3 shows also the comparison and the better sensitivity and specificity of the classifier.
1034
K. Babna et al.
Fig. 5 Selected image and processed stage
Fig. 6 Hair removed image
Fig. 7 Image after segmentation Table 2 Classification accuracy of proposed method Features
Accuracy
Precision
Recall
GLCM
82%
81%
82.3
HOG
87%
86.4%
87.15
GLCM + HOG
94%
93.2%
95%
GLCM + HOG + CNN
95.2%
94.8
95.13
Multi-Class Detection of Skin Disease …
1035
Table 3 Comparison of different classifier methods Classifier
Sensitivity
Specificity
Positive predicted value
Negative predicted
KNN
86.2
85
87
13
Naïve Bayes
72
82
85.2
14.8
SVM
95.13
95.13
89
11
Specificity (SP) and sensitivity (S) of classifier models are used to evaluate their performance (SE). They are defined as follows: Specificity =
TN TN + FT
Sensitivity =
TP TP + FN
where TP TN FP FN
correctly classified positive class (True positive). correctly classified negative class (True negative). incorrectly classified positive class (False positive). incorrectly classified negative class (False negative).
5 Advantages and Future Scope By using hybrid feature extraction of HOG, GLCM along with the convolution neural network features, the proposed method became more accurate. Classifier got high sensitivity and specificity compared with other methods. Here, use multi-SVM classifier for classifier so we can add more skin disease classes and works like a skin specialist who can identify any of the skin disease in the future. Further investigations on deeper convolution network for classification may increase the accuracy.
6 Conclusion Skin lesions were classified using hybrid feature extraction in this proposed study, which is described in detail below. The suggested technique is utilised to Kaggle images of skin lesions taken with a digital camera. Images of three distinct kinds of skin diseases, including melanoma, are included inside the files. In addition to GLCM and HOG, CNN is used for feature extraction in this application. The GAC method was used to segment the skin lesion, which was suggested as a solution. It has been possible to achieve segmentation with a JA of 0.9 and a DI of 0.82 in
1036
K. Babna et al.
this study. It is possible to extract CNN features by utilising the ResNet-18 transfer learning technique, whilst texture features may be retrieved by using the GLCM and HOG methods. In this instance, we use a multi-SVM classifier to allow for the inclusion of additional skin disease classes in the future, as well as to serve as a skin expert capable of detecting any skin condition in the future. The suggested technique was tested on a variety of datasets, including pictures of lesions on the skin. The multi-SVM classifier categorises the pictures into three different categories of skin diseases with 95.2% accuracy and 924.8% precision, according to the manufacturer. As a result, we may be able to add more skin ailment classifications in the future and act as a skin expert who is capable of detecting any skin condition. In the light of the information gathered, we can infer that accuracy is enhanced after the implementation of augmentation performance. Also, possible is the use of this technique on a neural network platform to enhance accuracy.
References 1. Jaisakthi SM, Mirunalini P, Aravindan C (2018) Automated skin lesion segmentation of dermoscopic images using GrabCut and k-means algorithms. IET Comput Vis 12(8):1088–1095 2. Chung DH, Sapiro G (2000) Segmenting skin lesions with partial-differential- equations-based image processing algorithms. IEEE Trans Med Imaging 19(7):763–767 3. Hemalatha RJ, Thamizhvani TR, Dhivya AJ, Joseph JE, Babu B, Chandrasekaran R (2018) Active contour based segmentation techniques for medical image analysis. Med Biolog Image Anal 4:17 4. Salih SH, Al-Raheym S (2018) Comparison of skin lesion image between segmentation algorithms. J Theor Appl Inf Technol 96(18) 5. Li Y, Shen L (2018) Skin lesion analysis towards melanoma detection using deep learning network. Sensors 18(2):556 6. Kasmi R, Mokrani K (2016) Classification of malignant melanoma and benign skin lesions: implementation of automatic ABCD rule. IET Image Proc 10(6):448–455 7. Bakheet S (2017) An SVM framework for malignant melanoma detection based on optimized hog features. Computation 5(1):4 8. Khan MQ, Hussain A, Rehman SU, Khan U, Maqsood M, Mehmood K, Khan MA (2019) Classification of melanoma and nevus in digital images for diagnosis of skin cancer. IEEE Access 7:90132–90144 9. Victor A, Ghalib M (2017) Automatic detection and classification of skin cancer. Int J Intell Eng Syst 10(3):444–451 10. Goel R, Singh S (2015) Skin cancer detection using glcm matrix analysis and back propagation neural network classifier. Int J Comput Appl 112(9) 11. Kawahara, J.; Hamarneh, G. Fully convolutional networks to detect clinical dermoscopic features. arXiv 2017, arXiv:1703.04559. 12. Jerant AF, Johnson JT, Sheridan CD, Caffrey TJ (2000) Early detection and treatment of skin cancer. Am Fam Phys 62:381–382 13. Binder M, Schwarz M, Winkler, A, Steiner A, Kaider A, Wolff K, Pehamberger H (1995) Epiluminescence microscopy. A useful tool for the diagnosis of pigmented skin lesions for formally trained dermatologists. Arch Dermatol 131:286–291 14. Celebi ME, Wen Q, Iyatomi H, Shimizu K, Zhou H, Schaefer G (2015) A state-of-the-art survey on lesion border detection in dermoscopy images. In: Dermoscopy image analysis. CRC Press, Boca Raton, FL, USA
Multi-Class Detection of Skin Disease …
1037
15. Erkol B, Moss RH, Stanley RJ, Stoecker WV, Hvatum E (2005) Automatic lesion boundary detection in dermoscopy images using gradient vector flow snakes. Skin Res Technol 11:17–26 16. Celebi ME, Aslandogan YA, Stoecker WV, Iyatomi H, Oka H, Chen X (2007) Unsupervised border detection in dermoscopy images. Skin Res Technol 13 17. Nair AT, Muthuvel K (2021) Automated screening of diabetic retinopathy with optimized deep convolutional neural network: enhanced moth flame model. J Mech Med Biol 21(1):2150005 (29 pages) World Scientific Publishing Company. https://doi.org/10.1142/S02195194215 00056. 18. Nair AT, Muthuvel K (2020) Blood vessel segmentation and diabetic retinopathy recognition: an intelligent approach. Comput Methods Biomech Biomed Eng Imaging Vis. Taylor & Francis. https://doi.org/10.1080/21681163.2019.1647459 19. Nair AT, Muthuvel K (2020) Research contributions with algorithmic comparison on the diagnosis of diabetic retinopathy. Int J Image Graphics 20(4):2050030 (29pages). World Scientic Publishing Company. https://doi.org/10.1142/S0219467820500308 20. Nair AT, Muthuvel K (2021) Effectual evaluation on diabetic retinopathy Lecture notes in networks and systems, vol 191. Springer, Singapore. https://doi.org/10.1007/978-981-16-07394_53 21. Nair AT, Muthuvel K (2021) Blood vessel segmentation for diabetic retinopathy. J Phys Conf Ser 1921012001
DeepFake Creation and Detection Using LSTM, ResNext Dhruti Patel, Juhie Motiani, Anjali Patel, and Mohammed Husain Bohara
Abstract Technology was created as a means to make our lives easier. There is nothing more fast-paced than the advancements in the field of technology. Decades ago, virtual assistants were only a far-fetched imagination; now, these fantasies have become a reality. Machines have started to recognize speech and predict stock prices. Witnessing self-driving cars in the near future will be an anticipated wonderment. The underlying technology behind all these products is machine learning. Machine learning is ingrained in our lives in ways we cannot fathom. It may have many good sides but it is misused for personal and base motives. For example, various forged videos, images, and other content termed as DeepFakes are getting viral in a matter of seconds. Such videos and images can now be created with the usage of deep learning technology, which is a subset of machine learning. This article discusses the mechanism behind the creation and detection of DeepFakes. DeepFakes is a term generated from deep learning and fake. As the name suggests, it is the creation of fabricated and fake content, distributed in the form of videos and images. Deep learning is one of the burgeoning fields which has helped us to solve many intricate problems. It has been applied to fields like computer vision, natural processing language, and human-level control. However, in recent years, deep learning-based software has accelerated the creation of DeepFake videos and images without leaving any traces of falsification which can engender threats to privacy, democracy, and national security. The motivation behind this research article was to spread awareness among the digitally influenced youth of the twenty-first century about the amount of fabricated content that is circulated on the internet. This research article presents one algorithm used to create DeepFake videos and, more significantly, the detection of DeepFake videos by recapitulating the results of proposed methods. In addition, we also have discussed the positive aspects of DeepFake creation and detection, where they can be used and prove to be beneficial without causing any harm. D. Patel (B) · J. Motiani · A. Patel · M. H. Bohara Department of Computer Science and Engineering, Devang Patel Institute of Advance Technology and Research (DEPSTAR), Charotar University of Science and Technology (CHARUSAT), Changa 388421, India M. H. Bohara e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_75
1039
1040
D. Patel et al.
Keywords DeepFake · DeepFake creation · DeepFake detection · Generative Adversarial Networks
1 Introduction Fake images and fake videos formed by DeepFake methods have become a great public concern. The term “DeepFake” means to swap the face of one person by the face of another. The first DeepFake video was generated by a Reddit user in 2017 to morph the celebrity photos faces in pornography by using machine learning algorithms. Furthermore, some other harmful uses of DeepFake are fake news and financial fraud. Due to these factors, research traditionally devoted to general media forensics is being revitalized and is now dedicating growing efforts to detecting facial manipulation in images and videos [1]. The enhancing intricacy of cell phones as well as the development of social networks have resulted in an enormous increase in brand-new digital object contents in recent times. This extensive use of electronic images has resulted in a rise in strategies for changing image web content [2]. Up until recently, such techniques stayed out of range for the majority of customers because they were lengthy as well as tedious, and they necessitated a high level of computer vision domain name proficiency. Those constraints have continuously discolored away, thanks to recent growths in maker learning and accessibility to vast quantities of training data. Consequently, the time required to produce and control electronic web content has lowered significantly, allowing unskilled individuals to modify the content at their leisure. Deep generative versions, particularly, have lately been commonly used to create fabricated photos that appear all-natural. These models are based upon deep neural networks, which can estimate the real-data distribution of a provided training dataset. Consequently, variants might be added to the found-out circulation by testing from it. Two of the most frequently made use of and also effective techniques are Variational Autoencoders (VAE) and also Generative Adversarial Networks (GAN). Particularly, GAN techniques have lately been pushing the limits of cutting-edge outcomes, boosting the resolution and top quality of pictures generated. Therefore, deep generative designs are ushering in a new period of AI-based fake image generation, paving the way for the quick dissemination of top-quality tampered photo web content [2]. Face manipulation is broken down into four categories: (i) (ii) (iii) (iv)
Entire face synthesis Identity swap (Deep Fakes) Attribute manipulation, and Expression swap [1].
Illustrations of these face manipulation categories are given below in Fig. 1. One of the mechanisms which can manipulate or change digital content is “DeepFake.” DeepFake is a word that is derived from “Deep Learning” and “Fake.” It is a mechanism through which one can morph or change an image over a video
DeepFake Creation and Detection Using LSTM, ResNext
1041
Fig. 1 Examples of facial manipulation groups of real and fake images [1, 3, 4].
and thereby creating a new fabricated video that may appear to be real. The underlying mechanism for the whole of DeepFake development are the autoencoders and the Generative Adversarial Networks (GAN), which are deep learning models. Their usage is concentrated in the computer vision field. These models are used to analyze a person’s facial expressions and movements and synthesize facial images of someone with similar expressions and movements. So, through the DeepFake mechanism, we can create a video of a person saying or doing things that the other person is doing just by using an image of the target person and a video of the source person.
2 Methods In the following section, we describe our approach toward DeepFake creation and DeepFake detection algorithms.
1042
D. Patel et al.
2.1 DeepFake Creation The popularity of DeepFakes can be attributed to the creative users who target celebrities and politicians to generate fake and humorous content. DeepFakes have burgeoned over the past 3–4 years due to the quality of tempered videos and also the easy-to-use capability of its application to a broad range of users from professional to amateur. These applications evolved on deep learning techniques. One such application is called Faceswap, captioned as “the leading free and open-source multi-platform Deepfakes software.” Deep autoencoders constitute the blueprint of this application. The idea behind using autoencoders is dimensionality reduction and image compression because deep learning is well known for extracting the higher-level features from the raw input. A brief introduction about the techniques used is given below: 1.
2.
3.
CNN: Convolutional Neural Network (CNN or ConvNet) is a category of deep neural networks which are primarily used to do image recognition, image classification, object detection, etc. Image classification is the challenge of taking an input image and outputting a category or a possibility of instructions that best describes the image. In CNN, we take an image as an input, assign significance to its numerous aspects/functions in the image and have the ability to distinguish one from another. The preprocessing required in CNN is a lot lesser compared to different classification algorithms [5, 6]. RNN: RNN is short for Recurrent Neural Network. RNN is used to remember the past and selections made by the RNN are influenced by the past. Only one additional input vector is provided to the RNN to produce single or multiple output vectors. These outputs are not only governed by the weights that are applied on the input but also by a “hidden” state vector. This hidden state vector represents the context supporting previous input(s)/output(s) [7]. GAN: GANs stand for Generative Adversarial Networks. As the name implies, GANs are largely used for generative purposes. They generate new and fake outputs based on a particular input. GANs comprise of two sub models, which are the generator model and the discriminator model. The difference between the two is that the generator model, as the name suggests, is trained to generate or create new examples, whereas the discriminator model is more of a binary classification model that tries to identify the generated output as real or fake. Interestingly, the discriminator model is trained till it believes half the times that the generator model has produced a plausible output.
DeepFake creation uses two autoencoders, one trained on the face of the target and the other on the source. Once the autoencoders are trained, their outputs are switched, and then something interesting happens. A DeepFake is created! The autoencoder separates the inert properties from the face picture and the decoder is utilized to reproduce the face pictures. Two encoder-decoder sets are needed to trade faces between source pictures and target pictures where each pair is
DeepFake Creation and Detection Using LSTM, ResNext
1043
Fig. 2 A diagram depicting the working of two encoder-decoder pairs [8]
utilized to prepare a picture set, and the encoder’s boundaries are divided among two organization sets. This strategy assists normal encoders with finding and learning likeness between two arrangements of face pictures since faces by and large have comparative credits like eyes, nose, and so forth. We can say that the encoder provides data in a lower dimension, thus performing dimensionality reduction. The job of the decoder is to reconstruct the face again from the compressed and extracted latent features. Figure 2 shows the DeepFake creation measure. One may notice that the diagram shown in Figure 2 uses the same encoder but two different decoders. Since latent features are common to all faces, the job of an encoder remains uniform for all inputs. However, in order to generate a morphed picture, one needs to use the decoder of the driving image on the source image.
2.2 DeepFake Detection Creating DeepFake and spreading it over the social media platforms and swapping the faces of celebrities and politicians to bodies in porn images or videos can be threatening to one’s privacy. Sometimes DeepFake threatens the world’s security with videos of world leaders with fake speeches and falsification purposes and even used to generate fake satellite images. Therefore, it can be menacing to privacy, democracy, and national security. This raises concern to detect DeepFake videos from genuine ones. A brief introduction about the techniques used for detection of deepfakes is given below:
1044
1.
2.
D. Patel et al.
LSTM: Long Short-Term Memory or LSTM are artificial recurrent neural networks, which as the name suggests are capable of learning the order dependence in sequence prediction problems. LSTMs are majorly used in tortuous machine learning domains such as machine translation and speech recognition [9]. ResNext: It is a network architecture that is mainly used for image classification. This architecture is built on repeated building blocks that aggregate a set of transformations with the same topology.
DeepFake detection can be considered as a binary classification problem between authentic videos and tampered ones. DeepFake detection techniques are different for fake image detection and fake video detection. We have primarily focused on DeepFake video detection. These fabricated videos can be identified by temporal features across the frames [8]. The temporal features are the features related to time, these time-domain features are simple to extract and provide easy physical interpretation. A video is composed of coherent frames. Any manipulation done on a video occurs on a frame-by-frame basis, the unevenness between contiguous frames manifest as temporal discrepancies across frames. It can be metaphorically described as replacing a puzzle piece with another random piece that doesn’t properly fit. Current DeepFake detection methods rely on the drawbacks of the DeepFake generation pipeline. The detection method using LSTM and ResNext parallels the method used to create a DeepFake by the generative adversarial network. This method exploits certain characteristics of the DeepFake video, since this task involves the usage of computational resources and production time, the algorithm will combine the face images of a fixed size only. The next step is to subject these images to affinal warping. The idea is to check whether collinearity is preserved or not after an affine transformation. Affinal warping will unveil any resolution inconsistencies between the warped face area and surrounding context. The target video is divided into frames, and the corresponding features are extracted by a ResNext Convolutional Neural Network (CNN). The aforementioned temporal inconsistencies are captured by the Recurrent Neural Network (RNN) with the Long Short-Term Memory (LSTM). The simplification of the process is done by directly simulating the resolution inconsistency in the affine face wrappings. This is then used to train the ResNext CNN model [10]. The prediction flow of DeepFake detection is given in Fig. 3.
Fig. 3 Architecture of DeepFake detection
DeepFake Creation and Detection Using LSTM, ResNext
1045
3 Experiments In this segment, we present the devices and exploratory arrangement we used to plan and foster the model to implement the model. We will introduce the outcomes gained from the execution of the DeepFake detection model and give an understanding of the exploratory outcomes [11].
3.1 Dataset The detection model has been trained with three pairs of datasets. The variety of datasets allow the model to train from a diverse dataset and create a more generic model. The description of the datasets has been listed below: 1.
2.
3.
FaceForensics ++: This dataset largely consists of manipulated datasets. It has 1000 original videos tampered with four automated face manipulation techniques, which are, DeepFakes, Face2Face, FaceSwap, and NeuralTextures [3]. This dataset itself has been derived from 977 distinct YouTube videos. These videos primarily contain frontal face occlusions which allow the automated tampering methods to regenerate realistic forgeries. This data can be used for both image and video classification. Celeb-DF: This dataset is a large-scale dataset for DeepFake Forensics. It stands apart from other datasets for having DeepFake synthesized videos having similar visual quality at par with those circulated online [4]. It contains 590 original videos collected from YouTube. This dataset has been created carefully keeping in mind to maintain the diversity of the dataset, thus, it contains subjects of different ages, ethnic groups, and genders. It also contains 5639 corresponding DeepFake videos. DeepFake Detection Challenge: [12] This dataset is provided by Kaggle. This data contains files in the ‘.mp4’ format, which is split and compressed into sets of 10 GB apiece. The files have been labeled REAL or FAKE and accordingly the model is trained.
The prepared dataset used to train the model include 50% of the real videos and 50% the manipulated DeepFake videos. The dataset is split into a 70–30 ratio, i.e., 70% for training and 30% for testing.
3.2 Proposed System First, the dataset is split in a 70–30 ratio. In the preprocessing phase, the videos in the dataset are split into frames. After that face is detected, the detected face is cropped
1046 Table 1 Depicting respective models and their accuracies [10]. Source: https://github. com/abhijitjadhav1998/Dee pfake_detection_using_ deep_learning
D. Patel et al. Model
No. of videos
No. of frames
Accuracy
1
6000
10
84.21461
2
6000
20
87.79160
3
6000
40
89.34681
4
6000
60
90.59097
5
6000
80
91.49818
6
6000
100
93.58794
from the frame. Frames with no detected faces are ignored in preprocessing. The model includes ResNext CNN, followed by one LSTM layer and the preprocessed data of cropped faces videos are split into train and test dataset. ResNext is used to accurately extract and detect the frame-level features. LSTM is used for sequence processing so that temporal analysis can be done on the frames. And, then the video is passed to the trained model for prediction whether the video is fake or real [10].
3.3 Evaluation Models trained and their respective accuracy [10] (Table 1). For this project, we have used the second model with an accuracy of 87.79160. The relation between the number of frames and the accuracy seems to be directly proportional. The increased number of frames allows the algorithm to easily identify the distortions between the adjacent frames and thus yielding a higher accuracy. However, the larger the number of frames the slower the algorithm responds.
4 Results This section depicts the working of our model. The DeepFake creation has been depicted in Fig. 4. An image and a driver video are passed to the model to create a resultant DeepFake. The generated DeepFake is of low resolution because when the input is of high resolution, an extremely accurate GAN is required to generate fake videos which are hard to detect. The poor resolution of the DeepFake makes it easily identifiable by the naked eye, however, advancements in DeepFake technology are making it increasingly difficult to identify DeepFakes even with the help of detection algorithms. DeepFake detection results have been depicted in Figs. 5 and 6. With the help of LSTM and ResNext, we were able to build a model that detects fabricated videos based on the inherent inconsistencies between the frames. These results were derived by passing test data to a pre-trained model. The model was trained on a dataset of videos containing low-resolution videos divided into 20 frames per video.
DeepFake Creation and Detection Using LSTM, ResNext
1047
Fig. 4 Output of DeepFake creation
Fig. 5 Output of DeepFake detection showing that the provide video is REAL along with the confidence of prediction
1048
D. Patel et al.
Fig. 6 Output of DeepFake detection showing that the provided video is FAKE along with the confidence of prediction
DeepFake detection algorithms need to catch up with the constantly improving creation algorithms. As more and more fabricated content floods the internet, the more detection algorithms are unable to detect.
5 Challenges Although the performance and quality of the creation of DeepFake videos and especially in the detection of DeepFake videos have greatly increased [13], the challenges affecting the ongoing detection methods are discussed below. The following are some challenges in DeepFake detection: (1)
Quality of DeepFake Datasets: To develop DeepFake detection methods, we require the availability of copious datasets. However, the available datasets have certain impairments such as there lies a significant difference in the visual quality to the actual fake videos circulated on the internet. These imperfections in the dataset can either be some color discrepancy, some parts of the original face are still visible, low-quality synthesized faces, or certain inconsistencies in the face orientations [13].
DeepFake Creation and Detection Using LSTM, ResNext
(2)
3)
1049
Performance Evaluation: The current DeepFake detection is considered as a binary classification problem, where each video is classified as real or fake. The DeepFake detection methods are helpful only when the fabricated videos are created from the corresponding DeepFake creation algorithms. Nevertheless, many factors affect the detection methods when we implement them in the real world, i.e., videos fabricated in other ways than DeepFake, a video with multiple faces, and the picture is murkier. Therefore, binary classification needs to be expanded to multi-class, and multi-label to handle the complexities of the real world [13]. Social Media Laundering: A myriad number of DeepFake videos are spreading through social media, i.e., Instagram, Facebook, and Twitter. To reduce the bandwidth of the network and to protect users’ privacy, these types of videos usually remove meta-data, reduce the video size, and then compress it before uploading, it’s usually known as social media laundering. Because of this, we cannot recover the traces of manipulation and chances are high for detecting a fake video as a real one. Therefore, the robustness of the DeepFake detection method should be improved to avoid such types of issues [13].
The limitation of the proposed system of DeepFake detection is that the method we have used has not considered the audio. So, the DeepFakes with audios will not be detected [10].
6 Current Systems In this part, we give insights regarding the current system that can be utilized to generate DeepFake videos. Currently, applications such as FakeApp, Zao, and DeepFaceLab are used to create DeepFake videos and images. The first DeepFake application to appear on the internet was called DeepFaceLab. This application is very useful for understanding the step-by-step process of a DeepFake creation. DeepFaceLab allows users to swap faces, replace entire faces, age of people, and change lip movements. So, one could easily morph an image or video and create a phony one. Zao is a Chinese application that allows users to create DeepFake videos, but it is observed that Zao cannot create natural images of Indian faces because it is mainly trained with Chinese facial data. It can be clearly known whether it is real or fake using the Zao app for Indian faces. Faceswap is another DeepFake application that is free and open source. It is supported by Tensorflow, Keras, and Python. The active online forum of Faceswap allows interested individuals to get useful insights on the process of creation of DeepFakes. The forum accepts questions and also provides tutorials on how to create DeepFake [14].
1050
D. Patel et al.
7 Discussion In this paper, we have delineated and assessed the mechanism of face manipulation (DeepFake). We have explained the methods for creating the fake identity swap video and also how to detect such videos. We were able to create a low-resolution DeepFake video as the accessible frequency spectrum is much smaller. Although, we can create a DeepFake video and also detect one. DeepFake is a technology that has many negative aspects and if not applied wisely may cause a threat to society and turn out to be dangerous. Since most online users believe stuff on the internet without verifying them, such DeepFakes can create rumors. Looking at the positive aspects, the concept of DeepFakes can be applied to create DeepFake videos/images which can be used in a creative way, like one who is not able to speak or communicate properly can swap their face with the video of a good orator and hence can create their video. It can also be used in film industries for updating the episodes without reshooting them. Face manipulated videos can be created for entertainment purposes unless not creating any threat to society or someone’s privacy. The DeepFake detection method can be applied in the courtrooms to check whether the evidence provided in digital form is real or fake. It could be very beneficial for such scenarios. Every coin has two sides, and thus, technology has its pros and cons so if used wisely can be a boon for society.
8 Summary and Conclusion Many real-life problems have been solved as a result of technological advancements, but certain technologies have more negative aspects than positive ones. One of such examples is face manipulation or DeepFake or identity swap. In this article, we have discussed in detail the concept of DeepFakes and briefly about the DeepFake creation algorithms and detection algorithm. In addition to that we have implemented the algorithms and displayed the results accordingly. We were able to create a lowresolution DeepFake video as the available frequency spectrum is much smaller. We were able to create the DeepFake detection model with an accuracy of 87%. DeepFake has more negative aspects than positive ones. Hence, more research is being carried out for various detection methods. There is a gap between the DeepFake creation and detection technologies, and the latter is lagging. It is important to educate society on the malicious intent behind the creation of DeepFakes. Acknowledgements Every work that one accomplishes relies on constant motivation, benevolence, and moral support of people around us. Therefore, we want to avail this opportunity to show our appreciation to a number of people who extended their precious time, support, and assistance in the completion of this research article. This research article has given us a wide opportunity to think and expand our knowledge about new and emerging technologies. Through this research article, we were able to explore more about the current research and related experiments. Therefore, we would like to show our gratitude to our mentors for their guidelines throughout the process and for encouraging us to look forward to learning and implementing new emerging technologies.
DeepFake Creation and Detection Using LSTM, ResNext
1051
References 1. Tolosana R, Vera-Rodriguez R, Fierrez J. Morales A, Ortega-Garcia J (2020) DeepFakes and beyond: a survey of face manipulation and fake detection. Inf Fusion 64. https://doi.org/10. 1016/j.inffus.2020.06.014 2. Durall R et al (2020) Unmasking DeepFakes with simple features. arXiv:1911.00686 3. Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) FaceForensics++: learning to detect manipulated facial images 4. Li Y, Sun P, Qi H, Lyu S (2020) Celeb-DF: a large-scale challenging dataset for DeepFake forensics. In: IEEE conference on computer vision and pattern recognition (CVPR) 5. Modi S, Bohara MH (2021) Facial emotion recognition using convolution neural network. 2021 5th International conference on intelligent computing and control systems (ICICCS). IEEE 6. Parekh M (16 July 2019) A brief guide to convolutional neural network (CNN). Medium. https:// medium.com/nybles/a-brief-guide-to-convolutional-neural-network-cnn-642f47e88ed4 7. Venkatachalam M (1 March 2019) Recurrent neural networks. Towards data science. https:// towardsdatascience.com/recurrent-neural-networks-d4642c9bc7ce 8. Nguyen T. Nguyen CM. Nguyen T. Nguyen D, Nahavandi S (2019) Deep learning for DeepFakes creation and detection: a survey 9. Brownlee J (17 June 2019) A gentle introduction to generative adversarial networks (GANs). Machine learning mastery. https://machinelearningmastery.com/what-are-generative-advers arial-networks-gans/ 10. Jadhav A et al (2020) DeepFake video detection using neural networks. IJSRD—Int J Sci Res Dev 8(1):4. https://github.com/abhijitjadhav1998/DeepFake_detection_using_deep_learning/ blob/master/Documentation/IJSRDV8I10860.pdf 11. Wodajo D, Solomon A (2021) DeepFake video detection using convolutional vision transformer, p 9. Arrive. https://arxiv.org/pdf/2102.11126.pdf 12. DeepFake detection challenge dataset. Kaggle. https://www.kaggle.com/c/DeepFake-detect ion-challenge/data 13. Lyu S (2020) DeepFake Detection: current challenges and next steps. In: 2020 ieee international conference on multimedia & expo workshops (ICMEW). IEEE Computer Society, pp 1–6 14. Zhukova A (24 August 2020) 7 Best DeepFake apps and websites. OnlineTech Tips. https:// www.online-tech-tips.com/cool-websites/7-best-DeepFake-apps-and-websites/
Classification of Plant Seedling Using Deep Learning Techniques K. S. Kalaivani, C. S. Kanimozhiselvi, N. Priyadharshini, S. Nivedhashri, and R. Nandhini
Abstract Agriculture is an important livelihood of many nations. Increase in world’s population leads to an increasing demand for food and cash crops. It is a big deal to growth crops in this changing climate conditions. So, it is important to increase the production in agriculture at low cost. In agriculture, weeds are a big issue for farmers. Weeds take up the nutrients and water which is given for crops that causes huge loss. Chemical herbicides are used to kill weeds, but they are harmful to the ecosystem and raise costs. In order to protect the environment and save money, an automated machine vision system that can identify crops and remove weeds in a safe and cost-effective manner is needed. To classify between plant species, the dataset used was downloaded from Kaggle platform that consist of 12 different species of plants, namely black-grass, charlock, cleavers, common chickweed, common wheat, fat hen, loose silky-bent, maize, scentless mayweed, shepherds purse, small-flowered cranesbill, sugar beet. Convolutional neural network (CNN) was used to classify the plant species. The accuracy obtained was 74%. In order to improve the accuracy, CNN variants like VGG-19 and ResNet-101 are used in this work. The accuracy obtained for VGG-19 and ResNet-101 are 87% and 94%, respectively. From the results obtained, it is found that ResNet-101 model outperforms VGG-19 and basic CNN for classifying plant species. In addition, hyperparameter tuning like batch size, learning rate, optimizer is performed on ResNet-101. Keywords Deep learning · Plant seedling classification · CNN · Resnet-101 · VGG-19
K. S. Kalaivani (B) · C. S. Kanimozhiselvi · N. Priyadharshini · S. Nivedhashri · R. Nandhini Department of Computer Science and Engineering, Kongu Engineering College, Perundurai, Erode, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_76
1053
1054
K. S. Kalaivani et al.
1 Introduction Food and oxygen become a necessary thing for the living organisms in the world. The countries similar to India do agriculture as their important work, proper automation of the farming process will help to maximize crop yield while also ensuring longterm productivity and sustainability [1, 2]. The yielding of crop in agriculture is challenging due to weed invasion on farm lands. In general, weeds are unwanted plants in farm land. It has no valuable benefits like nutrition, food, and medication. The growth of weed is faster when compared to crops and hence deplete the growth of crops. Weeds take the nutrient and the space which is required for crops to grow. To obtain better productivity, it is necessary to remove the weeds from farming land at early stage of growth. The manual removal of weeds is not so easy and efficient process. For precision agriculture decision-making system is employed to save resources, control weeds, and to minimize the cost. Robots are involved for removing the weeds from field. It is necessary to accurately detect a weed from field through vision machines [3–6]. In this work, the dataset is taken from Kaggle platform which consist of 12 species of plants. The dataset contains totally 5545 images. The basic CNN is widely used to classify the plant species. To improve the classification accuracy, VGG-19 and ResNet-101 architecture is used. VGG-19 architecture has nineteen layer deep network, and Resnet-101 has 101 layers. The proposed architecture helps to enhance the vision mission to classify plant species accurately when compared to existing work. Ashqar [7] has implemented CNN architecture for classifying plant seedlings. The implemented algorithms are used extensively in this task to recognize images. On a held-out test range, the implemented model finds eighty percent correctly, demonstrating the feasibility of this method. NKemelu [8] compared the performances of two traditional algorithms and a CNN. From the obtained result, it is found that when comparing CNN with traditional algorithms, the basic CNN architecture obtain more accuracy. Elnemr [9] developed CNN architecture to differentiate plant seedling images between crop and weed at early growth stage. Due to the combination of normalization layer, pooling layer, and the filter used, performance has been increased in this system. With help of elaborated CNN, this work achieved higher precision. In this work, complexity is reduced, and also, CNN helps the system to achieve accurate classification. The segmentation phase is involved in order to classify plant species. This work can be combined with IoT for controlling the growth of weeds by spraying herbicides. This system achieved accuracy of 90%. Alimboyong [10] proposed deep learning approaches like CNN and RNN architectures. The dataset used for classification contains 4234 images belonging to 12 plants species taken from Aarhus University Signal Processing group. This system achieves low consumption of memory and high processing capability. Performance metrics like sensitivity, specificity, and accuracy are considered for evaluation. This system involves three phases. First, the data are augmented and then compared with the existing one. Second one is a combination of RNN and CNN using various other
Classification of Plant Seedling Using Deep Learning Techniques Table 1 Number of images present in each species
1055
Type of species
Number of images
Black-grass
263
Charlock
390
Cleavers
287
Common chickweed
611
Common wheat
221
Fat hen
475
Loose silky-bent
654
Maize
221
Scentless mayweed
516
Shepherds mayweed
516
Small-flowered cranesbill
496
Sugar beet
385
plant seedling dataset. Finally, a mobile application for plant seedling images is created using a developed model. This work produced an accuracy of 90%. Dyrmann [11] worked on deep learning techniques for classification of crop and weed species from different dataset. The dataset contains 10,413 images where it has 22 different crop and weed species. These images are taken from six different dataset and combined. The proposed convolution neural network recognizes plant species in color images. The model achieves approximately eighty six percent in classification between species. Rahman [12] developed deep learning techniques to identify plant seedlings at early stage. The quality, quantity, f1-score, and accuracy were measured for the proposed architecture. By using this calculation, comparison is made with previous implemented architecture. From this work, ResNet-50 performs well when compared to previous model. It produces accuracy of 88% than previous work. Haijian [13] proposed CNN variants like VGG-19 is used for classification of pest in vegetables. Fully connected layers have been optimized by VGG-19. The analysis shows that VGG-19 performs better than existing work. The accuracy obtained is 97%. Sun [14] has designed twenty-six layer deep learning model with eight residual building blocks. The prediction is done at natural environment. The implemented model predicts 91% accurately.
2 Materials and Methods The dataset is taken from Kaggle platform. The total number of images present in this dataset is 5550. Training dataset contains 4750 images, and test dataset contains 790 images. This dataset contains 12 species of plants (Table 1).
1056
K. S. Kalaivani et al.
Fig. 1 CNN architecture
2.1 Convolutional Neural Network CNN is widely used to classify plant seedlings in order to distinguish among crop and weed species at an early stage of development. The CNN has three layers such as input layer, hidden layers, and output layer. Before passing images to input layer, the images are equally resized. There are five stages of learning layers in hidden layer. Each convolutional layer at each stage uses the filters with kernel sizes of 3 × 3 and a number of filters such as 32, 64, 128, 256, and 1024, respectively (Fig. 1).
2.2 VGG-19 VGG-19 is a 19-layer network with 16 convolution layers along with pooling layers, two fully connected layers, and a dense layer (Fig. 2). It uses 16 convolution layers with different filters. The number of filters is 64, 128, 256, 512 used in different convolution layers. Each layer 1 and layer 2 has two convolution layers with 64 filters and 128 filters, respectively. Layer 3 has four
Fig. 2 VGG-19 architecture
Classification of Plant Seedling Using Deep Learning Techniques
1057
Fig. 3 Skip connection in ResNet
convolution layers with 256 filters. Each layer 4 and 5 has four convolution layers with 512 filters. Layer 5 has three convolution layers with filter 512. The input of VGG-19 is a fixed size of 224 × 224 × 3. The input size of the filters is given as 3 × 3. In fully connected layers, the first two fully connected layers have 4096 channels each activated by ReLU activation function, and the third fully connected layer has 1000 channels which acts as an output layer with softmax activation.
2.3 ResNet-101 It is made up of residual blocks. There are 101 layers in Resnet-101. It introduced an approach called residual network to solve the problem of vanishing and exploding gradient. In this network, a technique called skip connection is used. The skip connection skips few layers and directly connects to the output. The advantage of adding skip connection is the layers will be skipped by regularization if it affects the performance of architecture. This results in training neural networks without the problem of vanishing or exploding gradient (Fig. 3).
2.4 Optimizer Optimizer is a method or an algorithm that can be used to reduce the loss by changing the attributes of neural networks such as weights and learning rates. It is used to solve the optimization problems by reducing the function. The following are the different types of optimizers used. AdaGrad. It is an optimization method that adjusts the learning rate to the parameters. It updates low learning rates for parameters related with usually occurring features and high learning rates for parameters related with unusually occurring features. It takes default learning rate value of 0.01 and ignores the need to tune the learning rate manually.
1058
K. S. Kalaivani et al.
Stochastic Gradient Descent (SGD). In each iteration, the model parameter gets updated. It means the loss function is tested, and model is updated after each training sample. The advantage of this technique is that it requires low memory. Root-Mean-Square Propagation (RMS Prop). It balances the step size by normalize the gradient itself. It uses adaptive learning rate, which means the learning rate changes overtime. Adam. It is a replacement optimization method for SGD to train deep learning models. It combines the properties of AdaGrad and RMSprop to provide optimization when handle with large and noisy problems. It is efficient because the default parameters perform well in most of the problems.
3 Results and Discussion To improve the accuracy, CNN variants like VGG-19 and ResNet-101 are used in this work. The accuracy obtained for VGG-19 and ResNet-101 is 87% and 94%, respectively. From the results obtained, it is found that ResNet-101 model outperforms VGG-19 and basic CNN for classifying plant species (Fig. 4). The above graph shows the accuracy comparison of VGG-19 and ResNet-101 model with different Epochs (Fig. 5). The above graph shows the accuracy with different batch sizes for ResNet-101. When batch size is increased, the accuracy also gets increased (Fig. 6). The above graph shows the accuracy by varying learning rates like 0.1, 0.01, 0.001, and 0.0001. The higher accuracy obtained for 0.0001 learning rate.
Fig. 4 Epoch-wise accuracy
Classification of Plant Seedling Using Deep Learning Techniques
1059
Fig. 5 Batch-wise accuracy
Fig. 6 Accuracy for different learning rate
4 Conclusion The main aim of this project is classify the plant species in order to remove the weeds in the farmland. Removing weeds help the plants to get enough nutrients and water which in turn makes the plant grow healthier. This increases the productivity and gives good yield to the farmers. In this paper, we proposed VGG-19 model and
1060
K. S. Kalaivani et al.
ResNet-101 model for plant seedlings classification. A dataset containing images of 12 different species is used in this project. The total number of images present in this dataset is 5550. The model can detect and differentiate a weed from other plants. By comparing the accuracy of VGG-19 and ResNet-101, it is found that ResNet-101 model outperforms VGG-19 model. Further the hyperparameter tuning is performed on ResNet-101. The different optimizers like Adam, Adagrad, SGD, and RMS prop are used. Among the optimizers, Adam optimizer with 128 batch size and 0.0001 learning rate performs better on ResNet-101. The accuracy obtained is 94%. The proposed system can be extended to work with robotic arms for performing actual weeding operation in large farmlands.
References 1. Chaki J, Parekh R, Bhattacharya S (2018) Plant leaf classification using multiple descriptors: a hierarchical approach. J King Saud Univ—Comput Inf Sci 1–15 2. Prakash RM (2017) Detection of leaf diseases and classification using digital ımage processing 3. Kamilaris A, Prenafeta-boldú FX (2018) Deep learning in agriculture: a survey 147(July 2017):70–90, 2018 4. Mohanty SP, Hughes D, Salathé M (2016) Using deep learning for ımage-based plant disease detection 5. Grinblat GL, Uzal LC, Larese MG, Granitto PM (2016) Deep learning for plant identification using vein morphological patterns. Comput Electron Agric 127:418–424 6. Lecun Y, Bengio Y, Hinton G (2015) Deep learning 7. Ashqar BAM, Bassem S, Abu Nasser, AbuNaser SS (2019) Plant seedlings classification using deep learning 8. Nkemelu DK, Omeiza D, Lubalo N (2018) Deep convolutional neural network for plant seedling classification. arXiv preprint arXiv:1811.08404 9. Elnemr, HA (2019) Convolutional neural network architecture for plant seedling classification. Int J Adv Comput Sci Appl 10 10. Alimboyong CR, Hernandez AA (2019) An improved deep neural for classification of plant seedling images. 2019 IEEE 15th international colloquium on signal processing & its applications (CSPA). IEEE 11. Dyrmann M, Karstoft H, Midtiby HS (2016) Plant species classification using deep convolutional neural network. Biosyst Eng 151(2016):72–80 12. Rahman NR, Hasan MAM, Shin J (2020) Performance comparison of different convolutional neural network architectures for plant seedling classification. 2020 2nd International conference on advanced information and communication technology (ICAICT), Dhaka, Bangladesh, 2020, pp 146150. https://doi.org/10.1109/ICAICT51780.2020.93333468 13. Xia D et al (2018) Insect detection and classification based on an improved convolutional neural network. Sensors 18(12):4169 14. Sun Y et al (2017) Deep learning for plant identification in natural environment. Comput Intell Neurosci 2017
A Robust Authentication and Authorization System Powered by Deep Learning and Incorporating Hand Signals Suresh Palarimath, N. R. Wilfred Blessing, T. Sujatha, M. Pyingkodi, Bernard H. Ugalde, and Roopa Devi Palarimath Abstract Hand gesture recognition signals have several uses. Communication for visually challenged people, such as the elderly or the handicapped, health care, automobile user interfaces, security, and surveillance are just a few of the possible applications. A deep learning-based edge computing system is designed and implemented in this article, and it is capable of authenticating users without the need of a physical token or physical contact. To authenticate, the sign language digits are represented by hand gestures. Deep learning is used to categorize digit hand motions in a language based on signs. The suggested deep learning model’s characteristics and bottleneck module are based on deep residual networks. The collection of sign language digits accessible online shows that the model achieves a classification accuracy of 97.20%, which is excellent. Model B+ of the Raspberry Pi 3 is used as an edge computing device, and the model is loaded on it. Edge computing is implemented in two phases. First, the gadget collects and stores initial camera data in a buffer. The model then calculates the digit using the first photograph in the buffer as input and an inference rate of 280 ms.
S. Palarimath (B) · N. R. W. Blessing · B. H. Ugalde Department of IT, University of Technology and Applied Sciences, Salalah, Oman e-mail: [email protected] N. R. W. Blessing e-mail: [email protected] B. H. Ugalde e-mail: [email protected] T. Sujatha Department of CSE, Karunya Institute of Technology and Sciences, Coimbatore, India e-mail: [email protected] M. Pyingkodi Department of Computer Applications, Kongu Engineering College, Erode, India e-mail: [email protected] R. D. Palarimath Faculty of Computer Science, Mansarovar Global University, Bhopal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9_77
1061
1062
S. Palarimath et al.
Keywords Artificial intelligence · Edge computing · Machine learning · MobileNetV2 · Convolutional neural network · Deep learning · Hand signal · Raspberry Pi
1 Introduction Contactless biometric identification is considered to be more sanitary, secure, and efficient than traditional biometric identification. Physiological and behavioural biometrics are two types of biological measurements. Physiological biometrics include fingerprints, facial characteristics, palm prints, retinas, and ears. Common behavioural biometrics include keystrokes, and signatures. When it comes to body language, hand gestures are an element that may be communicated via the centre of a person’s palm, their finger position, and the shape formed by the hand. Hand gestures may be divided into two categories: static and dynamic. In contrast to the dynamic gesture, which consists of a sequence of hand motions such as waving, the static gesture is defined by its fixed form, as indicated by its name [1]. Prior to deep learning, biometric identification relied on handcrafted features retrieved using methods like SIFT [2] and other similar approaches. This decade’s debut of deep learning revolutionized the field of biometric identification. Most modern biometric identification systems use convolutional neural networks or variations of them. It is a deep multilayer artificial neural network that uses convolutional neural networks to learn. Their convolutional approach offers an excellent representation of the input photographs straight from raw pixels with little to no pre-processing and can easily recognize visual patterns [3]. Their convolutional approach, which creates a good representation of the input photographs straight from raw pixels with little to no pre-processing, makes them popular in medical imaging applications. The representations learned by CNN models are visual features, which are more effective than handmade features [2]. Listed below are some of the most recent deep learning-based biometric identification systems: [4] utilizes CNN for face biometric authentication, Aizat et al. [5] combine graph neural networks and CNN for palm print identification, and Aizat et al. [5] use deep neural networks to identify and authenticate users based on speech, referencing deep neural networks (DNNs), deep neural networks for fingerprinting. While deep learning has made significant advances in biometric identification, there are still many difficulties to solve. Some obstacles include the need for more demanding datasets to train the models, interpretable deep learning models, real-time model deployment, memory-efficient, and security issues. We present an authentication system that validates a person by verifying an ‘authentication code’ created by a memory-efficient CNN model. The system generates an authentication code for each user consisting of digits ranging from 0 to 9. The goal of using convolutional neural networks (CNNs) to classify images into categories automatically is called image classification. Deep learning models are
A Robust Authentication and Authorization System Powered …
1063
available now that perform well on image classification datasets using convolutional neural networks. On the ILSVRC-2012 dataset [6], the sharpness-aware minimization technique on the CNN model obtains 86.73% top-1 accuracy, whereas the EnAET model gets 1.99% top-1 accuracy CIFAR-10 dataset. All of the preceding models are large and need a lot of memory and model inference time. This has proven problematic, especially when inference must occur on a computer device at the network edge or in the cloud (central processing system). Complex models requiring more significant computing resources have been used to attain cutting-edge performance [7]. We present a memory-efficient CNN model for use in edge computing systems. The suggested memory-efficient CNN model is paralleled to the existing memory-efficient technology that is cutting-edge CNN model, MobileNetV2.
2 Literature Review In the last decade, many papers with regard to processing hand gestures were published and have become an interesting topic for researchers, where some of these studies have considered a range of different applications. However, the hand gesture interaction systems depend on recognition rate, which is affected by some factors, including the type of camera used and its resolution, the technique utilized for hand segmentation, and the recognition algorithm used. This section summarizes some key papers with respect to hand gestures. In [8], the author has discussed the recognizing hand gesture for the slandered Indonesian sign language, using Myo armband as hand gesture sensors. Authors in [9] have reviewed the various hand gesture techniques and merits and demerits. Authors in [1] used the Kinet V2 depth sensor for identifying hand gestures and suggested three different scenarios to get effective outcomes. The authors in [10] used the inertial measurement unit (IMU) sensors for human–machine interface (HMI) applications using hand gesture recognition (HGR) algorithm. Authors in [11] discussed hands-free presentations using hand gesture recognition; in this paper, authors have discussed the design and wearable armband to perform this hands-free presentation. Finally, the authors in [12] addressed the development and deployment of an end-toend deep learning-based edge computing system from conception to completion for gesture recognition authentication from start to finish. The technique to rectify the rotational orientation of the MYO bracelet device’s sensor that has been described by the authors in [13] was addressed in detail. In order to identify the highest possible energy channel for a given samples from the gesture’s timing synchronization set WaveOut, the method is used. Researchers say it can improve the recognition and classification of hand gestures. Authors in [14] used hand segmentation masks combined with RGB frames to identify real-time hand gesture recognition. Furthermore, the authors in [15] discussed the handy feature of hand gesture recognition: utilizing hand gestures in times of emergencies. Recognition of hand gestures was achieved via the use of vector machine-based classification as well as deep learning-based classification techniques.
1064
S. Palarimath et al.
3 Proposed Model 3.1 Dataset We used the Sign Language Digits Dataset [16] to train the proposed CNN and MobileNetV2 systems. The dataset has 2500 samples. There are ten courses numbered 0–9. Figure 1 illustrates the four classes. Each sample is [150 × 150] pixels. Table 1 lists the dataset’s statistics, including the total number of samples in each grouping. The dataset is separated into three parts: training, validation, and testing. Because there are ten courses, the test data is split evenly among them. The test set included 650 samples, or 25.20% of the whole dataset, with 63 instances
Fig. 1 Hand signals and the decoded numbers. (Source dataset [16])
Table 1 Number of samples from dataset [16] for training, testing, and validation Class
No. of samples
No. of training
No. of validation
No. of testing
0
250
149
38
63
1
250
149
38
63
2
250
149
38
63
3
250
149
38
63
4
250
149
38
63
5
250
149
38
63
6
250
149
38
63
7
250
149
38
63
8
250
149
38
63
9
250
149
38
63
2500
1490
380
630
Total
A Robust Authentication and Authorization System Powered …
1065
Fig. 2 Suggested authentication system’s categorization (digit recognition) job
from each class chosen at random. It is divided into two training and validation sets, with samples comprising 59.60 and 15.20% of the total dataset, respectively. Each sample in the collection has an image size of 150 × 150 pixels. These images were upscaled to 256 × 256 pixels using bicubic interpolation [17]. Due to the real-time testing restrictions stated in Sect. 3.4, we upscale the images to 256 × 256 pixels. The scaled photograph samples are then utilized as input pictures for the deep learning system. Section 3.2 details the proposed authentication mechanism.
3.2 Proposed Authentication System An authentication system presented in this article shows how to employ CNNs to produce authentication codes utilizing hand movement and sign language numbers. It consists of two steps: image capture and classification (digit recognition). Figure 2 depicts the system’s whole categorization task. Before feeding the input picture into the deep learning model for digit recognition, the system resizes it to 256 × 256 pixels. Figure 2 shows this.
3.3 Deep Learning Models for Hand Gesture Recognition Using MobileNetV2 This section addresses deep learning-based CNNs for hand gesture identification using MobileNetV2. The MobileNetV2 architecture is the state-of-the-art CNN that outperforms other models in applications like object recognition [18]. The network has efficient depthseparable convolutional layers. The network’s premise was to encode intermediate input and output across these bottlenecks efficiently. We train the MobileNetV2 model gradually using two transfer learning methods: feature extraction and fine-tuning. We first train the model using feature extraction and then refine it using fine-tuning. These are briefly mentioned below:
1066
3.3.1
S. Palarimath et al.
Feature Extraction
When the MobileNetV2 model is trained on the ImageNet dataset, which is utilized in this method, the model is ready to use. The ‘classification layer’ of the MobileNetV2 model is not included since the ImageNet dataset has more classes than the ‘Sign Language Digit Dataset’ we utilize. The basic model uses 53 pre-trained layers of the MobileNetV2 model. The learnt features are a four-dimensional tensor of size [None, 8, 8, 1280]. A global average pooling 2D function [19] flattens the basic model output into a 2-dimensional matrix of size [None, 1280]. Then, we add a dense layer, which is the dataset’s categorization layer. Using the feature extraction approach, we do not train the base model (MobileNetV2, except the final layer), but rather utilize it in order to extract characteristics from the input sample and feed them into the dense layer (the additional classification layer according to the ‘Sign Language Digit Dataset’). This technique uses the ‘RMSprop optimizer’ (the training of neural networks using gradient-based optimization) [20].
3.3.2
Fine-Tuning
The MobileNetV2 basic model (minus its classification layer) is pre-trained using ImageNet dataset, and an additional dense layer fine-tunes the model (classification layer according to our dataset). On the ‘Sign Language Digit Dataset’, we train 53 layers, including the last dense layer. Compared to the previous approach, this method has more trainable parameters. ‘RMSprop optimizer’ was used to train the MobileNetV2 model [20].
3.4 Deployment on Edge Computing Device The Raspberry Pi 3 Model B + microprocessor is used for the job. The Raspberry Pi 3 Model B + is the newest single-board edge computing device. It has more faster CPU and better connection than the Raspberry Pi 3 Model B. We used a Raspberry Pi Camera V2 module to collect pictures in real time. The trained model predicts hand movements on the Raspberry Pi 3. Using the proposed and MobileNetV2 models in their original form will create prediction delay. Therefore, these models’ TensorFlow Lite (TFL) versions are created and deployed to address the latency problem. During real-time testing, the camera’s pictures represent the system’s input. Before sending these pictures to the deep learning model, they are shrunk (downscaled) to 256 × 256 pixels. Images with a resolution less than 256 × 256 were warped for real-time prediction. Figure 3 depicts the complete authentication process. Figure 3 shows the counter variable ‘i’, which keeps track of the number of iterations in the system, and loop ‘n’ times, where n is the authentication code length. It follows two fundamental stages within the loop. The system initially reads the live camera feed from the Pi Camera and saves it in the frame buffer. The input picture (the first image
A Robust Authentication and Authorization System Powered …
1067
Fig. 3 Diagram of the process flow for the production of authentication codes (note: where n = 5, the length of the code is denoted by the letter ‘n’)
frame in the buffer) is then scaled to 256 × 256 pixels, and the anticipated digit class is displayed on the screen. After 2 s, the human changes the digit sign and the frame buffer is cleared. After that, the cycle is performed a second time. This sign digit changeover pause time may be customized to meet specific application requirements. Following the conclusion of the loop, the authentication code is shown. It is printed for the purpose of verification.
1068
S. Palarimath et al.
4 Results Figure 4 depicts a few instances of projected labels produced by the proposed model based on the test data and a representation of real-time predictions of the proposed mechanism, because the model is accurate in predicting the authentication code under both uniform and non-uniform illumination circumstances.
4.1 Discussion The findings show that authentication using hand gestures and deep learning on a Raspberry Pi is possible. The created technology could generate an authentication PIN without touching the keyboard. Because ATMs have enclosures on both sides, the digit signals (hand motions) are hidden from view and the entire authentication is low cost. On the other hand, as demonstrated in Fig. 5, code creation is independent of illumination conditions and achieves excellent accuracy in both. The Raspberry Pi 3 Model B+, used in this study as an edge computing device, may offer different outputs to input the security code.
Fig. 4 Predictions about the planned CNN model in identifying the samples taken from the dataset [16]
Fig. 5 A total of five numbers are predicted in real time under a variety of natural lighting condition situations (illumination situations that are both uniform and non-uniform), with the accompanying sign images being taken, and in each prediction, an edge computing device is used to process data. This picture depicts the final code that was produced as a result of the suggested CNN algorithm being implemented
A Robust Authentication and Authorization System Powered …
1069
4.2 Limitations The number of samples available for each class (0–9) in the hand gesture sign recognition dataset is restricted in the dataset being used. However, the proposed deep learning model’s performance is auspicious. Furthermore, the dataset may be further improved in future by including new classifications. On the other hand, the model’s performance may be somewhat reduced due to the motion artefacts that occur during picture capture. In addition, the performance may be adversely affected by the camera’s field of vision restricted and the practical placement of the hands in front of the camera.
5 Conclusions This study designed a comprehensive system that uses sign language digit hand gestures to authenticate users in public and commercial locations, including ATMs, information desks, railways, and shopping malls. A convolutional neural network (CNN) was utilized in this study to generate an authentication code using camera input, making it genuinely contactless. The whole deep learning model inference was made on a Raspberry Pi 3 Model B+ CPU with a connected camera, making it ideal for large-scale deployment. The suggested CNN obtained 97.20% accuracy on the test dataset. The proposed system operates in real time with a model inference rate of 280 ms per picture frame and may replace traditional touchpad and keypad authentication techniques. Furthermore, it is possible to expand the dataset in future to include classifications such as ‘accept’, ‘close’, ‘home’, ‘ok’, and ‘go back’ to minimize further the requirement for surface interaction in these secure systems. As previously mentioned, deep learning techniques may be used to a variety of applications, including water quality calculation, medical illness prediction, and instructional computing [20–25]. Future studies in these fields have been planned by the authors.
References 1. Oudah M, Al- A, Chahl J (2021) Elderly care based on hand gestures using kinect sensor. Computers 10:1–25 2. Zhou R, Zhong D, Han J (2013) Fingerprint identification using SIFT-based minutia descriptors and improved all descriptor-pair matching. Sensors (Basel). 13(3):3142–3156. https://doi.org/ 10.3390/s130303142 3. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377. https:// doi.org/10.1016/j.patcog.2017.10.013
1070
S. Palarimath et al.
4. Zulfiqar M, Syed F, Khan M, Khurshid K (2019) Deep face recognition for biometric authentication. In: International conference on electrical, communication, and computer engineering (ICECCE), Swat, Pakistan, pp 24–25. https://doi.org/10.1109/ICECCE47252.2019.8940725 5. Aizat K, Mohamed O, Orken M, Ainur A, Zhumazhanov B (2020) Identification and authentication of user voice using DNN features and i-vector. Cogent Eng 7:1751557 6. Foret P, Kleiner A, Mobahi H, Neyshabur B (2020) Sharpness-aware minimization for efficiently improving generalization. arXiv arXiv:cs.LG/2010.01412 7. Deng BL, Li G, Han S, Shi L, Xie Y (2020) Model compression and hardware acceleration for neural networks: a comprehensive survey. Proc IEEE 108:485–532 8. Anwar A, Basuki A, Sigit R (2020) Hand gesture recognition for Indonesian sign language interpreter system with myo armband using support vector machine. Klik—Kumpul J Ilmu Komput 7:164 9. Oudah M, Al-Naji A, Chahl J (2020) Hand gesture recognition based on computer vision: a review of techniques. J Imaging 6 10. Kim M, Cho J, Lee S, Jung Y (2019) Imu sensor-based hand gesture recognition for humanmachine interfaces. Sensors (Switzerland) 19:1–13 11. Goh JEE, Goh MLI, Estrada JS, Lindog NC, Tabulog JCM, Talavera NEC (2017) Presentationaid armband with IMU, EMG sensor and bluetooth for free-hand writing and hand gesture recognition. Int J Comput Sci Res 1:65–77 12. Dayal A, Paluru N, Cenkeramaddi LR, Soumya J, Yalavarthy PK (2021) Design and implementation of deep learning based contactless authentication system using hand gestures. Electron 10:1–15 13. López LIB et al (2020) An energy-based method for orientation correction of EMG bracelet sensors in hand gesture recognition systems. Sensors (Switzerland) 20:1–34 14. Benitez-Garcia G et al (2021) Improving real-time hand gesture recognition with semantic segmentation. Sensors (Switzerland) 21:1–16 15. Adithya V, Rajesh R (2020) Hand gestures for emergency situations: a video dataset based on words from Indian sign language. Data Brief 31:106016 16. Mavi A (2020) A new dataset and proposed convolutional neural network architecture for classification of American sign language digits. arXiv:2011.08927 [cs.CV] https://github.com/ ardamavi/Sign-Language-Digits-Dataset 17. Dengwen Z (2010) An edge-directed bicubic interpolation algorithm. In: 3rd International Congress on image and signal processing, pp 1186–1189. https://doi.org/10.1109/CISP.2010. 5647190 18. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE/CVF conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18–22 June 2018, pp 4510–4520 19. Lin M. Chen Q, Yan S (2013) Network in network. arxiv arXiv:cs:CV/1312.4400 20. Blessing NRW, Benedict S (2017) Computing principles in formulating water quality informatics and indexing methods: an ample review. J Comput Theor Nanosci 14(4):1671–1681 21. Sangeetha SB, Blessing NRW, Yuvaraj N, Sneha JA (2020) Improved training pattern in back propagation neural networks using holt-winters’ seasonal method and gradient boosting model. Appl Mach Learn. ISBN 978-981-15-3356-3, Springer, pp 189–198 22. Blessing NRW, Benedict S (2014) Extensive survey on software tools and systems destined for water quality. Int J Appl Eng Res 9(22):12991–13008 23. Blessing NRW, Benedict S (2016) Aquascopev 1: a water quality analysis software for computing water data using aquascope quality indexing (AQI) scheme. Asian J Inf Technol 15(16):2897–2907
A Robust Authentication and Authorization System Powered …
1071
24. Haidar SW, Blessing NRW, Singh SP, Johri P, Subitha GS (2018) EEapp: an effectual application for mobile based student centered learning system. In: The 4th international conference on computing, communication & automation (ICCCA 2018), December 14–15, India. IEEE, pp. 1–4 25. Pyingkodi M, Blessing NRW, Shanthi S, Mahalakshmi R, Gowthami M (2020) Performance evaluation of machine learning algorithm for lung cancer. In: International conference on artificial intelligence & smart computing (ICAISC-2020), Bannari Amman Institute of Technology, Erode, India, Springer, October 15–17, 92
Author Index
A Abdullah, S. K., 129 Abirami, A., 491 Adhiya, Krishnakant P., 953 Agarwal, Ritu, 251 Ahamed, A. Fayaz, 49 Ahmed, Md Apu, 345 Aishwarya, A., 529 Alam, Sanim, 345 Ambika, G. N., 453 Anand, Garima, 709 Anithaashri, T. P., 93 Anudeep, D. S. V. N. S. S., 825 Anu Keerthika, M. S., 373 Anusha, C., 529 Aparna, R., 801 Arefin, Md. Rashedul, 345 Arivarasan, S., 463 Arthi, R., 321 Arun Raj Kumar, P., 599 Aski, Vidyadhar Jinnappa, 991 Athira, M., 359
B Babna, K., 1025 Baby Shalini, V., 373 Balaji, A. Siva, 825 Banerjee, Archita, 491 Baranidharan, V., 265 Barik, Kousik, 491 Bedekar, Gayatri, 441 Benchaib, Imane, 671 Bhagat, Kavita, 1007 Bhatia, Sajal, 785
Bhavani, Y., 281 Bhingarkar, Sukhada, 731 Bindu, R., 477 Blessing, N. R. Wilfred, 1061 Bodavarapu, Pavan Nageswar Reddy, 1 Bohara, Mohammed Husain, 659, 697, 1039 Bose, Soumalya, 627 C Chandana, C. L., 801 Chandrika, C. P., 683 Channavar, Manjunath, 837 Charan, M., 905 Chhabra, Gunjan, 385 D Das, Saptarshi, 491 deRito, Christopher, 785 Deshpande, Girish R., 837 Dhaka, Vijaypal Singh, 991 Dhanushmathi, S. K., 75 Dharaniga, M., 75 Dharsheeni, R., 75 Dinesh Kumar, J. R., 75 Duvvuru, Jahnavi, 331 E Ebenezer, V., 541 F Ferdous, Tafannum, 345
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet of Things, Lecture Notes on Data Engineering and Communications Technologies 101, https://doi.org/10.1007/978-981-16-7610-9
1073
1074 Firke, Shital N., 193
G Gadige, Sunil Prasad, 643 Gautam, Binav, 709 Gayathri, K., 743 George, Joseph, 777 Ghoumid, Kamal, 671 Gopinath, Nimitha, 359 Goudar, R. H., 441
H Habib, Md. Rawshan, 345 Halsana, Aanzil Akram, 411 Haritha, K. S., 173, 359, 1025 Hasan, S. K. Rohit, 129 Hegde, Suchetha G., 801 Hossain, Sk Md Mosaddek, 411
I Indira, D. N. V. S. L. S., 519 Indrani, B., 139 Indumathi, P., 905 Isaac Samson, S., 265 Islam, Saiful, 345 Iswarya, M., 373
J Jacob, T. Prem, 905 Jain, Amruta, 103 Jain, Ayur, 385 Jain, Ranjan Bala, 193 Jayalakshmi, V., 17 Jayashree, H. N., 801 Jeyakumar, M. K., 777 Jeyanthi, D. V., 139 Joshi, Brijendra Kumar, 399 Joshi, Namra, 885
K Kalaivani, K. S., 425, 1053 Kaliraj, S., 207 Kallimani, Jagadish S., 683 Kalpana, B. Khandale, 921 Kamakshi, P., 281 KanimozhiSelvi, C. S., 425, 1053 Kanisha, B., 207 Kapse, Avinash S., 895 Karthik, N., 541
Author Index Kaushik, Sunil, 385 Kavya Sri, E., 281 Keerthana, M., 49 Keerthivasan, S. N., 265 Kejriwal, Shubham, 505 Kitawat, Parth, 505 Konar, Karabi, 491 Krishnaveni, S., 219, 291, 321 Kumar, Badri Deva, 331 Kumar, Jalesh, 965 Kumar, Sunil, 991 Kurhade, Swapnali, 505
L Lakshmi, K. S. Vijaya, 825 Lalithadevi, B., 219 Lamrani, Yousra, 671 Laxmi, M., 565 Lokhande, Unik, 115
M Mahender, C. Namrata, 921 Mandara, S., 309 Mane, Sunil, 103 Manjunathachari, K., 643 Manohar, N., 309 Meghana, B., 529 Miloud Ar-Reyouchi, El, 671 Mohammed, S. Jassem, 33 Mohanraj, V., 943 Mollah, Ayatullah Faruk, 129 Monika, 709 Motiani, Juhie, 1039
N Nair, Arun T., 173, 239, 359, 585, 1025 Nallavan, G., 855 Namboothiri, Kesavan, 173, 359 Namritha, M., 425 Nandhini, R., 1053 Narayan, T. Ashish, 1 Naveen, B., 837 Naveen Kumar, S., 265 Nayak, Amit, 697 Nikhil, Chalasani, 331 Nithya, N., 855 Nivedhashri, S., 1053 Niveetha, S. K., 425
Author Index P Palarimath, Roopa Devi, 1061 Palarimath, Suresh, 1061 Pandian, R., 905 Pandit, Ankush, 627 Parashar, Anubha, 991 Patel, Anjali, 1039 Patel, Dhruti, 1039 Patel, Nehal, 613 Patel, Ritik H., 613 Patel, Rutvik, 613 Patel, Sandip, 613 Patel, Vivek, 659 Patel, Yashashree, 697 Patil, Rudragoud, 441 Patil, Shweta, 953 Pavithra, K., 425 Phaneendra, H. D., 477 Prajeeth, Ashwin, 709 Prakash, S., 463, 565 Prathik, A., 541 Prathiksha, R., 49 Pravin, A., 905 Priyadharshini, N., 1053 Priyadharsini, K., 75 Priya, D. Mohana, 49 Punjabi, Zeel, 659 Purwar, Ravindra Kumar, 975 Pyingkodi, M., 1061
R Radhika, N., 33 Rahul, 709 Rajakumar, R., 905 Rajaram, Kanchana, 759 Rajendran, P. Selvi, 93 Raju, C., 265 Ramya Sri, S., 373 Rathod, Kishansinh, 659 Ravichandran, G., 93 Rekha, K. S., 477 Remya Krishnan, P., 599 Ruchika, 975 Rupesh, M., 529
S Saadhavi, S., 477 Sadhana, S. Ram., 477 Sai Siddharth, P., 869 Santhosh Kumar, S., 159 Santhosh, T. C., 837
1075 Sapra, Shruti J., 895 Saravanan, S., 869 Saritha Haridas, A., 173 Sasirekha, R., 207 Selvakumar, S., 759 Sen, Anindya, 627 Senthilkumar, J., 943 Shadiya Febin, E. P., 239 Shah, Jaini, 115 Shah, Panth, 697 Shah, Rashi, 115 Shamna, P. A., 585 Sharma, Pankaj Kumar, 759 Shawmee, Tahsina Tashrif, 345 Shidaganti, Ganeshayya, 565 Shivamurthy, G., 565 Shukla, Mukul, 399 Sikha, O. K., 59 Sindhu Sai, Y., 281 Singh, Aakash, 505 Singh, Bhupinder, 251 Singh, Manoj Kumar, 643 Singh, Rishi Raj, 385 Sivamohan, S., 291 Sobhana, M., 331 Sriabirami, V., 855 Sridhar, Gopisetti, 331 Sridhar, S. S., 291 Srihari, K., 59 Srilalitha, N. S., 477 Srinivas, P. V. V. S., 1 Sujatha, T., 1061 Suman, S., 965 Surendran, S., 463 Suresh Babu, Ch., 519 Suresh, Yeresime, 453, 943 Suri, Ashish, 1007 Swarup Kumar, J. N. V. R., 519 Swathi Chandana, J., 529
T Talegaon, Naveen S., 837 Tanvir, Md Shahnewaz, 345 Tanwar, Priya, 115 Tayyab, Md., 825 Tergundi, Parimal, 441 Thakral, Manish, 385 Thakur, Shruti A., 895 Thangavelu, S., 743 Thiyagu, T., 321 Tulasi Ratnakar, P., 869
1076 U Uday Vishal, N., 869 Ugalde, Bernard H., 1061 Uma, M., 553
V Vaishali, P. Kadam, 921 Venkateswara Rao, Ch., 519 Verma, Shailesh, 975
Author Index Vignan, N. Gunadeep, 825 Vijayalakshmi, K., 17 Vijetha, N., 801 Vinodhini, M., 541 Viswanadham, Y. K., 519 Viswasom, Sanoj, 159
Y Yohapriyaa, M., 553