145 47 25MB
English Pages 889 [855] Year 2022
Lecture Notes in Networks and Systems 436
V. Suma Zubair Baig Selvanayaki Kolandapalayam Shanmugam Pascal Lorenz Editors
Inventive Systems and Control Proceedings of ICISC 2022
Lecture Notes in Networks and Systems Volume 436
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
More information about this series at https://link.springer.com/bookseries/15179
V. Suma · Zubair Baig · Selvanayaki Kolandapalayam Shanmugam · Pascal Lorenz Editors
Inventive Systems and Control Proceedings of ICISC 2022
Editors V. Suma Department of Computer Science and Design Dayananda Sagar College of Engineering Bengaluru, India Selvanayaki Kolandapalayam Shanmugam Department of Mathematics and Computer Science Ashland University Wilmington, NC, USA
Zubair Baig School of Information Technology Deakin University Geelong, VIC, Australia Pascal Lorenz University of Haute Alsace Alsace, France
Department of Mathematics and Computer Science Concordia University Chicago River Forest, IL, USA
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-1011-1 ISBN 978-981-19-1012-8 (eBook) https://doi.org/10.1007/978-981-19-1012-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
We are honoured to dedicate the proceedings to all the participants, organizers, technical program chairs, technical program committee members, and editors of ICISC 2022.
Foreword
On behalf of the conference committee of ICISC 2022, it is my pleasure to write a foreword to the 6th International Conference on Inventive Systems and Control [ICISC 2022], organized by the JCT College of Engineering and Technology on 6– 7 January 2022. ICISC continues the research tradition that has been established over the past four years, to bring together the scientific and research community that is interested in sharing the research ideas and developments along with the technological breakthroughs in the areas of computing, communication, and control. The main aim of ICISC 2022 is to promote discussions about recent research achievements and trends by covering a wide range of topics ranging from data communication, computing networks to control technologies that encompass all the relevant topic in design, modelling, simulation, and analysis. It acts as a platform to encourage the exchange of technological and scientific information. The proceedings of ICISC 2022 includes a total of 61 papers. The submission of papers originated from various countries validating the international nature of the conference. The quality of papers accepted for presentation at the conference and publication in the proceedings is guaranteed by the Paper Review Board and Technical Program Committee members. ICISC is constantly encouraging research scholars’ participation and their interaction with leading professional experts and scientists from research institutes, universities, and industries. The technical sessions of ICISC 2022 complement a special keynote session from renowned research experts.
vii
viii
Foreword
I thank all the authors, technical program committee members, review board members, session chairs, and authors for their research contributions and effort to support the 6th edition of ICISC. I hope that the conference programme provides you with valuable research insights and will further stimulate research in the fields of computing, communication, and control. Conference Chair Dr. K. Geetha Dean, Academics and Research JCT College of Engineering and Technology Coimbatore, India
Preface
We are pleased to introduce you to the proceedings of the 6th International Conference on Inventive Systems and Control [ICISC 2022], successfully held on 6–7 January 2022. ICISC 2022 has gathered the research experts, scholars, and industrialists across the globe to disseminate and explore the recent advanced research works carried out in the field of intelligent systems and their control to establish an innovative academic exchange among computing and communication researchers. The geographically distributed ICISC 2022 committee consists of various experts, reviewers, and authors, hailing from the area of computing, communication, and control from different parts of the world. With its professional and significant research influence, ICISC is honored to invite three renowned research experts as keynote speakers. We are glad that out of 262 submissions 61 submissions of very high quality were selected. The selected submissions were further compiled into proceedings after rigorous reviewing each manuscript. Moreover, the committee always ensured every paper has gone through the peer review process to meet the international research publication standard. We wish to extend our gratitude to the organizing committee members, distinguished keynote speakers, internal/external reviewers, and all the authors for their continued support towards the conference. We would also be very thankful to Springer publications for publishing these proceedings. The readers will be highly benefitted by gaining state-of-the-art research knowledge from ICISC 2022 proceedings. We also expect the same overwhelming response from scholars and experts across the globe to join the international conference, which will be organized in the upcoming years. Guest Editors—ICISC 2022 Bengaluru, India Wilmington/River Forest, USA Alsace, France
Dr. V. Suma Dr. Selvanayaki Kolandapalayam Shanmugam Dr. Pascal Lorenz
ix
Acknowledgments
We ICISC 2022 extend our gratitude to thank all who made this 6th edition of the conference event amidst the pandemic situation across the globe. Moreover, we are immensely pleased to thank the institution JCT College of Engineering and Technology, Tamil Nadu, India, for the support and timely assistance during the conference event. The Guest Editors are very grateful to all the internal/external reviewers, faculties, and committee members for delivering extended support to the authors and participants, who dedicated their research efforts in submitting high-quality manuscripts to the conference. Furthermore, we particularly acknowledge the efforts made by Dr. K. Geetha, who leveraged immense support to all the stages of the conference event right from submission, review to publication. We are also grateful to the conference keynote speaker who delivered valuable research insights and expertise to assist the future research directions of the conference attendees. We also extend our gratitude to the advisory and review committee members for their valuable suggestions and timely reviews that improved the quality of the submitted manuscripts to ICISC 2022. Further, we also extend our appreciation to all the potential authors, who contributed their research work to enhance the publication quality of 6th ICISC 2022. Also, we would like to thank all the session chairs and organizing committees for contributing their tireless continuous efforts to this successful conference event. Finally, we are pleased to thank Springer Publications for their guidance throughout this publication process.
xi
Contents
Wind Turbine Alarm Management with Artificial Neural Networks . . . . Isaac Segovia, Pedro José Bernalte, and Fausto Pedro García Márquez Convolutional Neural Networks as a Quality Control in 4.0 Industry for Screws and Nuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diego Ortega Sanz, Carlos Quiterio Gómez Muñoz, and Fausto Pedro García Márquez Proposal of an Efficient Encryption Algorithm for Securing Communicated Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jouri Alharbi, Shatha Alluhaybi, Rahaf Alsaiari, and Raja Aljalouji Discernment of Unsolicited Internet Spamdexing Using Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Apeksha Kulkarni, Devansh Solani, Preet Sanghavi, Achyuth Kunchapu, M. Vijayalakshmi, and Sindhu Nair Unique Authentication System in End-To-End Encrypted Chat Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Pandey, T. Rajput, R. Parikh, A. Gandhi, and K. Rane
1
13
31
49
65
Vaccination Reminder System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vemulakonda Gayatri, Sunkollu Sai Chandu, Sreya Venigalla, Raavi Dinesh Kumar Reddy, and J. Ebenezer
77
A 60 GHz CMOS VCO Adapting Switchable High Q Inductors . . . . . . . . S. K. Hariesh, M. Reshma, O. K. C. Sivakumar Bapu, U. Vijay Gokul, and Karthigha Balamurugan
89
A Comprehensive Survey on Predicting Dyslexia and ADHD Using Machine Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Pavan Kumar Varma Kothapalli, V. Rathikarani, and Gopala Krishna Murthy Nookala
xiii
xiv
Contents
A Decentralised Blockchain-Based Secure Authentication Scheme for IoT Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Effra Altaf Haqani, Zubair Baig, and Frank Jiang Prediction of Type II Diabetes Using Machine Learning Approaches . . . 145 Tasmiah Rahman, Anamika Azad, and Sheikh Abujar Authentication and Security Aspect of Information Privacy Using Anti-forensic Audio–Video Embedding Technique . . . . . . . . . . . . . . . . . . . . 157 Sunil K. Moon Mobile Edge Computing: A Comprehensive Analysis on Computation Offloading Techniques and Modeling Schemes . . . . . . . . 173 I. Bildass Santhosam, Immanuel Johnraja Jebadurai, Getzi Jeba Leelipushpam Paulraj, Jebaveerasingh Jebadurai, and Martin Victor Enhanced Learning Outcomes by Interactive Video Content—H5P in Moodle LMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 S. Rama Devi, T. Subetha, S. L. Aruna Rao, and Mahesh Kumar Morampudi Anti-cancer Drug Response Prediction System Using Stacked Ensemble Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 P. Selvi Rajendran and K. R. Kartheeswari Smart Agent Framework for Color Selection of Wall Paintings . . . . . . . . 219 Mallikarjuna Rao Gundavarapu, Abhinav Bachu, Sai Sashank Tadivaka, G. Saketh Koundinya, and Sneha Nimmala Brain MR Image Segmentation for Tumor Identification Using Hybrid of FCM Clustering and ResNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 P. S. Lathashree, Prajna Puthran, B. N. Yashaswini, and Nagamma Patil A Review on Automated Algorithms Used for Osteoporosis Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Gautam Amiya, Kottaimalai Ramaraj, Pallikonda Rajasekaran Murugan, Vishnuvarthanan Govindaraj, Muneeswaran Vasudevan, and Arunprasath Thiyagarajan Supervised Deep Learning Approach for Generating Dynamic Summary of the Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Mohammed Inayathulla and C. Karthikeyan Design and Implementation of Voice Command-Based Robotic System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Md. Rawshan Habib, Kumar Sunny, Abhishek Vadher, Arifur Rahaman, Asif Noor Tushar, Md Mossihur Rahman, Md. Rashedul Arefin, and Md Apu Ahmed
Contents
xv
Study on Advanced Image Processing Techniques for Remote Sensor Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Md. Rawshan Habib, Abhishek Vadher, Fahim Reza Anik, Md Shahnewaz Tanvir, Md Mossihur Rahman, Md Mahmudul Hasan, Md. Rashedul Arefin, Md Apu Ahmed, and A. M. Rubayet Hossain Implementation and Performance Evaluation of Binary to Gray Code Converter Using Quantum Dot Cellular Automata . . . . . . . . . . . . . . 299 Uttkarsh Sharma, K. Pradeep, N. Samanvita, and Sowmya Raman QoS-Based Classical Trust Management System for the Evaluation of the Trustworthiness of a Cloud Resource . . . . . . . . . . . . . . . . . . . . . . . . . . 323 P. Kumar, S. Vinodh Kumar, and L. Priya A Novel Approach for Early Intervention of Retinal Disorders Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 P. B. Dhanusha, A. Muthukumar, and A. Lakshmi Deep Learning-Based Efficient Detection of COVID-19 . . . . . . . . . . . . . . . 357 Abdul Razim and Mohd Azhan Umar Kamil Handwritten Character Recognition for Tamil Language Using Convolutional Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 S. Vijayalakshmi, K. R. Kavitha, B. Saravanan, R. Ajaybaskar, and M. Makesh Liver Tumor Detection Using CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 S. Vijayalakshmi, K. R. Kavitha, M. Tamilarasi, and R. Soundharya Vehicle Spotting in Nighttime Using Gamma Correction . . . . . . . . . . . . . . 405 Shaik Hafeez Shaheed, Rajanala Sudheer, Kavuri Rohit, Deepika Tinnavalli, and Shahana Bano Application of UPFC for Enhancement of Voltage Stability of Wind-Based DG System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Namra Joshi and Pragya Nema Video Keyframe Extraction Based on Human Motion Detection . . . . . . . . 427 C. Victoria Priscilla and D. Rajeshwari Designing a Secure Smart Healthcare System with Blockchain . . . . . . . . . 443 Neelam Chauhan and Rajendra Kumar Dwivedi Vehicle Classification and Counting from Surveillance Camera Using Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 A. Ayub khan, R. S. Sabeenian, A. S. Janani, and P. Akash Classification of Indian Monument Architecture Styles Using Bi-Level Hybrid Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Srinivasan Kavitha, S. Mohanavalli, B. Bharathi, C. H. Rahul, S. Shailesh, and K. Preethi
xvi
Contents
Sign Language Recognition Using CNN and CGAN . . . . . . . . . . . . . . . . . . . 489 Marrivada Gopala Krishna Sai Charan, S. S. Poorna, K. Anuraj, Choragudi Sai Praneeth, P. G. Sai Sumanth, Chekka Venkata Sai Phaneendra Gupta, and Kota Srikar A Systematic Review on Load Balancing Tools and Techniques in Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Mohammad Haris and Rafiqul Zaman Khan TDMA-Based Adaptive Multicasting in Wireless NoC . . . . . . . . . . . . . . . . . 523 Smriti Srivastava, Adithi Viswanath, Krithika Venkatesh, and Minal Moharir Barriers to the Widespread Adoption of Processing-in-Memory Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 B. Mohammed Siyad and R. Mohan A Deep Multi-scale Feature Fusion Approach for Early Recognition of Jute Diseases and Pests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 Rashidul Hasan Hridoy, Tanjina Yeasmin, and Md. Mahfuzullah Assessment of Cardiovascular System Through Cardiovascular Autonomic Reflex Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 E. S. Selva Priya, L. Suganthi, R. Anandha Praba, and R. Jeyashree Detection of Glaucoma Using HMM Segmentation and Random Forest Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Chevula Maheswari, Gurukumar Lokku, and K. Nagi Reddy A UAV-Based Ad-Hoc Network for Real-Time Assessment and Enforcement of Vehicle Standards in Smart Cities . . . . . . . . . . . . . . . . 599 Vijay A. Kanade ARTFDS–Advanced Railway Track Fault Detection System Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 Lakshmisudha Kondaka, Adwait Rangnekar, Akshay Shetty, and Yash Zawar Recommendation System for Agriculture Using Machine Learning and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 K. SuriyaKrishnaan, L. Charan Kumar, and R. Vignesh A Machine Learning Approach for Daily Temperature Prediction Using Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Usha Divakarla, K. Chandrasekaran, K. Hemanth Kumar Reddy, R. Vikram Reddy, and Manjula Gururaj Keyword Error Detection on Product Title Data Using Approximate Retrieval and Word2Vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 Duc-Hong Pham
Contents
xvii
AI-Based Stress State Classification Using an Ensemble Model-Based SVM Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 Dongkoo Shon, Kichang Im, and Jong-Myon Kim Designing a Secure Vehicular Internet of Things (IoT) Using Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669 Atul Lal Shrivastava and Rajendra Kumar Dwivedi Survey on Machine Learning Algorithm for Leaf Disease Detection Using Image Processing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681 A. Dinesh, M. Maragatharajan, and S. P. Balakannan Deep Neural Networks and Black Widow Optimization for VANETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691 Shazia Sulthana and B. N. Manjunatha Reddy An Efficient Automated Intrusion Detection System Using Hybrid Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703 B. S. Amrutha, I. Meghana, R. Tejas, Hrishikesh Vasant Pilare, and D. Annapurna A Medical Image Enhancement to Denoise Poisson Noises Using Neural Network and Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717 V. Sudha, K. Kalyanasundaram, R. C. S. Abishek, and R. Raja Hand Gesture Recognition Using 3D CNN and Computer Interfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727 Hammad Mansoor, Nidhi Kalra, Piyush Goyal, Muskan Bansal, and Namit Wadhwa Analysis of Prediction Accuracies for Memory Based and Model-Based Collaborative Filtering Models . . . . . . . . . . . . . . . . . . . . . 737 C. K. Raghavendra and K. C. Srikantaiah Emotion Recognition Using Speech Based Tess and Crema Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 P. Chitra, A. Indumathi, B. Rajasekaran, and M. Muni Babu Wireless Data Transferring of Soldier Health Monitoring and Position Tracking System Using Arduino . . . . . . . . . . . . . . . . . . . . . . . . 761 K. SuriyaKrishnaan, Gali Mahendra, D. Sankar, and K. S. Yamuna Analysing and Identifying COVID-19 Risk Factors Using Machine Learning Algorithm with Smartphone Application . . . . . . . . . . . . . . . . . . . 775 Shah Siddiqui, Elias Hossain, S. M. Asaduzzaman, Sabila Al Jannat, Ta-seen Niloy, Wahidur Rahman, Shamsul Masum, Adrian Hopgood, Alice Good, and Alexander Gegov
xviii
Contents
Twitter Bot Detection Using One-Class Classifier and Topic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789 Anupriya Rajkumar, C. Rakesh, M. Kalaivani, and G. Arun Analysis on Skew Detection and Rectification Techniques for Offline Handwritten Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801 Krupashankari S. Sandyal and Y. C. Kiran Building an Intrusion Detection System Using Supervised Machine Learning Classifiers with Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 811 Aamir S. Ahanger, Sajad M. Khan, and Faheem Masoodi Smart Attendance with Real Time Face Recognition . . . . . . . . . . . . . . . . . . 823 G. Aparna and S. Prasanth Vaidya Development of Hybrid Algorithms Using Neural Networks for Early Detection of Glaucoma in Humans and Its Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833 Mahesh B. Neelagar, K. A. Balaji, T. C. Manjunath, and G. Pavithra A Smart Irrigation System Using Plant Maintenance Bot . . . . . . . . . . . . . . 845 R. Seetharaman, M. Tharun, K. Anandan, and S. S. Sreeja Mole Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855
Editors and Contributors
About the Editors Dr. V. Suma has obtained her B.E., M.S. and Ph.D. in Computer Science and Engineering. She in addition to being a Professor and Head in Computer Science and Design is also holding the position of Dean in the Research and Industry Incubation Centre. She is also associated with leading software industries such as Wipro and was associated with Infosys, Exilant Technologies for training their software developers towards the development of customer satisfied products. She is further a Consultant and Research Advisor at National Foundation for Entrepreneurship Development (NFED) which is an NGO whose objective is to uplift women and downtrodden communities. She was associated with Hochschule Hof University, Germany as a mentor for foreign exchange programmes since 2007. She was the Single Point of Contact for the Infosys Campus Connect programme from 2011 February to 2016 June. She was also a recognized mentor in ACM Internet where she was mentoring an International student from Sate Mississippi University for the doctoral degree. She has guided several doctoral students from universities such as VTU, JNTU, Bharatiyar, Jain and in the panel of Ph.D. examiners in universities such as Jain, Anna, JNTU, University of Madras, University of North Maharashtra, University of Botswana, University of Johannesburg, Dr. M. G. R. University, Periyar University, Karpagam University and so on. She has successfully guided 11 Ph.D. scholars from various universities till now and many are in the pipeline. She is responsible for several MOU’s for DSCE with industries such as Aero IT, QSIT, Envids Technologies LLP, Avam Consulting and other universities. Zubair Baig (Senior Member, IEEE) is a Senior Lecturer at the School of Information Technology and Division Lead, IoT and CPS Cyber Security, Strategic Centre for Cyber Security Research and Innovation (CSRI) at Deakin University. He has authored over 95 journal, conference articles and book chapters, as well as 5 white papers. He is also an inventor of 2 Cyber Security Technologies patented by the USPTO. He is currently serving as the editor of 3 international journals, the IET
xix
xx
Editors and Contributors
Wireless Sensor Systems Journal, the PSU—A Review Journal, Emerald Publishing House, and the Journal of Information and Telecommunication, Taylor and Francis. Baig has served on numerous technical program committees of international conferences and has delivered over 20 keynote talks on cyber security. His research interests are in the areas of cyber security, artificial intelligence, critical infrastructures, and the Internet of Things. He has a broad skillset for conducting risk assessments for the IoT, critical infrastructures and sensor networks. He has been successfully funded by the Cyber Security Cooperative Research Centre (CSCRC) and the Australian Department of Home Affairs and the Australian Department of Defence, to conduct cutting-edge research in the field of cyber security. Baig has been listed in the ‘Top 2% of Scientists in the Standford University Global Academic Ranking List for 2021’. Dr. Selvanayaki Kolandapalayam Shanmugam holds a Bachelor’s degree in Mathematics and Masters in Computer Applications from Bharathiar University, and her Master of Philosophy in Computer Science from Bharathidasan University, and a Ph.D. in Computer Science from Anna University. She takes various positions as Teaching Faculty, Research Advisor, Project Co-ordinator in the field of Academics from 2002 in various reputed institutions. She has related to the IT industry for more than 5 years by taking her prestigious role as Business Analyst Consultant. Her primary research interests are in the application of computing and information technologies to problems which impacts societal benefits. Pascal Lorenz received his M.Sc. (1990) and Ph.D. (1994) from the University of Nancy, France. Between 1990 and 1995 he was a research engineer at WorldFIP Europe and at Alcatel-Alsthom. He is a professor at the University of Haute-Alsace, France, since 1995. His research interests include QoS, wireless networks and highspeed networks. He is the author/co-author of 3 books, 3 patents and 200 international publications in refereed journals and conferences. He was Technical Editor of the IEEE Communications Magazine Editorial Board (2000–2006), IEEE Networks Magazine since 2015, IEEE Transactions on Vehicular Technology since 2017, Chair of IEEE ComSoc France (2014–2020), Financial chair of IEEE France (2017–2022), Chair of Vertical Issues in Communication Systems Technical Committee Cluster (2008–2009), Chair of the Communications Systems Integration and Modeling Technical Committee (2003–2009), Chair of the Communications Software Technical Committee (2008–2010) and Chair of the Technical Committee on Information Infrastructure and Networking (2016–2017). He has served as Co-program Chair of IEEE WCNC’2012 and ICC’2004, Executive Vice-Chair of ICC’2017, TPC Vice Chair of Globecom’2018, Panel sessions co-chair for Globecom’16, tutorial chair of VTC’2013 Spring and WCNC’2010, track chair of PIMRC’2012 and WCNC’2014, symposium Co-chair at Globecom 2007–2011, Globecom’2019, ICC 2008–2010, ICC’2014 and ’2016. He has served as Co-Guest Editor for special issues of IEEE Communications Magazine, Networks Magazine, Wireless Communications Magazine, Telecommunications Systems and LNCS. He is associate Editor for International Journal of Communication Systems (IJCS-Wiley), Journal on Security and Communication Networks (SCN-Wiley) and International Journal of Business
Editors and Contributors
xxi
Data Communications and Networking, Journal of Network and Computer Applications (JNCA-Elsevier). He is senior member of the IEEE, IARIA fellow and member of many international program committees. He has organized many conferences, chaired several technical sessions and gave tutorials at major international conferences. He was IEEE ComSoc Distinguished Lecturer Tour during 2013–2014.
Contributors R. C. S. Abishek Sona College of Technology, Salem, India Sheikh Abujar Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh Aamir S. Ahanger Department of Computer Science, University of Kashmir, Srinagar, India Md Apu Ahmed Chemnitz University of Technology, Chemnitz, Germany R. Ajaybaskar Department of ECE, Sona College of Technology, Salem, Tamil Nadu, India P. Akash Electronics and Communication Engineering, Sona College of Technology, Salem, Tamil Nadu, India Sabila Al Jannat Time Research and Innovation (Tri), Southampton, UK; Time Research and Innovation (Tri), Khilgaon Dhaka, Bangladesh Jouri Alharbi Department of Cyber Security and Forensic Computing, University of Prince Mugrin, Madinah, Saudi Arabia Raja Aljalouji Department of Cyber Security and Forensic Computing, University of Prince Mugrin, Madinah, Saudi Arabia Shatha Alluhaybi Department of Cyber Security and Forensic Computing, University of Prince Mugrin, Madinah, Saudi Arabia Rahaf Alsaiari Department of Cyber Security and Forensic Computing, University of Prince Mugrin, Madinah, Saudi Arabia Gautam Amiya Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India B. S. Amrutha Department of Computer Science and Engineering, PES University, Bengaluru, India K. Anandan PS Division of E&E, TUV SUD South Asia Pvt. Ltd., Chennai, India R. Anandha Praba BioMedical Engineering, SSN College of Engineering, OMR Road, Chennai, Tamilnadu, India
xxii
Editors and Contributors
Fahim Reza Anik Ahsanullah University of Science and Technology, Dhaka, Bangladesh D. Annapurna Department of Computer Science and Engineering, PES University, Bengaluru, India K. Anuraj Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India G. Aparna Aditya Engineering College, Surampalem, India G. Arun Solartis Technology Services Pvt. Ltd., Madurai, India S. L. Aruna Rao Department of Information Technology, BVRIT HYDERABAD College of Engineering for Women, Hyderabad, India S. M. Asaduzzaman Time Research and Innovation (Tri), Southampton, UK; Time Research and Innovation (Tri), Khilgaon Dhaka, Bangladesh A. Ayub khan Electronics and Communication Engineering, Sona College of Technology, Salem, Tamil Nadu, India Anamika Azad Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh M. Muni Babu Department of CSE, IIIT R. K. Valley, Rajiv Gandhi University of Knowledge Technology (RGUKT), Idupulapaya, Andha Pradesh, India Abhinav Bachu Department of CSE, GRIET, Hyderbad, India Zubair Baig School of IT, Deakin University, Geelong, VIC, Australia K. A. Balaji Electronics and Communication Engineering, Presidency University, Bangalore, India S. P. Balakannan Department of Information Technology, Kalasalingam Academy of Research and Education, Krishnankoil, India Karthigha Balamurugan Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India Shahana Bano Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, India Muskan Bansal Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab, India Pedro José Bernalte Ingenium Research Group, University of Castilla-La Mancha, Ciudad Real, Spain B. Bharathi Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, India
Editors and Contributors
xxiii
K. Chandrasekaran National Institute of Technology Karnataka, Surathkal, Karnataka, India Sunkollu Sai Chandu Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India Marrivada Gopala Krishna Sai Charan Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India Neelam Chauhan Department of Information Technology and Computer Application, MMMUT, Gorakhpur, India P. Chitra Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed to be University), Gandhigram, Dindigul, Tamil Nadu, India P. B. Dhanusha Department of ECE, Kalasalingam Academy of Research and Education, Srivilliputhur, Tamilnadu, India A. Dinesh Department of Information Technology, Kalasalingam Academy of Research and Education, Krishnankoil, India Usha Divakarla NMAM Institute of Technology, Nitte, Karkala, Karnataka, India Rajendra Kumar Dwivedi Department of Information Technology and Computer Application, MMMUT, Gorakhpur, India J. Ebenezer Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India A. Gandhi Department of Computer Engineering, K. J. Somaiya College of Engineering, Mumbai, India Vemulakonda Gayatri Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India Alexander Gegov Faculty of Technology, University of Portsmouth (UoP), Lion Terrace, Portsmouth, UK Alice Good Faculty of Technology, University of Portsmouth (UoP), Lion Terrace, Portsmouth, UK Vishnuvarthanan Govindaraj Department of Biomedical Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India Piyush Goyal Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab, India Mallikarjuna Rao Gundavarapu Department of CSE, GRIET, Hyderbad, India Chekka Venkata Sai Phaneendra Gupta Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India
xxiv
Editors and Contributors
Manjula Gururaj NMAM Institute of Technology, Nitte, Karkala, Karnataka, India Carlos Quiterio Gómez Muñoz HCTLab Research Group, Electronics and Communications Technology Department, Universidad Autónoma de Madrid, Madrid, Spain Effra Altaf Haqani School of IT, Deakin University, Geelong, VIC, Australia S. K. Hariesh Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India Mohammad Haris Aligarh Muslim University, Aligarh, UP, India Adrian Hopgood Faculty of Technology, University of Portsmouth (UoP), Lion Terrace, Portsmouth, UK Elias Hossain Time Research and Innovation (Tri), Southampton, UK; Time Research and Innovation (Tri), Khilgaon Dhaka, Bangladesh Rashidul Hasan Hridoy Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh Kichang Im ICT Convergence Safety Research Center, University of Ulsan, Ulsan, South Korea Mohammed Inayathulla Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, India A. Indumathi Department of Computer Applications, Kongunadu Arts and Science College, Coimbatore, India A. S. Janani Electronics and Communication Engineering, Sona College of Technology, Salem, Tamil Nadu, India Immanuel Johnraja Jebadurai Department of CSE, Karunya Institute of Technology and Sciences, Coimbatore, India Jebaveerasingh Jebadurai Department of CSE, Karunya Institute of Technology and Sciences, Coimbatore, India R. Jeyashree Electronics and Communication Engineering, Meenakshi College of Engineering, Chennai-78, Tamilnadu, India Frank Jiang School of IT, Deakin University, Geelong, VIC, Australia Namra Joshi SVKM’s Institute of Technology, Dhule, Maharashtra, India M. Kalaivani Nippon Telegraph and Telephone Public Corporation, Chennai, India Nidhi Kalra Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab, India
Editors and Contributors
xxv
K. Kalyanasundaram Sona College of Technology, Salem, India Mohd Azhan Umar Kamil Department of Computer Engineering, Aligarh Muslim University, Aligarh, India Vijay A. Kanade Pune, India K. R. Kartheeswari Department of Computer Science & Engineering, School of Computing, Hindustan Institute of Technology and Science, Chennai, India C. Karthikeyan Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, India K. R. Kavitha Department of ECE, Sona College of Technology, Salem, Tamil Nadu, India Srinivasan Kavitha Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, India Rafiqul Zaman Khan Aligarh Muslim University, Aligarh, UP, India Sajad M. Khan Department of Computer Science, University of Kashmir, Srinagar, India Jong-Myon Kim Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan, Korea Y. C. Kiran Global Academy of Technology, Bangalore, India Lakshmisudha Kondaka Department of Information Technology, SIES Graduate School of Technology, Navi Mumbai, India Pavan Kumar Varma Kothapalli Department of Computer Science and Engineering, FEAT, Annamalai University, Chidambaram, India G. Saketh Koundinya SAP Consultant, Bangalore, India Apeksha Kulkarni Vivekanand Education Society’s Institute of Technology, Mumbai, India L. Charan Kumar Sona College of Technology, Salem, Tamil Nadu, India P. Kumar Rajalakshmi Engineering College, Chennai, India Achyuth Kunchapu Vellore Institute of Technology, Vellore, India A. Lakshmi Department of ECE, Ramco Institute of Technology, Rajapalayam, Tamilnadu, India P. S. Lathashree Department of Information Technology, National Institute of Technology, Surathkal, Karnataka, India Gurukumar Lokku J.N.T. University Ananthapuramu, Anantapur, A.P., India Gali Mahendra Sona College of Technology, Salem, Tamil Nadu, India
xxvi
Editors and Contributors
Chevula Maheswari Department of ECE, N.B.K.R. Institute of Science and Technology, S.P.S.R., Nellore, A.P., India Md. Mahfuzullah Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh Md Mahmudul Hasan South Dakota School of Mines and Technology, Rapid City, USA M. Makesh Department of ECE, Sona College of Technology, Salem, Tamil Nadu, India T. C. Manjunath ECE Department, Dayananda Sagar College of Engineering, Bangalore, India Hammad Mansoor Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab, India M. Maragatharajan School of Computing Science and Engineering, VIT Bhopal University, Bhopal, India Faheem Masoodi Department of Computer Science, University of Kashmir, Srinagar, India Shamsul Masum Faculty of Technology, University of Portsmouth (UoP), Lion Terrace, Portsmouth, UK I. Meghana Department of Computer Science and Engineering, PES University, Bengaluru, India B. Mohammed Siyad Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, India R. Mohan Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, India S. Mohanavalli Department of Information Technology, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, India Minal Moharir Department of CSE, RVCE, Bengaluru, India S. S. Sreeja Mole Department of ECE, Christu Jyothi Institute of Technology & Science, Jangaon, India Sunil K. Moon Department of Electronics and Telecommunication, SCTRs Pune Institute of Computer Technology, (PICT), Pune, India Mahesh Kumar Morampudi Department of Information Technology, BVRIT HYDERABAD College of Engineering for Women, Hyderabad, India; Department of Computer Science and Engineering, SRM University AP, Andhra Pradesh, India
Editors and Contributors
xxvii
Pallikonda Rajasekaran Murugan Department of Electronics and Communication Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India A. Muthukumar Department of ECE, Kalasalingam Academy of Research and Education, Srivilliputhur, Tamilnadu, India Fausto Pedro García Márquez Ingenium Research Group, University of CastillaLa Mancha, Ciudad Real, Spain K. Nagi Reddy Department of ECE, N.B.K.R. Institute of Science and Technology, S.P.S.R., Nellore, A.P., India Sindhu Nair Dwarkadas J. Sanghvi College of Engineering, Mumbai, India Mahesh B. Neelagar ECE, Department of PG Studies (VLSIDES), VTU Campus, Belagavi, Karnataka, India Pragya Nema Oriental University, Indore, Madhya Pradesh, India Ta-seen Niloy Time Research and Innovation (Tri), Southampton, UK; Time Research and Innovation (Tri), Khilgaon Dhaka, Bangladesh Sneha Nimmala Department of CSE, GRIET, Hyderbad, India Gopala Krishna Murthy Nookala SRKR Engineering College, Bhimavaram, India Diego Ortega Sanz HCTLab Research Group, Electronics and Communications Technology Department, Universidad Autónoma de Madrid, Madrid, Spain S. Pandey Department of Computer Engineering, K. J. Somaiya College of Engineering, Mumbai, India R. Parikh Department of Computer Engineering, K. J. Somaiya College of Engineering, Mumbai, India Nagamma Patil Department of Information Technology, National Institute of Technology, Surathkal, Karnataka, India Getzi Jeba Leelipushpam Paulraj Department of CSE, Karunya Institute of Technology and Sciences, Coimbatore, India G. Pavithra ECE Department, Dayananda Sagar College of Engineering, Bangalore, India Duc-Hong Pham Faculty of Information Technology, Electric Power University, Hanoi, Vietnam Hrishikesh Vasant Pilare Department of Computer Science and Engineering, PES University, Bengaluru, India S. S. Poorna Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India
xxviii
Editors and Contributors
K. Pradeep Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India Choragudi Sai Praneeth Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India S. Prasanth Vaidya Aditya Engineering College, Surampalem, India K. Preethi Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, India L. Priya Rajalakshmi Engineering College, Chennai, India Prajna Puthran Department of Information Technology, National Institute of Technology, Surathkal, Karnataka, India C. K. Raghavendra Department of Computer Science and Engineering, SJB Institute of Technology, Bangalore, Visvesvaraya Technological University, Belagavi, Karnataka, India; Department of Computer Science and Engineering, B N M Institute of Technology, Bangalore, Karnataka, India Arifur Rahaman University of Science and Technology Chittagong, Chittagong, Bangladesh Md Mossihur Rahman Islamic University of Technology, Gazipur, Bangladesh Tasmiah Rahman Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh Wahidur Rahman Time Research and Innovation (Tri), Southampton, UK; Time Research and Innovation (Tri), Khilgaon Dhaka, Bangladesh C. H. Rahul Freshworks, Chennai, India R. Raja Muthayammal Engineering College, Rasipuram, India B. Rajasekaran Department of Electronics and Communication Engineering, Vinayaka Mission’s Kirupananda Variyar Engineering College, Salem, Vinayaka Mission’s Research Foundation (Deemed to be University), Salem, India P. Selvi Rajendran Department of Computer Science & Engineering, School of Computing, Hindustan Institute of Technology and Science, Chennai, India D. Rajeshwari Department of Computer Science, Shrimathi Devkunvar Nanalal Bhatt Vaishnav College for Women, University of Madras, Chennai, TamilNadu, India Anupriya Rajkumar CSE Department, Dr. Mahalingam College of Engineering and Technology, Pollachi, India T. Rajput Department of Computer Engineering, K. J. Somaiya College of Engineering, Mumbai, India C. Rakesh Cognizant Technology Solutions, Coimbatore, India
Editors and Contributors
xxix
S. Rama Devi Department of Information Technology, BVRIT HYDERABAD College of Engineering for Women, Hyderabad, India Sowmya Raman Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India Kottaimalai Ramaraj Department of Electronics and Communication Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India K. Rane Department of Computer Engineering, K. J. Somaiya College of Engineering, Mumbai, India Adwait Rangnekar Department of Information Technology, SIES Graduate School of Technology, Navi Mumbai, India Md. Rashedul Arefin Ahsanullah University of Science and Technology, Dhaka, Bangladesh V. Rathikarani Department of Computer Science and Engineering, FEAT, Annamalai University, Chidambaram, India Md. Rawshan Habib Murdoch University, Murdoch, Australia Abdul Razim Department of Electronics Engineering, Aligarh Muslim University, Aligarh, India B. N. Manjunatha Reddy Department of Electronics and Communication Engineering, Global Academy of Technology, Bengaluru, India K. Hemanth Kumar Reddy NMAM Institute of Technology, Nitte, Karkala, Karnataka, India R. Vikram Reddy NMAM Institute of Technology, Nitte, Karkala, Karnataka, India Raavi Dinesh Kumar Reddy Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India M. Reshma Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India Kavuri Rohit Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, India A. M. Rubayet Hossain Ahsanullah University of Science and Technology, Dhaka, Bangladesh R. S. Sabeenian Electronics and Communication Engineering, Sona College of Technology, Salem, Tamil Nadu, India
xxx
Editors and Contributors
N. Samanvita Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India Krupashankari S. Sandyal Dayananda Sagar College of Engineering, Bangalore, India Preet Sanghavi Dwarkadas J. Sanghvi College of Engineering, Mumbai, India D. Sankar Sona College of Technology, Salem, Tamil Nadu, India I. Bildass Santhosam Department of CSE, Karunya Institute of Technology and Sciences, Coimbatore, India B. Saravanan Department of ECE, Sona College of Technology, Salem, Tamil Nadu, India R. Seetharaman Department of ECE, CEG Campus, Anna University, Chennai, India Isaac Segovia Ingenium Research Group, University of Castilla-La Mancha, Ciudad Real, Spain E. S. Selva Priya BioMedical Engineering, SSN College of Engineering, OMR Road, Chennai, Tamilnadu, India Shaik Hafeez Shaheed Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, India S. Shailesh Ajira Tech of Symbion Technologies, Chennai, India Uttkarsh Sharma Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India Akshay Shetty Department of Information Technology, SIES Graduate School of Technology, Navi Mumbai, India Dongkoo Shon Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan, Korea Atul Lal Shrivastava Department of Information Technology and Computer Application, MMMUT, Gorakhpur, India Shah Siddiqui Time Research and Innovation (Tri), Southampton, UK; Time Research and Innovation (Tri), Khilgaon Dhaka, Bangladesh; Faculty of Technology, University of Portsmouth (UoP), Lion Terrace, Portsmouth, UK O. K. C. Sivakumar Bapu Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India Devansh Solani Nagindas Khandwala College, Mumbai, India
Editors and Contributors
xxxi
R. Soundharya Department of ECE, Sona College of Technology, Salem, Tamil Nadu, India K. C. Srikantaiah Department of Computer Science and Engineering, SJB Institute of Technology, Bangalore, Visvesvaraya Technological University, Belagavi, Karnataka, India Kota Srikar Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India Smriti Srivastava Department of CSE, RVCE, Bengaluru, India T. Subetha Department of Computer Science and Engineering, SRM University AP, Andhra Pradesh, India V. Sudha Sona College of Technology, Salem, India Rajanala Sudheer Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, India L. Suganthi BioMedical Engineering, SSN College of Engineering, OMR Road, Chennai, Tamilnadu, India Shazia Sulthana Department of Electronics and Communication Engineering, Global Academy of Technology, Bengaluru, India P. G. Sai Sumanth Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India Kumar Sunny University of Science and Technology Chittagong, Chittagong, Bangladesh K. SuriyaKrishnaan Sona College of Technology, Salem, Tamil Nadu, India Sai Sashank Tadivaka Department of CSE, GRIET, Hyderbad, India M. Tamilarasi Department of ECE, Sona College of Technology, Salem, Tamil Nadu, India Md Shahnewaz Tanvir South Dakota School of Mines and Technology, Rapid City, USA R. Tejas Department of Computer Science and Engineering, PES University, Bengaluru, India M. Tharun Department of ECE, CEG Campus, Anna University, Chennai, India Arunprasath Thiyagarajan Department of Biomedical Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India Deepika Tinnavalli Department of Computer Science, George Mason University, Fairfax, USA
xxxii
Editors and Contributors
Asif Noor Tushar University of Science and Technology Chittagong, Chittagong, Bangladesh Abhishek Vadher Murdoch University, Murdoch, Australia Muneeswaran Vasudevan Department of Electronics and Communication Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India Sreya Venigalla Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India Krithika Venkatesh Department of CSE, RVCE, Bengaluru, India Martin Victor Department of CSE, Karunya Institute of Technology and Sciences, Coimbatore, India C. Victoria Priscilla Department of Computer Science, Shrimathi Devkunvar Nanalal Bhatt Vaishnav College for Women, University of Madras, Chennai, TamilNadu, India R. Vignesh Sona College of Technology, Salem, Tamil Nadu, India U. Vijay Gokul Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India M. Vijayalakshmi Dwarkadas J. Sanghvi College of Engineering, Mumbai, India S. Vijayalakshmi Department of ECE, Sona College of Technology, Salem, Tamil Nadu, India S. Vinodh Kumar Rajalakshmi Engineering College, Chennai, India Adithi Viswanath Department of CSE, RVCE, Bengaluru, India Namit Wadhwa Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab, India K. S. Yamuna Sona College of Technology, Salem, Tamil Nadu, India B. N. Yashaswini Department of Information Technology, National Institute of Technology, Surathkal, Karnataka, India Tanjina Yeasmin Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh Yash Zawar Department of Information Technology, SIES Graduate School of Technology, Navi Mumbai, India
Wind Turbine Alarm Management with Artificial Neural Networks Isaac Segovia, Pedro José Bernalte, and Fausto Pedro García Márquez
Abstract Wind turbines require suitable maintenance management operations. Supervisory control and data acquisition system is widely applied in the industry since large volumes of data are obtained, being necessary the application of advanced analysis techniques. The analysis of the alarms is a critical phase in the industry. This work proposes a novel approach to analyse the alarm activations. It combined statistical methods and deep learning algorithms to increase the reliability of the analysis. Initial data filtering, principal component analysis and correlations are applied to increase the reliability of the neural network. It proposed a case study formed by data from a real wind turbine, and the results demonstrate an increment in the reliability of the artificial neural network. Keywords Wind turbines · Alarm · Neural network · SCADA
1 Introduction Nowadays, wind energy is one of the most relevant renewable energy sources. The wind energy is expected to play a fundamental role in the energy generation due to increasing size of wind turbines (WTs) and a significant growth in nominal power generation capacity. Wind energy has been growing significantly in recent years, and it is projected to reach the 5800 GB by 2050 [1]. The WT is formed by several subsystems, e.g. blades, pitch system, main shaft bearing, gearbox, generator and different electrical and security systems. WTs operate under extreme environmental conditions due to variable loads and hard weather conditions that can lead to mechanical stress or different failures. The I. Segovia (B) · P. J. Bernalte · F. P. G. Márquez Ingenium Research Group, Universiy of Castilla-La Mancha, Ciudad Real, Spain e-mail: [email protected] P. J. Bernalte e-mail: [email protected] F. P. G. Márquez e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_1
1
2
I. Segovia et al.
gearbox, generator, brake, hydraulics and yaw system provide the most elevated failure rate with the 60% of the total failures [2]. These faults produce high costs due to maintenance activities and downtimes. The operation and maintenance costs are considered between 15 and 22% of the total costs of power generation, reaching the 35% in different scenarios [3]. WTs are designed for 20 years, and it has been demonstrated that WTs suffer a regular loss of production, so it required high level of maintenance to ensure a cost-effective energy generation. The reliability-centred maintenance based on monitoring is widely implanted in the industry, and it is focused on the reduction of maintenance activities improving the reliability of critical components of the WT [4, 5]. It required the development of suitable maintenance management plans to increase the life cycle with the minimum consumption of resources [6, 7]. The real state of the WT is determined with several types of condition monitoring systems (CMS) that acquire data from critical components for fault diagnosis [8, 9]. Commercial industrial CMS is based on thermography, vibration and acoustic signals, ultrasonic testing, oil analysis and electric monitoring, among others [10– 12]. Data acquisition systems provides different variables to determine the condition of the system. The supervisory control and data acquisition (SCADA) is introduced in WTs due to the high effectiveness in the health monitoring. SCADA acquires data from different CMS, obtaining two types of data: alarms and signals [13, 14]. The alarm is an operational warning message designed to be activated when certain predetermined type of failures or anomalies is produced [15, 16]. The false alarm development is one of the most critical factors in WT maintenance management since the unnecessary downtimes increase the maintenance costs [17, 18]. False alarms are triggered although the WT has a normal behaviour without real failures [19, 20]. They are produced by issues in the connection of the sensors or inefficient SCADA models. It is demonstrated that around the 12% of the alarms are reset automatically so the classification and identification of the alarms are fundamental to detect the false alarms [21]. Due to the amount of alarms activated, several approaches and algorithms have been introduced to increase the useful information obtained from SCADA [22]. New methods are required due to the current needs for automating diagnostic techniques to develop faster and reliable diagnostics [23]. Machine learning algorithms are widely employed in the industry due to the processing of large amounts of data and the ability to find patterns by means of several types of computational methods [24, 25]. Support vector machine (SVM) is a supervised machine learning algorithm that reduces probabilities employing data regression with statistical learning [26]. Artificial neural network (ANN) is one of the most applied machine learning methods, due to adaptive learning, development of their own structure, capability to work with noisy dataset and easy implementation in several application fields, e.g. image processing or pattern recognition [27]. They are computational methods based on processing units called neurons. ANNs are formed by several layers with weighted connections between layers and neurons, whose computational nods are called hidden neurons.
Wind Turbine Alarm Management with Artificial Neural Networks
3
The output Ok is defined in (1): Ok = σ
i
Wr s xs + Trhidden layer
(1)
s=1
where the transfer function defined with σ (); i the number of input neurons; Wr s is the matrix of the connection weights; xs defines the inputs and Tihidden the tolerance of the hidden neurons. The main characteristics of ANNs are the training method, the connection between input and output data and the topology of the network. The training is supervised when the output of the network is known and defined by the user. ANNs receives the data, and the training process adjusts the weight of the interconnection. The multilayer perceptron (MLP) showed in Fig. 1 is one of the most commonly used ANNs. This network needs a small training requirement with high operation rate. MLP is a feedforward ANN with three different layers: input, hidden and output [28]. MLP needs a supervised learning with variations in the connection weights of each neuron. In the wind energy industry, ANNs are employed to pattern classification, fault detection and diagnosis, forecasting and design and control optimization [29]. The detection of false alarms with ANNs has been studied by several authors. Marugán and García [30] used ANNs to make prediction of the alarm activations. False alarms were reduced in reference [31] employing ANNs to develop models for predicting Fig. 1 MLP ANN diagram
4
I. Segovia et al.
of bearing faults. Zhao et al. developed an ANN with an adaptive threshold to filter false alarms. Marugán et al. [32] developed an ANN formed by three different MLPs for false alarm detection, obtaining a precision around 80%. Kavaz and Barutcu [33] developed a method with ANNs to detect calibration drifts and prevent false alarms. A new ANN architecture is developed in reference [34] to increase the reliability of false alarms detection. The analysis of the literature shows that the reliability of ANNs is influenced by large datasets, increasing the computational load of the operations [35]. This work proposes a novel approach to reduce the dataset form WT combining statistical algorithms and ANNs. It is pretended to reduce the redundant information and increase the reliability of ANN. The initial phase is based on the critical alarm definition and the analysis of alarm activations to select suitable periods for the study. The reduction of the initial signal dataset is obtained with correlation and principal component analysis (PCA) according to the critical alarm defined in previous phase. The methodology is tested with a real case study defined with data from a real WT. The results obtained with ANNs with original and filtered dataset are comparing and demonstrating the reliability of the procedure. This work is divided into three sections: Sect. 2 characterizes the approach with the definition of the algorithms and procedures. In Sect. 3, the real case study with the analysis of the results is presented.
2 Methodology This work presents a novel approach to increase the reliability of ANNs in the SCADA analysis by means of selecting reliable ranges of the dataset. The flow chart of the methodology is proposed in Fig. 2. The SCADA dataset is divided into alarm and signals. The initial procedure is the critical alarm determination. The critical alarm is defined by three key factors although operators may introduce their own criteria: the total number of alarm activations, the maximum and the average alarm period and the mean period without alarms. These variables are displayed graphically with Pareto chart to identify the potential alarms as critical alarms. The next phase is the filtering process based on the period between activations. The alarm deactivation period is divided into two periods: the safe and possible alarm activation period. The safe period presents reduced probability to cause the alarm activation. The period just before the alarm activation has higher probability to show the real causes of the triggering. Different studies [36] consider one day before the failure as a suitable period to analyse the failures, and this range is applied in this work. This value will be a threshold to filter the alarm activations: if the alarm activation does not accomplish this range, it is not considered into the study. This filtering process is included because it is possible to find several activations in succession without enough time to be analysed. These activations are
Wind Turbine Alarm Management with Artificial Neural Networks
5 SCADA Data
Alarm Activated
Non Interest Period
Critical Alarm Definition
Safe Period Alarm Deactivated
Alarm Activated
Possible Alarm Activation Period
Filtering Process
Non Interest Period
Enough range in the alarm?
TEMPORAL SCALE
Initial Signal Dataset
Initial Alarm Dataset
YES
NO
Filtered Alarm Dataset
Alarm Discarded
P-Values and correlations
Signals related to the Alarms
Regions of Alarm Causes
Filtered Signal Dataset
PCA
ANN
Fig. 2 Flow chart of the approach
considered as false alarms. With this procedure, it obtained the definitive filtered alarm dataset. The signals obtained with SCADA provide numerical values about the temperature of critical components, mechanical indicators or electric behaviour, among others. The selection of signals related to the critical alarm is fundamental to determine the real causes of the alarm activation and detect possible patterns. This
6
I. Segovia et al.
approach applies P-values and correlation to reduce the number of signals. The P-values show the degree of similarity of the data to the predicted pattern. Reduced P-values show that the data are not close to the hypothesis. The test statistic T is shown in (2). y2 − y1 T∗ = 2 sp s2 + np2 n1
(2)
where y2 and y1 are the population means, s 2p is the sample variance, and n 1 and n 2 are two independent samples. The P-value is defined by (3): Pvalue = 2 · Prob(tn 1 +n 2 −2 > T ∗ )
(3)
where tn 1 +n 2 −2 is a random variable from the t distribution. The P-value threshold is stablished in 0.05, and this value will be employed in this work [37]. PCA is applied to reduce the dimensions of the data, maintaining the relevant information and the patterns. This method develops a multivariate analysis to reduce the dimensions of the dataset [38]. The new dataset is structured with the principal components formed by the weighted average of the initial variables. The covariance matrix Si j identified with (4) defines the matrix W of weights, n Si j =
k=1 (x ik
− xi )(x jk − xj ) n−1
(4)
The initial dataset X has n × p dimensions, where n observations are defined in rows and p variables structured in columns. The principal component is established with (5): Yrq = w1r X 1q + w2r X 2q + · · · + w pr X pq
(5)
where w1r , w2r ,…w pr are the weights of the linear correlation defined by PCA. Once this phase is accomplished, the dataset is prepared to be introduced in the ANN to analyse the alarm activations. This work applies a MLP neural network with ten hidden layers.
3 Real Case Study and Analysis of the Results The real case study employs data from SCADA system of a 2 MW WT with an acquisition rate of 1 min. The initial dataset is formed by 215 alarms and 98 signals.
Wind Turbine Alarm Management with Artificial Neural Networks
Activations
14
7
Total Activations
104
100%
12
85%
10
71%
8
57%
6
43%
4
28%
2
14%
0
Alarm 8
Alarm 5
Alarm 7
Alarm 9
Alarm 16
0%
Alarms
Fig. 3 Maximum alarm activations
The data were collected for 1 year, and the total number of values is around 200 million. Pareto chart is applied for determining the critical alarm. For this case study, the total number of activations, see Fig. 3, is the most significant factor in the critical alarm definition. The critical alarm for this case study is the alarm 8 related to issues in the rotation of the generator. This alarm was activated around 11,000 times. The threshold to filter de alarm activations is stablished in one day, and only 12 alarms activations accomplished this condition. The signal dataset is obtained applying correlation and p-values. For this case, the new signal dataset is defined with 11 signals related to the critical alarm. PCA is used to reduce the dimensionality of this dataset, being necessary two principal components to define the 99% of the data. The validity of the approach is tested analysing the results of the MLP neural network with the original dataset and the filtered dataset with the methodology developed in this work. The MLP neural network is formed by 20 hidden layers. The training set is developed with the 70% of the total amount of data, and the rest is divided in equal parts between validation and tests. The performance of training, validation and test of the ANN with original dataset is shown in Fig. 4a, and the performance of the filtered dataset is shown in Fig. 4b. Both ANNs have developed similar number of cycles called epochs, but the ANN with the filtered dataset provides better performance with a high stabilization of the validation process, due to the filtering of redundant dataset. The number of cases properly identified by the network in the filtered dataset is increased in comparison with the original dataset scenario. It is concluded that around the 7% of the activations analysed are determined as false alarms.
8
I. Segovia et al.
Fig. 4 a ANN performance with original dataset. b ANN performance with the application of the approach
4 Conclusions Wind energy systems require new techniques to obtain reliable information. Supervisory control and data acquisition system provides large amounts of data generated from several types of condition monitoring systems. New efficient algorithms are required to increase the reliability of the data analysis. This work proposes an approach to reduce the dataset based on the combination between statistical algorithms and artificial neural network. Pareto chart is applied to select the critical alarm, and the signal dataset is defined with correlations and p-values. The dimensionality of the signals is reduced with principal component analysis. The validity of
Wind Turbine Alarm Management with Artificial Neural Networks
9
the approach is tested with a real case study based on real data from wind turbines. The performance of the artificial neural network with the initial and filtered dataset is compared. The results demonstrated an increment in the reliability of the neural network. Acknowledgements The work reported herewith has been financially by the Dirección General de Universidades, Investigación e Innovación of Castilla-La Mancha, under Research Grant ProSeaWind project (Ref.: SBPLY/19/180501/000102).
References 1. Arshad M, O’Kelly B (2019) Global status of wind power generation: theory, practice, and challenges. Int J Green Energy 16:1073–1090 2. Arabian-Hoseynabadi H, Oraee H, Tavner P (2010) Failure modes and effects analysis (FMEA) for wind turbines. Int J Electr Power Energy Syst 32:817–824 3. Tchakoua P, Wamkeue R, Ouhrouche M, Slaoui-Hasnaoui F, Tameghe TA, Ekemb G (2014) Wind turbine condition monitoring: state-of-the-art review, new trends, and future challenges. Energies 7:2595–2630 4. Márquez FPG, Chacón AMP (2020) A review of non-destructive testing on wind turbines blades. Renew Energy 5. García Márquez FP, Bernalte Sánchez PJ, Segovia Ramírez I (2021) Acoustic inspection system with unmanned aerial vehicles for wind turbines structure health monitoring. Struct Health Monitor, 14759217211004822 6. Marquez FG (2006) An approach to remote condition monitoring systems management 7. Garcia Marquez FP, Gomez Munoz CQ (2020) A new approach for fault detection, location and diagnosis by ultrasonic testing. Energies 13, 1192 8. Márquez FPG (2010) A new method for maintenance management employing principal component analysis. Struct Durability Health Monitor 6:89 9. Gómez Muñoz CQ, Zamacola Alcalde G, García Márquez FP (2020) Analysis and comparison of macro fiber composites and lead zirconate titanate (PZT) discs for an energy harvesting floor. Appl Sci 10, 5951 10. Liu Z, Xiao C, Zhang T, Zhang X (2020) Research on fault detection for three types of wind turbine subsystems using machine learning. Energies 13:460 11. Jimenez AA, Muñoz CQG, Márquez FPG (2019) Dirt and mud detection and diagnosis on a wind turbine blade employing guided waves and supervised learning classifiers. Reliab Eng Syst Saf 184:2–12 12. Butt AH, Akbar B, Aslam J, Akram N, Soudagar MEM, García Márquez FP, Younis M, Uddin E (2020) Development of a linear acoustic array for aero-acoustic quantification of camber-bladed vertical axis wind turbine. Sensors 20, 5954 13. Pérez JMP, Márquez FPG, Tobias A, Papaelias M (2013) Wind turbine reliability analysis. Renew Sustain Energy Rev 23:463–472 14. Garcia Marquez FP, Pliego Marugan A, Pinar Pérez JM, Hillmansen S, Papaelias M (2017) Optimal dynamic analysis of electrical/electronic components in wind turbines. Energies 10:1111 15. García Márquez FP, Segovia Ramírez I, Mohammadi-Ivatloo B, Marugán AP (2020) Reliability dynamic analysis by fault trees and binary decision diagrams. Information 11, 324 16. Standard, I. (2012) Condition monitoring and diagnostics of machines—data interpretation and diagnostics techniques—part 1: general guidelines
10
I. Segovia et al.
17. Gómez Muñoz CQ, García Márquez FP, Hernández Crespo B, Makaya K (2019) Structural health monitoring for delamination detection and location in wind turbine blades employing guided waves. Wind Energy 22:698–711 18. Chacón AMP, Ramírez IS, Márquez FPG (2020) False alarms analysis of wind turbine bearing system. Sustainability 12:7867 19. Pliego Marugán A, García Márquez FP (2019) Advanced analytics for detection and diagnosis of false alarms and faults: a real case study. Wind Energy 22:1622–1635 20. Segovia Ramirez I, Mohammadi-Ivatloo B, Garcia Marquez FP (2021) Alarms management by supervisory control and data acquisition system for wind turbines. Eksploatacja I Niezawodnosc-Maintenance and Reliability 23:110–116 21. Qiu Y, Feng Y, Tavner P, Richardson P, Erdos G, Chen B (2012) Wind turbine SCADA alarm analysis for improving reliability. Wind Energy 15:951–966 22. Ramirez IS, Marquez FPG (2020) Supervisory control and data acquisition analysis for wind turbine maintenance management. In: Proceedings of the ınternational conference on management science and engineering management, pp 470–480 23. García Márquez FP, Segovia Ramírez I, Pliego Marugán A (2019) Decision making using logical decision tree and binary decision diagrams: a real case study of wind turbine manufacturing. Energies 12:1753 24. Jiménez AA, Zhang L, Muñoz CQG, Márquez FPG (2020) Maintenance management based on machine learning and nonlinear features in wind turbines. Renew Energy 146:316–328 25. Dey B, García Márquez FP, Basak SK (2020) Smart energy management of residential microgrid system by a novel hybrid mgwoscacsa algorithm. Energies 13, 3500 26. Chacón AMP, Ramirez IS, Márquez FPG (2021) Support vector machine for false alarm detection in wind turbine management. In: Proceedings of the 2021 7th ınternational conference on control, ınstrumentation and automation (ICCIA), pp 1–5 27. Van Gerven M, Bohte S (2017) Artificial neural networks as models of neural information processing. Front Comput Neurosci 11:114 28. Orhan U, Hekim M, Ozer M (2011) EEG signals classification using the K-means clustering and a multilayer perceptron neural network model. Expert Syst Appl 38:13475–13481. https:// doi.org/10.1016/j.eswa.2011.04.149 29. Marugán AP, Márquez FPG, Perez JMP, Ruiz-Hernández D (2018) A survey of artificial neural network in wind energy systems. Appl Energy 228:1822–1836. https://doi.org/10.1016/j.ape nergy.2018.07.084 30. Marugán AP, Márquez FPG (2017) SCADA and artificial neural networks for maintenance management. In: Proceedings of the ınternational conference on management science and engineering management, pp 912–919 31. Kusiak A, Verma A (2012) Analyzing bearing faults in wind turbines: a data-mining approach. Renew Energy 48:110–116 32. Marugán AP, Chacón AMP, Márquez FPG (2019) Reliability analysis of detecting false alarms that employ neural networks: a real case study on wind turbines. Reliab Eng Syst Saf 191:106574 33. Kavaz AG, Barutcu B (2018) Fault detection of wind turbine sensors using artificial neural networks. J Sens 34. Adouni A, Chariag D, Diallo D, Ben Hamed M, Sbita L (2016) FDI based on artificial neural network for low-voltage-ride-through in DFIG-based wind turbine. ISA Trans 64:353–364. https://doi.org/10.1016/j.isatra.2016.05.009 35. Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Proceedings of the advances in neural information processing systems, pp 1135–1143 36. Schlechtingen M, Ferreira Santos I (2011) Comparative analysis of neural network and regression based condition monitoring approaches for wind turbine fault detection. Mech Syst Signal Process 25:1849–1875. https://doi.org/10.1016/j.ymssp.2010.12.007
Wind Turbine Alarm Management with Artificial Neural Networks
11
37. Feise RJ (2002) Do multiple outcome measures require p-value adjustment? BMC Med Res Methodol 2:8 38. Marquez FG (2006) An approach to remote condition monitoring systems management. IET Int Conf Railway Condition Monitor 2006:156–160. https://doi.org/10.1049/ic:20060061
Convolutional Neural Networks as a Quality Control in 4.0 Industry for Screws and Nuts Diego Ortega Sanz, Carlos Quiterio Gómez Muñoz, and Fausto Pedro García Márquez
Abstract The artificial intelligence has been implemented in different fields in recent years, especially in the field of image classification. The traditional patternbased techniques present severe problems for achieving efficient algorithms with high success and are usually strongly influenced by environmental factors such as illumination, dust and movement. Convolutional neural networks are particularly effective as image classifiers, but it is a field still under research, in which new methodologies are emerging. An image classifier indicates the class with the highest probability to which the object belongs. A neural network is a dynamic entity that can be retrained to be more robust to environment changes. This implies that the behaviour of the same neural network under different training sets may be different. This difference results in a worse prediction of the classes of the model, which can lead to complete model failure. In this work, a study of the performance of different classification neural networks will be carried out by controlling the different variables in order to obtain comparable results and extrapolate the performance of these networks for a specific data set. The results of these classifiers will be studied to obtain a decision on a real classification tool for industry. Keywords Convolutional neural network · Lineal classifier · Hybrid neural network system · Quality control · Computer vision
1 Introduction Detection of defects with image processing and its classification has aroused the interest of the industry in recent years. Thanks to pattern recognition, a solution highly demanded by the industry can be provided to detect a defect in a part in the production D. Ortega Sanz · C. Q. Gómez Muñoz (B) HCTLab Research Group, Electronics and Communications Technology Department, Universidad Autónoma de Madrid, 28049 Madrid, Spain e-mail: [email protected] F. P. García Márquez Ingenium Research Group, University of Castilla-La Mancha, Ciudad Real, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_2
13
14
D. Ortega Sanz et al.
chain. Traditional methods establish a simple pattern that is compared with each piece and uses techniques such as the pixel counting, geometric locators, contrast shifting, masks, edge and shape detection and filtering algorithms. In certain circumstances, it is necessary to discriminate certain elements according to the image orientation, which makes the task difficult. One way to reduce the computational cost is the delimitation of the working area. The algorithm acts only on the region of interest, or there is a global algorithm that works on several areas in local ways. However, these methods may offer poor accuracy in the detection of images with complex geometry components or with displaced elements, and when working on very specific areas, a minimum change in the boundary conditions may cause the algorithm to not work properly. This means that if the part moves or rotates, traditional algorithms will experience detection problems. In addition, these systems are inefficient with external disturbances such as illumination changes and do not tolerate a change in the component to be analysed. For example, with two references with the same element to be detected in different areas, two algorithms should be implemented, one per fault. Several solutions for component parts detection in this type of components are available in the neuronal networks field. One approach involves using classification neural networks that classify across classes, and the network will output only a single classification by image. This leads to the loss of valuable information in the case that within the same image objects of different classes appear. Recently, alternative solutions have been studied, such as the implementation of convolutional neural networks that are capable of detecting multiple objects (in this case defects inside the part). The advantage of this approach is that the output is not a single class, but any combination of them. Nevertheless, there remains a problem in processing these outputs, as they cannot be discriminated in a simple and efficient way. The Neocognitron, designed by Kunihiko Fukushima, is the base of the convolutional neural networks (CNN) [1]. CNN is used on edge detection, image segmentation and motion detection [2]. A new architecture of convolutional neural network was elaborated in 2006 for obtaining generic segmentation of the shape and edges of an image [3, 4]. One advantage of this specific neural network, besides the multiple variants that emerge of its main structure, is that it is more suitable for implementation in hardware. For this reason, it is common to find many developments and deployments of CNN algorithms in hardware and Graphics Processing Units (GPU) [4]. A further feature of that CNN is the ability for edge detection, so it was common for the results with classical image processing techniques to be compared with a CNN architecture. CNNs are specifically designed to delimit contours with low noise [5]. A neural network with hybrid architecture is formed by the integration of several intelligent subsystems, which maintains its own representation language as well as different solution inference mechanisms, e.g., condition monitoring [6, 7]. These subsystems can work and process signals individually, even the signals of the subsystems can be incompatible with each other, but not at a general level, which greatly increases the efficiency and robustness of these models [8, 9]. The goal of hybrid
Convolutional Neural Networks as a Quality Control …
15
systems is to improve the efficiency and performance of the overall system [10, 11]. The model’s performance may be improved through the use of monitoring mechanisms which will identify the subsystems that should be used at each point in time [12–14]. Overall, the hybrid architectures in convolutional neural networks field attempt to enhance the way to develop global systems and tools by combining simpler and local subsystems, to generate a whole system at a lower computational cost, obtaining more efficient and robust results. Hybrid systems offer the ability to solve some of the problems that are currently difficult to address from a single approach, by combining several subsystems for specific problems. This approach is similar to having qualified personnel for specific tasks rather than one person having to perform all the tasks [15–19] One of the main advantages of hybrid systems is their versatility, since they consist of several neural networks, each of them can be focused on a specific task with high performance. This means that many more cases can be treated than with a single, more complex neural network that would have greater global losses [20–22] The architectures with the greatest potential for hybrid systems are cascade architectures, in which the output of one network is the input of the next. The main problem for these systems is to build an architecture with a combination of networks that is efficient and achieves an acceptable computational cost. Figure 1 shows an example of a hybrid model designed by Guo et al. This work analysed several CNN architectures for creating a CNN model that meets the needs of image classification in manufacturing environments [24]. The modelling of the system is shown below, and a case study is presented where the results are discussed [16].
2 Modelling and Approach Figure 2 shows the information flow from the input to the system (images) to the output (classification/detection). The workflow works with a camera as the input device, the Jetson Nano hardware as the training and processing unit and the output in which the result of the network is visualized. The different variables affecting the model must be considered in order to control them and to be able to establish equivalence between the variables and the results. The camera shown in Fig. 3 captures and sends the images to the system. The variable lighting should be controlled in such a way that the model is not affected by this variable. The size of the frames should be controlled in order that the computational cost can remain stable. The camera has a IMX322 processing chip, with 8MPx and 1080P resolution, and a Varifocal Lens of 2, 8–12 mm. Nvidia Jetson Nano is the hardware element that will perform the rest of the tasks. An image of the hardware and overall specifications is shown in Fig. 4. The generation of the data set and its typology is crucial depending on the type of architecture or model to be trained. To make a reliable comparison, the data sets must contain the
16
D. Ortega Sanz et al.
Fig. 1 Hybrid CNN—RNN example scheme [23]
CAMERA
NVIDIA JETSON NANO
Image input
Datasets folders
Images weight
Datasets parameters Number of images Training parameters Training epochs
Fig. 2 Flow chart of image inspection system Fig. 3 ELP webcam
RESULT Accuracy
Convolutional Neural Networks as a Quality Control …
17
Fig. 4 Jetson Nano hardware
Table 1 Parameters
Variable
Status
Image input
RGB
Images dimensions
640 × 320
Data sets folders
Stablish by Jetson Nano API
Data sets parameters
20% validation 5% test
Training parameters
Defined in .txt file
Training epochs
30–1000
same number of images, the same lighting conditions and the same rotations and backgrounds with the aim that the result is not affected by these conditions. Finally, a test will be performed, where new images will be run on the networks and the accuracy of these images will be evaluated. The accuracy of each image will be noted to get an overall average level of accuracy for each model and be able to make a comparison. Table 1 lists the main variable parameters during training and their value determined to achieve comparable results. It can be observed that the input size of the images is significantly smaller than what the camera can provide. This is due to the hardware specifications on which the data set will be created, and the training of the networks will take place, which performs a reshaping of the images, which will be the previous preprocessing that will be applied to the input images to the different network architectures.
3 Case Study Figure 5 shows the components used, where the similarity between the components can be appreciated. Table 2 contains the description of each component. The first step to develop a custom CNN with Nvidia hardware is to acquire the image data set. It can be used as a specific tool for acquiring images and saving
18
D. Ortega Sanz et al.
Fig. 5 Components used in test
Table 2 Specifications of each component
No
PART
1
M6 nut
2
M8 nut
3
250 mm M8 screw
4
200 mm M6 screw
5
350 mm M6 screw
6
300 mm M6 screw
7
250 mm M6 screw
them in specific folders. Once the data set is available, the training of the designed network is carried out specifying the most appropriate training parameters for the type of network chosen. During the training, the accuracy of the network will be displayed during the different epochs. The weights of each “neuron” are saved for the best results. Once the weights have been obtained, they are exported to an ONNX model. Now the model, the network and the data set are loaded, and the trained model is executed. The system will acquire images from the camera and introduce them to the neural network to process them in real time. Figure 6 shows the steps followed to train and validate each CNN architecture. The process is carried out in the embedded software mentioned above. Figure 7 shows the hybrid CNN flow chart proposed, which consists of three artificial neural networks. The first network performs a fast detection of the components, differentiating between nuts and bolts. The output of this network is the input to two other convolutional models. Depending on whether the output is a nut or bolt, a classification network is run to differentiate between classes of that particular type. It is necessary to analyse different neural network architectures to decide which is the most suitable before carrying out the definitive training of the data set. In this work, several trainings have been carried out with architectures specialized in
Convolutional Neural Networks as a Quality Control …
19
Fig. 6 CNN flow chart
real-time video processing in order to implement it in industrial equipment [25]. The architectures used to make the comparisons of their performance after their training have been: • AlexNet contains eight layers. There are five convolutional layers, some of which are followed by maximal clustering layers. The other three layers are fully connected layers. This specific network architecture uses the non-saturating activation function ReLU that has a less computational cost than the sigmoid function [26]. A similar CNN architecture is SqueezeNet [25]. These CNNs are similar; therefore, both have obtained similar results in the tests carried out. The size of the input image should be 227 × 227 instead of 224 × 224, as Andrei Karpathy points out in his famous CS231n course [27, 28].
20 Fig. 7 Hybrid CNN flow chart
Fig. 8 AlexNet architecture
D. Ortega Sanz et al.
Convolutional Neural Networks as a Quality Control …
21
Figure 8 shows the layers contained in the AlexNet architecture. First layer is the input image, which size is 224 × 224 × 3. Convolutional layer is a ReLU activation function. • The Visual Geometric Group (VGG) is a CNN architecture after AlexNet, which groups multiple convolutional layers with smaller kernels. This avoids the use of a large convolutional layer. It also includes three ReLu layers instead of a single layer [29]. The main difference with AlexNet is that VGG brought with it a massive improvement in speed [30]. As a result, the VGG provides an advantage in terms of processing speed, due to the reduction of nonlinearity. Another VGG architectures are built using these concepts. This gives more options to fit on a custom application [31, 32]. Figure 9 shows the CNN architecture. This architecture has a higher number of convolutional layers and max pooling. • Residual network (ResNet) is a CNN architecture whose design can have thousands of convolutional layers. But the fact of having thousands of layers does not prevent this network from having a strong performance [31, 33]. ResNet was a novel approach for solving the “vanishing gradient” problem [29]. This CNN uses the “cardinality” hyper-parameter, instead of the backpropagation process [30, 34]. By adjusting the cardinality, it is possible to increase the precision without having to increase the depth and width of the CNN. This makes it very easy to adjust, since it is possible to adjust a single parameter, unlike Inception, which needs to adjust multiple hyper-parameters. [35, 36]. The authors demonstrated that this architecture is robust and exhibits ensemblelike behaviour by removing layers, observing its performance and discovering a correlation. [37, 38]. Figure 10 shows the CNN architecture. Now the architecture is substantially modified from previous ones. In this case, the architecture is more complex and larger. The convolutional layers have been grouped, and their size and quantity have been indicated.
4 Results Figures 11 and 12 show a comparison of the different neural networks tested. Each CNN is first compared with its training epochs to analyse the behaviour of the network under short and long training. Figure 11a shows the accuracy of AlexNet CNN with 30 epochs. With a longer training (300 epochs), the result is improved substantially (Fig. 11b), but it is still confusing classes, especially the components that are very similar where the accuracy does not exceed 60%. It can be concluded that the level of detection of this architecture is quite poor. Even reducing the number of classes with those with greater differences would still confuse them and result in a low level of detection.
22 Fig. 9 VGG architecture
D. Ortega Sanz et al.
Convolutional Neural Networks as a Quality Control …
23
Fig. 10 ResNet architecture
Figure 12a, b shows the accuracy of VGG CNN. Figure 12a indicates that the results of the short training have low accuracy, as it does not exceed 60% success rate in the classes, but it is an improvement compared to the previous architecture. In this case, as shown in Fig. 12b, there is a substantial improvement between the two training sessions. The network does not confuse nuts and bolts, but still confuses very similar classes, and at the end, the level of accuracy is slightly higher than in the previous architecture. It can be concluded that the level of detection of this architecture is similar to the previous architecture. However, there is a significant improvement in the differentiation of significantly different classes from each other. The accuracy of ResNet CNN architecture is analysed below. The improvement in performance over previous architectures is evident. In short training, the network successfully distinguishes between nuts and bolts and shows confusion between similar classes as shown in Fig. 13a. However, Fig. 13b shows that this problem is corrected with high performance in the long training, where the accuracy of the class network reaches 90%. The accuracy of each class as a function of the neural network and its training type is shown in Fig. 14. It can be observed that the networks have difficulties in the classification of the screws and nuts. They tend to confuse them since their shape is similar and they only differ in size. From this graph, it is concluded that the network that offers the best result is ResNet with long training. It can be observed that the 250 mm bolt has a significantly higher accuracy in all classifiers, and this is due to its darker colour is easily distinguishable from the rest. Finally, the overall accuracy level of each network is shown in Fig. 15. Each neural network improves accuracy with longer training, being the CNN ResNet the one that shows the greatest robustness and general behaviour for this specific task.
24
D. Ortega Sanz et al.
ALEXNET 30 EPOCHS
(a) 100% 80% 60% 40% 20% 0%
M6 nut
M8 nut
250mm 200mm 350mm 300mm 250mm M8 screw M6 screw M6 screw M6 screw M6 screw
M6 nut
M8 nut
250mm M8 screw
350mm M6 screw
300mm M6 screw
250mm M6 screw
(b)
200mm M6 screw
ALEXNET 300 EPOCHS
100% 80% 60% 40% 20% 0%
M6 nut
M8 nut
250mm 200mm 350mm 300mm 250mm M8 screw M6 screw M6 screw M6 screw M6 screw
M6 nut
M8 nut
250mm M8 screw
350mm M6 screw
300mm M6 screw
250mm M6 screw
200mm M6 screw
Fig. 11 a AlexNet CNN comparison. 30 epochs. b AlexNet CNN comparison. 300 epochs
5 Conclusions A comparison of different models has been carried out to propose a solution to a problem of classification of components in an image. Each neural network has improved its accuracy as it has trained with more data, with CNN ResNet generally showing superior performance than the others. For the same training images data set, the rest of the classifiers have shown a similar precision to each other, while the CNN ResNet was the architecture that delivers the best results with the right training. This architecture classifies the proposed data set efficiently and significantly better than the other compared architectures, which has been crucial in distinguishing between similar parts.
Convolutional Neural Networks as a Quality Control …
25
VGG 30 EPOCHS
(a) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
M6 nut
M8 nut
250mm 200mm 350mm 300mm 250mm M8 screw M6 screw M6 screw M6 screw M6 screw
M6 nut
M8 nut
250mm M8 screw
350mm M6 screw
300mm M6 screw
250mm M6 screw
VGG 300 EPOCHS
(b) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
200mm M6 screw
M6 nut
M8 nut
250mm 200mm 350mm 300mm 250mm M8 screw M6 screw M6 screw M6 screw M6 screw
M6 nut
M8 nut
250mm M8 screw
350mm M6 screw
300mm M6 screw
250mm M6 screw
200mm M6 screw
Fig. 12 a VGG CNN comparison. 30 epochs. b VGG CNN comparison. 300 epochs
This network has proven to have a better detection than the rest, especially in metallic components whose surface has a variability in roughness and brightness. This can drastically reduce false positives and false negatives which can reduce costs in any defect detection process. The mentioned hybrid system proposes a novel technique for applying cascade networks, allowing robust and efficient detection at a low computational cost. The implementation of this architecture in industrial environments will allow the acceleration of industrial processes and cost reduction. Future lines of development of this work are to implement a neural network in a real production environment to create an even more robust system that can face those unfavourable situations such as change of lighting and dust. One of the limitations of this work is that the images have been obtained in a controlled environment; therefore, the results are expected to vary in real working
26
D. Ortega Sanz et al.
RESNET 30 EPOCHS
(a) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
M6 nut
M8 nut
250mm 200mm 350mm 300mm 250mm M8 screw M6 screw M6 screw M6 screw M6 screw
M6 nut
M8 nut
250mm M8 screw
350mm M6 screw
300mm M6 screw
250mm M6 screw
200mm M6 screw
RESNET 300 EPOCHS
(b) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
M6 nut
M8 nut
250mm 200mm 350mm 300mm 250mm M8 screw M6 screw M6 screw M6 screw M6 screw
M6 nut
M8 nut
250mm M8 screw
350mm M6 screw
300mm M6 screw
250mm M6 screw
200mm M6 screw
Fig. 13 a ResNet CNN comparison. 30 epochs. b ResNet CNN comparison
conditions with new variables. For these cases, it is intended to add new sets of images to the existing image base. With this, it will be possible to adjust the network to a wide range of situations such as changes in lighting or appearance of dirt or oil. In these cases, the choice of the best performing model should be redesigned for the new working conditions.
Convolutional Neural Networks as a Quality Control … 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00%
M6 nut
ALEXNET-30
M8 nut
27
250mm 200mm 350mm 300mm 250mm M8 screw M6 screw M6 screw M6 screw M6 screw
ALEXNET-300
VGG-30
VGG-300
RESNET-30
RESNET-300
Fig. 14 Classes precisions resume 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% ALEXNET-30
Overall ALEXNET-300
VGG-30
VGG-300
RESNET-30
RESNET-300
Fig. 15 Overall precision
Acknowledgements The authors are thankful to Defta Spain SLU company.
References 1. Fukushima K, Miyake S, Ito T (1983) Neocognitron: a neural network model for a mechanism of visual pattern recognition. IEEE Trans Syst Man Cybern 5:826–834 2. Juan RQ, Mario CM (2011) Redes neuronales artificiales para el procesamiento de imágenes, una revisión de la última década. RIEE&C, Revista de Ingeniería Eléctrica, Electrónica y Computación 9(1):7–16 3. Nishizono K, Nishio Y (2006) Image processing of gray scale images by fuzzy cellular neural network. In: RISP international workshop nonlinear circuits, Honolulu Hawaii 4. Fujita T et al (2008) CAM 2-universal machine: a DTCNN implementation for real-time image processing. In: 2008 11th international workshop on cellular neural networks and their applications. IEEE
28
D. Ortega Sanz et al.
5. Babatunde H, Folorunso O, Akinwale A (2010) A cellular neural network-based model for edge detection. J Inf Comput Sci 5(1):003–010 6. Seiffert C et al (2009) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Humans 40(1):185–197 7. Muñoz CQG, Márquez FPG (2018) Wind energy power prospective. In: Renewable energies. Springer, Cham, pp 83–95 8. Ramirez IS, Muñoz CQG, Marquez FPG (2017) A condition monitoring system for blades of wind turbine maintenance management. In: Proceedings of the tenth international conference on management science and engineering management. Springer 9. Gómez Muñoz CQ et al (2019) Structural health monitoring for delamination detection and location in wind turbine blades employing guided waves. Wind Energy 22(5):698–711 10. Márquez FPG (2010) A new method for maintenance management employing principal component analysis. Struct Durability Health Monitor 6(2):89 11. García Márquez FP, Segovia Ramírez I, Pliego Marugán (2019) A decision making using logical decision tree and binary decision diagrams: a real case study of wind turbine manufacturing. Energies 12(9):1753 12. Jiménez AA et al (2020) Maintenance management based on machine learning and nonlinear features in wind turbines. Renew Energy 146:316–328 13. Pliego Marugán A, García Márquez FP (2019) Advanced analytics for detection and diagnosis of false alarms and faults: a real case study. Wind Energy 22(11):1622–1635 14. Zhou X, Wang L, Qin J, Chai J, Muñoz CQG (2019) Emergency rescue planning under probabilistic linguistic information: an integrated FTA-ANP method. Int J Disaster Risk Reduction 37:101170 15. Gómez Muñoz CQ et al (2018) Cracks and welds detection approach in solar receiver tubes employing electromagnetic acoustic transducers. Struct Health Monit 17(5):1046–1055 16. Riverola FF, Corchado JM (2000) Sistemas híbridos neuro-simbólicos: una revisión. Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial 4(11), 12–26 17. Herraiz ÁH, Marugán AP, Márquez FPG (2020) Photovoltaic plant condition monitoring using thermal images analysis by convolutional neural network-based structure. Renew Energy 153:334–348 18. Gómez C et al (2017) A heuristic method for detecting and locating faults employing electromagnetic acoustic transducers. Eksploatacja i Niezawodno´sc´ 19 19. Garcia Marquez FP et al (2017) Optimal dynamic analysis of electrical/electronic components in wind turbines. Energies 10(8):1111 20. Gómez CQ, García Márquez FP, Arcos A, Cheng L, Kogia M, Papaelias M (2016) Calculus of the defect severity with EMATs by analyzing the attenuation curves of the guided waves 21. Pliego Marugán A, Garcia Marquez FP, Lev B (2017) Optimal decision-making via binary decision diagrams for investments under a risky environment. Int J Prod Res 55(18):5271–5286 22. Muñoz CQG, Márquez FPG (2018) Future maintenance management in renewable energies. In: Renewable energies. Springer, Cham, pp 149–159 23. Guo L, Zhang D, Wang L, Wang H, Cui B (2018) CRAN: a hybrid CNN-RNN attention-based model for text classification. In: International conference on conceptual modeling. Springer, Cham, pp 571–585 24. Cadieu CF, Hong H, Yamins D, Pinto N, Majaj NJ, DiCarlo JJ (2013) The neural representation benchmark and its evaluation on brain and machine. arXiv:1301.3530 25. Canziani A, Paszke A, Culurciello E (2016) An analysis of deep neural network models for practical applications. arXiv:1605.07678 26. Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Asari VK (2018) The history began from alexnet: a comprehensive survey on deep learning approaches. arXiv:1803. 01164 27. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90 28. Krizhevsky A (2014) One weird trick for parallelizing convolutional neural networks. arXiv: 1404.5997
Convolutional Neural Networks as a Quality Control …
29
29. Sengupta A, Ye Y, Wang R, Liu C, Roy K (2019) Going deeper in spiking neural networks: VGG and residual architectures. Front Neurosci 13:95 30. Limonova E et al (2020) ResNet-like architecture with low hardware requirements. arXiv:2009. 07190 31. Yu W, Yang K, Bai Y, Xiao T, Yao H, Rui Y (2016) Visualizing and comparing AlexNet and VGG using deconvolutional layers. In: Targ S, Almeida D, Lyman K (eds) Proceedings of the 33rd international conference on machine learning. Resnet in resnet: generalizing residual architectures. arXiv:1603.08029 32. Mateen M, Wen J, Song S, Huang Z (2019) Fundus image classification using VGG-19 architecture with PCA and SVD. Symmetry 11(1):1 33. Howard AG et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 34. Marugán AP, Chacón AMP, Márquez FPG (2019) Reliability analysis of detecting false alarms that employ neural networks: a real case study on wind turbines. Reliab Eng Syst Saf 191:106574 35. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 36. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512. 03385 37. Veit A, Wilber M, Belongie S (2016) Residual networks behave like ensembles of relatively shallow networks. arXiv:1605.06431v2 38. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Proposal of an Efficient Encryption Algorithm for Securing Communicated Information Jouri Alharbi, Shatha Alluhaybi, Rahaf Alsaiari, and Raja Aljalouji
Abstract Cryptography protects data in storage, transmission, and process. The means of communication are susceptible to security threats and may be compromised by attackers who can breach the privacy of confidential data. Most of the traditional used symmetric algorithm uses a single key for encrypting the data. If this single key is compromised, the data is no longer secure. This paper will propose a new symmetric block cipher encryption algorithm that uses multiple keys instead of a single one. Therefore, even if one or more keys are compromised, a complete set of keys is required for a successful attack. The strength of the proposed algorithm relies on massive built-in database sets that are represented as keys. The program will pick from these keys based on random measures. The algorithm is programmed in Python, and a performance comparison with another symmetric algorithm Advanced Encryption Algorithm (AES) is provided, which shows promising results. Keywords Cryptography · Symmetric · Algorithm · Keys · Block cipher · Security · Securing communication · Python
1 Introduction Cryptography is the study of protecting data by transforming it into a format incomprehensible to unintended recipients using complicated mathematical equations [1]. Cryptography is used for data transport, storage, and to solve the challenges of information and data exchange [2]. The use of improper algorithms, weak encryption methods, and weak key handling might generate weak and vulnerable cryptographic algorithms. These vulnerabilities could lead to the exposure to attackers of sensitive data and information. [3] The main focus of this paper is to introduce a new J. Alharbi (B) · S. Alluhaybi · R. Alsaiari · R. Aljalouji Department of Cyber Security and Forensic Computing, University of Prince Mugrin, Madinah, Saudi Arabia e-mail: [email protected] R. Aljalouji e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_3
31
32
J. Alharbi et al.
symmetric block cipher encryption algorithm with new techniques and operations to perform an efficient and secure way to encrypt and decrypt the data, including the use of multiple keys instead of a single one. Furthermore, it generates different ciphers for identical plaintext characters. Also, it has huge database instead of fixed: substitution box, permutation box, shift rows, and maximum distance separable codes (MDS) matrixes. Additionally, the algorithm implements a random replacement of selected bits to generate secure ciphertexts.
2 Related Work Zhdanov and Sokolov [4] proposed an algorithm based on the many-valued logic and variable length of block principles. The encryption process is implemented repeatedly for five rounds. However, the number of rounds is not constant, and it can be varied. The first round includes permutation and gamma procedures, while the remaining round consists of gamma and substitution procedures. The proposed algorithm can process binary data after representing it as ternary vectors. However, there is no development of the concept of directly converting binary vectors to ternary vectors yet. Agha et al. [5] presented a symmetric key algorithm that uses two general encryption techniques: one for key generation and the other one for the cipher generation. The proposed key generation technique uses ASCII codes, XOR, and module operations. Cipher generation uses the same previous operations with addition to rotation 13 (ROT13). The ciphertext is two times bigger than the plain text, which is the algorithm’s primary drawback. Bathla et al. [6] have introduced two encryption algorithms based on ASCII characters and mathematical processes. The first algorithm uses multiplication to produce ciphertext by multiplying the key by two and then adding it to the ASCII characters. The second algorithm replaces multiplication with division. These algorithms are efficient in terms of time, but they may be readily hacked due to the lack of random key security. Qazi et al. [7] designed two different cryptography symmetric algorithms in their article. Both algorithms use ASCII characters and a numeric key that doubles for each sequential character. The first proposed algorithm uses 2’s complement and right shifting, whereas the second algorithm employs a large reverse permutation step. These algorithms are limited to only 16-bits or 32-bits until now. In addition to this, the generated ciphertext is quite long because more than four characters are represented for each plaintext character. Usman et al. [8] presented a hybrid lightweight symmetric algorithm called SIT. This algorithm uses an integration of the network with a uniform substitution–permutation and Feistel structure. The user will enter a key of 64-bits, which will be used as the key expansion block’s input. The block performs substantial operations and generates five unique keys. However, this algorithm is limited to five rounds only. Additionally, there is no detailed study on the cryptanalysis for the possible attack, nor has performance evolution been performed yet. Labbi et al. [9] proposed the Hybrid Symmetric Volatile Encryption Algorithm, which uses an intermediate Cipher Text
Proposal of an Efficient Encryption Algorithm …
33
Block (ICT) to produce a dynamic key utilized on each data encryption block. Block cipher and stream cipher are both included in this algorithm. Furthermore, this hybrid algorithm contains key creation based on stream cipher and two rounds of encryption using simple XOR operations. Each block of data is 128-bits, and each block is encrypted with a different key created by a stream cipher scheme. These keys are produced dynamically, with the integrity check included using intermediate ciphertext. However, in a real-life environment, this technique is yet unavailable. Novelan et al. [10] introduced SMS security system on mobile devices using Tiny Encryption Algorithm. This TEA block cipher algorithm employs the Feistel network technique and uses a block size of 64-bits and a key length of 128-bits. This key will be split into four keys; each one is 32-bits in length. This algorithm uses a one-time pad, and it has 32 rounds. The main issue with this algorithm is that since it uses a one-time pad, and it is inefficient in the case of large files as the key will be as large as the file. Mathur et al. [11] presented a symmetric algorithm that uses ASCII values. This algorithm takes the ASCII values of each plaintext’s character and stores these values in an array called asciicontent. Then it will find the minimum value from this array. After that, it operates a modulo operation. The user will enter a string that will be used as a secret key to encrypt the plaintext. Then the system will store the ASCII values of each character from the entered key in an array called asciikey. After that, it will find the minimum value from this array. Furthermore, it will operate a modulo operation. The binary results of this module operation will go through the right circular shifts for n times, where this n is the input’s length. The primary drawback of this algorithm is that it only works when the plaintext and key lengths are the same. Besides, if this algorithm is performed on any file, then the key’s length is the same as the file’s length, so in the case of large files, the length of the key will be as large. Rachmawati et al. [12] designed Hybrid Cryptosystem for securing plaintext files that combine both symmetric and asymmetric algorithms. The data is protected using a symmetric algorithm known as Tiny Encryption Algorithm (TEA) in the hybrid cryptosystem. In contrast, the symmetric key is protected using an asymmetric algorithm based on the Lucas sequence known as the LUC algorithm. The symmetric algorithm TEA splits the plaintext into two blocks, each with 32-bits of data. Then the key of 128-bits is divided into four blocks, each with 32-bits of data. After that, mathematical equations are performed to encrypt the data. Finally, the key is encrypted with the LUC public key and transmitted to the receiver along with the ciphertext. However, this Hybrid Cryptosystem uses the same key for all the encryption rounds, which reduces the algorithm’s security. Kako et al. [13] proposed a symmetric algorithm that uses two main techniques to encrypt the plaintext, horizontal and vertical, depending on two concepts: The Spiral Matrix and the reverse Row-Wise. The encryption block divides the plaintext into two characters, and the concepts of reverse Row-Wise matrix will determine the position in English alphabet order for these two characters and the concept of the Spiral Matrix. The generated two characters will be converted to binary then into a matrix of 4 × 4. This matrix will be divided into four sections: A, B, C, and D. Every two sections will be combined and XORed with the other two sections: AD XOR BC to generate the ciphertext. The disadvantages are that this algorithm only accepts
34
J. Alharbi et al.
16-bits per block which is very small and insecure, and it can only be implemented with English letters since each character is 8-bits.
3 The Proposed Algorithm The proposed symmetric block cipher encryption algorithm is named (JSR), which stands for the authors’ initial names and contains four components: permutation operation, mixing operation, substitution operation, and unique operation. These four components depend on four database sets: permutation array database, mixing array database, substitution sequence database, and unique sequence database. Each chosen array and sequence is represented as a key. This key is a random choice from the program itself, depending on various random measures to make the keys unpredictable. The algorithm has two modes of encrypting the data: one for regular use and the other mode is for sensitive use. The first mode accepts data of 128-bits and does the four operations (permutation operation, mixing operation, substitution operation, unique operation) for ten rounds. The second mode accepts data of 200-bits per block and performs the same four operations for 15 rounds to achieve higher security for sensitive data. Each algorithm’s round consists of the previously mentioned four operations. Each operation is associated with a key that links to an index from the databases. The size of the key is the same as the data block. So, in the case of the first mode, the key size is 128-bits, and in the second mode, the key size is 200-bits. The algorithm has two number of rounds. For the first mode, ten rounds will be implemented using 40 keys. And for the second mode, 15 rounds will be implemented using 60 keys. The keys will be sent as digits which makes the algorithm light in weight. Figure 1 shows how the encryption and the key generation processes are implemented through the main four operations until the ciphertext is generated.
4 Algorithm’s Operations 4.1 Permutation Operation The first operation is the permutation operation, which depends on a database called permutation array database. This database is a set of random combinations ranging from 1 to 16. The chosen array from the database will determine the index of each character of the plaintext. This operation produces the first output. Figure 2 shows a few combinations of the numbers from 1 to 16. This operation will swap the indexes of the plaintext to new indexes depending on the chosen key.
Proposal of an Efficient Encryption Algorithm …
35
Fig. 1 Overview of the proposed algorithm
Fig. 2 Permutation array database example
Figure 3 shows how Array 4 represents a new index position for each plaintext’s character.
36
J. Alharbi et al.
Fig. 3 Example of the permutation operation
4.2 Mixing Operation The second operation is the mixing operation, which depends on a database called mixing array database. This database is a set of a random combinations of ASCII characters. The chosen array from the database will be XORed with the previous operation’s output. Figure 4 shows a few combinations of ASCII characters. There are 256 ASCII characters, and each array could be any 16 characters. This produces a massive number of possible combinations. Thus, guessing the chosen array is very challenging. Figure 5 shows how Array 3 is XORed with each character of the previous output.
Fig. 4 Mixing array database example
Fig. 5 Example of the mixing operation
Proposal of an Efficient Encryption Algorithm …
37
4.3 Substitution Operation The third operation is the substitution operation, which depends on a database called substitution sequence database. This database is a random replacement of selected bits; the selected bits will be the opposite bit; if it is 0, it will be 1, and vice versa. The chosen sequence will perform an opposite change of bits to the previous operation’s output. Figure 6 shows a few sequences of selected bit replacement. The hyphen determines which bit will be replaced. Thus, even if one bit from a character is transformed to the opposite bit, a new character will be generated, which makes deciphering the data difficult. Figure 7 shows how Sequence 1 transformed the previous output’s characters to new characters.
Fig. 6 Substitution sequence database example
Fig. 7 Example of the substitution operation
38
J. Alharbi et al.
4.4 Unique Operation The fourth operation is the unique operation, which depends on a database called unique sequence database. This database is a unique replacement of selected bits. This operation is similar to the previous operation. However, the difference is that every 8-bit will have a unique change of selected bits. Thus, if repeated characters appeared in the previous output, this operation would generate different output for each one. The chosen sequence will perform a unique transfer of selected bits to the previous operation’s output. Figure 8 shows a few sequences of unique selected bit replacements. The hyphen determines which bit will be replaced. The advantage of this operation is the transform of each 8-bits is unique to avoid the same result for identical characters. Figure 9 shows how Sequence 2 transformed the previous identical output’s characters to new characters.
Fig. 8 Unique sequence database example
Fig. 9 Example of the unique operation
Proposal of an Efficient Encryption Algorithm …
39
The result of the final operation will go through either 10 or 15 rounds, and each operation needs a key that will be randomly chosen from the program and associated with one database from the four databases. After that, the ciphertext will be generated along with 40 keys.
5 Example 5.1 Key Generation Process The proposed program contains massive built-in databases; each database is represented as a key. The program will randomly pick the keys (K). These keys are sent along with the ciphertext for the decryption process.
5.2 Encryption Process The steps below explain the encryption process for one round: • Step 1: Input the plaintext → Capstone Project. • Step 2: The program randomly picks K1 . • Step 3: The chosen K1 is associated with a database from the permutation array database. • Step 4: The chosen array from the database will determine the index of each character of the plaintext. • Step 5: Output 1 is generated (Fig. 10). • Step 6: The program randomly picks K2 . • Step 7: The chosen K2 is associated with a database from the mixing array database. • Step 8: The chosen array from the database will be XORed with output 1.
Fig. 10 Example of the permutation operation encryption
40
J. Alharbi et al.
Fig. 11 Example of the mixing operation encryption
• Step 9: Output 2 is generated (Fig. 11). • Step 10: The program randomly picks K3 . • Step 11: The chosen K3 is associated with a database from the substitution sequence database. • Step 12: The chosen sequence will perform an opposite change of bits to output 2. • Step 13: Output 3 is generated (Fig. 12). • Step 14: The program randomly picks K4 . • Step 15: The chosen K4 is associated with a database from the unique sequence database. • Step 16: The chosen sequence will perform a unique transfer of selected bits to output 3. • Step 17: The ciphertext is generated (Fig. 13).
Fig. 12 Example of the substitution operation encryption
Proposal of an Efficient Encryption Algorithm …
41
Fig. 13 Example of the unique operation encryption
5.3 Decryption Process The steps below explain the decryption process for one round. The received keys will be entered in a reverse order to get the plaintext. • • • •
Step 1: Input the ciphertext. Step 2: Input K4 . Step 3: K4 is associated with a database from the unique sequence database. Step 4: The chosen sequence will perform a unique transfer of selected bits to the ciphertext. • Step 5: Output 1 is generated (Fig. 14).
Fig. 14 Example of the unique operation decryption
42
J. Alharbi et al.
Fig. 15 Example of the substitution operation decryption
• • • • • • • • • • •
Step 6: Input K3 . Step 7: K3 is associated with a database from the substitution sequence database. Step 8: The chosen sequence will perform an opposite change of bits to output 1. Step 9: Output 2 is generated (Fig. 15). Step 10: Input K2 . Step 11: K2 is associated with a database from the mixing array database. Step 12: The chosen array from the database will be XORed with output 2. Step 13: Output 3 is generated (Fig. 16). Step 14: Input K1 . Step 15: K1 is associated with a database from the permutation array database. Step 16: The chosen array from the database will determine the index of each character of output 3. • Step 17: The plaintext is returned (Fig. 17).
Fig. 16 Example of the mixing operation decryption
Proposal of an Efficient Encryption Algorithm …
43
Fig. 17 Example of the permutation operation decryption
Fig. 18 Example of plaintext
6 Algorithm’s Implementation The proposed algorithm is programmed in Python. Figure 18 shows the used plaintext file. Figure 19 shows the Python program, which takes the plaintext file as input and generates random keys to encrypt the plaintext file. Figure 20 shows two encryption rounds of the proposed algorithm program. It also shows each operation’s output. The program will encrypt for ten rounds. Figure 21 shows the final round on encryption and the generation of the ciphertext file (Fig. 22).
7 Experimental and Results The analysis part is conducted to measure the algorithm’s code performance in terms of speed, central processing unit (CPU) usage, CPU load, memory usage, memory profiler and to perform the same measures on a pure Advanced Encryption Standard (AES) Python code for comparison. The AES resource code is found on GitHub [14]. The measurements are done in Kali Linux in a virtual machine using PyCharm,
44
Fig. 19 Python inputs window of the proposed algorithm
Fig. 20 Encryption output of the proposed algorithm program
Fig. 21 Final encryption output of the proposed algorithm program
J. Alharbi et al.
Proposal of an Efficient Encryption Algorithm …
45
Fig. 22 Ciphertext file
Table 1 Used device specifications Device name
HP specter × 360 convertible 13-ac0xx
Processor
Intel(R) Core(TM) i7-7500U CPU @ 2.70 GHz 2.90 GHz
Installed random access memory (RAM) 8.00 gigabyte (7.88 Gigabyte usable) 64-bit operating system, ×64-based processor
System type
Table 2 Algorithms’ test results in encryption Test—encryption
AES
JSR
Tool
Speed/time
0.545 s
0.545 s
C extension profile built-in Python module (Cprofile)
CPU usage
3.2%
46.9%
Process and system utilities (psutil) cross-platform library
CPU load
None
49%
Operating system (OS) Python module and psutil
Memory usage
89%
84.65%
psutil
Memory_Profiler
14.72 MB
19.4 MB
Memory_Profiler Python module
which is a professional integrated development environment. Table 1 lists the used device specifications for the testing. Tables 2 and 3 show the two algorithms test results in encryption and decryption.
8 Conclusion Today, most of us use cyberspace to communicate and transfer data at our own risk. All cyberspace users across the world should be concerned about information security because hackers may be able to access information in cyberspace [15]. Modern encryption is required to provide protection and guarantee that information
46
J. Alharbi et al.
Table 3 Algorithms’ test results in decryption Test—decryption
AES
JSR
Tool
Speed/time
0.235 s
0.415 s
C extension profile built-in Python module (Cprofile)
CPU usage
3.45%
18.7%
Process and system utilities (psutil) cross-platform library
CPU load
None
56.9%
Operating system (OS) Python module and psutil
Memory usage
89%
84.8%
psutil
Memory_Profiler
14.6 MB
19.6 MB
Memory_Profiler Python module
transferred in cyberspace stays intact and safe. Cryptography algorithms must be improved in order to give the highest level of security to legitimate Internet users [15]. This paper presented a new symmetric block cipher encryption algorithm that uses various efficient techniques for secure encryption of data including the following: • Using multiple keys instead of one. • Two modes of encryption and numbers of rounds that depend on the sensitivity of communicated data with a different number of keys, block size, and a number of rounds. • Huge built-in databases instead of fixed tables and matrixes. • Random selecting of keys that depend on random measures. • Two techniques of unique replacement of selected bits. • Generating different ciphers for identical plaintext characters. • Implementing padding to short plaintext blocks. • Lightweight keys to be encrypted using asymmetric algorithms. Moreover, performance testing of the encryption algorithm has been presented. And a performance analysis has been conducted and compared with the performance of the Advanced Encryption Standard (AES) algorithm in terms of speed, CPU usage, CPU load, and memory usage, which shows promising results.
9 Future Work The future works of the project are to verify the soundness of the algorithm using formal methods of verification. In addition, proofs would be presented to show that the algorithm is free from security flaws. For better security, the proposed encryption algorithm would be implemented by performing additional phases that follow: • Generating random numbers dependent on different measures, time and date, CPU usage, mouse movements, fan noise, and other unpredictable measurements. • Huge built-in set of databases. • The use of asymmetric algorithms, hash function, and digital signature to secure the transferring of the symmetric keys.
Proposal of an Efficient Encryption Algorithm …
47
• Secure storage of the keys. • Clearance of all symmetric keys from computer memory of the sender after encrypting plain text. • Clearance of all encryption outputs before starting the new plaintext encryption to free used space and improve machine performance. • Ensuring the algorithm could work with different languages.
References 1. Fruhlinger J (2020) What is cryptography? How algorithms keep information secret and safe. https://www.csoonline.com/article/3583976/what-is-cryptography-how-algorithmskeep-information-secret-and-safe.html/. Accessed on 19 Oct 2021 2. Murtaza A, Pirzada SJH, Jianwei L (2019) A new symmetric key encryption algorithm with higher performance. In: 2nd international conference on computing, mathematics and engineering technologies (iCoMET), pp 1–7 3. Owasp (2016) M5: insufficient cryptography. https://owasp.org/www-project-mobile-top-10/ 2016-risks/m5-insufficient-cryptography/. Accessed on 19 Oct 2021 4. Zhdanov ON, Sokolov AV (2016) Block symmetric cryptographic algorithm based on principles of variable block length and many-valued logic. Far East J Electron Commun 1–17 5. Agha D, Khan SA, Fakhruddin H, Rizvi HH (2017) Security enhancing by using ASCII values and substitution technique for symmetric key cryptography. Indian J Sci Technol 1–6 6. Bathla C, Kumar K (2018) Implementation of cipher text generation algorithm using ASCII values. Int J Res Appl Sci Eng Technol 1–9 7. Qazi F, Khan FH, Kiani KN, Ahmed S, Khan SA (2017) Enhancing the security of communication using encryption algorithm based on ASCII values of data. Int J Sec Appl 1–10 8. Usman M, Ahmed I, Aslam I, Khan S, Shah UA (2017) SIT: a lightweight encryption algorithm for secure internet of things. Int J Adv Comput Sci Appl (IJACSA), pp 1–10 9. Labbi Z, Senhadji M, Maarof A, Belkasmi M (2017) Symmetric encryption algorithm for RFID systems using a dynamic generation of key. Global J Comput Sci Technol ENetwork Web Sec 1–11 10. Novelan MS, Husein AM, Harahap M, Aisyah S (2018) SMS security system on mobile devices using tiny encryption algorithm. IOP Conf Ser J Phys 1–8 11. Mathur A, Riyaz A, Vyas J (2018) Encryption of text characters using ASCII values. Int J Eng Res Technol (IJERT) 1–4 12. Rachmawati D, Sharif A, Jaysilen AM (2018) A hybrid cryptosystem using tiny encryption algorithm and LUC algorithm. In: IOP conference series: materials science and engineering, pp 1–8 13. Kako NA, Sadeeq HT, Abrahim AR (2020) New symmetric key cipher capable of digraph to single letter conversion utilizing binary system. Indonesian J Electrical Eng Comput Sci 1–8 14. Kizhvatov I (2014) Pysca toolbox. Github, https://github.com/ikizhvatov/pysca/. Accessed on 15 Apr 2021 15. Genço˘glu MT (2019) Importance of cryptography in information security. IOSR J Comput Eng (IOSR-JCE) 1–5
Discernment of Unsolicited Internet Spamdexing Using Graph Theory Apeksha Kulkarni , Devansh Solani , Preet Sanghavi , Achyuth Kunchapu , M. Vijayalakshmi , and Sindhu Nair
Abstract With the advent of technology, spamming of web pages which is a methodology by which the imposter pages get a higher rank than the true or genuine pages in the search engine’s results to have become prevalent. It poses a gigantic issue for search engines, making it essential for search engines to be able to catch hold and identify spam web pages during crawling. One of the ways in which this issue can be visualized is by considering the underlying web graph structure and the directed URL links(edges) between different spam and true web hosts(nodes), as well as the content attributes of each of the web pages. This graph structure on being fed to an inductive graph neural network model for training purposes will enforce the model to efficiently categorize the new, previously unseen web hosts as web spam or genuine. A GraphSAGE model was developed for carrying out node classification that would leverage the web content attribute data to create node embeddings for the entirely new, unseen web host nodes. Keywords GraphSAGE · Node classification · Web graph · Web spam
1 Introduction Spam is generally any medium that spreads irrelevant and useless information through various platforms, such as e-mails, social media sites, web content or even e-commerce reviews. Due to the enormous quantity of web pages obtainable on the A. Kulkarni (B) Vivekanand Education Society’s Institute of Technology, Mumbai, India e-mail: [email protected] D. Solani Nagindas Khandwala College, Mumbai, India P. Sanghavi · M. Vijayalakshmi · S. Nair Dwarkadas J. Sanghvi College of Engineering, Mumbai, India e-mail: [email protected] A. Kunchapu Vellore Institute of Technology, Vellore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_4
49
50
A. Kulkarni et al.
World Wide Web at one’s disposal, users usually surf or query the search engines to find useful and important web pages. During recent events, many giant search engine firms have detected confrontational data extraction and retrieval as a major priority, due to the myriad of negative consequences brought out by spam and the novel difficulties that have emerged in the research sector [1]. Spam has the ability to degrade the quality of search engine results as well as devoid the benign websites of profits that they might have been able to earn and had there been no spam platforms. In addition, the confidence of an end-user in a browser provider gets drastically weakened, plus it is very easy for the user to hop over different search engines as the cost of switching among different search sites is zero. Another major and very common issue brought about by spam websites are subjecting users to malware, spreading of adult content and phishing attacks. When the user types in the query in the search bar, the search engine will obtain all the pertinent web pages and the user is made available with the hyperlinks to all these pages, in series of around ten hyperlinks. The user can then click on the resulted hyperlinks so that they can access the web pages. This is a very common practice wherein the users surf the Internet through search engines for retrieving the information they need regarding any area. The website managers are attracted to rank or index their web pages at a higher value within the search results of the engines, for the conspicuous reasons of gaining profits from the huge network influx deriving from the searches that have a huge probability of financial gains [2]. This concept came to be known as spamdexing or web spamming and first came into being in 1996 [3] and consequently became known as one of the many more paramount and valued hurdles for the web browser industry [4]. The portion of network page results in the form of links that arise from browser engines is notable and the web users are habitual to paying attention mostly only to the top-ranked search results. Research statistics [5] showed that for 85% of the requested queries, the user mostly requests just the first outcome web page and only the first few hyperlinks, in the range of 3–5 links [6], are clicked on. An increase in the website traffic clearly brings on monetary benefits which influence website managers to manipulate their search engine rankings. This involves various spam links which result in phishing attacks, irrelevant vulgar, pornographic content on web pages, click fraud to increase website revenue or to exhaust a company’s advertising budget, cloaking and tag spam [3]. It has been a common practice where rankings of web pages are augmented via search engine optimization methodologies, by revamping the web page content. There are many instances where ginormous amounts of web pages are created that are directly or indirectly linked to the target web pages. The aforementioned came to be known as link stuffing, which sums to the possibility of improving the rankings of target spam pages to search engines using the link-based ranking technique [7]. The keyword-stuffing technique has also been very popular to create imposter web pages that are filled with popular keywords and phrases that are commonly looked for by web surfers. The main aim of carrying out such practices is to gain a reputation as a pertinent and valid web page by the search engines that satisfy the user’s requirements [7].
Discernment of Unsolicited Internet Spamdexing Using Graph Theory
51
2 Literature Review Machine learning is among one of the innumerable techniques used to determine Internet spam. Because web spam strategies are always evolving, categorization of these approaches should be considered provisional. There are, nevertheless, some important, unbreakable rules to bear in mind [8]: • Internet spamdexing determination is a classification issue. Machine learning algorithms are utilized by searching browsers to find out whether or not a page includes spam. • Every developed and successful spam pivots on at least one or more traits utilized by the web browser ranking workflows. • Statistical anomalies are associated with a few visible elements in Internet browsers. They also observe advancements in unsolicited web spam recognition techniques. Spam and nonspam pages have many statistical distinctions with respect to host attributes, and these differences are leveraged to classify spam and nonspam pages automatically. Some features were proposed for the spam page in this way at first. A method is learned by employing a classification approach and basing it on these traits. Search engines can classify sites into spam and nonspam premised on this mechanism. Prieto et al. announced SAAD, a system that uses online information to recognize spam on the World Wide Web. C4.5, boosting and bagging were utilized for categorization in this technique [9]. Amitay et al. looked at categorization algorithms for determining a website’s capabilities. They discovered 31 clusters, each of which comprised a collection of web spam [10]. Karimpour et al. originally introduced PCA to minimize the amount of features, and then they used the EM-Naive Bayesian semi-supervised classification technique to analyse Internet junk [11]. Web spam detection was considered by Ntoulas et al. through content analysis [12]. To detect web spam, Tian et al. introduced a mechanism constructed upon machine learning that incorporated human ideas, beliefs, concepts and observations along with a semi-supervised algorithm [13]. To classify online spam, Becchetti et al. used connection metrics like TrustRank and PageRank [14]. To filter web spam, Rungsawang et al. used an ant colony algorithm. In comparison with SVM and decision trees, the findings showed that this method had a greater specificity and lesser Slip [15]. Silva et al. looked at decision trees, SVM, KNN and LogitBoost as well as other classification techniques [16]. Castillo et al. used the C4.5 classifier to categorize web spam, bearing in mind link-based features and content analysis [17]. Dai et al. used two phases of classification to determine temporal information. The first level uses many SVMlights, while the second uses a logistic regression [18]. With the introduction of HITS and PageRank, as well as the victory of web browsing engines in displaying optimized outcomes with the help of connectionbased ranking algorithms, web spammers attempted to alter link structure in order to boost their own rankings. Page et al. invented the PageRank algorithm in 1998, which incorporated a strategy that was regarded as among the most effective ways
52
A. Kulkarni et al.
to eliminate online clutter. Within this system, all connections are not designated identical weights in establishing rank; rather, hyperlinks from high-ranking network sites have a greater worth over links from websites with less traffic. As a result, fraudsters’ websites rarely use criteria to determine their ranking. Because of this, the Google search engine has indeed been chosen over the years [19]. Graph neural networks (GNNs) have become very popular in recent trends. They belong to a category of deep learning techniques designed to perform inference on data described by graphs consisting of nodes and edges, which on direct application to graphs help with an easy way to perform node-level, graph-level and edge-level forecasting jobs. They are better than convolutional neural networks(CNN) as CNNs cannot understand the depth of the connectivity of graphs and their corresponding topology. Graph structure consists of nodes as objects and edges as connections or links between the different nodes. A very important concept in GNNs is node embeddings, which involves mapping the graph nodes to relatively low-dimensional space in contrast to the true dimension of the graph, in order to achieve close embedding of the nodes that have similar characteristics. In contrast to a typical neural network, a GNN can gather data and information from the neighbouring nodes at any depth of the current node surrounding [20]. Thus, we have leveraged the salient characteristics of the graph neural network-based GraphSAGE algorithm that would enable us to detect Internet junk through a unique and novel methodology.
3 Material and Experimental Data Processing We have evaluated the performance of our GraphSAGE model on a large public dataset mainly for web spam detection—WEBSPAM-UK2007 [21]. The dataset comprises around one million hosts, of which approximately 4000 hosts are labelled as either ‘spam’, ‘nonspam’ or ‘undecided’ by a group of volunteers. We have made use of these labelled hosts for our node classification problem statement (Fig. 1). The dataset also provides us with a set of web content features for each of the one million hosts. Since GraphSAGE model is inductive and semi-supervised in nature, 20% of the host nodes have been set aside to test the model, such that the model will never come across them during the process of training. These host nodes will be the ‘unseen’ nodes. The remaining 80% of the labelled nodes have been split into train, validation and test sets. So there will be 4 sets in all. The web content features file consisted of a set of 96 attributes for each web page. The most obvious features included the number of pages for each host and the length of the host name. The edges or connections between each of these web pages or nodes are the URLs or hyperlinks that have been provided along with the number of page-to-page connections between the source and the destination hosts (nodes). The number of connections between the source node and the destination node has been considered as the edge weights for our web graph (Fig. 2).
Discernment of Unsolicited Internet Spamdexing Using Graph Theory
53
Fig. 1 Distribution of labelled spam and nonspam hosts in the dataset
Fig. 2 Subset of all the spam labelled nodes in the web graph dataset (WEBSPAM-UK2007) which represents the interconnectivity between the other spam host nodes
4 Model Workflow and Implementation The processed data is used by the model mainly to extract knowledge from the URL relationships between the host nodes as well from the web content features of these hosts. The main aim of the experimental analysis is to forecast host node labels based on liaisons between different web hosts as well as the web page attribute information. For a better understanding of the problem statement and the GraphSAGE model, the symbols in Table 1 are represented as follows (Fig. 3). The following steps are obtained by breaking down the working of the GraphSAGE model.
54
A. Kulkarni et al.
Fig. 3 Implementation workflow Table 1 Symbol representations Notations Descriptions G = (V, E, X ) V is the set of nodes, E is the set of edges, X is the matrix of node attributes, used to obtain a web graph network, G |V| The number of nodes in our web graph structure |E| The total number of edges or hyperlinks between the host nodes in the web graph X The feature attribute matrix for each of the nodes in V L The set of labels for the nodes in V Vj The jth node in G, Vj ⊆ V Zi Allocation of the labels of the jth node in G set
4.1 Sampling The fundamental concept that GraphSAGE is based on is to imbibe knowledge and useful information using only a subsample of the neighbouring node attributes, instead of the entire graph structure. After collecting the data and processing it, we define the number of neighbouring nodes to be used at different dense or depth layers for our model. We have used just 20 adjacent host nodes in the first layer, followed by 10 neighbour host nodes in the second layer. Alternatively, in contrast to training unique embeddings for every node, we attain a function that leads to embeddings by sampling and aggregating features from a node’s local neighbourhood. Many previous research studies have mainly fixated on obtaining node embeddings from a single fixed graph, but on the contrary, many real-world applications need node embeddings to be created efficiently and faster for unseen nodes or fully new graphs or sub graphs. This inductive and semi-supervised proficiency is absolutely essential for high-productivity artificial intelligence machines which are usually subjected to continuously budding graphs and encounter unseen nodes (e.g. social media posts, social media connections, streaming videos on YouTube). Such node embeddings that are obtained through inductive concepts also provide for generalization across graphs that have a matching configuration of attributes: for instance, an embedding
Discernment of Unsolicited Internet Spamdexing Using Graph Theory
55
generator can be trained on protein–protein connection graphs obtained from a model organism, and then generate node embeddings for data that has been gathered on new or different organisms with the help of the trained graph model without much hassle [22]. In a sense, we shall pretend that these 20% of hosts or the ‘unseen’ nodes have been established by website managers and are now available on the Internet, shortly after we have implemented our algorithm.
4.2 Aggregation DeepWalk is one of the examples of transductive algorithms, which means it requires access to the entire graph in order to learn a node’s embedding. It is unsupervised in nature. As a result, whenever a new node is added to an existing graph of nodes, the process must be restarted in order to build an embedding for the new entrant. In this research, we introduce GraphSAGE, a dynamic graph representation learning technique. GraphSAGE is good at predicting the embedding of a new data point even without retraining. To accomplish so, GraphSAGE understands aggregator functions that can infer a fresh node’s embedding based on its attributes and surroundings. Inductive learning is the term for this type of learning. The aggregator makes use of adjacent node neighbours of the previous layers and totals them. Many operations could be carried out for the purpose of aggregation, for instance, simple averaging, LSTM or pooling. Through the sampling procedure, we were able to establish a neighbourhood. This was followed by defining an approach for enabling knowledge exchange between the adjacent nodes. A neighbourhood is taken as input by the aggregators or the aggregator function, which then collates every neighbour node’s embedding, along with weights thereby generating an embedding of the neighbourhood [23]. In brief, the embedding of a node will be relying on the embeddings of its adjacent host nodes that it is connected to will depend upon the embedding of the nodes it is connected to. For example in the simple graph below, node 456 and node 457 have a higher probability of being similar as compared to node 456 and node 687 (Fig. 4). This concept is designed by first randomly assigning values to the node embeddings, followed by changing the previous value of the embedding by utilizing the
Fig. 4 Simple graph
56
A. Kulkarni et al.
Fig. 5 Input graph
Fig. 6 Final graph
mean of all node embeddings it is directly linked to. This concept can be visualized in a simple manner by representing it in the way given below (Figs. 5 and 6). The cube which connects node 1788 with nodes 258, 736 and 827 characterizes a function for these connected nodes. This function can be sum, max, mean or any other operation. In our GraphSAGE model, we have used the mean aggregator as the aggregator function. Now let’s try to generalize the impression by using not only the neighbours of a node but also the neighbours of the neighbours. The way which we will be using here is to first create each node’s embedding in the first step by using only its neighbours just like we did above, and then in the second step, we will use these embeddings to generate the new embeddings. This procedure is repeated each time on the graph and new embeddings are generated for each aggregation layer. This process can be understood with the help of the equations below: Consider that there are a total of K aggregation layers. Initially at time T = 0, (1) hv0 ∀v ∈ V For all the K aggregation layers and all the vertices in V : h kN = AGG R E G AT E K
h k−1 u , ∀u ∈ N (v)
(2)
hNk at this particular time will now contain all the embeddings of its adjacent host nodes as well as gain knowledge about the neighbouring nodes attribute information. The core aggregation functions that are utilized in aggregators for GraphSAGE are, Mean aggregator, pooling aggregator and LSTM aggregator, and the mathematical formulas for the same are as follows:
Discernment of Unsolicited Internet Spamdexing Using Graph Theory
Mean: AGG = Pool: AGG = γ
LSTM: AGG = L ST M
57
h k−1 u |N (v)| u∈N (v)
Qh k−1 u , ∀u ∈ N(v)
(3)
h k−1 u , ∀u ∈ π(N (v))
(4)
(5)
where huk−1 here is a sample of neighbour nodes as the GraphSAGE is semi-supervised and only a sub-sample of nodes can be considered for extremely huge data sources, for instance, social media data, e-commerce reviews data, where a node can have a ginormous amount of adjacent nodes. Usually, there is a possibility of an issue wherein if two nodes, say node 1 and node 2 have different embeddings at an initial time, but by the time aggregation is completed for all the aggregation layers, the final embeddings of both nodes 1 and 2 might be same, mostly because these two nodes share the same neighbours in their respective surroundings. GraphSAGE has the capability of avoiding this issue by passing the adjacent neighbours of nodes 1 and 2 into the aggregating function and concatenating the obtained vector with the vector of node 1 or node 2, instead of feeding both these nodes into an identical aggregating function. This can be represented as: k−1 h kv = C O N C AT h k−1 v , AGG R E G AT E k h u ∀u ∈ N (v)
(6)
The above equation can now be modified by the addition of a nonlinear activation function for better approximation power of the network and make it more expressive. Each layer will have a weight matrix which is represented by W (k). Since the model has no trainable parameters in the above equation, the weight matrix has been added to the equation to have something from which the model can extract useful information for learning. This can be represented as follows: h kv = σ W (k) · C O N C AT h k−1 v , k−1 AGG R E G AT E k h u ∀u ∈ N (v)
(7)
The above calculation is executed by every single node in the graph and gains knowledge about its adjacent nodes and the structure. This is followed by another aggregation step and the creation of node embeddings, which results in enabling each node to amalgamate knowledge distant from the central node. After sampling and aggregation for each node as well as for a certain depth of the neighbouring nodes, we obtain a finalized eigenvector, which is represented as Z i which comprises the knowledge extracted from the neighbouring nodes and the construction or the structure of these host nodes. For binary classification, we can calculate the probability of each of the spam and nonspam classes by using the standard Sigmoid function.
58
A. Kulkarni et al.
This is represented by the equation below: Z i = sigmoid hkv
(8)
Finally, we have made use of the binary cross-entropy or log loss to calculate the loss of our model, and the same can be represented as follows: Bq (m) = −
N 1 xi · log ( p (xi )) + (1 − xi ) · log (1 − p (xi )) N i=1
(9)
where x represents the label of the host nodes (1 for spam nodes and 0 for non spam nodes), while p(x) represents the predicted probability of the node being spam node for all N number of nodes. From the above formula, it can be concluded that for each spam node (x = 1), log( p(x)) is added to the loss, which implies the log probability of it being a spam host node. Similarly, in converse to this, log(1 − p(x)) is summed to the loss again, which represents the log probability of it being a nonspam host node, for each of the nonspam labels (x = 0). The above log loss for graph nodes encourages the adjacent nodes to have comparable characteristics and representations while making sure that these characteristics of distinct nodes are highly diversified (Fig. 7). For carrying out supervised learning technique, we can follow any one of the two courses: one, we generate the embeddings for the nodes followed by using those embeddings for subsequent jobs or second, we can merge both the segment of gaining knowledge from embeddings and the segment of wielding these embeddings in the job into an exclusive end to end representative model and followed by employing the loss for the concluding segment, and backpropagate to learn the embeddings while unravelling or resolving the job concurrently.
Fig. 7 Binary cross-entropy loss graph
Discernment of Unsolicited Internet Spamdexing Using Graph Theory
59
5 Experiment and Results Due to high imbalance in the dataset, we tuned the class weights repeatedly and found the optimum weights to be as follows (Table 2). The GraphSAGE model was trained for a total of 100 epochs, using the Adam optimizer which is an adaptive learning rate optimization algorithm, as it enables stochastic gradient descent for the training of graph neural networks [24]. It requires less memory and is efficient computationally. Alternative to getting feature learning rates adapted based on the average first moment or the mean as is carried out in the RMSProp optimizer, Adam optimizer has the ability to utilize the mean of the second moments of the gradients as well. The learning rate for the optimizer was tuned and experimented on, giving the best results when the same was assigned a value of 0.005. The number of sample adjacent nodes used for the first layer was given a value of 20, while the second layer of the model was given a sample value of 10 neighbouring nodes for the semi-supervised segment of our model learning, or the sampling process. The size of the two dense classification layers used in our model was tuned and given the same values of 64. The batch size was kept as 50 for our model at the time of training. As observed in Figs. 8 and 9, the training and validation accuracy gradually increased, while the training loss and the validation loss decreased over the 100 epochs. Sigmoid function has been utilized as the activation function for both the layers of the GraphSAGE model. To prevent overfitting and to regularize the model, the dropout parameter of the model was tuned to a value of 0.2, due to the large number of nodes present in the dataset. The architecture of our tuned model is shown in Fig. 10.
Table 2 Tuned class weights Class 0(nonspam) Class 1(spam)
Fig. 8 Model accuracy for training set and validation set
0.54 11.22
60
A. Kulkarni et al.
Fig. 9 Model loss for training set and validation set
Table 3 ROC AUC and F1-score for test and unseen host nodes Data Average accuracy (%) Training data Validation data
91.29 88.78
Table 4 ROC AUC and F1-score for test and unseen host nodes Data ROC AUC score (%) F1-score (%) Test data Unseen data
82.13 78.105
86.985 86.715
The average training and validation accuracy obtained over 100 epochs of training the model are as follows (Table 3). We were able to obtain an F1 accuracy of around 87% on test data as well as unseen labelled host nodes data, using our model (Table 4). When it comes to performance measurement of models on highly skewed datasets, only depending on the accuracy metric is inappropriate and incomplete. The primary objective for not using just accuracy as a metric is that this metric can be easily tampered with due to a large number of observations from the majority class, compared to the very few minority class examples. Classification accuracy can be formulated as follows: Accuracy Measure =
No. of Correctly Predicted Samples Total no. of Predictions
(10)
Precision is an important and useful metric that evaluates the number of true positives or the number of positive classification predictions that have the same ground truth, that is a positive class.
61
Fig. 10 Tuned graphSAGE model architecture for training
Discernment of Unsolicited Internet Spamdexing Using Graph Theory
62
A. Kulkarni et al.
Precision can be represented as follows: Precision =
True Positives (True Positives) + (False Positives)
(11)
Recall, on the other hand, specifies the number of positive category predictions obtained, out of all the positive class ground truth labels in the dataset. Recall can be represented as follows: Recall =
True Positives (True Positives) + (False Negatives)
(12)
F1-score is calculated as the harmonic mean of precision and recall and provides a medium to coalesce both these measures into one single metric that catches hold of both properties. This metric is a better interpretation of both precision and recall, as these measures individually cannot provide us with the entire analysis on the performance of our model. F1-score can be represented as follows: F1-score =
2 * Precision * Recall (Precision) + (Recall)
(13)
There are possibilities where we might obtain excellent recall but with very bad precision or achieve a perfect precision but with an extremely dissatisfactory recall. Thereby, F1-score metric can be considered as an optimum measure for carrying out binary classification on a highly skewed dataset, having very few spam labels compared to bulk or the majority of nonspam labels. The results showcase the capability of the model to distinguish between spam and nonspam host nodes, which represent the solution and the goal of this paper.
6 Conclusion and Future Scope The categorization of different host nodes in a web graph has a very useful application. It can be known from the graph structure of the web graph dataset that spammers have a tendency to be connected to other spammers as well as try to fabricate their web page features to imitate the genuine webpages, in order to commit Spamdexing. There are many graph-based algorithms that can be explored but since most of them are transductive in nature, GraphSAGE has a better edge over these algorithms considering its unique selling point of not needing retraining in case another completely new, unseen node is added to a graph. Real-world graphs are always evolving, and our dynamic model can help solve the detection of spam web pages, which is an area where spammers come up with spam pages and hyperlinks very frequently. We will be using a much larger dataset for our model in future and also revamp the impact
Discernment of Unsolicited Internet Spamdexing Using Graph Theory
63
and efficacity of our model. GraphSAGE is a very intuitive and powerful algorithm that can be used in many other use-cases, such as link prediction, ensemble learning, graph regression, graph sampling, graph classification, recommendation systems and graph representation learning. In the future, we also aim to use GraphSAGE for the prediction of links between different spam nodes to get an intuition if there exists collaboration between two or more spam pages, the objective being identifying pairs of nodes that will either form a connection or not later in the near future, essentially exploring the application of detection of link farming using an inductive approach.
References 1. Fetterly D (2007) Adversarial information retrieval: the manipulation of web content. ACM Computing Reviews 2. Liu B (2007) Web data mining: exploring hyperlinks, contents, and usage data. Springer Science & Business Media 3. Convey E (1996) Porn sneaks way back on web. The Boston Herald 28 4. Henzinger MR, Motwani R, Silverstein C (2002) Challenges in web search engines. ACM SIGIR Forum 36(2). ACM, New York, NY 5. Silverstein C et al (1999) Analysis of a very large web search engine query log. ACM Sigir Forum 33(1). ACM, New York, NY 6. Joachims T et al (2017) Accurately interpreting clickthrough data as implicit feedback. ACM SIGIR Forum 51(1). ACM, New York, NY 7. Jia Z et al (2012) Research on web spam detection based on support vector machine. In: 2012 international conference on communication systems and network technologies. IEEE 8. Najork M (2009) Web spam detection. Encycl Database Syst 1:3520–3523 9. Prieto VM et al (2012) Analysis and detection of web spam by means of web content. In: Information retrieval facility conference. Springer, Berlin, Heidelberg 10. Amitay E et al (2003) The connectivity sonar: detecting site functionality by structural patterns. In: Proceedings of the fourteenth ACM conference on Hypertext and hypermedia 11. Karimpour J, Noroozi AA, Alizadeh S (2012) Web spam detection by learning from small labeled samples. Int J Comput Appl 50(21) 12. Ntoulas A et al (2006) Detecting spam web pages through content analysis. In: Proceedings of the 15th international conference on world wide web 13. Tian Y, Weiss GM, Ma Q (2007) A semi-supervised approach for web spam detection using combinatorial feature-fusion. In: Proceedings of the graph labelling workshop and web spam challenge at the European conference on machine learning and principles and practice of knowledge discovery 14. Becchetti L et al (2006) Link-based characterization and detection of web spam. AIRWeb 15. Arnon R, Apichat T, Bundit M (2011) Spam host detection using ant colony optimization. IT convergence and services. Springer, Dordrecht, pp 13–21 16. Silva RM, Yamakami A, Almeida TA (2012) An analysis of machine learning methods for spam host detection. In: 2012 11th international conference on machine learning and applications, vol 2. IEEE 17. Castillo C et al (2007) Know your neighbors: web spam detection using the web topology. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval 18. Dai N, Davison BD, Qi X (2009) Looking into the past to better classify web spam. In: Proceedings of the 5th international workshop on adversarial information retrieval on the web 19. Page L et al (1999) The PageRank citation ranking: bringing order to the web. Stanford InfoLab
64
A. Kulkarni et al.
20. Xiao L, Wu X, Wang G (2019) Social network analysis based on graph SAGE. In: 2019 12th international symposium on computational intelligence and design (ISCID), vol 2. IEEE 21. “Web Spam Collections”. http://chato.cl/webspam/datasets/ Crawled by the Laboratory of Web Algorithmics, University of Milan, http://law.di.unimi.it/. URLs retrieved 05 2007 22. Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st international conference on neural information processing systems 23. Özçelik R (2019) An intuitive explanation of graphsage. http://towardsdatascience.com/anintuitive-explanation-of-graphsage-6df9437ee64f 24. Kingma DP, Ba J (2017) Adam: a method for stochastic optimization. arXiv:1412.6980
Unique Authentication System in End-To-End Encrypted Chat Application S. Pandey, T. Rajput, R. Parikh, A. Gandhi, and K. Rane
Abstract Communication has been a part and parcel of our daily life. The advent of technology has bolstered the quality of communication and thus made it effective. Consequently, chat applications have become the primary method of connecting and socializing with people. However, the rise of these applications has also led people to question the confidentiality of their services. In this paper, we are proposing a secure chat application along with a unique authentication system while simultaneously providing a methodology for End-to-End Encryption. Our solution fulfills a set of requirements that are required for a safe chatting experience. We will be presenting an in-depth procedure of our alternative approach for developing a well-rounded and end-to-end encrypted chat application. Keywords Authentication · End-to-end encryption · Chat application · Cryptography
1 Introduction With the strides technology has taken in communication towards developing utilitarian applications, consumers are faced with a plethora of options. Different platforms provide discrete and fundamentally different features. However, the most essential feature that ensures a spotless conversation and communication is privacy S. Pandey (B) · T. Rajput · R. Parikh · A. Gandhi · K. Rane Department of Computer Engineering, K. J. Somaiya College of Engineering, Mumbai, India e-mail: [email protected] T. Rajput e-mail: [email protected] R. Parikh e-mail: [email protected] A. Gandhi e-mail: [email protected] K. Rane e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_5
65
66
S. Pandey et al.
and how effectively it is implemented. It not only ensures the code of ethics but also respects the consumer’s personal space. Additionally, storage and privacy of the user credentials including their private key should also be valued. Thus, there is a need for a distinctive authentication system. To improve the overall security of the application, the authentication should be supplemented by a strong encryption algorithm. There are plenty of methodologies one can employ to provide optimum privacy. However, end-to-end encryption [1] is one of the most reliable and hallmark approaches toward ensuring protected mass communication. End-to-End Encryption [2] is a method of securely sending data from one host to another host. Here, the data is encrypted using Asymmetric encryption. This encrypted data can be read only by the sender and receiver. The transfer of data occurs via the signal protocol. When the data travels toward its destination, the message cannot be decrypted or tampered with by an additional third-party entity or service. The presence of cryptographic keys (a public–private key pair) makes the transfer near impenetrable and provides the user with the highest form of security. The E2E encryption protocols contain a digital certificate that authenticates both end points of the communication. To make sure that the transmitted data is safe and protected from unauthorized access, these keys must be accessible to authorized users and not to external network operators or cloud services. All in all, this approach presents a modern and a secured form of communication through our solution using Flutter, Provider [3], Node.js, and MongoDB.
2 Literature Review In this section, we are going to discuss asymmetric encryption and digital signature. 1.
Asymmetric Key Cryptography [4]:
Asymmetric encryption makes use of two keys—the public key and the private key. These keys are used for the encryption and the decryption process. The public key is known to everyone while the private key is only known to the user. Some of the algorithms which follow Asymmetric Key Cryptography are as follow: I. II. 2.
Rivest–Shamir–Adleman (RSA) Elliptical Curve Cryptography (ECC) RSA Algorithm:
The public key is only used to encrypt messages, and it is exposed to the public and can be used or seen by anybody. As a result, it is not a secret key. The communications are decrypted using the private key. The private key is known as the secret key. In this technique, we use a “n” prime number that is difficult to crack and cannot be easily deconstructed.
Unique Authentication System in End-To-End …
3.
67
Digital Signature [5]:
Digital signature is a confidential key that is known only to the user. It is similar to a fingerprint or an attachment to the user document that essentially guarantees the integrity and safety of the document. Digital signature accomplishes the following: I.
II.
III.
Message Authentication—The digital signature can be verified by the receiver with the help of the public key of the sender. He can be positive that only the sender has the confidential private key. Data Integrity—When there is a security breach and an attacker has the access to the data and tries to alter it, the digital signature verification at the end of the receiver is stalled. There will be a mismatch between the hash of the altered data and the output supplied by the verification algorithm. Thus, access to data would be denied to the attacker. Non-repudiation—The signer can create only one single unique signature on a given data. ˙In case any dispute arises in the future, the receiver can provide the data and digital signature as proof of evidence to the third party.
3 Criteria A successful chat application should be well-rounded and should contain all the essential tools which make it efficient and easy to use. Privacy supplements these features, making the application secure. Keeping these aspects in mind, we have listed our criteria for making a safe and strong application. We took into consideration the nuances of different well-known chat applications such as WhatsApp [6], Messenger, Telegram [7], and Viber, [8] understood their respective criteria, and filtered out the features that would make our application more secure and convenient. Apart from that, we decided to experiment with several techniques and methodologies to improve such pre-existing features for increased security. Additionally, our criteria provide a rich insight into modern security protocols and provide increased ease of operation. Criteria
Description
1
Password stored on the server must be hashed
2
The server must never know the user’s full password
3
The public and private key pairs must not be generated on the server
4
The server must never know the user’s private key
5
Message authentication and integrity should be done using digital signatures
68
S. Pandey et al.
Fig. 1 Component diagram
4 Proposed Methodology 4.1 Proposed Architecture The chat app implemented as shown in the Fig. 1 is displayed through a featurepacked chat UI responsible for displaying the essential features of a well-rounded chat application, i.e., chat contact list, chat dialog box for individual chats, the messages within every single chat window, etc. This is achieved through the communication between the chat client engine and the chat server engine. All the requests between the client and server are managed by the presence of chat REST APIs on the backend for fulfilling API requests. The request for delivery and dispatch of messages from each chat instance is implemented real time due to the active communication between the client chat web socket and the server web socket. Finally, when it comes to the passage and storage of media, we used firebase as the repository which is willfully communicated to the media storage server on further request to in turn be deployed to the main chat server engine.
4.2 Authentication In any End-to-End encrypted system, generating the public and private key is the most fundamental part. Moreover, the safety of the key is of utmost importance and the safety should be maintained even when the security of the server is compromised. In an instance where the private key from the server is obtained by an unauthorized party, the system should still be secured. In this module, we will explain the generation and safe storage of the public and private key pairs as shown in the Fig. 2.
Unique Authentication System in End-To-End …
69
Fig. 2 Authentication mechanism
A.
Registration: Step 1: The user will enter the password, which will be hashed using a cryptographically secure SHA-1 algorithm, which is a one-way hashing algorithm. Step 2: Divide the hash password into two parts in such a way that the first part will comprise 24 bytes and the second part will comprise the remaining 16 bytes. Step 3: Only the first part will be sent to the server as the user’s password. This will satisfy criteria 2. Step 4: The second part will be stored in the local storage as the encryption key. Step 5: Generate the public and private keys on the client-side. This will satisfy the criteria 3. Step 6: Encrypt the generated private key (with AES algorithm [9, 10]) using the encryption key in step 4. Send the encrypted private key to the server along with the plaintext public key.
70
B.
S. Pandey et al.
Login: Step 1: Follow steps 1–4 of the above registration process. Step 2: Login is successful when the first part of the hashed password matches with the user’s password stored on the server. Step 3: Retrieve the encrypted private key and plaintext public key for that user. Step 4: Decrypt the encrypted private key with the encryption key (second part of the user’s hashed password). Step 5: Encryption key, decrypted private key, and plaintext public key are stored in the user’s device.
4.3 Messaging Figure 3 describes how the messages are sent and received. The safety of this system is escalated by the presence of public and private key pairs [11]. These keys are unique and are generated whenever a new user registers into the application. The receiver’s public key, unlike the private key, is loaded on the sender’s side. The users have access to their private key, stored on their device, which can be used to decrypt the received messages. In other words, the public key which encrypts the message does not allow any private key apart from its original pair, to decrypt the message. In our application, the encryption using the public key happens on the client-side and is converted into relevant ciphertext. After the message is decrypted by the receiver
Fig. 3 Overview of communication between two clients
Unique Authentication System in End-To-End …
71
using the corresponding private key, the message is stored in the form of plain text in the local storage [12] (of the receiver) and deleted from the server. The message will be stored as an encrypted ciphertext on the server until it is delivered to the receiver. A.
Sending Messages: Step 1: Retrieve recipient’s public key Step 2: Store the plain text message in the local storage of the sender’s device Step 3: Encrypt the plain message to be sent with the retrieved public key. Step 4: A copy of the above-encrypted text will be created. This copy will be hashed using hashing algorithm (MD5) algorithm [13]. Step 5: Using the signature algorithm [14] and taking the hash as well as sender’s private key as inputs, a signature is generated. Step 6: The signature is appended to the encrypted message from step 3 and then sent to the server.
B.
Step 7: Server will connect with the receiver and send the encrypted message. Receiving Messages: Step 1: The receiver will receive the message from the server. Step 2: Retrieve your own private key from the local storage. Step 3: Decrypt the retrieved message using the sender’s public key. This will give us the signature and the encrypted message. Step 4: Create a copy of the encrypted message. Step 5: Create the hash of this copy using the same (MD5) one-way hashing algorithm and compare it with the received signature. This will ensure data integrity. Step 6: Decrypt the encrypted message from step 3 with the receiver’s private key. Step 7: Store the decrypted message in the receiver’s local storage.
5 Results and Analysis 5.1 Criteria Analysis In Sect. 2, we made a note of all the specific criteria that should be followed to make a chat application more safe and secure. To have a wider view of the features we managed to implement and the criteria we fulfilled while designing the application,
72
S. Pandey et al.
we have provided a meticulous analysis of our implementation in the form of a table given below. Criteria Description
Proposed solution
1
Password stored on the server must be hashed
Y
2
The server must never know the user’s full password
Y
3
The public and private key pairs must not be generated on the server
Y
4
The server must never know the user’s private key
Y
5
Message authentication and integrity should be done using digital Y signatures
Note “Y” means that it meets the requirement
5.2 Time Analysis The graph in Fig. 4 shows the time taken for the whole cycle from when the message is sent to when the message is acknowledged as read for 14 different samples. The average time comes out to be 1.776006357 s. The server was hosted at AWS cloud, Ohio, USA, while the clients were connected to the internet in Mumbai, India.
Fig. 4 Analysis of sample vs time
Unique Authentication System in End-To-End …
73
Sample no
Message sent
Message read
Time
1
17.19601
19.057355
1.861345
2
26.947913
28.377683
1.42977
3
31.345805
33.164336
1.818531
4
40.585376
42.098413
1.513037
5
52.25226
53.663863
1.411603
6
55.364202
57.557515
2.193313
7
0.528517
2.271412
1.742895
8
5.932325
7.802064
1.869739
9
13.361672
15.167323
1.805651
10
16.457714
18.44987
1.992156
11
20.240321
22.549015
2.308694
12
22.710228
24.315998
1.60577
13
25.917828
27.561292
1.643464
14
29.685675
31.353796
1.668121
6 Conclusion The primary goal of this specific implementation was to provide a prototype for building a versatile and secure chat application as shown in Fig. 5 that follows the most important protocols for the preservation of the privacy of the users. Hence, this venture was an in-depth analysis of the technical hassle a developer and a chat service provider might face while developing and providing such a secure service. During the development phase, we did not encounter technical difficulties or roadblocks. The presence of pre-existing services at our disposal made our analysis easier. Moreover, we tested our application for the proposed features and the requirements we mentioned earlier. Additionally, we did not test the scalability of our application. The final product provided all the features of a furnished chat application. The process of generating and sending messages is proportionate with the service provided by well-known chat applications. However, we cannot guarantee the same for the chat server because we did not perform load tests on it to make it analogous with other mobile chat services. This test outstrips the scope of our project.
7 Future Work As a part of the future scope, we would like to test the scalability and analyze the performance of the chat server. It may lead to some bottlenecks which may require new research work for solving. We would also like to compare the performance
74
S. Pandey et al.
Fig. 5 Screenshots from the app
of different encryption algorithms in our chat application [15]. A new feature of scheduling messages can also be added to our chat application. The measures taken to maintain the security of the current application with normal chatting may not be enough and a more intricate mechanism may be needed when a scheduled chat feature is added.
References 1. Kartik Giri, Namit Saxena, Yash Srivastava, Pranshu Saxena: End-to-End Encryption Techniques: 1089–1093 2. Ali A, Sagheer A (2017) Design of secure chatting application with end-to-end encryption for android platform, pp 22–27 3. Slepnev D (2020) State management approaches in flutter 4. Chandra S, Paira (2014) A comparative survey of symmetric and asymmetric key cryptography, pp 83–93 5. Goldwasser S, Micali S (2014) A digital signature scheme secure against adaptive chosenmessage attacks, pp 281–308 (1988) 6. Endeley RE (2018) End-to-end encryption in messaging services and national security-case of WhatsApp Messenger 7. Saribekyan H, Margvelashvili (2017) Security analysis of telegram 8. Sutikno T, Handayani L, Stiawan (2016) WhatsApp, viber, and telegram: which is the best for instant messaging? 2088–8708
Unique Authentication System in End-To-End …
75
9. Mahajan P, Sachdeva A (2013) A study of encryption algorithms AES, DES and RSA for security 10. Heron S (2009) Advanced encryption standard (AES). Netw Sec 8–12 11. Karabey, Akman G (2016) A cryptographic approach for secure client-server chat application using public key infrastructure (PKI), pp 442–446 12. Waluyo AB, Srinivasan (2005) Research in mobile database query optimization and processing, pp 225–252 13. Gupta S, Goyal N, Aggarwal K (2014) A review of comparative study of md5 and ssh security algorithm 14. Stallings W (2011) Cryptograpy and network security principle and practice, 5th ed. Prentice Hall 15. Choudhary B, Pophale C, Gutte (2020) Case study: use of AWS lambda for building a serverless chat application, pp 237–244
Vaccination Reminder System Vemulakonda Gayatri, Sunkollu Sai Chandu, Sreya Venigalla, Raavi Dinesh Kumar Reddy, and J. Ebenezer
Abstract As rightly said, prevention is better than cure. Vaccines have been instrumental in preventing pathogenic infectious diseases for humans, and among seventy infectious diseases, we have only thirty vaccines that help us unreliable. Some basic vaccines have to be taken by every single individual. It will prevent us from getting prone to diseases. The main motive of this project is to provide a virtual platform that gives all the detailed information regarding the vaccines along with the age limit provided through reminder short messages service (SMS) and an email message. It recommends hospitals that provide the required vaccines in their locality. It also shows the availability of COVID-19 vaccines. It also helps the user access the web application in their preferred language to be helpful to understand. It checks the user’s age and matches with the eligibility for the vaccine. Keywords Vaccination · Reminder system · GPS · Translator · REST API · SMTP library · Django
1 Introduction Every era introduces different challenges to healthcare organizations, and the start of the twenty-first century has been no different. Today there is unrivaled attentiveness to the quality of health services. When a robust healthcare system exists, they distribute quality services to all people, where and whenever they need it, which will be a good start. Vaccination plays a vital role in the healthcare domain from preventing and controlling immune-preventable diseases like polio. Therefore, it is necessary to maintain the vaccination status contemporary and accessible to enable its benefits.
V. Gayatri (B) · S. S. Chandu · S. Venigalla · R. D. K. Reddy · J. Ebenezer Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India e-mail: [email protected] J. Ebenezer e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_6
77
78
V. Gayatri et al.
Researchers have focused on vaccination reminder systems in the recent past for different diseases [1, 2].
1.1 Analysis of Vaccination Vaccines assist in protecting each and every individual from illnesses and also hold out against infectious diseases. Vaccination has its own time period and schedule along with dosages. The vaccination dosage rests the same among newborn babies but may vary for adults according to their health conditions. Infants can fight germs with their immune systems, but some hazardous diseases cannot be handled [3]. That is why there is a need for vaccination to nourish their immune system. These harmful diseases were killing or harming many infants, children, and adults. However, now with vaccination, your child can get immunity against these diseases without getting sick. Moreover, in the case of some vaccines, being vaccinated can give you a better immune response than being prone to that disease. Over 53% of the hospitalized people are eligible for prevention care by vaccine intake [4]. If one is vaccinated, it prevents others from getting infected. In general, germs can travel quickly through a community, and many people fall prey to them. If enough people get sick, it can lead to an outbreak. But when enough people are vaccinated against a particular disease, it is difficult to spread to others. This means that the entire community is less likely to contract the disease [5]. Short message service (SMS) is an essential and efficient service included in mobile phones. It is available on all mobile phones as it is easy to use and can operate with minimal cost. SMS allows users to communicate non-verbally, saying themselves through coalitions of alphanumerical symbols with a most extensive of 160 characters per single SMS message. SMS has entered global links because SMS is a free, fast, and efficient means of connection between people of any distance. Email plays a vital role in everyone’s daily life. Through email, we can send or receive various kinds of important information. Email plays an essential role in the reminder system by alerting the user about their vaccination schedule from time to time [6, 7].
1.2 Alerting System A reminder system is a time management computer program designed to alert users of important events or schedules that they have input into the program. Most programs provide a calendar or display of a list of events and schedules and a reminder technology. Some applications like phone calls, emails, text messages were made by the people or as automated calls that run through a system that is various forms of appointment reminders.
Vaccination Reminder System
79
2 Novelty We came up with a social welfare project which helps society to get vaccinated on time. It helps infants to the elderly get their vaccines at their respective age groups, including COVID vaccination. It educates the users about the vaccination and the need for vaccine intake. This system alerts users through SMS and email. It provides the details of the hospitals which are nearby their locality. It is accessible in different languages so that it makes the user easy to understand the application.
3 Related Work Children vaccination reminder via SMS paper is taken as a reference to the main paper vaccination reminder system [8]. This is service-based as it ensures children of all literacy backgrounds can be vaccinated, and this can also be done through their mobiles or physically visiting the clinic. The reminder system drops a message to the user’s registered mobile number two days ahead of the scheduled appointment. This makes it austere for the parent to keep track of their child’s vaccination [9]. This helps the user to get his child jabbed in the correct time allotted. It is designed by taking Pregnancy Progress System (PregProSyst) as its base. Visual Basic Net 2008 and SQL server 2008 to store the data.Net2sms.net server are used for SMS alerts to remind the user through SMS. Vaccine tracker/SMS reminder system paper is used as a reference to primary paper vaccination reminder system [10]. The parents register their details along with their child’s details. It is a manual process, and data is available only up to the child’s 6th year. This reminder system regularly reminds the user’s appointment and keeps track of already taken vaccinations to keep track of the doses ahead. The user will be reminded regularly through a text message mailed to their registered mail id. This reminder system is constrained to a particular region named Kurdistan, Iraq. This website is developed using C#. The server used is client–server model. Many SMS gateway types are integrated to generate the text accordingly. A general database is used to store and segregate the data wisely. ELasikevelapatti baby vaccination reminder system presents a reminder of vaccines related to the user’s age till five years. It keeps track of the child’s growth by storing his height and weight information to get graphical details about his growth. It also sends a message regarding the polio vaccination date and location of the nearest polio booth [11]. Vaccination adherence is designed for supporting the Arabic language and the Saudi vaccination schedule with customization features. It should fulfill the need for a functional children vaccination reminder for parents in Saudi Arabia [12]. An Android Based Application for Determine Nearest Hospital Nearest to Parents Location is an application that helps the user to find the nearest hospital location within Karachi [13]. The majority of health centers still use a paper-based vaccination system to keep track of vaccination uptakes of each user [14], and delay of
80
V. Gayatri et al.
vaccination becomes a significant challenge to people [15]. Hence, there is a necessity for a reminder system that makes the reminders and vaccine uptakes easier.
4 Proposed Method The existing systems have some drawbacks like they schedule the vaccines only for children up to 6 years of age and a particular region. These are local reminder systems that are not accessible to everyone. Those systems were accessible in their regional language only. They are providing limited hospitals and their locations. The proposed design for the vaccination reminder system has overcome such drawbacks by considering the following features. Collect the data for the vaccination reminder system in terms of information to be communicated to parents or the respective users, and the system was ready to develop. These reports focus on designing and developing a prototype for the proposed model. Both design and development of the system are discussed in the following subsections, and Fig. 1 depicts the flow of the proposed model.
4.1 Handy Web Interface An interface plays a crucial role in order to make an interest to the user. We developed a user-friendly interactive environment that helps the user to navigate through the web application. This was developed using the Django python web framework. It is a collection of Python libraries that allow you to quickly and efficiently create a quality web application and is suitable for both front and backend. Frontend design has been developed using a bootstrap CSS framework, and for backend purposes,
Fig. 1 Workflow
Vaccination Reminder System
81
we use DBSqlite that helps to store the data. The criteria for vaccines that include the name of the vaccine, price, effects, etc., have been preloaded using the Djangoadmin site. Admin imports the required datasets into the database. The Djangoadmin is the only one who can make changes in the database by adding, deleting, or modifying the datasets. SQLite is faster than direct file system I/O and is an opensource SQLite database that stores data to a text file on a device. It is embedded into end program rather than a client–server database engine. Language is a key for good communication and understanding, so there is a need for a translator. However, English is a notable language but cannot be understandable by everyone. Hence, we have used “Google Trans” to help the user navigate the website with their preferred language.
4.2 Reminder System Nowadays, people are busy due to their hectic schedules. So, there is a need for a reminder system that helps them to be up-to-date in any field of work. The same thing happens to this vaccination schedule. Therefore, we remind the user of the vaccination schedule, which helps him to strengthen the immune system. The reminders were generated through email and SMS to their registered credentials. This reminder system was created using the SMTP library and Fast2SMS REST API. Server SMTP is a widely used protocol to help transmit email messages and alerts. Fast2SMS REST API is used to send reminders to registered mobile numbers. REST APIs are like client–server models, and hence, it is feasible to link up with the website and our database. Some features like Bulk SMS, Custom SMS, Multimedia SMS, Delivery Reports, Transactions, and API are available for developers to get the most out of the service. Important information can be quickly transferred to people in different locations through these features. Generally, these generations of people/parents prefer to communicate via email and check their emails thoroughly. However, on average, SMS is a very effective tool with a 98% open rate and is read within 5 min of delivery. Hence, both reminders help them get vaccinated.
4.3 Self-Tracking System and Hospital Location Activity One of the main goals of the vaccination reminder system is self-analysis by considering the status of completed vaccines. This self-tracking system helps the user to quickly identify the completed and incomplete vaccines in the form of a bar graph. Users can individually manage their vaccination status, and this interpretation of graphs is effortless to apprehend inoculation status by noticing the values. The values considered are the ones extracted from the user’s records. It analyzes the data and differentiates according to the values recorded.
82
V. Gayatri et al.
As there is a significant development in technology, locating the required places now is not a big problem, but providing detailed information to the user can save his time and energy. This can be done using geolocation, which allows users to provide their location to our web application. Because of privacy issues, the user will be asked for permission to report location information. By using the recorded location of the user, the details of nearby hospitals will be suggested. This suggestion was made using Mapbox REST API, which is a client–server model. Nearby hospital details will be provided using this Mapbox.
5 Architecture Diagram Figure 2 depicts how the vaccination reminder system works. First, the user has to register on the registration page providing his/her details. User details that include username, password, registered mail id, mobile number, and age are stored in the SQLite database, which is used to store data inside the user’s device in a text file after the registration. The user’s passwords will be encrypted using the SHA256 algorithm. In the SQLite database, the CSRF middleware and template tag provides easy-to-use protection against Cross-Site Request Forgeries. Then the system asks the user to provide the details of their child or the person who wants to know their vaccination status so that their respective vaccines are displayed. Email and SMS alerts will be sent to the user’s registered mail ID and mobile number with the help of the SMTP server and Fast2SMS REST API. By accessing the user’s location through GPS and with the help of Mapbox REST API, the user can find the nearby hospitals, which makes them easier to get vaccinated. The whole process can be accessed by different languages using Google Trans.
Fig. 2 System architecture
Vaccination Reminder System
83
6 Evaluating Results The user must select the required vaccination type, so that it will be redirected to the form page where the user should give the details of their child or his/her details if selected elderly vaccination. After the registration, the details will be stored in the database, making it easy for the user to track their vaccination schedule. When the user provides his details, the system implicitly calculates the child’s age and displays the vaccines with detailed information in his age group. When the user has successfully logged in, they can check their vaccination schedule by selecting links provided at “select type,” and hence, the details of the vaccination schedule will be displayed as shown in Fig. 3. The user must provide his vaccination status, whether he completed the vaccination or not, by checking the box on the vaccination details page. So that record will be updated in the database. If the user completed their respective vaccination, they need to acknowledge status as “Yes” to be redirected to their upcoming vaccination page. He can visit the details of the upcoming vaccines (Fig. 4). If the user acknowledges that their vaccination is pending, then it will display their present vaccines to be done as shown in Fig. 5.
6.1 Alerts Email alerts are generated using an SMTP server. An email reminder will be sent to the registered mail id of the user (Fig. 6). This reminder includes the vaccines to be taken and suggests the user visit their profile to know the further details of vaccines that have to be taken at that particular age.
Fig. 3 Vaccination schedule
84
V. Gayatri et al.
Fig. 4 Upcoming vaccines
Fig. 5 Available vaccines
Fig. 6 Email alert
Apart from email, SMS with details of the vaccines to be taken will be sent to their respective phone number using Fast2SMS API (Fig. 7). Users can find the nearby hospitals’ details by clicking on the link provided at the bottom of the vaccines page. By clicking on the link, the user is asked to allow his/her location access. The system identifies the user’s location using geolocation
Vaccination Reminder System
85
Fig. 7 SMS alert
and provides nearby hospitals and their details, making it easy for the user to reach and get their vaccination (Fig. 8). If the user wants to know the status of the completed vaccinations, he can find it on the status page by clicking on the “status” button in the navigation bar. There he can notice the number of completed and incomplete vaccinations through a graphical representation, as shown in Fig. 9. The main feature of this reminder system is the provision of multiple languages. The user can access the application in their ideal language by selecting the name in the drop-down menu in the navigation bar “Select Language,” as shown in Fig. 10.
7 Conclusion and Future Scope To conclude, we came up with a multilingual reminder system. It makes it easier for the user to convey and understand the information. It guides the user to keep track of his vaccination details. This system imparts email and SMS alerts to the end users about their vaccination schedules. It also helps to perceive the availability of vaccines in the nearest hospitals. This system even gives the status of vaccination delineated on a graph. For future development, user can register their appointment through the website at the nearer hospital.
86
V. Gayatri et al.
Fig. 8 Displaying hospital details
Fig. 9 Graphical representation of status of infant vaccination
Further, for all the diseases, vaccines and their detailed description will be added along with the preliminary vaccines. This can be achieved by using a machine learning algorithm by taking the user’s health condition as an input to suggest which vaccines are safe and which are not to be intake. Users can find hospital locations through Google Maps around India. [16]
Vaccination Reminder System
87
Fig. 10 Selecting preferred language
References 1. Lehnert JD et al (2018) Development and pilot testing of a text message vaccine reminder system for use during an influenza pandemic. Human Vaccines Immunotherapeutics 14(7):1647–1653 2. McLean HQ et al (2017) Improving human papillomavirus vaccine use in an integrated health system: impact of a provider and staff intervention. J Adolescent Health 61(2):252–258 3. Wagner NM et al (2021) Addressing logistical barriers to childhood vaccination using an automated reminder system and online resource intervention: a randomized controlled trial. Vaccine 4. Dexter PR, Perkins S, Overhage JM, Maharry K, Kohler RB, McDonald CJ (2001) A computerized reminder system to increase the use of preventive care for hospitalized patients. N Engl J Med 345(13):965–970. https://doi.org/10.1056/NEJMsa010181 PMID: 11575289 5. Shastri S, Manostra V, Sharma A (2016) Child immunization coverage—a critical review. IOSR 6. Kempe A, Stockwell MS, Szilagyi P (2021) The contribution of reminder-recall to vaccine delivery efforts: a narrative review. Acad Pediatr 21(4):S17–S23 7. Hasan S et al (2021) e-vaccine: an immunization app. In: 2021, 2nd international conference on intelligent engineering and management (ICIEM). IEEE 8. Yusof Y, Almohamed A (2011) Children vaccination reminder via SMS alert, https://doi.org/ 10.1109/ICRIIS.2011.6125750. Conference Paper 9. Peck JL, Stanton M, George CCM, Reynolds ES (2014) Smartphone preventive health care: parental use of an immunization reminder system. J Pediatr Health Care 10. Sallow AB, Zeebaree SRM, Zebari R, Abdulrazzaq M (2020) Vaccine tracker/SMS reminder system: design and implementation, ISSN (Online): 2581–6187 11. Harshitha S, Umraz S, Jagadish SM, Sharuq Khan K (2017) E-Lasikevelapatti baby vaccination reminder application in Kannada, IISC Bangalore 12. Abahussina AA, Albarrak AI (2016) Vaccination adherence: review and proposed model, Elsevier 13. Munir MW, Omair SM, Zeeshan ul Haque M (2015) An android based application for determine a specialized hospital nearest to patient’s location. Int J Comput Appl 118(9):975–8887 14. Karkonasasi K, Cheah Y-N, Mousavi SA Intention to use SMS vaccination reminder and management system among health centers in Malaysia: the mediating effect of attitude. Universiti Sains Malaysia (USM), Penang
88
V. Gayatri et al.
15. Mekonnen ZA, Nurhussien F, Tilahun B, Elaye KA (2019)Development of automated textmessage reminder system to improve uptake of childhood vaccination. Online J Publ Health Inform. https://doi.org/10.5210/ojphi.v11i2.1024 16. Gatuha G, Jiang TK (2015) VACS, Improving vaccination of children through cellular network technology in developing countries. Interdiscipl J Inf Knowl Manage 10:37–46
A 60 GHz CMOS VCO Adapting Switchable High Q Inductors S. K. Hariesh, M. Reshma, O. K. C. Sivakumar Bapu, U. Vijay Gokul, and Karthigha Balamurugan
Abstract An LC CMOS voltage-controlled oscillator (VCO) working in V-band is presented in this paper. The objective of the paper is to provide CMOS VCO in 60 GHZ under 65 nm process. We have designed the VCO circuit using NMOS crosscoupled pair with source follower PMOS load and simulated in a standard CMOS 65 nm process. For achieving improved phase noise, switchable variable inductors are proposed as tuning elements in this work and included in VCO circuit. Depending on switch states, the inductance and its quality factor vary and so does the phase noise. We have observed the phase noise of VCOs using four different values of inductances. The other parameters observed are oscillation frequency, tuning range, figure of merit, and power consumption. Using the proposed technique, VCO achieves good phase noise of –115.6.2 dBc/Hz and figure of merit of –147.6 dBc/Hz, the best results observed among the designed switchable inductors. Also, wide tuning range, 33.5 GHz/dB, is observed that demonstrates the efficacy of the proposed work. Keywords Voltage-controlled oscillator · Switchable inductor · Phase noise · Figure of merit · Tuning range
1 Introduction The need for very high-data rate radio communications with increased bandwidth requirements has spurred the interest to use extreme high-frequency (EHF) bands. It is found that 45–70 GHz frequency band is a viable option for providing high-speed wireless connectivity for gigabit services in scientific, industrial, and commercial fields [1]. The growing demand for higher frequencies paved a way to number of researchers to explore low-cost RF solutions at millimeter waves (mm wave), especially at 60 GHz frequency for high-speed data communications and networks. Wireless HD, also known as UltraGig, is the first 60 GHz standard protocol designed to allow high-definition video streaming. Though 60 GHz transmission is affected by S. K. Hariesh · M. Reshma · O. K. C. Sivakumar Bapu · U. Vijay Gokul · K. Balamurugan (B) Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_7
89
90
S. K. Hariesh et al.
signal attenuation due to oxygen absorption, the advantages like frequency reuse and inherent interference make it suitable for short range Gbps communications. The chipset of CMOS receivers has some vital component, voltage-controlled oscillator (VCO), which is the frequency synthesizer block used for modulation and de-modulation process. VCOs serve as reference clock in PLL, used for data synchronization, jitter reduction, and in generation of carrier frequency signals for mixer. The VCO’s performance such as the phase noise and tuning range determines the overall performance of frequency synthesizer block. Challenges of VCO design at millimeter wave frequencies are low-quality factor (Q-factor) passives, increased parasitic and noise levels at mm wave, non-availability of device models, etc. These impose a design trade-off among VCOs parameters. The phase noise is the ratio of noise spectral power present in side bands in a unit bandwidth at an offset frequency (ω) from the center frequency, ω0 to the carrier signal power, and is an essential performance metric. Noise associated with sidebands leads to phase noise in the mixer following down conversion. Lesson [2] uses the formula to represent the phase noise of an LC tank VCO. The phase noise of an LC tank VCO is modeled by Lesson [2] as given by 2 4F K T R p ω0 L(ω) = 10log 2Qω V0 2
(1)
where K is Boltzmann constant, F is the parameter, T is basically the absolute temperature, V 0 is the amplitude of output oscillation, ω0 is the oscillation frequency, and ω is the offset frequency. From Eq. (1), it is found that the higher the Q-factor of LC tank, then lower the level of phase noise. The LC Q-factor for a series tuned and parallel tuned circuits is given in Eq. (2) and Eq. (3), respectively. 1 Q= R
Q=R
L C
(2)
C L
(3)
In literature [2], a digitally controlled inductor (DCI) is used to tune the frequency of mm wave VCOs. The DCI is proposed with wide tuning range with 14% frequency tuning range and phase noise between −93.7 and −87.5 dBc/Hz at 1 MHz frequency. In [3], a linear CMOS LC VCO using triple-coupled inductors is designed which has been implemented in a 40-GHz integer-N phase-locked loop (PLL) integrated in a standard 90-nm CMOS process. In paper [4], a 45 GHz mm wave VCO adapting digitally switchable metal-oxide-metal (MOM) capacitors as the tuning element is described. In [5], a CMOS VCO operating in the V-band with an admittancetransforming cross-coupled pair has been proposed and studied to understand more about admittance transforming. A 63 GHz VCO prototype is implemented on 0.18 m
A 60 GHz CMOS VCO Adapting Switchable High Q Inductors
91
CMOS. The circuit topology suitable for millimeter wave CMOS VCO is presented. In [6], compact, low-loss, dual-path phase-switched inductors suitable for mm wave sub-blocks of transceiver have been proposed. The design of an active inductor used for tunable voltage-controlled oscillator has also been presented [7]. In a nutshell, high Q-factor LC-based oscillator has been gathering much of researcher’s attention because of its direct impact on phase noise and its easiness of tuning. In [8], the proposed work demonstrates a current reuse voltage-controlled oscillator (VCO) based on CMOS that uses a ferrite integrated inductor on a three-layered substrate to reduce phase noise and enhance the VCO’s FOM. The research focuses on the magnetic characteristics of the core, spiral dimensions, and substrate parameters, all of which influence the Q-factor. In [9], we study the effect of substrate on the inductor selection criterion for the VCO circuit and resolve the long-lasting argument among circuit designers about whether large or small inductors should be used to reduce the phase noise of VCOs. n we used the enhanced model to study the selection criterion of on-chip inductors for the optimal LC VCO design. In [10], the design of CMOS oscillator using a substrate integrated waveguides-based LC tank circuit is proposed. The designed SIW-based models-based oscillator has lower-power consumption, increased tuning range with better phase noise. In [11], a direct-coupled open ended spiral resonator (OESR) on a microstrip line is presented in this work for band stop filter applications. The parametric study resulted in the filter design equations, which forecast the resonance with an error of less than 3%. The objective of this paper is to design a 60 GHz CMOS VCO employing switchable inductors as tuning elements that effectively operates in compliance with an acceptable Q-factor. The main component proposed for frequency tuning is the switchable inductors which is designed to have resistance strips for selecting various inductances. This greatly increases the Q-factor. The paper is organized into five sections. Section 2 describes methodology of LC tank for 60 GHz operation and presents the model for switchable inductors. Section 3 describes LC tank-based oscillator design and its analysis. In part 4, the results and comments are provided, and in Sect. 5, the study is concluded.
2 Methodology 2.1 VCO Design at mm Wave Flowchart (Fig. 1): Flowchart Explanation: The flowchart given in Fig. 1 explains the steps involved in VCO design with LC tank employing switchable inductors. Design steps illustrated are based on oscillator study, specifications, configurations, and tuning methods. To meet specification, different topologies, tunable LC tank, supply voltage, and determination of
92
S. K. Hariesh et al.
Fig. 1 Flowchart of VCO design
transconductance gain have been iterated [8]. In order to be compatible to mm circuit operations, the proposed work uses 65 nm CMOS technology for designing VCO.
2.2 Model of a Switchable Inductors A loaded transformer as shown in Fig. 2, composed of a primary coil, L 1, and a switchable two secondary coils, L 2 , is used for designing LC tank. The equivalent inductance L eff and the series resistance Reff seen at the primary coil can be derived [2, 6] as follows: Fig. 2 Loaded transformer
A 60 GHz CMOS VCO Adapting Switchable High Q Inductors
L eff = L 1 +
ω2 k 2 L 1 L 2 (C2 R 2 2 − ω2 C 2 2 L 2 R 2 2 − L 2 )
Reff =
R 2 2 (1 − ω2 C2 L 2 )2 + ω2 L 2 2 R2 ω 2 k 2 L 1 L 2 R 2 2 (1 − ω2 C2 L 2 )2 + ω2 L 2 2
93
(4)
(5)
The quality factor (Q eff ) can be derived as follows: Q eff
L eff =ω = Reff
1 ωL 2 −1 2 k R2
(6)
By assuming that the secondary coil’s resonance frequency is well above the operating frequency and that the efficiency factor of L 2 (Q2 ) is sufficiently greater than 1, then the expressions in Eqs. (4) and (5), ω2 L 2 2 >> R2 2 (1 − ω2 C 2 L 2 )2 reduce resulting L eff and Reff as follows: L eff = (1 − k 2 )L 1 Reff =
R2 K 2 L 1 L2
(7)
(8)
A switchable inductor formed by two secondary coils acts as tuning elements for setting the desired operating frequency. Overall, the inductor is situated on an oxide layer with thickness of 8 μm on a 10 cm silicon wafer with thickness of 200 μm. As per Fig. 2, the primary coil has a thickness of 6 μm with the inner radius starting at 4 μm. The secondary coils have a thickness of 2 μm with a gap of 2 μm between them as shown in Fig. 3. The two secondary coils are presented in pale orange color Fig. 3 2D view of loaded transformer type inductor coils
94
S. K. Hariesh et al.
Fig. 4 3D view
Table 1 Switch states versus inductance, Q-factor, and resistance
Switch state (1-ON, 0-OFF)
Inductance at 60 GHz (nH)
Q value
Resistance ()
10
24.33
13.45
10, 20 k
11
24.90
14.05
10, 10
01
25.17
17.2
20 k, 10
00
25.92
26.2
20 k, 20 k
strips while pale green denotes the primary. The 3D view of the same is presented in Fig. 4. Table 1 shows the switch states, say SW1 and SW2 having 10, 11, 01, and 00 values that specify variable inductances, quality factor, and equivalent resistances. The secondary coils have a resistor strip that models a switch for selecting inductances with different Q-factor. The resistance value of 10 is assigned for ON state while 20 K is assigned for OFF state. The two switch models connected to secondary coils give a 2-bit response word as mentioned in Table 1. When in ON condition, the circuit is closed and the switch is ideally considered to possess a small value of ON resistance, RON assumed as 10 in the simulation [2]. On the other hand, when the switch is in OFF condition, the circuit is open and the switch acts like a resistor with infinite resistance, ROFF which assumed as 20 K in simulation. Together, these two affect the values of the secondary switchable inductors. The 3D electromagnetic simulation is done by using two lumped ports (1 and 2) connected from primary and secondary coils to their respective grounds. Ideally, the ONON condition will have two 10 resistor strips, while ON–OFF and OFF–ON will have one 10 and one 20 K resistor strip and OFF–-OFF state will have two 20 K resistor strips [2, 7, 9]. From Table 1, a variable inductance of nearly 1.6 nH is achieved when switching from 10 to 00 states, and similarly, the respective Q-factor uplifts from 13.45 to 26.2. Thus, the high Q-factor achieved in the designed switchable inductors is due to resistance strips used for selection rather than active elements that suffer from glitches and frightful ON resistance at mm wave frequencies.
A 60 GHz CMOS VCO Adapting Switchable High Q Inductors
95
The following equations are used for the plotting of Q, L S , RS , C P, and RP for all the four states: Zseries = -2/(Y(1, 2) + Y(2, 1)) Zp1 = 1/(Y(1, 1) + Y(1, 2)) Zp2 = 1/(Y(2, 2) + Y(2, 1)) Q = abs(im(Y(1, 1))/re(Y(1, 1))) Ls = im(Zseries)/(2 * pi * freq) * 1e12 Rs = re(Zseries) Rs = re(Zseries) Cp = -1/(im(Zp1) * 2 * pi * freq) * 1e15 Rp = re(Zp).
3 LC Tank Oscillator Design Figure 5 shows negative gm oscillator circuit consisting of cross-coupled NMOS pair, M 3 and M 4 , connected to PMOS source follower loads formed by M 1 and M 2 . The cross-coupled pair is connected with LC tank consisting of L-block, C 2, and C 3 . Transistor sizing of cross-coupled pair is done to get optimal width for obtaining better VCO transconductance. This creates necessary negative resistance to match with parallel resistance, RP of tank in order for oscillations to build. The transconductance of cross-coupled pair is given by: gm ≥ 2/Rp
(9)
The ‘Rp ’ value is obtained from the Z-parameter of LC tank. According to it, the widths of cross-coupled NMOS pair are adjusted to obtain the necessary gm value. The ‘L’ block in Fig. 5 is employed by custom designed switchable inductors that provides various inductance values with improved Q-factor as mentioned in Table 1. They are imported properly from 3D electromagnetic simulation environment to SPICE model in circuit. Switchable inductors and the equivalent capacitance C formed from C2 and C3 serve as tuning element that helps in deciding the desired oscillation frequency. Fr = √
1 2 ∗ π ∗ LC
(10)
96
S. K. Hariesh et al.
Fig. 5 VCO design employing switchable inductors
L is switchable inductance, C is the equivalent capacitance formed by C2 and C3 voltage control, Vctrl is applied at junction of C2 and C3 that change the oscillation frequency. For comparison purpose, VCO is designed using available analog library inductors. The respective Q-factor and figure of merits (FOMs) have been noticed separately for the designed VCOs employing custom-designed inductors and analog library inductors. The phase noise of the designed circuits is plotted by using the envelope and harmonic balance option in ADS. Equations (11) and (12) show the respective formula for FOM without and with inclusion of tuning range [2, 8, 10].
Pdc + 10log10 [2] FOM = P N − 20log10 1mW f 0∗F T R% Pdc FOM(T ) = P N − 20log10 + 10log10 [2] f ∗ 10 1mW f0 f
(11) (12)
A 60 GHz CMOS VCO Adapting Switchable High Q Inductors
97
4 Results and Discussion The Z-parameter of LC tank is measured and shown in Fig. 6 which is used to determine the value of Rp. The observed value is found to be 34.981 . Based on Eq. (9), gm is tuned to be greater than 30 mA/V. Rp = 34.981 . gm >= 57.174 mA/V. Accordingly, the aspect ratio of cross-coupled pair is chosen as given below: Length of the MOSFET = 60 nm. Width of the MOSFET = 30 μm. Figure 7 shows the transient analysis done for the designed LC CMOS VCO including switchable inductors. It is observed that oscillating frequency is close to 60 GHz which is the desired objective. From this, it is evident that the switchable inductors presented in Sect. 2 serve as good tuning elements and work in compatible with VCO circuit. Time period = 3.067 − 3.050 = 0.017 ns. Operating frequency = 1/0.017n = 58.9 GHz. Figures 8, 9, 10, and 11 show the phase noise of VCOs using four different cases of switchable inductance selected based on switch states say 00, 01, 10, and 11. The same is given in Table 2 which shows that the best phase noise is achieved
Fig. 6 Z-parameter of LC tank
98
S. K. Hariesh et al.
Fig. 7 Transient analysis
Fig. 8 Phase noise of VCO in state 01
for 11 and 10 switch state inductances, say −115.6 dBc/Hz and −112.8 dBc/Hz, respectively. FOMs have been calculated for all VCOs without and with consideration of tuning range and presented in Table 2. Best and worst FOMs are observed from frequency range, 60–80 GHz, and entered in Table 2. The crude power (PDC ) of the designed circuit including ports is noticed as 31.8 mW for the bias current of 2 mA
A 60 GHz CMOS VCO Adapting Switchable High Q Inductors
Fig. 9 Phase noise of VCO in state 10
Fig. 10 Phase noise of VCO in state 11
99
100
S. K. Hariesh et al.
Fig. 11 Phase noise of VCO in state 00
and VDD of 1.2 V. Figure 12 represents the tuning range for the designed VCO using switchable inductance corresponding to state 11 since it is considered as the best case that gives improved phase noise in our work. From this plot, the tuning range f/ V is observed to be 33.5 GHz/dB (for 1 dB of Vctrl) while it is around 26 GHz/dB for VCOs using analog library inductors. That is, nearly 7.5 GHz/dB has been improved in VCO designs employing switchable inductors. This is achieved by varying Vctrl from 0.8 to 1.2 V. It is evident that the wide tuning range has been resulted from high Q-factor of switchable inductors, and hence, they can be applied to filter implementations [11]. Table 3 shows the comparison of results of the proposed work with available literature reports that have been implemented using 65 nm CMOS process. Better phase noise, −115.6 dBc/Hz, is achieved by the proposed work than ref [2]. Similarly, tuning range is also observed to be higher. The outputs are observed in wide range of frequencies, nearly 20.1 GHz that makes it a real candidate to compete with paper.
5 Conclusion In this work, switchable inductors using loaded transformer structure are designed to serve as tuning elements. The proposed work achieves variable inductance by using resistance strip that acts as switch. This greatly alleviates frequency-dependent
A 60 GHz CMOS VCO Adapting Switchable High Q Inductors
101
Table 2 Comparison of results of VCOs with switchable inductors with analog library inductors Process
CMOS 65 nm (Analog L)
CMOS 65 nm (00)
CMOS 65 nm (01)
CMOS 65 nm (10)
CMOS 65 nm (11)
Frequency range
60.1–80.2
60.1–80.2
60.1–80.2
60.1–80.2
60.1–80.2
Frequency tuning range (%)
26
33.5
33.5
33.5
33.5
Best phase noise (dBc/Hz)
−153
−110.8
−111
−112.8
−115.6
Worst phase noise (dBc/Hz)
−144
−98.5
−100
−101.8
−105.8
PDC (mW)
33
31.8
31.8
31.8
31.8
Best FOM (dBc/Hz)
−185.2
−142.8
−143.4
−145.2
−147.6
Worst FOM (dBc/Hz)
−176.2
−130.64
−132.4
−134.2
137.8
Best FOM −177.2 (T) (dBc/Hz)
−134.98
−135.2
−137
−139.8
Worst FOM 168.2 (T) (dBc/Hz)
−122.7
−124.2
−126
130
passives. The range of inductance varied is 1.6 nH with uplifted Q-factor of 12.75. The phase noise and tuning range of VCO have been improved, that is, −115.6 dBc/Hz and 33.5 GHz/dB, respectively. Results demonstrate that the proposed work forms a competent component in the design of 60 GHz receivers. The practical limitations were during the integration between the HFSS tool files with the VCO in ADS. The high potential of the broad bandwidth in the 60 GHz unlicensed band, enabling multi-gigabit data transmission, has made 60 GHz Wi-Fi based on the IEEE s802.11ad standard attractive in recent years. However, commercialization of 60 GHz Wi-Fi has yet to take off, owing to the high-range limitations as well as a lack of diverse applications. Why is the 60 GHz band getting so much attention when there are so many other WLAN activities in the famous 2.4 and 5 GHz bands? First and foremost, in the USA, the 60 GHz commercial, science, and medical (ISM) band allows unlicensed access across 14 GHz of spectrum, from 57 to 71 GHz. In 2016, the band from 64 to 71 GHz was added. Since the frequency is so high, propagation models are more difficult to develop, but modern technology is capable of providing high throughput for a wide range of applications.
102
S. K. Hariesh et al.
Fig. 12 Vctrl versus frequency Table 3 Comparison of our results with other literature reports Process
CMOS 65 nm [2]
CMOS 65 nm [3]
CMOS 65 nm [4]
This work (11-state)
Frequency range
57.5–90
51.9–65.5
57–65.5
60.1–80.2
Frequency tuning range (%)
41.1
25.8
14.2
33.5
Best phase noise (dBc/Hz)
−118.8
−106
−110.8
−115.6
Worst phase noise (dBc/Hz)
−104.6
−80
−80
−105.8
PDC (mW)
10.8
5.4
6
31.8
Best FOM (dBc/Hz)
−180
−175
−178
−147.6
Worst FOM (dBc/Hz)
−170
−157
−167
−137.8
Best FOM (T) (dBc/Hz)
−192
−183
−181
−139.8
Best FOM (T) (dBc/Hz)
−184
−166
−170.8
−130
A 60 GHz CMOS VCO Adapting Switchable High Q Inductors
103
References 1. Raj C, Suganthi S (2016) Survey on microwave frequency V band: characteristics and challenges. In: 2016 international conference on wireless communications, signal processing and networking (WiSPNET). IEEE 2. Jin JY, Wu L, Xue Q (2018) A V-band CMOS VCO with digitally-controlled inductor for frequency tuning. IEEE Trans Circuits Syst II Express Briefs 65(8):979–983. https://doi.org/ 10.1109/TCSII.2018.2795577 3. Chen Z et al (2017) Linear CMOS $LC$-VCO based on triple-coupled inductors and its application to 40-GHz phase-locked loop. IEEE Trans Microw Theory Tech 65(8):2977–2989. https://doi.org/10.1109/TMTT.2017.2663401 4. Huang G, Kim S, Gao Z, Kim S, Fusco V, Kim B (2011) A 45 GHz CMOS VCO adopting digitally switchable metal-oxide-metal capacitors. IEEE Microwave Wirel Compon Lett 21(5):270–272. https://doi.org/10.1109/LMWC.2011.2124449 5. Hsieh H, Lu L (2009) A V-band CMOS VCO with an admittance-transforming cross-coupled pair. IEEE J Solid-State Circuits 44(6):1689–1696. Available: https://doi.org/10.1109/jssc. 2009.2020203 6. Baylon J, Agarwal P, Renaud L, Ali SN, Heo D (2019) A Ka-band dual-band digitally controlled oscillator with −195.1-dBc/Hz FoM ${_T}$ based on a compact high-$Q$ dual-path phaseswitched inductor. IEEE Trans Microw Theory Tech 67(7):2748–2758. https://doi.org/10.1109/ TMTT.2019.2917671 7. Rajeshwari S, Vaithianathan V (2017) Design of active inductor based tunable voltage controlled oscillator. In: 2017 international conference on communication and signal processing (ICCSP), Chennai, pp 0879–0883. https://doi.org/10.1109/ICCSP.2017.8286495 8. Sarika MR, Balamurugan K (2018) High performance CMOS based LC-VCO design using high Q-factor, field shield layered substrate inductor. Int J Pure Appl Math 119(12):13759–13769 9. Yong Zhan R, Harjani, Sapatnekar SS (2004) On the selection of on-chip inductors for the optimal VCO design. In: Proceedings of the IEEE 2004 custom integrated circuits conference (IEEE Cat. No. 04CH37571), Orlando, FL, pp 277–280. https://doi.org/10.1109/CICC.2004. 1358797 10. Rao PS, Balamurugan K (2019) High performance oscillator design using high-q substrate integrated waveguide (SIW) resonator,. In: 6th international conference on signal processing and integrated networks (SPIN) 11. Vasudev G, Abhijith S, Akshay M, Menon SK (2020) Direct coupled spiral resonator for band stop filter applications. In: 2020 7th international conference on signal processing and integrated networks, SPIN 2020, pp 520–523
A Comprehensive Survey on Predicting Dyslexia and ADHD Using Machine Learning Approaches Pavan Kumar Varma Kothapalli, V. Rathikarani, and Gopala Krishna Murthy Nookala
Abstract Attention deficit hyperactivity disorder (ADHD) and dyslexia are neurological disorders characterized by vague comprehension and generally refer to poor reading and writing ability. It influences some specific populations, i.e. school-aged children, specifically male children. Therefore, it leads to risks and consequences like low self-esteem and poor academic performance for the entire lifetime. The longterm need of the researchers is to model appropriate ADHD and dyslexia prediction approaches to help the affected children community. Based on this, various machine learning (ML) approaches are implemented using multiple online available datasets with attaining better prediction performance and classification accuracy. Moreover, acquiring the clinical acceptability with these existing approaches provides specific research challenges like dataset privacy, appropriate classifier model, good optimizer and feature selection, hyperparameter selection and over-fitting problems. This work provides an extensive review of various existing ML approaches, image processing approaches, analysing multiple performance metrics, research gaps, and research challenges. All these processes attempt to perform critical analysis towards the provided ML which comes for predicting ADHD and dyslexia and makes an appropriate way to the users of ML approaches to facilitate the model performance is in an acceptable range. Therefore, researchers can envisage higher prediction and classification performance with the available clinical relevances using ML which comes for ADHD and dyslexia prediction by addressing the potential research challenges. Keywords ADHD · Dyslexia · Machine learning · Prediction · Research challenges
P. K. V. Kothapalli (B) · V. Rathikarani Department of Computer Science and Engineering, FEAT, Annamalai University, Chidambaram, India e-mail: [email protected] G. K. M. Nookala SRKR Engineering College, Bhimavaram, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_8
105
106
P. K. V. Kothapalli et al.
1 Introduction Attention deficit hyperactivity disorder (ADHD) and dyslexia are brain development disorders with higher comorbidity rates (30–40%). Generally, dyslexia is also termed as DSM-5 specific learning ability. Based on the theoretical analysis, the transfer of single disease deficiency to multiple disease deficiency models leads to neuropsychology development [1]. The multiplicative deficiency model specifies the probabilistic predictors with various stages of analysis. The rise in the comorbidity rate is due to the risk factors related to the disorder [2]. The multiplicative deficit model is essential for improving comorbidity science. It is done by merging the neural, genetic and cognitive levels to analyse comorbidity. Some specific proofs of related research for shared neuropsychological and genetic risk factors are connected is referred in Reference [3]. Some variations identified at the neural level eliminate the specification of integrating the ADHD dyslexia comorbidity model that spans various levels of analysis. At the initial level of genetic analysis, there is enormous evidence to establish the correlated model liability with comorbidity rate among ADHD and dyslexia, which recommends that shared genetic influence leads to disease manifestation among the children [4]. The provided correlated liability model’s supporting evidence is attained from twin babies’ multiplicative behavioural and genetic behavioural studies. It is established to study the genetic influence of overlapping disorders, which is considered a secondary disorder. The preliminary way to deal with the extension of gene overlap is performed with statistics termed as a genetic correlation that ranges from 0 to 1, i.e. the genetic influence of trait which is not related to secondary feature to the overall genetic effect of quality associated with the second characteristic. The specific way to understand the genetic correlation among these traits shows the probability of gene related to one marker related to another feature. Plonski et al. [5] evaluate the genetic correlation among ADHD and dyslexia with a more substantial range of 0.50 and extend to 0.70 for specific studies. The neuropsychological analysis proves that the risk factors most specifically influence the processing speed of deficits and factors of executing certain functionality like sustained attention, inhibition and working memory. It shows the broader gap in neuropsychological research. These gaps are addressed with the extensive analysis of ADHD and dyslexia comorbidity progression at the neuropsychological and genetic levels [6]. For instance, only a limited number of neuroimaging structural studies directly evaluate the comorbid ADHD and dyslexia group. There are substantial neuroimaging groups, either separate or pure groups with and without comorbidity status, i.e. different ADHD, separate dyslexia and integration of both. While both of these groups are essential for specific research questions, they directly resolve the cause of the disorder and predict it earlier. However, the design needs to determine what differentiates one disease from another in predicting the transdiagnostic regions with the same features [7]. This transdiagnostic method is rare in neuroimaging samples developmental till data; however, there is a distinguished meta-analytic study over adult psychiatric
A Comprehensive Survey on Predicting Dyslexia and ADHD …
107
neuroimaging and provides a guiding framework. Alsobhi et al. [8] examined structural neuroimaging studies of clinical controls vs disorders. The clinical disorders range is covered in an extensive range, i.e. anxiety disorders, substance use disorders, bipolar disorder, schizophrenia, obsessive–compulsive disorders and major depressive disorder. Mohammed et al. [9] analysed various existing voxel-based morphometry analyses of every condition and performed conjunction analysis to predict common regions among multiple diseases. Outcomes pointed to the bilateral insula and cingulate cortex as a region with the minor grey matter over clinical disorders are compared to other controls. These regions are connected with executive dysfunction, which is constant with cognitive analysis for reporting that executive dysfunction is generally cross-cutting cognitive phenotype over neurodevelopmental and psychiatric disorders, including ADHD and dyslexia [10]. Typically, these observations represent the efficiency of transdiagnostic sample correlation during comorbidity studies. With the broader analysis on ADHD and dyslexia, the author primarily performed a meta-data analysis that tested the brain regions related to ADHD and dyslexia; however, it concentrates on the cerebellum. It is known as the meta-analysis on cerebellar VBM investigations over ADHD and dyslexia. There is no overlap among the cerebellar clusters related to ADHD and dyslexia; however, there is a potential functional overlap among the ventral attention system as the cluster predicted in the cerebellum for both these disorders is executed in the attentional network model [11]. It is essential to speculate neural systems connected with other conditions to sparsely analyse the neural correlation between ADHD and dyslexia. For dyslexia, the neural correlation includes observation of network from left temporoparietal regions, left occipitotemporal regions and inferior frontal gyrus. In contrast, the case of ADHD consists of the striatum and prefrontal cortex. However, there is no overlap among the canonical regions related to both these disorders; it seems probable that some common overlapping areas receive lesser attention as they are not the evaluation part of ecclesiastical regions [12]. As an outcome, a quantitative analysis is required systematically for some common neural correlations. More specifically, various sorts of grey matter volumes are predicted through voxel-based morphometry approaches. These approaches are extensively used for automated systems for the examination of structural brain images [13]. While considering the variations among the structural/functional activations and functional connectivity of ADHD and dyslexia, this work concentrates on grey matter correlation as various existing studies with VBM are feasible in both ADHD and dyslexia (N = 22 for ADHD studies and N = 15 for dyslexia studies). The analysis facilitates the inclusion of various life spans to examine sample size while investigating heterogeneity overages. Significantly, some analytical strategies are modelled to predict transdiagnostic grey matter related to prevailing neuroimaging designs that differentiate the disorders. The overall objective of this analysis is to predict the overlap among the brain regions related to ADHD and the dyslexia disease prediction model [14]. The overlap in the advanced studies helps understand the comorbidity of ADHD and dyslexia at the neural stages. It examines the crucial gap among both neuropsychological and etiological analyses with better understanding.
108
P. K. V. Kothapalli et al.
2 Reviews on Learning Approaches to Predict Dyslexia This section provides a detailed analysis of the prediction of dyslexia using various machine learning (ML) approaches. Dai et al. [15] modelled an approach to categorize the individuals either as dyslexia or regular person using EEG recordings. Here, support vector machine (SVM) is utilized as the feature analysis and extraction process, and the ensemble of SVM is considered for decision-making and classification purposes. Some essential features like spectral flatness, positive area and maximal peak amplitude are extracted and fed to the classifier model. The provided algorithm does not concentrate on the type of dyslexia such as dysgraphia and dyscalculia. However, the model fails to record the prediction accuracy. Ebrahimi et al. [16] perform an extensive analysis with the dyslexia prediction process using EEG, where the electrodes are considered to observe the electrical brain activities. This model also adopts SVM for feature extraction and classification purposes. The advantages and disadvantages of these methods are predicted, and the recommendation of adopting the optimization approach is given for attaining superior prediction accuracy. Flag et al. [17] provide more accessible ways to validate the students’ performance during the writing and reading process. Generally, the understanding of school children’s writing and reading skills is evaluated with the R tool. It adopts a rulebased decision-making process, and the data are tested for professional analysis. The outcomes of the anticipated model are compared with various prevailing approaches like random forest and decision tree. The works depict that the performance of the expected model is provided primarily to show the significance compared to standard models. It is known from the analysis that the responses to intervention using ML approaches are superior to other systems. Galuschka et al. [18] modelled a statistical process to distinguish dyslexia readers from other readers based on eye movement. The movement of the eye is analysed with the eye tracker. Here, the SVMbased binary classifier model is used for model construction. It is observed that the model shows better prediction with tinier saccades, loner fixations and regression. Some other classifier models like perceptron learning and NN are used for prediction purposes. Gori et al. [19] use various screening-based approaches for earlier prediction of dyslexia to assist children. Here, fuzzy concept is used for categorizing unordered rule-based induction by evaluating the experimental outcomes of teachers and parents. The significant disadvantages of the anticipated model are the higher execution time to classify the provided dataset. Author et al. [20] predict a computational analysis model with the assistance of dyslexia metrics. Here, Gibson test over brain skills is done with some essential pre-processing steps and analysis with the k-means algorithm, fuzzy model and ANN approaches. The outcomes of these models are used for categorizing the individuals as usual and dyslexia. The reasonable prediction accuracy is 97% based on the provided classifier model. Jorgensen et al. [21] presented systematic reviews on dyslexia prediction using ANN and examined the preliminary cause of dyslexia in an appropriate manner using ANN. The outcomes attained with the test data are given reasonably and play a substantial role in categorizing non-dyslexics and dyslexics. Koen et al. [22] provide
A Comprehensive Survey on Predicting Dyslexia and ADHD …
109
a practical classification approach for predicting abnormal children from normal ones. The improved K-NN model is adopted for classification purposes and gives superior prediction accuracy. Layes et al. [23] adopt a perceptron model using ANN to predict reading disabilities by performing academic evaluation via remediation teachers. The provided model comprises 11 units of the input layer connected with the output layer. The anticipated model is easier for evaluation, and prediction accuracy is higher. These techniques give lesser computational complexity and execution time. However, the outcome is higher than the expected results over the prediction process with metrics like specificity, accuracy and sensitivity. Table 1 depicts the comparison of various ML approaches for dyslexia prediction.
3 Reviews on Image Processing Approaches to Predict Dyslexia This section provides a detailed analysis of the prediction of dyslexia using various image processing approaches. Lofti et al. [24] examined the abnormal activities of brain anatomies using multi-variate classification approaches. Here, the grey matter is analysed, and the MRI brain images collected from three diverse countries are fed as input for pre-processing, feature extraction, and classification approaches. Then, tenfold CV is performed for factor correction, feature selection and classification. The outcomes show the curvature pattern with added folds in the left brain hemispherical region with better prediction accuracy. Deepak et al. [25] anticipate various methods for differentiating brain control regions and dyslexia. Some quantitative analysis is done with cerebral white matter from the provided 3D MR images and uses it for classification purposes. Here, the individual sample is used for classification purposes and classified as dyslexia or non-dyslexia by quantifying the extracted gyrification volume. The quantitative outcomes of this model are effective and compared with the general level-based segmentation approach. The process is further extended to examine the brain structure and helps in predicting the dyslexic variation over time. Then, grey matter is further analysed. Morris et al. [26] examined voxel-based morphometry with white or grey matter. The variations with the local GM and WM volume need to be explored. The prediction and classification are made with the available brain image datasets. Furthermore, the provided images are smoothed and segmented with the Gaussian kernel. It is noted that the cognitive nature of dyslexia shows higher significance towards the anatomical variations. Figure 1 specifies the research dimensionalities for predicting dyslexia. Wang et al. [27] examined the cerebral functionality and activation with phonological processing. The images considered for analysis are MRI suing software process management tools. The images are concerned with statistical approaches, and the children with the disease are sensed to have an abnormality with higher activation during orthographical activities. The outcomes demonstrate the readings with higher bilateral activation. The author recommends functional connectivity among
110
P. K. V. Kothapalli et al.
Table 1 Comparison of various ML approaches for dyslexia prediction References Objectives
Pros
Cons
Improvements
[9]
Adopts SVM for SVM distinguishing the normal from the abnormal dyslexia readers
Methods
Easier prediction process
It does not concentrate on other subtypes of dyslexia
Features connected with the subset of the disease are also extracted for accurate prediction
[9]
The EEG-based SVM disease prediction process
Shows the benefits of using EEG for prediction purpose
No optimizer is used for enhancing the global solution
Here, mirror prox algorithm is used for prediction purpose
[10]
ML approaches RF and DT are adopted for approaches are evaluating writing used and reading difficulties
Prediction accuracy is higher than the existing rule-based model
Accuracy needs to be improved using the intervention model
The rule-based decision model is considered for prediction purpose
[9]
The eye-tracking method is adopted for the prediction
SVM
The model attains 80% prediction accuracy
Accuracy The feature needs to be extraction improved more process needs to be given more attention
[10]
Here, prediction is made with low-quality data
Fuzzy-based unordered rules are considered for prediction purpose
Teachers and parents are involved in the analysis to enhance prediction accuracy
Huge time is needed for analysing the provided dataset
Adopts distribution approach and needs to improve the execution time
[13]
Computational analysis is done for prediction purpose
k-means, ANN and fuzzy concepts are used for prediction
Prediction accuracy is 96% with all the three classifier models
EEG signals need to be considered in an efficient manner
EEG signal recording needs to be considered during the prediction process
[13]
ANN is applied for prediction purpose
ANN
Adopts screening tools
Needs to enhance the prediction accuracy
Accuracy needs to be higher (continued)
A Comprehensive Survey on Predicting Dyslexia and ADHD …
111
Table 1 (continued) References Objectives
Pros
Cons
Improvements
[14]
EEG is adopted K-NN for discriminating the normal children from the dyslexia person
Methods
Lesser classification error
Data standardization needs to be concentrated
Other clustering approaches need to be analysed
[15]
Performs MLP-based computational ANN model diagnosis with learning disability children
Lesser Evaluates the computational performance of complexity teachers with ensemble ML approaches
[18]
Neuro-anatomical Multi-parametric A typical model for model curvature predicting the model is development of adopted for the disease hemisphere brain region prediction
Other parts of the brain are not involved for prediction purpose
Need to adopt standard ensemble model Grey and white matter need to be considered for classification purpose
Image processing approaches to predict dyslexia
ML approaches to predict dyslexia
Dyslexia Modelling assistive tool to help the prediction of dyslexia
Perform test to predict dyslexia with a decisionmaking model
Evaluation tool for predicting writing and reading disabilities
Fig. 1 Research dimensionality to indicate dyslexia
112
P. K. V. Kothapalli et al.
Table 2 Comparison of various image processing approaches for dyslexia prediction References
Objectives
Pros
Cons
Improvements
[23]
Adopts CAD Quantitative system for analysis earlier disease prediction
Methods
Superior to existing level-based approaches
Grey matter is not given with higher significance
Understands the variations in dyslexia over a specific time
[22]
Neuro-image classification is performed to predict whether the image processing approaches to control the disease
Classification is done with Gaussian kernel
Dyslexia and its subset are examined
Some dyslexia types are not predicted during classification
Some features are extracted for superior classification
[6]
Examination of WM is connected with the modelling of ML approaches
SVM
Prediction accuracy is 84%
WM is not analysed
WM and GM are examined
[31]
Evaluating ANOVA test dyslexia among literacy people
Predicting delay
A lesser number of samples are considered
Needs to validate classification accuracy
[31]
Neurological approaches are used for predicting dyslexia
Variations inactivation are predicted
Features are predicted
Ensemble-based ML approaches need to be adopted
Statistical analysis
the cerebellum and cerebral regions. Toste et al. [28] adopt MRI with diffusion tensor imaging to attain the features of WM. The analysis is done with 29 schoolchildren and 33 matching attributes. Here, SVM is adopted for differentiating dyslexia children from non-dyslexia children to acquire 84% prediction accuracy. Szucs et al. [29] assume cortical brain image connectivity of school children, and samples are from three diverse literacy groups. The anticipated dynamic model is adopted to evaluate the connectivity measures. Here, an ANOVA test is performed to show the significance of the model. The outcomes show some delay among the emergent and pre-emergent literacy phases. Ramsay et al. [30] examined the correlation between the brain MR images and EEG signal for discriminating non-dyslexia and dyslexia. The provided input signal is needed to substitute with PET and MR images to identify the non-invasive and less expensive outcome. The brain activities of both dyslexia
A Comprehensive Survey on Predicting Dyslexia and ADHD …
113
and non-dyslexia are considered for the analyses. Therefore, variations in brain activities are predicted among dyslexia and non-dyslexia. Table 2 depicts the comparison of diverse image processing approaches for dyslexia prediction.
4 Reviews on Clinical Features of ADHD This section provides an extensive analysis of the clinical features of ADHD. The ratings are attained from the teacher and parent of ADHD symptoms outcomes in the heritability evaluation. Ramsay et al. [30] adopt the prediction policies of the most relevant features using various approaches for predicting ADHD. Different conventional methods like RF and SVM are used for the classification of ADHD. SVM is extensively evaluated during ADHD classification purposes using MRI data. The functional brain graph is designed by considering Pearson’s correlation among time series of every region. The centrality graph measures are determined as the essential features where SVM is applied for classification purposes and attained 77% prediction accuracy. The statistical evaluation is performed based on [31]. Gropper et al. [32] analysed structural features of MRI using LBP, which is an innovative approach over computer vision. It is evaluated in three steps: select pixel from the neighbourhood pixels, measure neighbourhood threshold using the appropriate pixel value, and pixel values are considered the summation of the binary number. This model attains a prediction accuracy of 70%. In et al. [33] use SVM for MRI-based feature extraction. It includes global graph-theoretical metrics, pairwise Pearson’s classification, global/nodal graph measures and global graph-theoretical metrics. In contrast, the morphological information from the structural MRI includes GM volume, surface area, surface vertices, thickness, etc. The model attained 55% prediction accuracy. Kim et al. [34] adopts linear SVM in functional connectivity produced with MRI data and achieves a prediction accuracy of 66%. The author considers frequency and statistical feature extraction from fMRI data along with the demographic information. Here, a decision tree (DT) is used for classification purposes. This model attains a prediction accuracy of 83%. Laasonen et al. [35] uses a K-NN model for functional connectivity classification, generates the resting state data using marginal criteria, and attains a prediction accuracy of 80%. The author adopts the K-NN model for classification purposes and similarity measures to evaluate the common factors among fMRI time series data. Figure 2 specifies ML approaches based on brain disorder classification model. Various other approaches are also used for classifying ADHD, and some golden standards are considered for evaluation purposes. Author et al. [36] consider functional brain connectivity of diverse frequency bands as features and apply SVM for classification purpose. Miranda et al. [37] adopt kernel SVM for dynamical and statistical functional connectivity features acquired based on sliding window mechanism. The author applies SVM towards the HOG features of fMRI data to achieve better prediction performance. Mucci et al. [38] adopt three diverse classification approaches like RF, SVM and gradient boosting machine and attain an average
114
P. K. V. Kothapalli et al.
Fig. 2 Machine learning approaches based on brain disorder classification
prediction accuracy of 68%. Another method the author considers is adopting the SVM and the K-NN model for ADHD classification with the analysis of the lowerlevel ranking representation of the input data. Some other authors believe RF establishes functional connectivity among the regions of fMRI data. The authors also consider some class imbalance strategies with the medical dataset as the patients specify the minority class. This process is consistent with a lower disorder rate substantially as ADHD generates substantial challenges associated with the optimizing accuracy by diminishing both FNs and FPs subsequently. The method of training the imbalanced data shows bias towards the majority of classes. The class imbalance process is identified with the ADHD datasets. Various approaches are adopted for reducing the majority class effect on the final prediction. Thus, the ML community has resolved the class imbalance issue in two diverse manners, i.e. allocating distinct costs towards the training samples, and resamples the original data using over-sampling with smaller minority classes or under-sampling the larger minority classes. There are various approaches to resolve the imbalanced dataset issues for ADHD classification. This k-fold CV splits the data randomly by preserving the class distribution, resampling rate and bootstrapping. Paris et al. [39] adopt SMOTE approach for handling the imbalanced dataset to oversample some minority classes. It is an approach to adjust dataset-based class distribution and produces synthetic data. It shows the integration of under-sampling and oversampling of majority and minority classes to attain better prediction accuracy. Here, the ROC curve is also generated and compared with the loss rate. Here, SMOTE is successfully used over the MRI dataset for ADHD classification purposes. Table 3 shows the comparison of various ML approaches for ADHD prediction.
A Comprehensive Survey on Predicting Dyslexia and ADHD …
115
Table 3 Comparison of various ML approaches for ADHD prediction Modality
Training data size
Test size
Classification
Test type
Prediction accuracy (%)
Identifications
fMRI
640
–
Probabilistic NN model
CV
90
Samples are under 20 years of age
296
–
L2-Logistic regression model
CV
75
IQ and age-matched
240
–
SVM
CV
79
The subset of samples is 12–19 years old
871
–
SVC
LOSO
68
–
888
222
SVM
Training and testing
62
–
200
52
RF
Training and testing
92
The subset of provided dataset
sMRI
sMRI + fMRI
93 ± 40
–
K-NN/SVM
Fivefold CV 72
48 ± 31
–
RF
10 or 20-fold CV
80 ± 9
–
132
–
SVM
CV
79
–
64
–
SVM
Tenfold CV 68
Two sites from the provided dataset
650
–
K-NN/SVM
Fivefold CV 53 ± 7
Subjects below 10 years of age are excluded
44
–
SVM
CV
77
–
138
47
projection-based learning process
Training and testing
70
NYU dataset
85
–
RF
Threefold CV
81 ± 2
–
40
–
Projection-based learning process
Fivefold CV 98 ± 2
Here, samples are females of adult age
870
–
Graph-based convolutional network model
Tenfold CV 70
800
311
SVM
Training and testing
Samples from the NDAR dataset range from 5–10
48
–
FCN
Tenfold CV 94
186
–
DBN
Tenfold CV 65
816
–
MLP
Tenfold CV 85
64
–
(continued)
116
P. K. V. Kothapalli et al.
Table 3 (continued) Modality
Training data size
Test size
Classification
Test type
Prediction accuracy (%)
810
–
Multichannel ANN
Tenfold CV 73
Identifications
5 Reviews on Performance Metrics Some performance metrics considered for the prediction purpose are accuracy, precision, recall and F-measure [40]. The performance indicators are provided based on the confusion matrix, where TP specifies the number of instances predicted as positive and considered to be positive actually; TN specifies the number of cases expected to be opposing that is negative actually; FP specifies the number of issues that are negative actually but intends to be positive, and FN specifies the number of positive cases is intended to be negative. Here, the outlier prediction accuracy entirely relies on the ability of the model to handle the complete instance as negative or positive. It is expressed as below: Prediction accuracy =
(TP + TN) ∗ 100% (TP + FN + FP + TN)
(1)
Precision is depicted as the proportion of appropriately predicted positive instances out of all positive predictions provided by the predictor model. It is expressed as in Eq. (2): Precision =
TP (TP + FP)
(2)
The recall is depicted as the proportion of predicted positive instances to be positive always. It is expressed as in Eq. (3): Recall =
TP (TP + FN)
(3)
F1-score is depicted as the measure generally used for classification problems. It adopts the average harmonic method to determine the recall and precision rate, where the maximal value is set as 1 and the minimum value is set as 0. It is mathematically expressed as in Eq. (4): F1 - score = 2 ∗
precision ∗ recall precision + recall
(4)
A Comprehensive Survey on Predicting Dyslexia and ADHD …
117
RMSE is depicted as the standard deviation of the residuals, i.e. prediction errors. The errors are predicted over the quantitative data and mathematically expressed as in Eq. (5): RMSE =
N i=1
2
xi − xi N
(5)
Here, i specifies variables, N specifies the no. of non-missing data points, xi specifies actual observations time series, and xi selects estimated time series. MCC is expressed as the reliable statistical rate that offers a higher score only when the prediction process attains superior outcomes in all the metrics like TP, FP, TN and FN. It is mathematically expressed as in Eq. (6):
TP ∗ TN − FP ∗ FN MCC = √ (TP + FP)(TP + FN)(TN + FP)(TN + FN)
(6)
The coefficient of determination (R2 ) is depicted as the variation proportion over the dependent variables predicted from the independent variables. It is expressed as in Eq. (7): R2 = 1 −
RSS TSS
(7)
Here, RSS specifies the sum of residual squares and TSS specifies the total sum of squares.
6 Research Gaps ML approaches are depicted as the general-purpose approaches of AI that learns data patterns devoid of expressing the data prior. These approaches have been widely used for the prediction of ADHD and dyslexia for the past few years. For prediction purposes, some typical ML workflows include preliminary steps like data acquisition, preparation, pre-processing, feature extraction and selection and classification. Finally, performance metrics are evaluated to show the significance of the model. Recently, both ML and deep learning approaches have been adopted for the prediction process due to the potency to give higher prediction accuracy. Some specific parts of the scientific code are re-runnable, reproducible, repeatable, reusable and replicable. The reproducibility of various ML approaches is considered as another essential factor that needs to be considered. Modelling an appropriate ML approach comprises multiple training and hyperparameters. It includes the number of nodes, hyperparameter tuning process, number of iterations and regularization methods. These are utilized to eliminate over-fitting issues. Without proper
118
P. K. V. Kothapalli et al.
implementation of the anticipated model with the available public information and other criteria, the model reconstruction and better prediction accuracy are not probable. Sharing the model implementation with proper guidelines uses appropriate result reproduction with better research experience for upcoming researchers. Some recently proposed models like brain image data structures help standardize data, curation and storing processes. It will act as a streamline for the reproducibility and reliability of ML approaches. There is still some space for enhancing the present research works to offer a better diagnostic experience. The significant issues overlooked in various research works are the execution time required for training the prediction model. Based on the above discussion, ML approaches are not specifically designed only for brain imaging data. For instance, the conventional CNN model is specifically designed for categorizing the 2D images; however, the authors are concentrating on 4D or 3D data concerning MRI and fMRI data. The advancements of the standard models from 2 to 3D/4D lead to an increase in several parameters and overall running time. The execution time is considered the hurdle for assisting the tools during disease diagnosis, and high-performance computing could be an essential part of making all these ML mentioned above approaches a mainstream process. Mostly, sMRI and fMRI features are individually considered predictors to various ML approaches, while integrating these approaches can offer a higher source of information. The integration of fMRI and MRI data, specifically when merged with other details like demographic features, acts as a possible way to enhance the interpretability and predictability of ML approaches. However, there is still room for improving the reliability of the predictive models with the adoption of transfer learning and data augmentation methods. The success of these methodologies over other fields like computer vision encourages integrating them in modelling the predictor approaches for diagnosing brain disorders.
7 Research Challenges At present, the available ML approaches are extremely slow and detract from making suitable gains in classifying ADHD and dyslexia subsets. The current proliferation of ‘big data and increased sample utilization, specifically with MRI data, leads to the path of adopting various novel approaches that have the competency to effectually and quickly examine the provided input data to predict the specific cause of ADHD dyslexia disorder. Recent research needs a crafter parallel processing algorithm by considering both the GPU and CPU or CPU accelerator model, which are crucial for scalable provisioning solutions for multi-dimensional input data. Moreover, highperformance computing needs two essential components for modelling these sorts of frameworks like parallel processing of ML approaches and parallel processing of input data related to both dyslexia and ADHD disorder. Adopting this highperformance computing for faster and quick execution and prediction is highly challenging, and the collection of huge samples is also a significant research constraint
A Comprehensive Survey on Predicting Dyslexia and ADHD …
119
noted in previous approaches. Thus, these research issues need to be cleared in future with superior prediction accuracy.
8 Conclusion This review concentrates on analysing various prevailing ML approaches which are implemented explicitly for ADHD and dyslexia prediction. Both ADHD and dyslexia are considered exceedingly complicated brain disorders during the development process. They have acquired a considerable interest in modern ML approaches and neuroscience over the past few years. Even though many image processing and mining approaches have been applied in the prediction process during the past few decades, ML approaches are considered an infant stage. This review deduced SVM as the most commonly used ML approaches for ADHD and dyslexia prediction and diagnosis process. The data for ADHD and dyslexia prediction have been collected from various heterogeneous sources. The outcomes from these ML approaches are reviewed as promising outcomes and show the occurrence of ADHD and dyslexia as a heterogeneous disorder due to the variations in brain tissue features. With the extensive analysis, some approaches complement the prevailing computer-based interventions. However, there is a necessity for further research work to model for languageindependent data collection for ML-based ADHD and dyslexia prediction as the learning disability is not so specific towards a particular culture, language and region. These researchers are considered interest fields if further analysis can measure the consequences of ensemble approaches and the situations, where the prediction process from various ML approaches is hybridized to enhance the performance of different ML approaches. Additionally, the potential ADHD and dyslexia studies need to consider executing various multi-modal ML frameworks that facilitate the capitalization of diverse dyslexia and ADHD datasets from multiple heterogeneous sources with multitask approaches to handle the additional tasks for enhanced performance.
References 1. Campbell T (2011) From aphasia to dyslexia, a fragment of genealogy: an analysis of a ‘medical diagnosis’ formation. Health Sociol Rev 20(4):450–461 2. Iwabuchi M, Hirabayashi R, Nakamura K, Dim NK (2017) Machine learning-based evaluation of reading and writing difficulties. Stud Health Technol Info 242:1001 3. Al-Barhamtoshy HM, Motaweh DM (2017) Diagnosis of dyslexia using computation analysis. In: Informatics, health & technology (ICIHT), pp 1–7 4. Zainuddin L, Mansor, Mahmoodin Optimized KNN classify rule for EEG based differentiation between capable dyslexic and normal children. In: Biomedical engineering and sciences (IECBES), 2016 IEEE, pp 685–688
120
P. K. V. Kothapalli et al.
5. Pło´nski G, Altarelli M, Marbach VE, Grande, Jednoróg (2017) Multi-parameter machine learning approach to the neuroanatomical basis of developmental dyslexia. Human Brain Map 38(2):900–908 6. Feng Z, Yang T, Xie D (2017) Dyslexic children show atypical cerebellar activation and cerebrocerebellar functional connectivity in orthographic and phonological processing. Cerebellum 16(2):496–507 7. Cui X, Gong G (2016) Disrupted white matter connectivity underlying developmental dyslexia: a machine learning approach. Hum Brain Mapp 37(4):1443–1458 8. Alsobhi YA, Khan N, Rahanu H (2015) Personalized learning materials based on dyslexia types: ontological approach. Proc Comput Sci 60:113–121 9. Mohamad S, Mansor W, Lee KY (2013) Review of neurological techniques of diagnosing dyslexia in children. In: 2013, 3rd international conference on system engineering and technology (ICSET), pp 389–393 10. Manghirmalani PZ, Jain K (2011) Learning disability diagnosis and classification—a soft computing approach. In: 2011 World congress on information and communication technologies (WICT). IEEE, pp 479–484 11. Nidhya R, Kumar M, Ravi RV, Deepak V (2020) Enhanced route selection (ERS) algorithm for IoT enabled smart waste management system. Environ Technol Innov 20:101116. ISSN 2352-1864 12. Bar-kochva I (2016) An examination of an intervention program designed to enhance reading and spelling through the training of morphological decomposition in word recognition. Sci Stud Reading 20(2):163–172 13. Bonacina S, Cancer A, Lanzi PL, Lorusso ML, Antonietti A (2015) Improving reading skills in students with dyslexia: the efficacy of a sublexical training with the rhythmic background. Front Psychol 6:1510 14. Cirino PT, Rashid FL, Sevcik RA, Lovett MW, Frijters JC, Wolf M, Morris RD (2002) Psychometric stability of nationally normed and experimental decoding and related measures in children with reading disability. J Learn Disabil 35(6):526–539 15. Dai L, Zhang C, Liu X (2016) A unique Chinese reading acceleration training paradigm: to enhance Chinese children’s reading fluency and comprehension with reading disabilities. Front Psychol 7:1937 16. Ebrahimi L, Pouretemad H, Khatibi A, Stein J (2019) Magnocellular based visual motion training improves reading in Persian. Sci Rep 9:1142 17. Flaugnacco E, Lopez L, Terribili C, Montico M, Zoia S, Schon D (2015) Music training increases phonological awareness and reading skills in developmental dyslexia: a randomized control trial. PLoS ONE 10(9):e0138715 18. Galuschka K, Ise E, Krick K, Schulte-Korne G (2014) Effectiveness of treatment approaches for children and adolescents with reading disabilities: a meta-analysis of randomized controlled trials. PloS one 9(8):e105843 19. Gori S, Seitz AR, Ronconi L, Franceschini S, Facoetti A (2016) Multiple causal links between magnocellular-dorsal pathway deficit and developmental dyslexia. Cereb Cortex 26(11):4356– 4369 20. Heth I, Lavidor M (2015) Improved reading measures in adults with dyslexia following transcranial direct current stimulation treatment. Neuropsychologia 70:107–113 21. Jorgensen TD, Pornprasertmanit S, Schoemann AM, Rosseel Y (2020) semTools: Useful tools for structural equation modelling. R package version 0.5-3 22. Koen BJ, Hawkins J, Zhu X, Jansen B, Fan W, Johnson S (2018) The location and effects of visual hemisphere-specific stimulation on reading fluency in children with the characteristics of dyslexia. J Learn Disabil 51(4):399–415 23. Layes S, Chouchani MS, Mecheri S, Lalonde R, Rebai M (2019) Efficacy of a visuomotorbased intervention for children with reading and spelling disabilities: a pilot study. Br J Spec Educ 46(3):317–339 24. Lofti S, Rostami R, Shokoohi-Yekta M, Ward RT, MotamedYeganeh N, Mathew AS, Lee HJ (2020) Effects of computerized cognitive training for children with dyslexia: an ERP study. J Neurolinguist 55:100904
A Comprehensive Survey on Predicting Dyslexia and ADHD …
121
25. Deepak V, Khanna MR, Dhanasekaran K, Prakash PGO, Babu DV (2021) An efficient performance analysis using collaborative recommendation system on big data. In: 2021 5th International conference on trends in electronics and informatics (ICOEI), pp 1386–1392 26. Morris SB (2008) Estimating effect sizes from pretest-posttest-control group designs. Organ Res Methods 11(2):364–386 27. Wang LC, Liu D, Xu Z (2019) Distinct effects of visual and auditory temporal processing training on reading and reading-related abilities in Chinese children with dyslexia. Ann Dyslexia 69:166–185 28. Toste JR, Capin P, Williams KJ, Cho E, Vaughn S (2019) Replication of an experimental study investigating the efficacy of a multisyllabic word reading intervention with and without motivational beliefs training for struggling readers. J Learn Disabil 52(1):45–58 29. Szucs D, Ioannidis JPA (2017) An empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. Plus Biol 15(3):e2000797 30. Ramsay MW, Davidson C, Ljungblad M, Tjamberg M, Brautaset R, Nilsson M (2014) Can vergence training improve reading in people with dyslexia? Strabismus 22(4):147–151 31. R Core Team (2020) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria 32. Gropper RJ, Tannock R (2009) A pilot study of working memory and academic achievement in college students with ADHD. J Atten Disord 12:574–581 33. In de Braek D, Dijkstra JB, Jolles J (2011) Cognitive complaints and neuropsychological functioning in adults with and without attention-deficit hyperactivity disorder referred for multidisciplinary assessment. Appl Neuropsychol 18:127–135 34. Kim S, Liu Z, Glizer D, Tannock R, Woltering S (2014) Adult ADHD and working memory: neural evidence of impaired encoding. Clin Neurophysiol 125:1596–1603 35. Laasonen M, Lehtinen M, Leppämäki S, Tani P, Hokkanen L (2010) Project DyAdd: phonological processing, reading, spelling, and arithmetic in adults with dyslexia or ADHD. J Learn Disabil 43:3–14 36. Mehren A, Özyurt J, Lam AP, Brandes M, Müller HHO, Thiel CM, Philipsen A (2019) Acute effects of aerobic exercise on executive function and attention in adult patients with ADHD. Front Psychiatry 10:132 37. Miranda A, Baixauli I, Colomer C (2013) Narrative writing competence and internal state terms of young adults clinically diagnosed with childhood attention deficit hyperactivity disorder. Res Dev Disabil 34:1938–1950 38. Mucci F, Avella MT, Marazziti D (2019) ADHD with comorbid bipolar disorders: a systematic review of neurobiological, clinical and pharmacological aspects across the lifespan. Curr Med Chem 26(38):6942–6969 39. Paris J, Bhat V, Thombs B (2015) Is adult attention-deficit hyperactivity disorder being overdiagnosed? Can J Psychiatry 60(7):324–328 40. Skodzik T, Holling H, Pedersen A (2017) Long-term memory performance in adult ADHD: a meta-analysis. J Atten Disord 21(4):267–283
A Decentralised Blockchain-Based Secure Authentication Scheme for IoT Devices Effra Altaf Haqani, Zubair Baig, and Frank Jiang
Abstract Internet of things (IoT) is a paradigm comprising smart and embedded devices that communicate with each other over communication channels. As the connectivity is mostly through vulnerable channels, these devices may have authentication issues. With the exponential growth in the number of IoT devices in the market, alongside exposed vulnerabilities of constantly improving technology, the adversary has found numerous ways to pose a threat to the IoT communication channel. Authentication is essential in order to ensure the security of IoT, comprising identification of both users and devices. The reliance on single servers and centralised authorities for network-wide registration and authentication of IoT devices exposes current authentication schemes to adversarial threats. We present a blockchain-based decentralised authentication scheme for peer-to-peer IoT networks. The scheme is designed to foster secure authentication for IoT devices in an integrated IoT-cloud network topology. Keywords Blockchains · Mutual authentication · IoT · Smart homes · Cyber security
1 Introduction Considering the increasing expansion of smart devices on the Internet of things (IoT), the security of these devices has started to become a significant source of worry as attackers are able to exploit exposed vulnerabilities. These smart devices offer varied applications in different domains, for instance, health care, medical systems, smart cities, transport, and home security. The use of smart devices is expanding progressively, being interconnected to many sensors, they collect and produce an enormous measure of sensitive information. Thus, it is significant to validate the authenticity of devices and data. In an exposed IoT perimeter, a remote user can establish a connection to any node by seeking unauthenticated access to the IoT network. Explicit data E. A. Haqani · Z. Baig (B) · F. Jiang School of IT, Deakin University, Geelong, VIC 3216, Australia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_9
123
124
E. A. Haqani et al.
can be pulled from these nodes when associated with these unsolicited connections. As a result, mutual authentication is critical, because in IoT networks, establishing resourceful gateway endpoints accelerates data transmission and processing, and doing so in an authenticated manner is a strong security measure [1]. Continuous and mutual authentication is pivotal for ensuring that the user is not being impersonated [2]. In sensor networks, there is an assortment of authentication schemes. For instance, lightweight security, key generation, shared verification, and multifaceted confirmation are all critical requirements for the authentication framework [3]. Authentication will prevent individuals who are not authorised from entering and deploying malicious programmes. These authentication schemes are usually intended for optimal processing systems and are thus inappropriate for lightweight Internet of things devices that are limited in power, storage and processing capabilities [4]. To authenticate the subject, only one factor was used at first. Because of its simplicity and user-friendliness, single-factor authentication has been widely adopted. For example, employing a secret passcode to validate the authenticity of a user may be investigated. User accounts can be compromised quickly if the secret key (password) is shared. Attackers can use varied social engineering methods; for example, the dictionary attack is utilised to gain access. It is usual to consider the minimum password complexity prerequisite when employing this type of authentication. To overcome the issues in single authentication, two-factor authentication was introduced. It uses the combination of (username/password) with an ownership factor, such as a phone (text messaging). Then there was the concept of multi-factor authentication, which integrated more than two types of credentials to give a better degree of security and to enable continued protection of critical services from unauthorised access. Users were needed to provide proof of their identification, which was based on distinct criteria, at this step, which improved the level of security. Due to its decentralised structure and cryptographic features, the emerging blockchain technology may be a viable option for providing identification and access control services in IoT. Blockchain innovation was first utilised in the Bitcoin money-transfer framework. Notwithstanding, security specialists from all around the globe are zeroing in on blockchain to further develop IoT protection and security. Blockchain’s properties, like expanded trustworthiness, unforgeability, and adaptation to non-critical failure, make it a viable solution for authentication. The internal functions reveal their reliance on encryption techniques. According to the cryptographic standards of the blockchain, every access point in a network would receive a public key (for encryption), which is used to scramble messages by other peer access points, and a private key (for decoding), which allows the messages to be read at the source access point. Device authentication is also impossible if there is no agreement on IoT device fingerprint data, which can be used to generate authentication proofs and to verify device identities across a network’s multiple IoT devices. Security vulnerabilities in the published protocols might expose them to a range of cyber threats, including secret disclosure, message replay attacks, and a lack of traceability methods. This paper’s main goals can be summarised as follows:
A Decentralised Blockchain-Based Secure Authentication …
125
• In a distributed environment for a smart home network, we propose a blockchainbased authentication system. • Platform measurements will be recorded and incorporated in the blockchain, and users/devices could send a query to the Ethereum blockchain to obtain platform information, ensuring platform integrity. • Our scheme also proposes asymmetric key-based protocol overlaid on an Ethereum blockchain, which is shared between users and devices to achieve mutual authentication, using a trusted third party in the distributed environment. • We present a security analysis of our proposed authentication method, as well how the strategy achieves its security requirements.
2 Related Work The blockchain is a shared network that is totally decentralised. It is regarded as Bitcoin’s greatest important new advancements since it serves as a “trustless” proof method for all transactions performed on the network. The blockchain is made up of several blocks that are connected through their hash values. A public ledger in a blockchain network keeps track of the digital signature of transactions in a network. Every node in the blockchain network maintains a full duplicate of the record. Consensus techniques are used by blockchain to validate transactions and update the whole ledger. When a new transaction is sent to the blockchain, all checkpoints in the network review the information and, if these are authorised, they are then approved and added to the existing record. Due to its decentralised design and unforgeable characteristic, blockchain is an excellent solution for IoT devices that are frequently deployed in dangerous regions with no physical security. The blockchain is an application layer that operates on top of the existing Internet protocol stack, creating a whole new tier for economic activity including instantaneous digital payments as well as other financial transactions. Blockchain technology might be used by IoT devices to generate a secure, unalterable, and auditable data record. On the Ethereum blockchain, the challenge–response technique is utilised to authenticate the server, introduced by Author [5]. The technique relies on the use of a third-party authentication server. On the blockchain, there are two types of nodes: validating node is responsible for storing and reading block data but not to create new blocks or to initiate transactions. The miner node is the second type of node, and it may both create and validate blocks. Proof-of-work is one of network consensus algorithm (PoW). A mathematical problem is introduced to the PoW algorithm, which must be calculated by the nodes for a block to be validated. The puzzle’s difficulty may be adjusted based on the miner nodes’ computing power and the time it takes to validate new blocks. In settings where there are no restrictions, the PoW algorithm is utilised [6]. While a blockchain has many advantages, it is not ideal for IoT devices with limited resources due to significant bandwidth overheads, delays, and the need for considerable amount of computing resources. The challenges with IoT’s processing power
126
E. A. Haqani et al.
requirements for blockchain transactions are related to the amount of processing power needed, implying that general-purpose servers and processors may not be sufficient. With the advancement of IoT devices, a tremendous amount of data is sent to the cloud for processing and storage, necessitating high cloud platform performance, network capacity, and a possible concentrated risk. To solve these challenges, edge computing is a newly budding architecture that merges cloud computing with IoT [6, 7]. However, concerns like authentication, intrusion detection, and access control are difficult to handle with edge computing because varying software and applications are integrated into heterogeneous edge servers, and service transfer between edge servers is prone to cyber threats. Edge computing and blockchain technology are a possible solution to these challenges. As a result, edge computing combined with blockchain technology may enable secure data storage, network access and management, and rapid search in IoT devices. The centralised cloud computing architecture is unable to adequately address the new challenges that the expanding IoT paradigm poses, such as service delay, inadequate computing capacity, resource-constrained devices, persistent services with connectivity, and enhanced security. It will need a more sophisticated cloud computing paradigm to overcome these challenges. This paradigm must break past the limitations of centralised design while also eliminating bandwidth and computing limits [8]. Edge computing is becoming a more essential part of the (IoT) architecture, especially when it is vital to get a quick and autonomous response from devices depending on the data they receive. Edge computing decreases latency, saves money, consumes less bandwidth, and enhances security. Data is first stored and processed locally, as near as feasible to the source of the data, using this technology. Massive amounts of IoT devices generate enormous data at a rapid rate. All these devices, which are resource-restricted, require computational resources such as computation power and storage, among other things, to function properly. These difficulties can be handled to a certain extent with the aid of cloud infrastructure, but only to a limited extent. Edge computing, like cloud computing, is vulnerable to the same sorts of attacks. The edge, because of virtualisation technologies, is also vulnerable to co-tenancy risks. Other communication risks, such as sniffing and jamming, are also prevalent while using a vulnerable network connection. In addition, because an edge node is not as resourceful as a cloud data centre, additional risks such as resource depletion attacks must be addressed. Between the cloud and the user, the edge adds another layer, giving attackers new attack surfaces. A cloud data centre is generally based in a single place and is monitored closely. It is also generally well-equipped and capable of countering cyber threats. Thus, edge node security is significantly more difficult. Between the cloud and the end-user, an edge server functions as a cache. Cache-based attacks, which target a node with security flaws, are therefore quite likely to occur. Edge services are primarily intended to provide service to the user situated closest to them. As a result, user privacy, particularly that which relates to location, is extremely vulnerable to hostile attacks [9]. Authentication implies validating the identity establishment between two communicating parties [10], whereas identity management uniquely identifies entities. Multiple individuals and devices must authenticate each other through trusted
A Decentralised Blockchain-Based Secure Authentication …
127
services; consequently, it is important to think about how to manage authentication in the IoT. For example, when it comes to authentication, IoT is subject to a set of vulnerabilities that continue to be one of the most serious challenges in the provision of security in many applications. The authentication mechanism employed is restricted in that it guards against only one kind of threat, such as a DoS or a replay attack. Information security is one of the most susceptible areas in IoT authentication high prevalence of applications that are vulnerable, due to their inherent multiplicity of data gathering in the IoT environment. For instance, the credit cards allow an adversary to obtain card data without IoT authentication, enabling them to buy items using the cardholder’s bank account and identity [11]. Personal information acquired by IoT devices could be misused due to these vulnerabilities. A hacker may utilise this information to help identity theft if a gadget captures and saves personal, medical, and/or financial data. To preserve personal data, information should be stored just as needed and routinely deleted. The obtained data should be confined to what is needed for the device and application to run properly. Unprotected endpoints become entry points for cyber-attacks that attempt to take them offline and modify their settings. Inadequate security for IoT devices and the data they collect, store, and transmit can lead to data breaches, when personal information is stolen or compromised. As a result, IoT devices and applications’ data-gathering capabilities should be protected from security breaches. One example of this sort of software is Bitdefender. It can defend typical items linked to a home network against malicious software assaults. To protect IoT, standardised security processes and standards are also necessary [12]. This not only decreases the risk of data breaches, but also the risk of data misuse that goes against user expectations (Fig. 1).
Fig. 1 An overview of blockchain-based authentication in IoT systems
128
E. A. Haqani et al.
One of the authentication methods suggests using the OAuth 2.0 standard to secure access to the IoT network through mutual authentication. An OAuth 2.0 protocol is offered as an authentication technique to permit only approved and genuine users by matching user information and access tokens in the security manager local database and refusing access to the IoT network. If something is incorrect, the security manager checks the user’s information and access tokens against the local database and prohibits access to the loT network. Multiple users can be registered or generated with less effort across networks. By storing user information credentials, it can assist reduce the network manager’s burden. It also safeguards the IoT network from impersonation and replay attacks [13]. A lightweight authentication technique was developed that did not necessitate the usage of a central server to store secret keys. In this paper, the researcher found a flaw: the current authentication method demands the storing of authentication credentials on a secure server. physically unclonable functions (PUFs) are physical functions that are dependent on the physical characteristics of devices and are nearly impossible to duplicate. PUFs are used in the recommended approach. The proposed method has a fault in that it is inefficient since it requires at least five exchanges of authentication messages. Additionally, the PUF-based authentication data is stored on the server node, making that node vulnerable to a single point of failure attack [14]. Sikdar [15] describes a novel two-factor authentication approach for Internet of things devices that protect privacy while allowing an IoT device to communicate anonymously with a server located at the data and control unit. It was shown that the proposed technique is secure even if an attacker has physical access to an IoT device, which was previously considered to be impossible. This method has several drawbacks, the most notable of which is that the authentication data is stored on a central server. A total of five message exchanges between the device and the server are also required by the approach. Another approach [16] for identifying and authenticating devices in Internet of things networks was offered. This solution uses blockchain technology to store information about approved devices. Intentional manipulation or forging is exceedingly difficult due to the distributed structure of blockchain. Furthermore, every transaction on the network was signed with a private key, providing a high level of protection against forgery and fraud. The data saved by approved devices may be safely stored utilising blockchain technology. Blockchain has the potential to be used to offer digital identification and authentication for Internet of things devices, based on its characteristics. This authentication method is suggested to be carried out via the Authenticated Devices Configuration Protocol (ADCP). The method in [17, 18] is essential in circumstances when low latency and high performance are required. In this case, however, only the metadata of transactions between IoT peer devices is stored on the blockchain, with all other data being sent directly between IoT peer devices. The employment of complex routing and discovery algorithms is required by this design. This will ensure that data from IoT peer devices reaches other IoT devices in a timely and efficient manner. When devices are members of the same domain or linked to the same network, this approach will operate well. This would make the crucial finding and routing procedures easier. No
A Decentralised Blockchain-Based Secure Authentication …
129
IoT peer device has a direct link or means of communication with another under this design. For all interactions, the blockchain is used. On the blockchain, all data connected with a communication between two or more IoT peer devices may be recorded and preserved. Furthermore, the blockchain might be useful for tracking and authenticating IoT transactions. This improves the traceability and transparency of device interactions. The drawback of this approach is that it would necessitate more bandwidth and data for IoT peer devices to work and communicate in this manner. Another research work establishes an IoT design based on blockchain technology that minimises expense while maintaining most of the security and privacy features. A smart home application is utilised in this study to demonstrate a broader IoT potential. To maintain privacy and security, an overlay network and cloud storages coordinate data flows with the blockchain. To achieve a decentralised topology, the design leverages distributed trust mechanisms and multiple types of blockchains dependent on the network hierarchy. Three blockchains are linked: a private blockchain for each use case, a shared (private) blockchain, and a public blockchain. Its primary problem is that it is not distributed but rather centralised, which runs counter to its premise and restricts its power and availability [19]. One blockchain-based approach suggested bubbles of trust offered a decentralised authentication mechanism. This technique, which is based on the Ethereum public blockchain, aims to establish secure virtual zones where devices may safely interact. The basic problem in the proposed technique is that devices from various systems are unable to communicate with one another. It is, however, inapplicable to a wide range of dispersed IoT applications where communication between IoT devices belonging to different systems is important [20]. Table 1 summarises the benefits and drawbacks of the related works.
3 Proposed Blockchain-Based Authentication Scheme The main aim of this work is to provide a design and propose an efficient decentralised blockchain mechanism to achieve secure authentication for IoT devices in an integrated IoT-cloud network topology. The proposed framework will be delivering a basic prototype solution for specific threat families, namely channel compromise and false injection attacks. Those devices that are registered with blockchain-enabled cloud and are authorised can connect with each other under the proposed method. Devices that are not registered on the blockchain are unable to authenticate themselves and hence are not permitted to connect with authorised devices. This eliminates the chance of the malicious devices interacting with genuine IoT devices. In this approach, we are proposing a framework by implementing four crucial steps to achieve decentralised Ethereum blockchain secure authentication for IoT devices. The framework as shown in Fig. 2 comprises of five main participants the users, IoT devices, smart contracts, Ethereum blockchain and the cloud hosting IoT data. IoT devices and the users have unique Ethereum addresses. By sending a request
130
E. A. Haqani et al.
Table 1 Summary of advantages and limitations of known schemes References Advantages
Limitations
[21]
It enables users to buy a token from the network owner to get IoT data, which is given when the token is verified. It employs a distribution reusing token detection method to stop potential access
Most of the research addressed the privacy preservation, and no consideration was given to fine-grained access control regulations for end devices
[22]
Offers a novel and lightweight authentication system for wearable devices that make use of a cloud server to facilitate mutual authentication while also maintaining anonymity for the devices
The architecture is an anonymous authentication system for wearable devices, and it may not be well-suited to other Internet of things use cases
[23]
The framework solves security and It does not secure the integrity of the privacy concerns by implementing messages that are sent Mandatory Access Control regulations in cloud and BYOD settings in a sensitive and safe manner using a separate platform that is completely independent
[13]
An OAuth 2.0 authentication mechanism to secure access to the IoT network. The protocol compares user information and access tokens in the security manager local database and prohibits access to the IoT network
[14]
With the suggested protocols, no secrets The approach cannot protect IoT device are stored in the IoT devices, reducing privacy the server storage needs
[15]
offers a flexible and privacy-preserving two-factor authentication method for IoT devices that includes physically unclonable functionalities
Not susceptible to password guessing attack
[16]
Hardware authenticators are used to identify and authenticate devices, which are subsequently stored on the blockchain. Devices that hold a specific number of coins are considered authenticated
The framework did not address to solve security issues
[19]
It does not utilise PoW consensus validation blocks since the miner controls all IoT devices on the smart home layer. The solution’s overheads are negligible compared to the security advantages
The architecture is for smart home applications and may not be appropriate for other IoT devices. This method also does not enable self-enforced access control
Acquiring minimum user information may need further requests. The jwt token can assist, but not all services support it. If a token is taken, an attacker has temporary access to secure data. To reduce this danger, a signed token can be utilised
(continued)
A Decentralised Blockchain-Based Secure Authentication …
131
Table 1 (continued) References Advantages
Limitations
[20]
The primary issue in the proposed technique is that devices from various systems are unable to communicate with one another
This technique, which is based on the Ethereum public blockchain, establishes secure virtual zones where devices may safely interact
Fig. 2 Framework for IoT to cloud blockchain-based authentication
and response, the user and device will perform mutual authentication for each other, measure respective platform integrity, and register themselves in the user and device registration system. Smart Contracts Smart contracts were implemented in the proposed framework to provide decentralised authentication of sensor devices. These smart contracts offer a variety of capabilities, such as managing the devices information, address mapping, policies, and events, that allow to control IoT in a decentralised manner. Because it only allows transactions to be executed if they are in accordance with a pre-defined policy, it is a critical addition. Blockchain serves as a trusted party in many situations as a result, there is no chance of tampering with or breaching information that is required for the protocol. Cloud Computation and storage server that gather and store IoT data are hosted in the cloud. The data can then be routed to cloud servers for intensive processing and
132
E. A. Haqani et al.
analytics. We exclusively present our approach for user authentication and device authentication. Users The users are the individuals who seek access authorisation from the smart contract to gain access to a specific IoT device. As soon as the user has registered an Ethereum BC account, it requests to communicate with the IoT device. The user and device perform mutual verification with both to ensure the security and credibility of the distributed network. User Registration The creation and execution of smart contracts on the blockchain are the first steps in the authentication system as depicted in Fig. 3. Over the Ethereum blockchain, the user first generates and deploys smart contracts. Ethereum returns the “Uaddress ”, blockchain address indicating the contract owner after a successful deployment. In this arrangement, a user gets assigned the first Ethereum address from among the available accounts. Device Registration Each device has its own set of credentials, such as a device id and an address, which are referred to as DA and DAddr , respectively. Figure 4 illustrates the flow of the
Fig. 3 Registration flow between user and Ethereum BC
A Decentralised Blockchain-Based Secure Authentication …
133
Fig. 4 Registration flow between the device and Ethereum BC
registration process. The device’s registration procedure recognises DA and DAddr from the users and looks for device credentials. When the blockchain already holds credentials, smart contracts will reverse the transaction and provide the error “device not added” to notify the device of the credentials’ existence. If the device is not found in the blockchain, then the smart contract will assign a specific address to the device in the BC and register its account. Preliminaries This section comprises of the relevant notations for interpreting the proposed technique. The notations and descriptions used in the proposed protocol are summarised in Table 2. Execution flow The method is initiated once the user first creates the smart contract, which is then compiled and executed using Ethereum blockchain addresses. The user initiates a system call with smart contract parameters to connect to the Ethereum blockchain. If the process is successful, it sends back the smart contract address as a secondary validation address; else, it returns the reverse function and the error message “User
134 Table 2 Notations
E. A. Haqani et al. Notations
Description
U
User
D
IoT device
UAddress
User address
DAddress
IoT device address
TTP
Trusted third party
KPpublic
Public key of the device
Pkprivate
Private key of the device
Ppublic
Public key of the TTP
Pprivate
Private key of the TTP
Usign
platform integrity signed by the third party for the user
Pkpublic
Public key of the user
Pkprivate
Private key of the user
DAsign
platform integrity signed by the third party for the device
not added”. When users use the Ethereum blockchain system call to get the Ethereum address, they can connect and initiate communications with the Ethereum blockchain and various other parts of the system effectively. The second step is to use the devices’ unique credentials, such as DA and DAddr , to register them over the blockchain. Each device in the system has its own set of credentials. The user receives an error message if the credentials of a new device, such as DA and DAddr match the credentials of an existing device. Integration of the device into the network enables a group of IoT devices to join for safe a reliable communication with third-party assistance who will validate the dispersed network’s security and trustworthiness. To carry out the authentication scheme, system entities interact in two primary ways: user and device registration and mutual authentication of user and device using platform integrity. In the final phase (Fig. 5), we’ll use terminals A and B to represent a requester and a responder in an IoT paradigm, respectively. The terminal A represents several users (UA ), whereas the terminal B represents a small number of IoT devices (DA ). Four entities make up the system model: terminal A, terminal B, blockchain and a third trust party (TTP). Users and devices form a separate entity known as terminal A and terminal B. User A from terminal A and device A from terminal B test their platform integrity in this scenario. Then, during the trusted phase, UA performs mutual verification DA and the vice versa to ensure the distributed network’s security and credibility. Furthermore, blockchain oversees preserving and updating the transactions it receives during the process. User to Device request As a requester Terminal A which consists of users; (UA ) can make a request to establish a connection with Terminal B which is an entity of devices (DA ) as depicted
A Decentralised Blockchain-Based Secure Authentication …
135
Fig. 5 Authentication process of a user and IoT device
in (Fig. 6). Terminal B can acquire (UA ) current platform integrity through message delivery. By obtaining the (UA ) latest transaction included in the blockchain, (DA ) can detect the reliability of the other party. The device A from terminal B will send a request to the smart contract to verify if the user is registered in Ethereum block, the trusted third party (TTP) will share its public key (Ppublic ) with the device and request to send the user information. The device will use the public key of the TTP to encrypt the user credentials (UA , Uaddress ) and sends across public key (KPpublic ) of the device. The TTP will decrypt the user credentials using the public key (KPpublic ) of device (DA ). The smart contract will confirm the user platform integrity from Ethereum blockchain. Once the information is validated in the blockchain, a third party will sign the validation message (Tsign ) using its private key (Pprivate ). The private key of the device (Pkprivate ) of the DA will be used to decrypt the message, and trusted third party, public key of the TTP (Ppublic ) will be used to verify the signature hash. Here, UA is responsible for performing mutual verification with DA in the trusted network. Device to User response As a responder, Terminal B (DA ) can send a request and establish a connection with Terminal A-(UA ). Then Terminal B may compare the integrity of Users A’s platform to its most recent transaction on the blockchain to see if the other party has been attacked. Similarly, terminal A can obtain (DA ), current platform integrity, by obtaining the (DA ) latest transaction included in the BC. The device A after
136
E. A. Haqani et al.
Fig. 6 User to device request
confirming the user-(UA ) platform integrity will send a response to communicate. The user A from terminal A will send a request to the smart contract to verify if the device is registered in Ethereum block, the trusted third party (TTP) will share its public key (Ppublic ) with the device and request to send the device information. The device will use the public key of the TTP to encrypt the device credentials (DA , Daddress ) and send public key (PKpublic ). The TTP will decrypt the device credentials by making use of the public key (PKpublic ) of user (UA ). The smart contract will confirm the user platform integrity from Ethereum blockchain. Once the information is validated in the blockchain, a third party will sign the validation message (Usign ) using its private key (Pprivate ) The private key of the device (Pkprivate ) of the UA will be used to decrypt the message, and trusted third party; public key of the TTP (Ppublic ) will be used to verify the signature hash, (Fig. 7) describes the mechanism. Asymmetric key Cryptography The scheme uses asymmetric key cryptography for the communication. Asymmetric key algorithms use two keys: private and public keys. The public key is used for encryption and the private key is used for decryption. The public key is known, but the private key is not revealed. So, no need to share keys beforehand. The decryption key is difficult to extract from the encryption key in these techniques. Asymmetric algorithms encrypt and decode messages using public and private keys [24]. Public key cryptography also enables irreversible digital signatures. Since its inception, public key cryptography has grown in importance as a way of securing secrecy, particularly using key distribution, in which individuals communicating in secret exchange encryption keys. Digital signatures, which enable users to sign keys to authenticate their identities, are also included. The digital signatures function as a verified seal or fingerprint to certify the sender’s legitimacy and message integrity. When sending
A Decentralised Blockchain-Based Secure Authentication …
137
Fig. 7 User to device response
an encrypted message, users can encrypt the data using their private key to authenticate their identity. The recipient can then decode the message by making use of the sender public key, to verify the identity and integrity. If a malicious hacker intercepts and alters a message before it reaches its intended destination, the digital signature is broken, and the receiver realises this after trying to verify the signature with the sender’s public key. Thus, the digital signature confirms the sender’s identity because only the legitimate sender would be able to sign the message with his or her private key [25]. Public key cryptography has always been the safest and most secure protocol above private key cryptography. This is primarily because the users very seldom must communicate or expose their private keys to anybody. As a result, during the transmission of data utilising public key cryptography, cyber criminals are less likely to uncover an individual’s private key. It also handles the middle-man attacks, replay attacks, user impersonation, and authentication by providing ways for confirming the receiver and sender’s identities. As a result, public key cryptography, which allows you to authenticate the identities of both the recipient and the sender, can help minimise threats. In this way, meeting the goals of anonymity and information security [26].
4 Informal Security Analysis An informal security analysis is performed in this part to showcase that the proposed framework is secure against various threats. Informal analysis is focused on the analyst’s expertise and innovation.
138
E. A. Haqani et al.
User Impersonation Attack: Our system can determine whether users are authorised and identify unauthorised users. Each request/response must be verified, and its signature should be validated every time to prove its unique identity, an attacker cannot pretend to be someone else to carry out attacks. Replay Attack: Our method can identify terminal platform integrity and prevent unauthorised devices from connecting to a trusted network. In addition to confirming the sender’s identity, integrity and non-repudiation are the two critical security concerns for any IoT system to avoid data manipulation. Because every exchange of message within the authentication protocol is signed cryptographically, the system is immune from man-in-the-middle (MITM) and replay attacks when certain security criteria are met. DDOS Attacks: The system’s architecture incorporates blockchain technology, bringing it closer to a distributed foundation. Using blockchain as a trusted third party (TTP) and the widespread adoption of IoT devices and users has reduced the risk of a DDoS attack on the system. A common approach for creating secrecy is to encrypt and decode messages using a secure SSL connection following successful user authentication. Asymmetric public key pairs are included in the TTP and may be used to construct a secure communication session amongst the authenticating user and the authenticating IoT device [58]. Mutual Authentication: We assume that all devices and users will register on Ethereum and thus be trustworthy; if malicious users/devices who are not registered on Ethereum impersonate to conduct phishing attacks on terminal users/devices, we require the TTP to sign the authentication results, and the terminal devices/users can determine whether the authenticated user/device information is accurate. Each device and user that is a member of the system must authenticate the other’s authenticity. This improves the ability to trust with one another. To encrypt the user/device information that is transmitted to the Ethereum blockchain, the TTP shares a shared key with the user and device. A user must initially register with the system before being authenticated. If the user has previously registered, the system will have a user address linked with it, and after Ethereum has verified the presence and authenticity of the information supplied by the device, it will transmit the user authentication to the device, allowing it to respond in the system, and vice versa. Spoofing Attack: To effectively perform a spoof identity attack, the attacker will need the user/device information and address, as well as the TTP, device, and user private keys. Even if the attacker knows the user’s and device’s information and addresses, the private key is required to mimic the user’s and device’s information from the Ethereum blockchain. Integrity: The term “integrity” refers to the ability of a message recipient to assure that data has not been tampered with during transmission. To maintain data integrity, our system uses asymmetric key encryption. If an attacker wants to interfere with the transmission of messages during our mutual authentication phase, the attacker must first discover the private key. The attacker cannot compute the user/device information without knowing the secret key because he cannot discover the secret value from intercepted communications. As a result, a hostile attacker will be unable to successfully interfere with the transmission of messages.
A Decentralised Blockchain-Based Secure Authentication …
139
5 Formal Security Analysis The framework’s security verification and simulation utilising the Scyther simulation tool [], an efficient tool for assessing, identifying potential threats and weaknesses of network security protocols, is given. We demonstrate that the method can survive various attacks when subject to the Scyther tool. Scyther takes a security protocol specification as an input, which allows security attributes such as claim events and outputs to be defined for each sort of cryptographic attack in the outline report and graph. We’ve utilised claims including Nisynch, Niagree, and Secret. The Nisynch claim verifies the feasibility of desynchronisation. It asserts that no agreement will commit to the parties’ altered values, whereas the secret claim asserts that the sum referred to is confidential. We wrote two SPDL scripts to describe two distinct situations of the user–device authentication phase of the device authentication protocol based on a Scyther syntax constraint. According to the findings, no attacks on any of the claims in the proposed protocol were discovered. The framework proposed is resistant to a variety of threats and attacks. Therefore, the proposed scheme is significantly more secure than existing comparable protocols and meets the security criteria of blockchain-based authentication for IoT devices in a smart home. It was demonstrated in Scyther when we did the security analysis, that the proposed framework is secure against a variety of attacks described below: User Impersonation: In our proposed protocol, the users and devices will have to register themselves over the Ethereum blockchain, the unauthenticated users/devices will not be able to communicate with the devices or the user. To ascertain whether authenticated user is registered on the Ethereum blockchain, when the authenticated user attempts to send a request to the IoT device, the device will check if the user is registered on the BC, and IoT will encrypt the user details, once the Ethereum confirms the user authenticity TTP will encrypt the user details and sign the message that will be sent over to the IoT device. Only legitimate users can generate a valid signature, and any attempts of impersonation are detected when the corresponding authentication operation fails. The Scyther analysis confirms this as we sent the registered users over the IoT device to check if they are registered in blockchain. send_1(User,Device,(A, a)); //user sending the details send_3(Device,TTP,(A,a),(pr),pk) //The device sending the encrypted user details to the TTP
Replay Attacks: An attacker can eavesdrop user/device details (UA ), (DA ) and address (UAddress) , (DAddress ) and try to replay them, but still the user will have to be registered over the blockchain (UA , UAddress ), ((DA ,DAddress ). The user/device details are signed using a TTP private key (Pprivate ) if present in the blockchain. Once the device is receiving the signed user (Usign), the system will reject the invalid signature validation since it is not legitimate. The signature is getting verified using the TTP public key (Ppublic ). Since our security protocol is verifying user identity that is cryptographically signed hence the potential of a replay exploit is no longer present.
140
E. A. Haqani et al.
Our system can withstand replay attacks because it integrates user/device (Usigned) credentials inside signed messages to keep them fresh. recv_6(TTP,Device,Usign); //Device receiving the signed user details.
Spoofing Attacks: The malicious attacker will have to spoof user/device details to impersonate them. As the protocol defines and assumes that the users are registered over the Ethereum blockchain, a private key of TTP and User/device (Pprivate , Pkprivate ) is required to spoof the user and device data from Ethereum. DDOS Attacks: The usage of blockchain and the (TTP), as well as the widespread deployment of IoT devices, reduces the likelihood of a distributed denial of service (DDoS) impact. Furthermore, because keys are not altered between transactions the potential of modifying credentials and systemic desynchronisation is prevented. The scheme is using asymmetric encryption; it is less feasible that a secret key (Pprivate, Pkprivate) will be discovered during the transmission. In the Scyther, we are also verifying the claims of secret key and no attacks are very found. claim_d1(Device,Secret,sk);
Mutual Authentication: The Ethereum blockchain is highly resistant to DDoS attacks due to its wider distribution and excellent integrity and consistency records. The protocol is successfully mutually authenticating the user/device using the asymmetric key encryption, we need the TTP (Pprivate ) to sign the authentication results if a malicious attacker impersonates users to execute phishing attacks. By validating the signature (Usign), the terminal devices may determine if the authorised users are trustworthy and whether the hashes are authentic. Hence, our proposed protocol is successfully fulling the condition of mutual authentication. send_4(TTP,Etheruem, (A,a)); send_5(Etheruem,TTP,U); recv_6(TTP,Device,Usign);
Integrity: In Scyther, this claim is successfully established as the asymmetric key encryption will ensure the communication integrity is maintained. To intercept messages (UA , UAddress ), (DA , DAddress ) during mutual authentication phase, an attacker needs first discover the private keys (Pprivate , Pkprivate ). The blockchain ensures the message’s integrity by preventing tampering with the credentials once signed by the TTP, the integrity is sealed. send_6(TTP,Device ,(Usign));
A Decentralised Blockchain-Based Secure Authentication …
141
Table 3 Security features comparison and evaluation Security features Proposed protocol
[27]
[28]
[20]
[29]
User impersonation
✓
✓
●
●‘
●
Replay attacks
✓
✓
✓
✓
✓
Spoof attacks
✓
…
●
●
…
DDOS
✓
●
✓
✓
…
Mutual authentication
✓
✓
✓
✓
✓
Integrity
✓
…
●
✓
✓
Encryption type
Asymmetric key
Random numbers
Random numbers
Public/private key
Public key distribution
Certificate
✓
●
●
✓
●
In Table 3, the tick implies it is secure while the cross signifies insecure and the … means not considered.
6 Computational Cost During the protocol functioning, each entity is involved in various cryptographic computations. For the computational cost, we are only considering the mutual authentication phase for analysis, our approach has a lower computational cost as user/device to TTP only one encryption, decryption (TAS ), and signature (Tsign ) will take place, respectively. Similarly, during the communication of a device/user one encryption and decryption (TAS ) computation and hash (Thash ) calculation will occur. Thus, calculating the below computation values for our proposed scheme, our approach necessitates a relatively lesser computing power. Table 4 shows a summary of the cryptographic operations involved in the system’s operation, which provides improved security and mutual authentication between the user, the device, and TTP. It also illustrates the computational load placed on the system’s various components during operation. According to the findings in [30], the time consumption of the AES encryption operation TAES is about 2.76 ms, and the time consumption of the hash operation Th is approximately 1.5 ms. We only consider our proposed protocol for computation delay analysis, as our scheme has very less parameters hence the computation cost will be less.
142
E. A. Haqani et al.
Table 4 Computational cost analysis Notation
Computation in milli seconds (ms)
TAES = asymmetric encryption/decryption
2.76 ms
Th = Hash calculation time
1.5 ms
Phase
Device (mutual authentication)
IoT (mutual authentication)
Trusted Total third party (TTP)
AES (asymmetric key)
1TAES + 1TAES + 1Th = 2TAES + 1Th
1TAES + 1TAES 1Th = 2TAS + 1Th
1TAES + 1TAES = 2TAES
2TAES 2TAES 2TAES 6TAES 2Th
Computation in (ms) + + = +
19.56
7 Conclusion We have proposed a user to device authentication scheme for smart homes and implemented a blockchain-based solution for IoT device authentication utilising Ethereum smart contracts in a decentralised way with a trusted third party in this paper. Measurements of the platform from IoT sensors can be recorded and incorporated into the blockchain, and users/devices will be able to query the Ethereum blockchain for this information, maintaining the platform’s integrity. Additionally, our system provides an asymmetric protocol based on the Ethereum blockchain between users and devices to enable mutual authentication via a trustworthy third party in a decentralised environment. Furthermore, we discussed the security analysis of our suggested authentication technique, as well how the strategy achieves stipulated security requirements. In this way, our scheme is suitable for realising a blockchain-based IoT authentication framework since it is more dependable, lightweight, and trustworthy than existing schemes. Acknowledgements The authors would like to acknowledge the Cybersecurity Cooperative Research Centre, CSCRC, for funding this project, M13-00220.
References 1. Kavianpour S, Shanmugam B, Azam S, Zamani M, Samy GN, De Boer F (2019) A systematic literature review of authentication in Internet of Things for heterogeneous devices. J Comput Netw Commun 2. Lau CH, Alan KH, Yan F (2018) Blockchain-based authentication in IoT networks, pp 1–8. https://doi.org/10.1109/DESEC.2018.8625141 3. Chen T-H, Shih W-K (2010) A robust mutual authentication protocol for wireless sensor networks. ETRI J 32:704–712 4. Mendez, D., Papapanagiotou, I. and Yang, B. Internet of things: Survey on Security and Privacy, Cryptography and Security, 2017.
A Decentralised Blockchain-Based Secure Authentication …
143
5. Ourad AZ, Belgacem B, Salah K (2018) Using blockchain for IOT access control and authentication management. ICIOT 6. Zhu Y, Huang C, Hu Z, Al-Dhelaan A, Al-Dhelaan M (2021) Blockchain-enabled access management system for edge computing. Electronics 10:1000. https://doi.org/10.3390/electr onics10091000 7. Li C, Zhang, L-J (2017) A blockchain based new secure multi-layer network model for internet of things. In: 2017 IEEE international congress on internet of things (ICIOT), pp 33–41 8. Zhu J, Chan DS, Prabhu MS, Natarajan P, Hu H, Bonomi F (2013) Improving web sites performance using edge servers in fog computing architecture. In: 2013 IEEE seventh international symposium on service-oriented system engineering, pp 320–323. https://doi.org/10. 1109/SOSE.2013.73 9. Hoque MA, Hasan R (2019) Towards a threat model for vehicular fog computing. In: Proceedings of the IEEE 10th annual ubiquitous computing, electronics mobile communication conference (UEMCON), New York, NY, USA, 10–12 Oct 2019 10. Tawalbeh L, Muheidat F, Tawalbeh M, Quwaider M (2020) IoT Privacy and security: challenges and solutions. Appl Sci 10:4102. https://doi.org/10.3390/app10124102 11. Mahalle P, Babar S, Prasad NR, Prasad R (2010) Identity management framework towards Internet of Things (IoT): roadmap and key challenges. In: Meghanathan N, Boumerdassi S, Chaki N, Nagamalai D (eds) Recent trends in network security and applications. CNSA 2010. Communications in computer and information science, vol 89. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14478-3_43 12. Conti M, Dragoni N, Lesyk V (2016) A survey of man in the middle attacks. IEEE Commun Surv Tutor 18:1–1. https://doi.org/10.1109/COMST.2016.2548426 13. Khan J et al (2018) An authentication technique based on Oauth 2.0 protocol for Internet of Things (IoT) network. In: 2018 15th International computer conference on wavelet active media technology and information processing (ICCWAMTIP), pp 160–165. https://doi.org/10. 1109/ICCWAMTIP.2018.8632587 14. Aman MN, Chaudhry SA, Al-Turjman F (2020) RapidAuth: fast authentication for sustainable IoT 15. Sikdar B (2018) Lightweight and privacy-preserving two-factor authentication scheme for IoT devices. IEEE Internet Things J, 1–1. https://doi.org/10.1109/JIOT.2018.2846299 16. Lau CH, Yeung AK, Yan F (2018) Blockchain-based authentication in IoT networks. In: 2018 IEEE conference on dependable and secure computing (DSC),pp 1–8. 17. Zahmatkesh H, Al-Turjman F (2020) Fog computing for sustainable smart cities in the IoT era: Caching techniques and enabling technologies—an overview. Sustain Cities Soc 59 18. Nartey C, Tchao ET, Gadze JD, Keelson E, Klogo GS, Kommey B, Diawuo K (2021) On blockchain and IoT integration platforms: current implementation challenges and future perspectives. Wirele Commun Mob Comput, vol 2021 19. Dorri A, Kanhere SS, Jurdak R (2016) Blockchain in Internet of Things: challenges and solutions 20. Hammi MT, Hammi B, Bellot P, Serhrouchni A (2018) Bubbles of trust: a decentralized blockchain-based authentication system for IoT. Comput Secur 78:126–146. https://doi.org/ 10.1016/j.cose.2018.06.004 21. Zhang Y, Wen J (2017) The IoT electric business model: uUsing blockchain technology for the internet of things. Peer-to-Peer Netw Appl 10:983–994. https://doi.org/10.1007/s12083-0160456-1 22. Wu F, Li X, Xu L, Kumari S, Karuppiah M, Shen J (2017) A lightweight and privacy-preserving mutual authentication scheme for wearable devices assisted by cloud server. Comput Electr Eng 63:168–181 23. Almarhabi K, Jambi K, Eassa F, Batarfi O (2018) An evaluation of the proposed framework for access control in the cloud and BYOD environment. Int J Adv Comput Sci Appl (IJACSA) 9(10) 24. Bali P (2014) Comparative study of private and public key cryptography algorithms: a survey. Int J Res Eng Technol 03:191–195
144
E. A. Haqani et al.
25. Blumenthal M (2007) 1 Encryption: strengths and weaknesses of public-key cryptography 26. Soomro S, Belgaum MR, Alansari Z, Jain R (2019) Review and open issues of cryptographic algorithms in cyber security. https://doi.org/10.1109/iCCECE46942.2019.8941663 27. Zhang J, Wang Z, Shang L, Lu D, Ma J (2020) BTNC: aA blockchain based trusted network connection protocol in IoT. J Parallel Distrib Comput 143 28. Yavari M, Safkhani M, Kumari S, Kumar S, Chen C-M (2020) An improved blockchain-based authentication protocol for IoT network management. Secur Commun Netw 2020:1–16. https:// doi.org/10.1155/2020/8836214 29. Almadhoun R, Kadadha M, Alhemeiri M, Alshehhi M, Salah K (2018) A user authentication scheme of IoT devices using blockchain-enabled fog nodes, pp 1–8. https://doi.org/10.1109/ AICCSA.2018.8612856 30. Pereira GC, Alves RC, Silva FL, Azevedo RM, Albertini BC, Margi CB (2017) Performance evaluation of cryptographic algorithms over IoT platforms and operating systems. Secur Commun Netw 2017, Article ID 2046735, 16 pages
Prediction of Type II Diabetes Using Machine Learning Approaches Tasmiah Rahman, Anamika Azad, and Sheikh Abujar
Abstract Diabetes is a major threat all over the world. It is rapidly getting worse day by day. It is found that about 90% of people are affected by type 2 diabetes. Now, in this COVID-19 epidemic there is a terrible situation worldwide. In this situation, it is very risk to go hospital and check diabetes properly. In this era of technology, several machine learning techniques are utilized to evolve the software to predict diabetes more accurately so that doctors can give patients proper advice and medicine in time, which can decrease the risk of death. In this work, we tried to find an efficient model based on symptoms so that people can easily understand that they have diabetes or not and they can follow a proper food habit which can reduce the risk of health. Here, we implement four different machine learning algorithms: Decision tree, Naïve Bayes, Random Forest, and K-Nearest Neighbor. After comparing the performance by using different parameter, the experimental results showed that Naïve Bayes algorithm performed better than other algorithms. We find the highest 90.27% accuracy from Naïve Bayes algorithm. Keywords Machine learning · Risk · Decision tree · Naïve Bayes · Random forest · K-Nearest neighbor · Diabetes
1 Introduction Diabetes mellitus, commonly called diabetes, is an incurable disease that occurs by a metabolic disorder. It occurs because of pancreas unable to make adequate insulin or the human body would not able to produce insulin to cells and tissues [1]. A T. Rahman (B) · A. Azad · S. Abujar Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh e-mail: [email protected] A. Azad e-mail: [email protected] S. Abujar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_10
145
146
T. Rahman et al.
large amount of people is affected by diabetes and die for it. In 2013, International Diabetes Federation indicated that 382 million people have diabetes mellitus which brings 6.6% of the total increased population in the world. It has been expected by analyzing the world healthcare medical data that within the year 2030 diabetes disease might grow from 376 to 490 billion [2]. Diabetes is not only a disease but also a creator of a lot of diseases. It harms the heart, eyes, kidneys, nerves, blood vessels, etc. Autoimmune reactions, unhealthy lifestyle, unhealthy food habits, lack of exercise, fatness, environmental pollution, and genetic are mainly responsible for diabetes. Besides, there is a lot of reason for diabetes to occur. The unplanned urbanization in Bangladesh is one of the premium reasons. People cannot get enough places for a culture like playing games and exercise. Moreover, people eat junk food like pizza, burgers, and soft drinks, which are full of sugar and fat. Type 1 diabetes happens when the pancreas will not be able to produce insulin. Maybe you get type 1 diabetes by ancestral if your parents have it. And it is mostly found in children. We can see Type 1 diabetes symptoms like thirst, tiredness, weight loss, frequent urination, and increased appetite in a diabetic person. Type 2 diabetes occurs when cells and tissue in the body cannot produce insulin, and it is also called non-insulin subordinate diabetes mellitus or adult starting diabetes [1]. Generally, it is found in people with high BMI and who have an inactive lifestyle. Middle-aged people are more prone to diabetes. Many types of hormones are secreted during women’s pregnancy. Those hormones raise blood sugar levels in the body, and that is why gestational diabetes occurs. There is a possibility of type 2 diabetes and obesity later in someone who has gestational diabetes. If gestation diabetes is untreated, babies will die before birth. There is no clear pattern of inheritance of type 2 diabetes. Hence, awareness and drugs can improve people’s health, and there is no permanent cure for diabetes. People who have type 2 diabetes can make insulin but cannot use it properly. The pancreas produces enough insulin and tries to get glucose into the cell but cannot do it, and thus the glucose builds up in the blood. People who have type 2 diabetes are said to have insulin resistance. Scientists have found several bits of the gene that affect to make insulin. Extra weight or obesity causes insulin resistance. When our digestive system cannot work, it is responsible for type 2 diabetes. Diabetes is not only dependent on age. It is dependent on many other factors like insulin, BMI, blood sugar level, etc. Sometimes, it faces some problems when people suffer more diseases of the same category. At that time, physicians are not able to determine this disease properly. For this concern, in recent times, machine learning techniques are used to develop software to help doctors for making the decision on diabetes disease at a very early stage. Early stage prediction of the probability of a person at the risk of diabetes can reduce the death rate. Machine learning algorithms are used in the medical dataset with multiple inputs and identify diseases more accurately at a low cost. Machine learning is an area of computer science that uses several statistical methods and analyzes data, processes data, and finds a useful pattern between data and makes the different pattern to achieve the expected goal. It teaches a computer so that
Prediction of Type II Diabetes Using …
147
the computer can learn data, make decisions, and think without human interaction. It can predict accurate prediction when data is given in a computer system.
2 Background Study Diabetes prediction is the most researchable topic in machine learning. Most of the research work about predict diabetes has been done by several algorithm. Tigga et al. utilized six machine learning methods on PIMA Indian dataset and their own dataset. For the purpose of their research work, RStudio was used for implementation and R programming language was used for coding. After that, they have compared both datasets against each other and got 94.10% accuracy from Random Forest Classifier [1]. Faruque et al. done the work with different risk factor for early prediction by using C4.5 Decision Tree, KNN, Naïve Bayes, and SVM on adult population dataset and attained highest accuracy 73.5% from C4.5 Decision Tree [2]. Wu et al. aimed to make a model that would be able to adaptive to more than one dataset and improve accuracy of prediction; for this reason, they used total three datasets and used WEKA toolkit and attained 95.42% accuracy which is 3.04% higher than others [3]. Naz et al. approached deep learning to make a model that would be able to measure risk of diabetes in early stage by using Naïve Bayes, Artificial Neural Network, Decision tree, and Deep learning, for data preprocessing they have used sampling technique, and deep learning gave the highest accuracy rate of 98.07% [4]. The hypothesis of Maniruzzaman et al. is that for feature selection technique, they used LR-RF combination, which is a machine learning-based system, and got 94.25% accuracy [5]. Sai et al. represented several algorithms like K-means Algorithm, Random Forest Decision Tree, KNN Logistic Regression, SVM, and Naive Bayes, then compared performance between them and discovered an accuracy of 93% in SVM which is the highest accuracy [6]. Kowsher et al. presented a model by using scikit-learn and applied seven different classifiers and found highest accuracy in Random Forest which is 93.80% [7]. Islam et al. utilized three machine learning algorithms and applied percentage split evaluation technique and tenfold cross validation, and best result achieved by Random Forest is 99% and 97.4% for percentage split method and tenfold cross validation [8]. Shuja et al. utilized five machine learning algorithms in two phases for reducing data imbalance and obtained desired accuracy of 94.70% from one phase with SMOTE and Decision tree [9]. Chen et al. used K-means and Decision tree algorithms to build a hybrid prediction model, and they found the best accuracy of 90.04% from that hybrid prediction model than other classification models [10]. Singh et al. used medical data for diagnosis diabetes and applied Naïve Bayes, Multilayer Perceptron and Random Forest machine learning algorithm and k-fold cross validation and percentage split method, trained the dataset with preprocessing technique and without preprocessing technique and achieved better average accuracy for NB with the preprocessing technique [11]. Sisodia et al. have experimented to predict the early stage diabetes possibility; for this, they utilized Decision Tree, Naïve Bayes, and Support Vector Machine and
148
T. Rahman et al.
earned good accuracy of 76.30% by Naïve Bayes which is verified by ROC curve [12]. Xu et al. proposed a model which is able to predict the risk of type 2 diabetes at initial stage by using RF-WFS algorithm and XGBoost classifier and earned best accuracy of 93.75% [13]. Tiwari et al. did feature selection by Random Forest, estimated diabetes by XGBoost, and compared it with ANN approach but achieve better accuracy 78.91% from XGBoost [14]. Khanam et al. basically used machine learning and neural network in their research and build a NN model which has two hidden layers. Among seven ML algorithms, Logistic Regression and Support Vector Machine worked perfectly. They attained 88.6% accuracy [15]. Pal et al. in their proposed model applied SVM, KNN, and Decision tree methods for detecting the presence of diabetes. SVM can detect either the patient has diabetes or not with 80% accuracy [16].
3 Research Methodology To find our expected goal, we followed few stages that are explained shortly in this section. We will also describe the data collection process and machine learning method which are used to finding our expected output. For implementation, we divide our dataset into two parts, training dataset and testing dataset. The overall process which we followed to implement this work is shown in Fig. 1.
3.1 Dataset Information In any research work, the first and most important thing is data. It is considered the heart of machine learning process. In this work, dataset is collected from the hospital in Bangladesh. Dataset is created from people who have diabetes symptoms and it may cause diabetes. This dataset consists of 18 columns and 1152 rows. Here, first 17 columns are considered as feature which is used for predicting last column “Class” that define the patients have diabetes or not. In this dataset, “No” means not affected by diabetes and “Yes” means affected by diabetes.1152 rows in dataset mean that there is 1152 patients’ information provided in this dataset. This dataset consists of 678 patients’ information who are not affected by diabetes and 474 patients’ information who have diabetes. In Table 1, dataset description was given.
3.2 Data Preprocessing Data preprocessing is very important for dataset. It can prepare the raw data for making it suitable to build different machine learning model. In the real world, it is not possible to get an organized data. Most of the time, the dataset is incompatible,
Prediction of Type II Diabetes Using …
149
Fig. 1 Experimental workflow of the analysis
lacks particular important behavior, and contains many errors in real world. Data preprocessing is a technique to reduce this problem. Computer can work only in numeric value. We convert the categorical value to number by using level encoding. For example, where there is “Yes” we convert it to 1 and where is “No” we convert it to 0.
3.3 Train/Test Split For evaluating the performance of machine learning algorithm, train-test split method are used. This method is utilized any supervised learning algorithms. Train/Test split method means when the dataset is divided into two parts. One part is utilized for training the model that is called training dataset and another part is utilized for testing the model. Large portion of data is utilized for training the model. In this
150 Table 1 Dataset description
T. Rahman et al. S. no.
Features name
Type
1
Age
Numeric
2
Gender
Nominal
3
Insulin
Numeric
4
Polyuria
Nominal
5
Sudden weight loss
Nominal
6
Polydipsia
Nominal
7
Blood pressure
Nominal
8
Glucose
Numeric
9
Diabetes in family history
Nominal
10
Blurry vision
Nominal
11
Dry skin
Nominal
12
Fatigue
Nominal
13
Irritability
Nominal
14
Slow healing sores
Nominal
15
Frequent ınfection
Nominal
16
Increased appetite
Nominal
17
Numbness in hand or feet
Nominal
18
Class
Nominal
study, randomly 75% data utilized for training the system and 25% data for testing the prediction.
3.4 Algorithm Selection For predicting the accuracy, we utilize several machine learning techniques in this work. Here, we applied four machine learning algorithms. These four algorithms are Random Forest, Naive Bayes, Decision Tree, and K-Nearest Neighbor; then, we compared these algorithms based on different parameters. In this stage, our model is capable for testing attribute as an input. After comparing these algorithms, we will find the best algorithm that can predict the output more accurately. Decision Tree. Decision Tree is a structured classifier that used the concept of classification and concept of regression. When an input comes in the model, decision tree shows the class of that particular input. The significance of decision nodes is nothing but test. Test is performed on the attribute. To make a decision tree, we need a dataset. Then, we have to calculate the entropy of target value and predictor attribute value. Then, we will gather information of all attribute and finally which attribute has more information would be root node. In this way, data would be split and we will get our decision tree and could be able to make decision.
Prediction of Type II Diabetes Using …
151
Naïve Bayes. Naïve Bayes machine learning algorithm is a probabilistic algorithm. It is based on what is acquainted as Bayesian system. This classifier can be used when the dataset is too large. The target of this machine learning model is to find the conditional probability of the output. p(a|b) = [( p(b|a) ∗ p(a))/ p(b)]
(1)
In Eq. (1), p(a|b) is conditional probability, p(b|a) is the probability of the given class and p(b) is the prior probability of the class. Random Forest. Random Forest is a kind of powerful and popular ensemble classifier which is using decision tree algorithm in a randomized way. It is combined of multiple decision tree. To build every single decision tree, it uses bagging technique. When we classify a new object, we got classification from each tree as tree vote. And the major vote for classification is accepted. That is why it rather provides more accurate result than single decision tree. And in the case of regression, it takes the average of the outputs by different trees. K-Nearest Neighbor. K-Nearest Neighbor is considered as supervised machine learning algorithm that capable of finding out the classification and regression using numbers (K) neighbors. K would be integer number. To execute KNN, first we need a categorical dataset. Then, the value of K will be defined as odd number. Next calculate the distance of new instance from nearest neighbor. Finally, new instance would assign in majority of neighbor class. Finally, new instance would assign in majority of neighbor class. It works with either Euclidean distance or Manhattan distance with neighbor vote.
4 Result and Discussions After trained the model with our collected dataset, now we will describe the results and analysis parameter that we used in this research work. To determine the best algorithm from which we can obtain the highest accuracy, we used train-test split method where randomly 75% data is used for training the model and 25% data is used for testing. To calculate the performance of each algorithm, we built confusion matrix. The confusion matrix has four parameters: True Positive (TP), False Positive (FP), False Negative (FN), and True Negative (TN). From this value, we find the AUC, Accuracy, Precision, F1-Score, and Recall for analyzing the performance of every machine learning algorithm.
4.1 Performance Evaluation In this section, we will define all the performance parameters which we used in this work.
152
T. Rahman et al.
Accuracy. Accuracy means how much we predict correctly. It is measured by the total number of correct predictions divided by total number of predictions made by the system. Accuracy =
TP + TN TP + FP + TN + FN
Precision. Precision is measured by the truly affected people divided by those people who are classified affected by the model. Precision =
TP TP + FP
Recall. Recall is measured true positive divided by total number of people who are actually affected. Recall =
TP TP + FN
F1-Score. F1-Score is calculated by Recall and Precision. It is called the harmonic mean of Recall and Precision. It is measured by, F1-Score = 2∗
precision ∗ recall precision + recall
ROC Curve. The full form of ROC is receiver operating characteristic which is a graphical plot of TPR (True-Positive Rate) and FPR (False-Positive Rate) or we can say a comparison of sensitivity and 1-specificity. It measures different threshold values to determine the best threshold point of the model. From ROC curve, we understand how good the model is for predicting positive and negative classes. AUC. AUC is the area under ROC curve. This is used to determine the performance of the classifier. Its range is 0 to 1. The higher value of AUC means better performance and 0.5 means the model has no separation capacity. For balanced dataset, AUC are useful for measuring the performance. Higher value defines better performance.
4.2 Compare Performances For evaluating the model, at the end of Dataset preprocessing, we trained the model by using 75% data and tested the model by using 25% data. After evaluating four different algorithm’s performance, we added the value of Accuracy, Precision, Recall, F1-Score, and AUC in the table which is given below. In Table 2. we see, four machine learning algorithms are applied for early prediction of type 2 diabetes that are Naïve Bayes, Decision Tree, K-Nearest Neighbor, and Random Forest. We measure the performance of this algorithms by calculating
Prediction of Type II Diabetes Using …
153
Table 2 Performance of different algorithms Accuracy (%)
Random forest
Naive Bayes
Decision Tree
K-Nearest Neighbor
85.76
90.27
82.29
53.12
Precision
0.857
0.940
0.798
Recall
0.824
0.839
0.816
0.477 0.328
F1-Score
0.840
0.887
0.807
0.398
AUC
0.876
0.891
0.822
0.504
F1-Score, Precision, Recall, and AUC score. Table 2 showed us that Naïve Bayes algorithms performed better than other algorithms to predict type 2 diabetes at early stage. We find the highest accuracy 90.27% form Naïve Bayes algorithm. We also find the highest Precision, Recall, F1-Score, and AUC from Naïve Bayes Algorithm which is 0.940, 0.839, 0.887, and 891, respectively (Fig. 2). In Fig. 3, the separate ROC curve is graphically plotted in the same axis and Area Under Curve (AUC) is analyzed for all algorithms. This is used to measure the performance of the model. The area which cover the most under the curve is considered the best classifier among all the classifier. In Fig. 3, we see that most area under the curve is covered by Naïve Bayes algorithm. So Naïve Bayes is the best classifier among all classifier which we applied in this research work (Fig. 4). Overall, we have chosen Naïve Bayes Algorithms for predicting type 2 diabetes which achieve the highest performance based on different evaluation techniques.
Fig. 2 Accuracy of different classifiers
154
Fig. 3 ROC curve and AUC score of different classifiers
Fig.4 Precision, Recall, F1-Score of different classifiers
T. Rahman et al.
Prediction of Type II Diabetes Using …
155
5 Conclusion and Future Work In this work, we endeavored to create a model which can predict diabetes at an early stage. If a person gives some personal characteristic and symptoms, the model can predict whether the person has diabetes or not. In our dataset, there are 18 attributes and 1152 instances in this dataset. At first, we convert the categorical value to number by using level encoding. We used percentage split technique to divide our dataset into two parts training and testing dataset. In percentage split method, we used randomly 75% as a training dataset and 25% for test the model for finding expected model. We applied four different machine learning algorithms. We measure the performance of each model by different statistical matrix such as AUC, Accuracy, Precision, F1-Score, and Recall. After applying different parameter, we found that Naïve Bayes algorithms give us the highest accuracy which is 90.27%. We also find highest Precision, F1-Score, Recall, and AUC from Naïve Bayes algorithm which is 0.940, 0.887, 0.839, and 0.891, respectively. In the future, this system can be improved and be able to find more accurate results by using more recent and real-life data or by applying advanced and combined algorithm to develop this model.
References 1. Tigga NP, Garg S (2020) Prediction of type 2 diabetes using machine learning classification methods. Procedia Comput Sci 167:706–716. https://doi.org/10.1016/j.procs.2020.03.336 2. Faruque MF, Asaduzzaman, Sarker IH (2019) Performance analysis of machine learning techniques to predict diabetes mellitus. In: 2nd International conference on electrical, computer and communication engineering, ECCE 2019, pp 1–4 (2019). https://doi.org/10.1109/ECACE. 2019.8679365. 3. Wu H, Yang S, Huang Z, He J, Wang X (2018) Type 2 diabetes mellitus prediction model based on data mining. Inf Med Unlocked 10:100–107. https://doi.org/10.1016/j.imu.2017.12.006 4. Naz H, Ahuja S (2020) Deep learning approach for diabetes prediction using PIMA Indian dataset. J Diabetes Metab Disord 19:391–403. https://doi.org/10.1007/s40200-020-00520-5 5. Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM (2020) Classification and prediction of diabetes disease using machine learning paradigm. Heal Inf Sci. Syst. 8:1–14. https://doi. org/10.1007/s13755-019-0095-z 6. Sai PMS, Anuradha G, Kumar VP (2020) Survey on type 2 diabetes prediction using machine learning. In: Proceedings of 4th international conference on computing methodologies and communication ICCMC 2020, pp 770–775 (2020). https://doi.org/10.1109/ICCMC48092. 2020.ICCMC-000143 7. Kowsher M, Tithi FS, Rabeya T, Afrin F, Huda MN (2020) Type 2 diabetics treatment and medication detection with machine learning classifier algorithm. Springer Singapore. https:// doi.org/10.1007/978-981-13-7564-4_44 8. Islam F, Ferdousi R, Rahman S, Bushra HY (2019) Computer vision and machine intelligence in medical image analysis 9. Harish Sharma KG (2019) Advances in computing and intelligent systems 10. Chen W, Chen S, Zhang H, Wu T (2018) A hybrid prediction model for type 2 diabetes using K-means and decision tree. In: Proceedings of IEEE international conference on software
156
11. 12. 13.
14.
15. 16.
T. Rahman et al. engineering services science, ICSESS, pp 386–390. https://doi.org/10.1109/ICSESS.2017.834 2938 Gnana A, Leavline E, Baig B (2017) Diabetes prediction using medical data. J Comput Intell Bioinforma 10:1–8 Sisodia D, Sisodia DS (2018) Prediction of diabetes using classification algorithms. Procedia Comput Sci 132:1578–1585. https://doi.org/10.1016/j.procs.2018.05.122 Xu Z, Wang Z (2019) A Risk prediction model for type 2 diabetes based on weighted feature selection of random forest and xgboost ensemble classifier. In: 11th International conference on advanced computational intelligence, ICACI 2019, pp 278–283. https://doi.org/10.1109/ ICACI.2019.8778622 Tiwari P, Singh V (2021) Diabetes disease prediction using significant attribute selection and classification approach. J Phys Conf Ser 1714:012013. https://doi.org/10.1088/1742-6596/ 1714/1/012013 Khanam JJ, Foo SY (2021) A comparison of machine learning algorithms for diabetes prediction. ICT Express. https://doi.org/10.1016/j.icte.2021.02.004 Journal I, IRJET- Diabetes prediction using machine learning
Authentication and Security Aspect of Information Privacy Using Anti-forensic Audio–Video Embedding Technique Sunil K. Moon
Abstract Presently, fraud in online data transmission becomes a more critical and serious issue due to its security and authentication. This paper contributes and provides the correct path to these problems. Till date, many such aspects have been developed using the data embedding approach to embed the one-bit secret data into the video which produces less security and authentication. The implemented concept provides authentication and security aspects where more than three bits are embedded into video pixels using the forensic pixel adjustment mapping (FPAM) technique. The different types of image processing attacks are also tested which do not create any distortion. The extensive observed and verified software results of implemented approach have been compared to the state-of-the-art video data embedding methods. It is observed that the implemented forensic video data embedding technique is better as compared to any existing approaches. Keywords Anti-forensic detection · Authentication · Security · Video data embedding · FPAM
1 Introduction Nowadays, steganography and steganalysis tools are giving solutions for security and authentication purposes. It provides the challenges faced by the researchers in the field of data security and gives recommendations and suggested a new tool [1, 2]. To create a security model using steganography, data embedding capacity, security, imperceptibility, robustness, and authentication of secret data are major concerns [3, 4]. Patel et al. in [5] have proposed a method that has applied two different transformations DCT and DST on the non-dynamic region of specifically selected video frames. The components of both transformations in the frequency domain are used as a carrier object to conceal secret messages that provide high-level security along with the excellent visual quality of stego video which would be shared over an S. K. Moon (B) Department of Electronics and Telecommunication, SCTRs Pune Institute of Computer Technology, (PICT), Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_11
157
158
S. K. Moon
unsecured network channel. It produces low security, imperceptibility, and robustness of secret data. Abood et al. in [6] have verified the transformation (encoding) and steganography techniques which are invested to produce an efficient system to secure communication for an audio signal by producing an efficient method to transform the signal into a red–green–blue (RGB) image. The secret data as the image is hidden in a cover audio file by using the least significant bit (LSB) method in the spatial and transform domains. The introduced method enables robustness for sending an audio message securely with a size less than the original audio file and in meaningless form. Both the methods have data embedding capacity, security, and authentication [6]. Azam et al. in [7] explained how steganography is used to protect the vulnerable data to unauthorized interception communication purposes. However, there is an always trade-off between capacity and imperceptibility. In 2017, Chin-Feng Lee et al. have suggested an efficient reversible data hiding with reduplicated EMD using image interpolation and edge detection. It produces the embedding difference of the image’s feature information and verifies the pixels that are pointed at the edge and inserts different data payloads according to the application demands. The results show data payload of 3.01bpp which is a very high embedding capacity [8]. Liu et al. in [9] implemented an enhanced general EMD by dividing a group of n cover pixels into multiple groups. By this method, the embedding capacity can be further improved from general EMD [9]. Younus et al. in [10] have addressed the drawbacks of EMD approach like robustness and concealed capacity. The EMD-3 is better as compared to existing steganography methods with the values of PSNR of 55.71 dB, the payload of 52,400 bytes, and the robustness. Cueing Niu et al. in 2015 suggested EMD-3 which modifies one-bit secret digit in to 3n-ary notational system. In EMD-3 when n = 3, PSNR = 62 dB, ER = 1.58 bpp, and it is applied to a grayscale image as secret data and provides less security [11]. All existing EMD techniques can conceal one bit at a time. They have major limitations like security and authentication. But, they have very good recovery of secret data, and hence, more research should be done on these parameters. This paper is described as follows: The next section discusses the proposed work, while Sect. 3 gives the source to destination result, and its importance. Section 4 shows the system security and its performance, while system authentication is justified in Sect. 5. Section 6 indicates the experimental results in detail.
2 Proposed FPAM Methodology of Information Privacy The cover video contains many available frames to conceal private information. Any type of secret data and selected frames of the video is first converted into binary form as pixel values. The proposed algorithm is used to embed three-pixel values of secret data, and frames of video are mapped using the FPAM concealing approach which uses [4p2 + p + 4]-ary notational system where p is the decimal value of each private data as shown in Eq. (1) and Eq. (2). Z m = mod 4 p 2 + p + 4
(1)
Authentication and Security Aspect of Information Privacy … Stego-Key
159 Stego-Key Image Frames
Image Frames
FPAM Embedding Process
Authenticate Forensic Check
Original Video
Original Video
Stego-Video
FPAM Extraction Process
Audio Communication Channel
Audio
Embedded Secret data
Recovered Secret data
Fig. 1 Proposed block diagram of the security model for video crypto-steganography transmitter
Fa = (E − F(b1 , b2 , b3s ) ∗ mod ∗ 4 p 2 + p + 4
(2)
where E is the secret information size and b1 , b2 , b3 are the three pixels of secret data. P decides any three-pixel points of original video. As p increases, a greater number of pixels are used to conceal the private information. For P = 1, 9-ary notational and P = 2, 22-ary notational and P = 3, 43-ary notational system. Once any type of secret data is embedded into frames of video, it is called embedded video and it is sent from source to destination. The authentication forensic process checks the incoming embedded video along with private key, and if it is verified to be correct, then the secret data is sent to destination end, otherwise it stops in the communication channel as shown in Fig. 1.
2.1 The FPAM Process Step 1 The three-pixel bits of the original video convert it into a binary value. The three pixels (b1 , b2 , b3 ) are defined as J = b1 *1 + b2 *2 + b3 *3 where J is the selected frame of original video. The obtained frame of the original video forms matrix values in terms of digits of each block index. Step 2 Calculate the values of the three-pixel blocks X as indicated in Eq. (3)
160
S. K. Moon
F(b1 , b2 , b3 ) = (b1 ∗ 1 + b2 ∗ 2 + b3 ∗ 3) ∗ mod ∗ Z m
(3)
where Z = 4 p 2 + p + 4 . Step 3 The received stego video with three pixels can be calculated as shown in Eq. (4) Fa = (E − F(b1 , b2 , b3 ) ∗ mod ∗ Z m
(4)
where E is the secret data digit value. The three-pixel values lie between 0 and 0.49, then it is equal to 0. If it is from 0.5 to 0.99, then it is 1. If it varies from 1 to 1.49, then it is equal to 1, and if this value lies between 1.5 and 1.99, then it is equal to 2 and so on as per the following conditions. N = b1 /Fa , Q = b2 /Fa , R = b3 /Fa . Hence, new stego pixel values can be obtained by following conditions b1 = b1 + N , b2 = b2 + Q, b3 = b3 + R, a minimum error and b1 = b1 − N , b2 = b2 − Q, and b3 = b3 − R. To produce obtain the new possible stego pixel values b1 , b2 , b3 , the FPAM method generates four various conditions to obtain the three new stego pixel values as. Case 1: b1 = (N + p) and b2 = Q + [( p + 1) or ( p + 2)], b3 = R+ [( p − 1) or ( p − 2)] Case 2: b1 = N − p and b2 = Q − [( p + 1) or ( p + 2)], b3 = R− [( p − 1) or ( p − 2)] Case 3: b1 = N + [( p + 1) or ( p + 2)], b2 = Q − p, or Q + p, b3 = R+ p Case 4: b1 = N − [( p + 1) or ( p + 2)], b2 = Q + p, or Q − p, b3 − R− p Hence, the recovered secret data can be obtained using Eq. (5) F b1 , b2 , b3 = b1 ∗ 1 + b2 ∗ 2 + b3 ∗ 3 ∗ mod ∗ 4 p 2 + pm + 4
(5)
Authentication and Security Aspect of Information Privacy …
161
2.2 Process to Embed the Image into Video with an Example The secret data as the image has a large number of bits values. Every bit of image is mapped to bits values of video frames by the FPAM algorithm. To embed secret data as image into video frames for p = 1, Z m = 9, let the three-pixel values of selected frame of video (b1 , b2 , b3 ) = (5, 7, 9) and secret data E = 3. Step 1: (b1 , b2 , b3 ) = (5, 7, 9) = 5*1 + 7*2 + 9*3 mod 9 = 1. Step 2: from Eq. (4), F a = 3 − 1 mod *9 = 7. Step 3: from Eq. (5,), N = 5/7 = 0.71, b1 = 1, Q= 7/7 = 1, so b2 = 1, R = 9/7 = 1.28, so b3 = 1. The new stego pixel values are b1 , b2 , b3 = (6, 8, 10). The recovery of secret data can be obtained using F b1 , b2 , b3 = (b1’ *1 + b2’ *2 + b3’ *3)* mod * (4p2 + p + 4) = 6*1 + 8*2 + 10*3* mod 9 = 7 which is not secret data E. Now to get the secret data E = 3, apply four various conditions. Case 1: b1 = (N + p) = 1 + 1 = 2, b2 = Q + [(p + 1) = 3, b3 = R + (p − 1) = 1. Hence, new stego values are (b1 , b2 , b3) = (7, 10, 10). pixel F b1 , b2 , b3 = b1 ∗ 1 + b2 ∗ 2 + b3 ∗ 3 ∗ mod ∗ 4 p 2 + p + 4 = 7*1 + 10*2 + 10*3* mod 9 = 57 mod 9 = 3 which is equal to secret data E.
2.3 Step to Embed the Audio into Video with an Example Every bits of audio is superimposed with bits values of video images by the FPAM method. For p = 1, Z m = 9, let the three-pixel values of selected frame of video (b1 , b2 , b3 ) = (15, 18, 20) and secret data E = 3. Step 1: (b1 , b2 , b3 ) = (15, 18, 20) = 15*1 + 18*2 + 20*3 mod 9 = 3. Step 2: F a = 3 − 3 mod *9 = 9. Step 3: N = 15/9 = 1.66, b1 = 2, Q = 18/9 = 2, so b2 = 2, R = 20/9 = 2.22, so b3 = 2. The new stego pixel values are (b1 , b2 , b3 ) = (17, 20, 22). The recovery of secret using. data canbe obtained F b1 , b2 , b3 = b1 ∗ 1 + b2 ∗ 2 + b3 ∗ 3 ∗ mod ∗ 4 p 2 + p + 4 = 17*1 + 20*2 + 22*3* mod 9 = 6 which is not secret data E. Now to get the secret data E = 3, apply four various conditions. Case 1: b1 = (N + p) = 3, b2 = Q + [(p + 1) = 4, b3 = R + (p − 1) = 2. Hence, new stego pixel values are (b1 , b2 , b3 ) = (18, 22, 22). The recovery of secret data can be obtained using. F b1 , b2 , b3 = b1 ∗ 1 + b2 ∗ 2 + b3 ∗ 3 ∗ mod ∗ 4 p 2 + p + 4 = 18*1 + 22*2 + 22*3* mod 9 = 128 mod 9 = 2 which is not equal to secret data E. Case 2: b1 = N − p = 1 and b2 = Q *− (p + 1) = 0, b3 = R − (p − 1) = 2. Hence, 1 , b2 , b 3 ) = (16, 18, 22). (b F b1 , b2 , b3 = b1 ∗ 1 + b2 ∗ 2 + b3 ∗ 3 ∗ mod ∗ 4 p 2 + p + 4 = 16*1 + 18*2 + 22*3* mod 9 = 118 mod 9 = 1 which is not equal to secret data E. Case 3: b1 = 5, b2 = 1, b3 = 3. Hence, (b1 , b2 , b3 ) = (20, 19, 23).
162
S. K. Moon
F b1 , b2 , b3 = b1 ∗ 1 + b2 ∗ 2 + b3 ∗ 3 ∗ mod ∗ 4 p 2 + p + 4 = 20*1 + 19*2 + 23*3* mod 9 = 127 mod 9 = 1 which is not equal to secret data E. Case 4: b1 = 0, b2 = 3, b3 = 1. Hence, (b1 , b2 , b3 ) = (15, 21, 21). F b1 , b2 , b3 = b1 ∗ 1 + b2 ∗ 2 + b3 ∗ 3 ∗ mod ∗ 4 p 2 + p + 4 = 15*1 + 21*2 + 21*3* mod 9 = 120 mod 9 = 3 which is equal to secret data E = 3. Hence the secret data is recovered.
2.4 Step to Embed the Audio into Audio The secret as well as cover data audio have a large number of samples which are transformed into bits. For P = 2, Z m = 22-ary notations. Let us consider the three-pixel bits of cover video (b1 , b2 , b3 ) = (20, 30, 40) and secret data E = 12, p = 2. Step 1: (b1 , b2 , b3 ) = (20, 30, 40) = 20*1 + 30*2 + 40*3 mod 21 = 11. Step 2: F a = 12 − 11 mod *22 = 21. Step 3: N = 20/21 = 0.95 = so b1 = 1, Q = 30/21 = 1.42 = 1.5, so b2 = 2, R = 40/21 = 1.90, so b3 = 2. The new stego pixel values become (b1 , b2 , b3 ) = (21, 32, 42). of private data can be. The recovery F b1 , b2 , b3 = b1 ∗ 1 + b2 ∗ 2 + b3 ∗ 3 ∗ mod ∗ 4 p 2 + p + 4 = 21*1 + 32*2 + 42*3* mod 21 = 1 which is not secret data E. Now to get the secret data E = 12, apply four various conditions. Case 1: b1 = (N + p) = 3, b2 = Q + [(p + 1) = 5, × 3 = R + (p − 1) = 3. Hence, new stego pixel values are (b1 , b2 , b3 ) = (23, 35, 43). The hidden information can be obtained by. F b1 , b2 , b3 = b1 ∗ 1 + b2 ∗ 2 + b3 ∗ 3 ∗ mod ∗ 4 p 2 + p + 4 = 23*1 + 35*2 + 43*3* mod 21 = 222 mod 21 = 12 which is equal to secret data E = 12.
3 Simulation Result and Its Justification 3.1 Transmitter and Receiver Section The input video is split into several frames and obtain any frame to conceal the secret data like image, audio, or text. Figure 2a shows the cover video and thumb image as secret data for video steganography, and Fig. 2b is the randomized secret data with the cover video. Figure 3a is the randomized private information and its encrypted form, while Fig. 3b is the original and embedded data video with the same size, width, and resolution. Figure 4a indicates the 15th frame of cover and concealed video at receiver end, while Fig. 4b gives the randomized data and recovered along with its histogram. Figure 5 verifies the spectrogram of cover as well as recovered
Authentication and Security Aspect of Information Privacy …
163
Figs. 2 a, b Carrier video and image for steganography and secret data with randomization
Figs. 3 a, b Randomized secret data and original and stego video
Figs. 4 a, b Fifteenth frame of cover and stego video and analytical parameters calculation
audio as Mp3 for p = 1, while Fig. 6 gives the spectrogram of original and recovered audio as a voice for p = 3 which is the same.
164
S. K. Moon
Fig. 5 Spectrogram of original and recovered audio Mp3 for p = 1
Fig. 6 Spectrogram of original and recovered audio voices for p = 3
4 System Security and Its Performance Discussion As the secret data bits values p increase, CC and CR also increase and the values of the PSNR decrease, and it becomes 28.53 dB, NCF = 0.5. This is the major limitation and it is called as significance value of the FPAM algorithm. The simulation results for various secret data of sizes 512 × 512 and samples values are shown in Table 1.
5 Authentication and Security Analysis Through Attacks The authentication and security of private information are verified through number of attacks like frame rotational, histogram variations, paper and salt, and frame addition and deletion on stego video. Figure 7 indicates the framewise histogram analysis of stego video for p = 4, and Fig. 8 shows the original and stego frames of 1 to 10 for p = 3 which looks identical to each other. Figure 9 is the histogram of original and stego frames with indivisible RGB for p = 3, and Fig. 10 gives the histogram attack for stego video for frames of 11 to 20.
Authentication and Security Aspect of Information Privacy …
165
Table 1 Simulations results for various secret data of sizes CR (in bpp)
NCF
p
CC in bits
PSNR (dB) at different ER
1
1
0
2,097,152
58.03
58.09
58.07
58.06
1.58
1
1
3,313,500
57.12
57.19
57.18
57.17
2.22
1
2
4,655,677
56.10
56.17
56.17
56.17
2.71
1
3
5,683,281
52.60
52.65
52.66
52.67
3.08
1
4
6,459,228
48.19
48.15
48.16
48.16
3.38
1
5
7,088,373
45.50
45.56
45.54
45.55
3.84
1
7
8,053,063
40.17
40.18
40.13
40.14
5.15
0.56
35
–
28.53
28.48
28.43
28.6
Thumb (540*540)
Sign (640*640)
Thumb (540*540)
Sign (540*540
Secret data as audio like Mp3, Tone, and .Wav CR (in bpp)
NCF
p
CC in bits
Mp3 456321samples
Voice 543216 samples
.Wav 5892134 samples
Tone 643512 samples
1
1
0
2,097,152
58.03
58.09
58.07
58.06
1.58
1
1
3,313,500
57.12
57.19
57.18
57.17
2.22
1
2
4,655,677
56.10
56.17
56.17
56.17
2.71
1
3
5,683,281
52.60
52.65
52.66
52.67
3.08
1
4
6,459,228
48.19
48.15
48.16
48.16
3.38
1
5
7,088,373
45.50
45.56
45.54
45.55
3.84
1
7
8,053,063
40.17
40.18
40.13
40.14
5.15
0.56
35
–
28.53
28.35
28.35
28.43
Fig. 7 Framewise histogram analysis of stego video for p = 4
166
S. K. Moon
Fig. 8 Original and stego frames 1 to 10 for p = 3
Fig. 9 Histogram of original and stego frames with indivisible RGB for p = 3
Fig. 10 Histogram attack for stego video for frames 11 to 20
5.1 Concealed Rate (CR) and Concealed Capacity (CC) It also gives the information on the extent up to which the private information can be concealed into digital media which maintains the concept of any type of steganography which remains unchanged [4]. CR rate is calculated by Eq. (6) CR = 4p2 + p + 4 /2
(6)
Authentication and Security Aspect of Information Privacy …
167
where p = Embedding parameter in bpp or bits. For p = 0, CR = 1 bpp, CC = 2,097,152 bits, p = 1, CR = 1.58 bpp, CC = 3313500 bits, p = 2, CR = 2.22 bpp, CC = 4655677 bits, p = 3, CR = 2.71 bpp, CC = 5633281 bits, and so on, it is shown in Eq. (7). CC = CR ∗ 512 ∗ 512 ∗ 8
(7)
5.2 Normalized Correlation Factor (NCF) The NCF verifies the similarity between cover and stego videos which are to be correlated [12, 13]. If it is 1 or (100%), then it produces a perfect correlation for better security. The proposed algorithm FEMD has a very high NCF which is shown in Eq. (8) m n NCF =
i=1
j=1
m n i=1
I (i, j).I (i, j)
j=1 [I (i,
j)]2
(8)
where m×n I (i, j) I (i, j)
size of secret data. bits of secret data. bits of stego data approach.
5.3 Mean Square Error (MSE) and Peak Signal to Noise Ratio (PSNR) The MSE and PSNR are the most important security parameters in data embedding approach [13]. MSE and PSNR calculations are shown in Eq. (9) and Eq. (10). MSE =
m n 2 1 I (i, j) − I (i, j) m × n i=1 j=1
where m×n I (i, j) I (i, j)
size of private information. bits of the cover data. bits of stego data
(9)
168
S. K. Moon
PSNR = 10 × log10
2552 MSE
(10)
6 Experimental Result and Discussion 6.1 Functionality Comparison with Existing EMD Techniques The implemented security model FPAM is used to increase the information security by verifying CR, CC, MSE, NCF, PSNR, resizing, and randomization of stego video. For the both secret data, the proposed security model has CR = 1.58 bpp, CC = 3313500 bits for color image with p = 1. It also considered several standard color images and audio as a database for the simulation results. A large database like Baboon (512*512, Cameraman (512*512), Lena 256*256), Photograph (256*256), Plants (512*512), Signature (512*512), Thumb (512*512), Vegetable (128*128) and audio like MP3 of 874,569 samples, .Wav 846,395 samples, Voice Tone of 764,983 samples are used. Chin-Feng Lee et al. [8], Yanxiao Liu et al. [9] have applied EMD algorithm and produced payload 38,136 bits for n = 3, PSNR = 39.28 dB, and PSNR = 47.65 dB, bpp = 2 with less security of secret data and embedding capacity. The comparison is done with Niu et al. [11] and Younus et al. [10] where ER = 1 0.58 bpp for n = 3, PSNR = 62 dB, and EC = 52,400 bytes, PSNR = 55.68 dB. In the presented method, there is CR = 1.58 bpp for p = 1 and CC = 3,313,500 bits, and it is applied on the color images as well as audio as secret data and presented an audio–video base FPAM steganography approach in terms of CR, CC, and PSNR. The proposed work has better embedding capacity, robustness, PSNR, CC, and CR in bpp as shown in Table 2. The graph between ER verses PSNR and MSE is shown in Figs. 11a, b, whereas the values of ER increase, the PSNR also increases, and it is stable at 61 dB with a minimum value of MSE. In the works of Hui-Shih Leng et al. [15] which applied EMD on the gray image, as the values of w increase, the payload also increases, the values of PSNR start to decrease, and it becomes 30.13 dB minimum when w = 27 and payload = 4.75 bpp. It is the failure value of the Hui-Shih Leng et al. [15] EMD technique, while the proposed techniques produce the large values of CC, CR, and NCF with critical values p = 35, CR = 5.85 bpp, and PSNR = 29.12 dB where the FPAM approach fails as given in Table 3. In the Shaswata Saha et al. [14], EMD method, as the values of N and K increase, the PSNR starts to decrease, and it becomes 34.01 dB minimum when N and K = 4, while the proposed techniques produce the large values of CC, CR, NCF, and PSNR with color image and audio as secret data as compared to Shaswata Saha et al. [14] as shown in Table 4.
Authentication and Security Aspect of Information Privacy …
169
Table 2 Functionality comparisons of existing EMD algorithm and proposed technique Chin-Feng Lee et al. [8]
Yanxiao Liu et al. [9]
Niu et al. [11]
Younus et al. [10]
Proposed FPAM approach
Payload (bits) 38,136, n = 3
bpp, n = 2
1.58 bpp, for n = 52,400 bytes 3
CR = 1.58 bpp for p = 1, CC = 2,097,152
PSNR (dB) 39.28 PSNR (dB) 47.65
62.00
55.69
58.09
Embedding capacity (low)
Embedding capacity (low)
Embedding capacity (low)
Embedding capacity (low)
Embedding capacity (high)
NCF (no)
No
No
No
Yes
MSE (no)
No
No
No
Yes
Attacks (no)
No
No
No
Yes
Video steganography
No
No
No
Yes
Audio steganography
No
No
No
Yes
Fig. 11 Relationship between ER verses PSNR and MSE for p = 2 and p = 4 Table 3 Comparison of Hui-Shih Leng et al. [15] and FPAM approach Hui-Shih Leng et al. [15]
Proposed approach color images
w
Payload (bpp)
PSNR
p
CR
CC (bits)
PSNR
3
1.5
49.89
3
2.71
5,683,281
50.45
4
2
46.74
4
3.08
6,459,228
48.89
7
2.81
42.12
7
3.84
8,053,063
45.60
9
3.17
39.17
9
4.19
8,787,066
41.72
27
4.75
30.13
35
5.15
10,800,332
29.12
170
S. K. Moon
Table 4 Comparison of Shaswata Saha et al. [15] and FPAM approach Shaswata Saha et al. [14]
Proposed approach color images
Audio
N
K
PSNR
CR
CR
0
0
–
1
2,097,152
58.09
1
2,097,152
58.09
1
1
55.34
1.58
3,313,500
56.89
1.58
3,313,500
56.89
2
2
48.53
2.22
4,655,677
54.23
2.22
4,655,677
54.23
3
3
39.50
2.71
5,683,281
50.45
2.71
5,683,281
50.45
4
4
34.01
3.08
6,459,228
48.89
3.08
6,459,228
48.89
–
–
–
3.38
7,088,373
45.89
3.38
7,088,373
45.89
CC (bits)
PSNR
EC (bits)
PSNR (dB)
7 Conclusion In this paper, FPAM-based video data concealing concept using anti-investigation— ways, is verified to improve authentication, security, and robustness of private information. The FPAM scheme shows better results as compared to all presently available methods to increase the robustness, security and produces a low distortion error to improve the recovery of very good stego data quality. Even though this technique is good for images and audio as secret data, it fails to recover secret data information when p = 35, CR = 5.15 bpp with PSNR = 29.12 dB which is called the failure value of the system. In the future, it can be further improved by applying different algorithms or any hardware DSP, FPGA, Arduino processor.
References 1. Dalal M, Juneja M (2020) Steganography and steganalysis (in digital forensics): a cybersecurity guide. Springer J Multimedia Tools Appl https://doi.org/10.1007/s11042-020-09929-9 2. Dalal M, Juneja M (2020) A survey on information hiding using video steganography. Springer Artif Intell Rev 2021 3. Mustafa RJ, Elleithy KM, Abdelfattah E (2017) Video-steganography techniques: taxonomy, challenges, and future directions. In: IEEE long Island systems, applications, and technology conference (LISAT), pp 1–6. https://doi.org/10.1109/LISAT.2017.8001965 4. Moon SK, Raut RD (2018) Innovative data security model using forensic audio-video steganography for improving hidden data security and robustness. Inderscience Int J Information and Comput Sec 10(4):374–395 5. Patel R, Lad K, Patel M (2021) Novel DCT and DST based video steganography algorithms over non-dynamic region in the compressed domain: a comparative analysis. Springer Int J Inf Tecnol https://doi.org/10.1007/s41870-021-00788-7 6. Abood EW, Abduljabbar ZA, Alsibahee M, Hussain MA (2021) Securing audio transmission based on encoding and steganography. Indonesian J Electr Eng Comput Sci 22(3):1777–1786 7. Azam MHN, Ridzuan F, Sayuti MNSM, Alsabhany AA (2019) Balancing the trade-off between capacity and imperceptibility for least significant bit audio steganography method: a new parameter. IEEE conference on application, information and network security (AINS), pp 48–54
Authentication and Security Aspect of Information Privacy …
171
8. Lee C-F, Weng C-Y, Chen K-C (2017) An efficient reversible data hiding with reduplicated exploiting modification direction using image interpolation and edge detection. Springer J Multimedia Tools Appl 76:9993–10016 9. Liu Y, Yang C, Sun Q (2018) Enhance embedding capacity of generalized exploiting modification directions in data hiding. IEEE Access Digital Object Identifier https://doi.org/10.1109/ ACCESS.2017.2787803 10. Younus ZS, Hussain MK (2019) Image steganography using exploiting modification direction for compressed encrypted data. J King Saud Univ Comput Inform Sci https://doi.org/10.1016/ j.jksuci.2019.04.008 11. Niu X, Ma M, Tang R, Yin Z (2015) Image Steganography via fully exploiting modification direction. Int J Secur Appl 9(5):243–254 12. Kasana G, Singh K, Bhatia SS (2015) Data hiding algorithm for images using discrete wavelet transform and arnold transform. J Inf Process Syst (JIPS) 1–14. https://doi.org/10.3745/JIPS. 03.0042 13. Arab F, Abdullah SM (2016) A robust video watermarking technique for the tamper detection of surveillance system. Springer J Multimedia Tools Appl. https://doi.org/10.1007/s11042-0152800-5 14. Saha S, Chakraborty A, Chatterjee A, Dasgupta S, Ghosal SK, Sarkar R (2020) Extended exploiting modification direction-based steganography using hashed-weightage Array. Springer J Multimedia Tools Appl https://doi.org/10.1007/s11042-020-08951-1 15. Leng H-S, Tseng H-W (2019) Generalize the EMD scheme on an n-dimensional hypercube with maximum payload. Springer J Multimedia Tools Appl 78:18363–18377
Mobile Edge Computing: A Comprehensive Analysis on Computation Offloading Techniques and Modeling Schemes I. Bildass Santhosam, Immanuel Johnraja Jebadurai, Getzi Jeba Leelipushpam Paulraj, Jebaveerasingh Jebadurai, and Martin Victor Abstract In this age of computation offloading, the emerging technologies which require high computational power are switching to mobile cloud computing (MCC). But the latency is high and the QoE is bad. In order to solve this, we bring the concept of edge computing. This makes the computation and data storage near to the place where it is required and reduces the latency with increased bandwidth. Taking a decision of where and when the offloading should take place makes the system complicated. Hence, to get the aid of artificial intelligence especially machine learning technique to take accurate decisions and give an expected result. One of the recently used applications is vehicular edge computing networks which is a solution for many computational issues. An insight is given on the architecture where later on the basic models which are found in the offloading system are discussed. The models like reckoning and communication model, channel model and EH models are briefed up based on various learning approaches. At the end, the research analysis is conversed about with the shortcomings of offloading in edge computing. Keywords Mobile edge computing (MEC) · Cloud computing · Vehicular edge computing networks (VECNs) · Computation offloading
1 Introduction While taking the most recent enlargements and capabilities of smart handheld devices, mobile applications play a vital role in business and personal usage in the present era. Generally, the SMDs are attached to other devices via different wireless protocols that can function interactively and independently to a certain point. The mainly used and known smart devices include smart handheld devices and various smart applications which define a device that displays some properties of chronic computing. I. B. Santhosam (B) · I. J. Jebadurai · G. J. L. Paulraj · J. Jebadurai · M. Victor Department of CSE, Karunya Institute of Technology and Sciences, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_12
173
174
I. B. Santhosam et al.
In spite of it being well known and having a lot of advantages, integrated cloud substructure is being affected by some severe challenges for meeting the pertinent necessities which includes security, energy, stretch, and cost of creative mobile applications includes mass media, virtual gaming, increased reality (AR) and self-driving vehicular applications accomplishing in permeating handheld devices. The natural resource challenges of these unwelcome influence in handheld devices functioning, such as lack of source applications, it requires delay minimization and high data tariffs with minimum bandwidth, and competences of the remote cloud have been changing with more apposite new technologies, i.e., MEC with minor restrictions for particular puzzling applications. SMDs can be designed to assist various aspects, a range of properties concerning inescapable computing and to utilize in the three major fields of the system: physical world, human-centered environs and dispersed computing atmospheres. Edge computing was evolved due to the unforeseen growth of IoT devices, which connect to the Internet for either getting services/sending request from the/to the clouds. IoT devices produce enormous amount of data in the progression of their tasks. The capabilities of cloud computing extend by multi-access edge computing by taking it to the edge of the network and engaged up by various smart users to gain priority among same technologies. The location of the handheld devices, with adequate servers’ abilities, is located nearly at the end of the network by MEC to reach the essential user-centric requirements. Consideration of some intrinsic limitations of MEC, which includes resource allocation constraints, energy and delay constraints, is required to make the edge computation a true serviceable technique. However customary cloud computing is executed on remote servers which are located far from the user and device, MEC licenses tasks to occur in base stations and other bunch points on the network. The load of cloud computing is fluid to individual local servers; then, the MEC aids in fall of jamming on the networks and minimization of delay, growing the quality of experience (QoE) for terminal users. Figure 1 represents the taxonomy of task offloading framework for edge computing system, and it consists of three planes, namely end users, data transmission and cloud server control. The first plane is put forward of users who need computational activities to be offloaded. Macro-BSs connected to the MEC server make up the data transmission. The controller, i.e., edge cloud server installed at macro-cell BS, represents the control plane. Edge servers locate beside a wireless BS. Mobile devices link to MEC servers through wireless BS. Users connect to macro-BSs via wireless links, while scattered small BSs connect to the dominant macro-BS via a great speed network. The MEC server, as a fundamental component of the MEC architecture, provides computational wealth and loading capacity to function tasks provided by terminal users. The outlook of the computation offloading in MEC is to transfer resource-onerous chore to the neighboring cloud servers for increasing the prospective of cloud application. Basically, the concept of offloading in MEC is processing source-demanding applications, on the side of local handheld devices, absorbed to reduce the liability of the work, decrease the computation complexity and costs associated with local
Mobile Edge Computing: A Comprehensive Analysis on Computation …
175
Fig. 1 Taxonomy of computation offloading system (Source ArXiv–2017)
accomplishment. Together terminal devices and edge servers are required to operate offloading frameworks to complete computation offloading. Numerous researchers consider the subject profoundly to suggest new techniques of reaching the goals in the offloading design, then brought out their methods on the basis of mathematical, model-based, game theory, machine learning, investigative-based or a hybrid form of techniques and to increase the learning-based techniques, which are more adequate for the dynamic behavior of MDs in MEC. Therefore, the computation offloading is considered as a critical challenge in edge computing. To analyze and optimize the computation offloading in EC, offloading modeling which determines the overall EC performance plays a crucial role. Figure 2 represents a conceptual view of computation offloading, which offloads the tasks to terminal device, consists of: • Conveyance procedure • Remote implementation procedure
176
I. B. Santhosam et al.
Fig. 2 Conceptual view of computation offloading system (Source ArXiv–2017)
• Solution send back technique. Then the offload schemes add the following key components. • Task partition • Offload decision or task placement • Source allocation.
2 Literature Survey 2.1 Efficient Resource Allocation Firstly, it is presented with an I-UDEC structure and formulates the heterogeneous resources of I-UDEC with a hybrid computation offloading strategy. As solicitation segregating, reserve provision and amenity coaching state of affairs have hazards. Two-span deep bolstering learning (2TS–DRL) approaches are offered, and FL-based dispersed typical exercise way is used for training (DRL agent). Despite effective
Mobile Edge Computing: A Comprehensive Analysis on Computation …
177
integration (MEC with UDN), it faces issues such as maximum usage of system resources (full use of system resources), dynamic and low-cost schedule and privacy and security-preserving services [1]. It is aimed to launch a lively subcarrier, power allocation and computation unburdening pattern. IoT is expected to enable “smart city” and smart grid applications. The main objective is to reduce the expectancy of system-related cost, ingesting of energy than masses of handheld devices. It proposed a dual reckoning offloading and broadcasting reserve distribution procedure built on Lyapunov optimization. The MCC technologies provide pervasive computation facilities for the terminal devices and industrial-based IoT applications. Fog computing system enables computationintensive applications and remains the promising technologies for IoT, and it enables computational and broadcasting resources nearer to the terminal devices. Recharging often for energy supply is not practical due to the restrictions of the MDs (on size and allocation). EH is an energy-efficient solution because of its self-sustainable nature [2]. Computation offloading, the primary technological enabler of the IoT, helps to address personal (individual) devices’ resource restrictions. The major need is the energy consumption of IoT devices. Edge CloudSim supports dynamic decision making and compares approaches against share-first, verge-first, cloud-only, unsystematic and application leaning tactics. IoT devices depend on the database storage, transmission and provision of resources for computation of remote cloud infrastructure. The main intention is to minimize transferring time and scheduling the task for computation to what is practical for power capacity of edge devices. The edge computing standard is a core structure of the current trends of mobile communications (5G). The great use of computation offloading in IoT networks and data transmissions to the edge permits the benefits of cloud without communication disadvantages [3]. Vehicular edge computing (VEC) is a new pattern to increase the safety of automated vehicles and intellectual transference systems (ITS), and the salient point of VEC is the computation of offloading strategies. The time-varying spectrum efficiency (SE) occurs by the time-varying fading channel. Hence, due to the extra unsure allocated bandwidth, the offloading strategies are questionable. The proposed methodology is introduced to deal with the fixed SE problem and also another rounding number algorithm, minding the complexity of the proposed algorithm on the basis of resource allocation strategies of the static SE problem, and proposed and supporting algorithms to sort out the core issue for both static and dynamic tasks are introduced [4]. Edge computing witnesses IoT networks by giving away the capabilities of computation to much closer to user. The development of the IoT users for more powerful computation ability required adequate spectrum resources to convey spectrum tasks to an ES which gives the possibility to execute tasks locally. Recent innovation in hardware and software expands IoT networks with many connected devices, and this leads to high-speed data processing, storage and communications. The rise of computation capacity at EDs offers current research area called edge computing with disciplines called wireless communication and machine learning. A deep queue
178
I. B. Santhosam et al.
network is a means to acquire knowledge of an optimal task offloading scheme by reducing the everlasting utility performance without prior knowledge of the network. To find the solution for the resource allocation problem, the algorithm was introduced based on a simultaneous task learning scheme [5]. Transitive optimization problems of computation offloading and reserve provision are formulated as a MINP problem after research on many end users and terminal servers in the MEC scenario. The aim is to reduce the large energy depletion of the UTDS by limiting system variables of devices offloading devices, selection of channel, provision of resource and power. The generic scheme is used to solve the offloading decision of the devices initially, and the allocation of computing resources is upgraded by the means of myopic optimization on the basis of current offloading decisions which are the two proposed stages to solve this problem [6]. RSUs show interest to promote the applications of IoCV. Mobile communication is familiarized the IoCV scenario for enough communication bandwidth. ESs, located together with BSs in recent communication and RSUs, offer replacements to take up the tasks anyhow, and difficult positions of MABSs and RSUs make it hard to differentiate the offloading endpoints of the computing errands in IoCV. The proposed scheme for computing in recent communications visualizes IoCV to optimize the chore offloading latency and reserve consumption in the edge scenario, and it is the mobility communication system. The deployment of sensors and cognitive technology makes IoCV largely covered by academic and industry edge computing in Internet of connected vehicle, the roadside units and MBSs which are promoted as the edge nodes to put up with computing tasks and application data from enclosed vehicles [7]. Edge computing gives data analysis and computation offloading scheme which is a promising technology in IoT and mobile communication by edge computing, and it has the ability to reduce the delay and enhance the credibility of IoT services by allowing the computation workloads and generate the local data by IoT devices to offload for edge nodes. Cloud computing gives efficient services to edge users for computation, memory, software and amenities by centrally allocating resources. But it is with basic limitations related to topographical distance between the end users and data centers that offer facilities. It is stated that the no. of IoT devices will touch 34.2 billions in 2025, and it is likely to reach 125 billion by 2030. For a longterm average response time delay optimization problem, a new distributed scheme is suggested. The time slots include time span of the order of minutes or hours till it captures congestion fluctuations [8]. Fog computing is versioned as a clear approach to raise the computation ability of MDs and lessen the energy depletion. In IoT fog systems, time-varying channel environment is concentrated at first. The wireless channel is likely to be autonomous and identically distributed block fading, and it considers the estimated CSI in each block fading since it is hard to get perfect CSI in a practical network. It is formulated as a partially observable MDP problem by its imperfect CSI. To solve the problem, an offline algorithm based on the deep frequent queue network is developed with a long short-term memory scheme [9].
Mobile Edge Computing: A Comprehensive Analysis on Computation …
179
The proposed scheme in V2V offloading networks with health applications is based on CAD of clinical medicine in disaster times when electricity and communication properties are available. A credible computation offloading mechanism is introduced where each application is classified to various parts and offloaded in corresponding numerous vehicles. It is formulated as a NLP problem with constraints on topological analysis and a limited shiftless resource, which is NP-hard minimal solutions that can be gained by facing a high computational complication, and the proposed scheme is to solve large and real-time network cases with increase in vehicular solidity [10] (Table 1).
2.2 Latency Minimization The Internet of things takes a significant role in various areas, and some problems limit the future growth of IoT technologies. Hence, it is dealt with the problem of minimizing response time, and packet drops during tasks offloading under the limited energy are to enhance the reliability of task processing. The proposed scheme can be used for delay-constrained IoT systems. Energy sources such as solar power, thermal energy, wing energy and kinetic energy help to yield energy for devices in the IoT and scenario to support them. The end device needs much energy capacity, and so a proposed algorithm is launched which contains energy management that conquers other schemes minding the tasks of arrival probability, average energy harvest power, etc. [11]. Drastic-reliable and low-dormancy communication is a smart energy efficient for dynamic computation offloading. In edge computing numerous smart users participate for a familiar space of computational resources. It poses an optimal balance between average energy consumption and service delay. Power gathering ability is incorporated at MDs for computation offloading and producing durability of the battery level around given operating value. The proposed scheme with the efficient control policy has unavoidable effect on whole energy consumption. The capability of the proposed scheme meets the requirements of QoS. It benefits with online data without a priori knowledge of statistics of radio channels having the ability to take up mobile environments [12]. The mobile cloud computing (MCC) planning gives strong computing ability at the network terminal to the users. The communications between the handheld devices are gained as in LTE, and Wi-Fi direct technologies mobile devices do simultaneous network connections at the same time. MCC and partial computation offloading techniques are to reduce the time of the task in the shared range. OFDMA plan is the basis for handheld devices linking use of cellular communication resources. The proposed algorithm with the alteration of data partition solves the problems oriented to resource allocation. It gives away the benefits of minimized total delay for D2Denabled partial computation offloading in OFDMA system and proposed a plan to reduce the processing delay issues and solve challenges [13].
180
I. B. Santhosam et al.
Table 1 Summarization of resource allocation Authors name Algorithm/methodology
Advantages
Disadvantages
Yu et al.–Oct. I-UDEC framework [1]
• Task execution time is reduced • Crypto currencies has been adopted
• Lack of efficient utilization of resources • Lack of privacy and security scheme
Chang Multiparty computation • Minimized system et al.–Oct. [2] offloading and radio execution cost • Dynamic subcarrier reserve sharing proven allocation on Lyapunov opt. scheme
• Limitations in size and location of MDs
Xie et al.–Sept., [9]
POMDP scheme
• Meritoriously diminish the energy ingesting of the IoT devices
• It cannot reassure enduring average vigor ingesting and dormancy
Lin et al.–Sept., [8]
Strewn method based on outlet-and-assured approach
• Provide efficient on-demand services for terminal users • Balance between the deferral of reckoning capacity and the gist of the simulated lines
• Lack of resource handiness and sites for some potential-profound tasks
Li, et al.–July, THOA method [6]
• Radical short expectancy services can be done by F-RAN via supportive task figuring
• Time complexity is too high
Xu et al.–July, MOEA/D algorithm [7]
• Improved transmission efficiency by the RSUs • Offered sufficient communication bandwidth
• Difficult to achieve vehicle-to-the lot (V2X) communications with short expectancy
Jian Wang et al.–April, (2020)
MCTS scheme based on a multitask RL method
• Minimized the system cost based on dynamics environment
• Overhead in multiple user task offloading
Li et al.–March, [4]
LBBCO/CRICO algorithms
• Great efficacy and squat • Lack of reserve expectancy of the VEC provision for organism multi-vehicles and multi-VEC headwaiters
Dai et al.–Jan., (2020)
Sub-optimal solution based on PSO algorithm
• Sub-optimal solution can be gotten squat computational intricacy
• Resource limitations in cases of extraordinary user mass, during high-commuter traffic phases
Jaddoa et al.–Nov., (2019)
MEDICI mechanism
• Allow customization and energetic unburdening verdict • Increase rejoinder time
• Elastic properties might affect the processing times of a task
Mobile Edge Computing: A Comprehensive Analysis on Computation …
181
Computation offloading from a MD to terminal server brings down the completion remission of intensive computations in MEC. To overcome the problems caused by heavy traffic volume, it offloads computation tasks to terminal servers enabled Wi-Fi networks. To reduce the loading time, offloading node selection strategy can be used based on changes in prevailed network bandwidth and location of mobile devices. Transaction is possible in between terminal servers and MDs without third parties and is an advantage along with a secure computation offloading [14]. For monitoring ocean and its exploitation (M-10 T), the maritime Internet of things is created. The united air-space-sea network requires a high ship system. Minding the resource limitation and latency requirement, two-stage optimal offloading algorithm (J00A) is opted in mobile edge compacting for ship users. To come out of the issue of energy consumption and delay, a fast and effective channel and power allocation method for ship users ought to be available to offload decisions and to guide in maritime networks. The capacity of center cloud servers is limitless while MEC servers are limited. Optimal resource allocation is possible by making logical use of MEC server and center cloud server. In J00A, computing tasks can be done locally by offloading computing tasks to the MEC server by the ship users. If the number of tasks is high, energy consumption will be high in J00A. High performance with a high amount of tasks and two-stage joint optimal offloading algorithms are the advantages for ship users in MEC [15]. Binary offloading policy is accepted by wireless powered MEC network. Computation tasks or wireless devices (WDS) can be executed locally and offloading fully to an MEC server. To deal with this issue, the deep reinforcement learning-based online offloading (DROO) framework is implemented with deep neural network. In wireless power transfer (WPT) technology, devices can be continuously changed with no replacement of battery over the air. Computational complexity of DROO algorithm relays on complexity to solve resource allocation subproblem, and the adaptation of DROO frame is not suitable as long as the resource allocation subproblems are effectively solved in its great advantage [16]. Vehicular communications and networking emerges with its advancement of the Internet of vehicles (IOV) with its high expense in infrastructures and roadside units (RSU) and a delay-sensitive property offloading tasks which gain big popularity by its free of charge and quick response. Max–min schemes with proposed scheme are to execute time of task. Cooperation task schedule scheme reduces the task execution time of proposed scheme gains in iterative process. V2V communication helps the task offloading scheme for latency subtle applications in IOV. The benefit of high task execution time comes down by the proposed schemes, which reduces the cost of unloading calculation and gives fast and best options of problem [17]. Edge cloud computing is a way to share the computation resource in the broadcasting environment due to the limited computing power of terminal equipment. A cache management mechanism is created for the edge server that defines the cache contents on records of data usage. During computation offloading, the release of the real job becomes random and done by a stochastic process that is mobile and non-mathematically tractable. The terminal device ignores the status of the remaining devices because the decisions can be taken based on the information
182
I. B. Santhosam et al.
gathered and so reduces the weighted response time dramatically in the comparison of other algorithms [18]. Autonomous vehicles (UAVs) with its moralization as mobile edge servers are used to give computation offloading chances in IOT devices. IOT services with latency requirements will be supplied when chore offloading decisions are done with cooperation among unmanned aerial vehicles by optimizing the positions and number of deployed UAV cloudlets in a 3D space. Metaheuristic algorithm is based to evaluate and compare the optimal solution in such environments with their resources and capabilities to face challenges. The autonomous vehicles mounted by unmanned aerial cloudlets support latency-sensitive services in IOT networks. Reducing the number deployed UAVs in IOT leads to lower cost and better usage of available resources. The advantage in the device-to-UAV associations and the computation offloading decisions is carried out in 3D space—with UAVs unique properties of flexibility and sight connectivity [19]. Continuous reckoning requests are being generated locally at each devices, then it is handled via an efficient column system called dynamic computation offloading with multi-access edge computing (MEC), and the devices are used with battery and energy-saving capabilities to restrict the service latency and keep the computation queue. Stochastic optimization can solve the problem, and the algorithm works as a shared optimization of radio and computation resources. In expectancy controlled dynamic computation offloading with energy harvesting IoT devices, issues will be met in every slot based on the out of provision probability in an IoT with energy harvest low-power devices [20] (Table 2).
2.3 Efficient Energy Consumption MEC network gives adjacent user’s tamping and communication task and has risers as potential planning in short distance and fast communication. The research shows that the computation offloading proposal of unmanned aerial vehicle-enabled MEC network can be actively planned. In UAV-based MEC network planning realization, the present MEC topological structure is upgraded. The new idea is bifurcated into the IoT device layer, UAV and date hastily increase the load of the device layer. The rise in device usage-based stations cannot feed back information on time. The impossibility of meeting the requirements of fast communication is concerned with intelligent applications on extensive range and low delay, and the results show that the UAV energy depletion of the proposed scheme in various applications is lower than that of unsystematic task of offloading procedure with dissimilar plans [21]. The concept of the industrial IoT is enormously applicable to service provisioning in various empires, which includes health care, autonomous transportation and smart grid. The supporting resource-intensive applications, like AI and big data, 3D sensing and navigates, remain a challenging task because of the IIoT devices’ limited on board resources. The algorithm that reaches an NE and proves the convergence of
Mobile Edge Computing: A Comprehensive Analysis on Computation …
183
Table 2 Summarization of latency minimization Authors name
Algorithm/methodology
Advantages
Disadvantages
Lorenzo et al.–October, (2020)
URLLC-EH algorithm
• Works with online data • Fast convergence behavior • Ability to acclimate to non-motionless milieus
• Causes some end-to-end delay
Hou et al.–October, (2020)
MDP based scheme
• No third-party authority for secure computation offloading a task
• Highest reputation value
Li et al.–April, (2020)
JPORA scheme
• Minimize the execution • Task execution time of the task time is increase • Iteratively reduce the • Task partitioning parallel processing delay cost is high
Chen et al.–January, [17]
Max–min fairness: PSO algorithm
• Shrink the cost of the delivery calculation • Task execution time is reduced
• High expensive infrastructures • Kinesis of vehicles affects the unburdening enactment
Hassan et al.–December, (2019)
DPCOEM algorithm
• Verve sources are reliable • Achieves the lowest average task • Marks bursting habit of working out wealth
• Small energy buffer capacity
Mohammad et al.–November, (2019)
Online execution time estimation algorithm
• Minimizes the correspondent slanted comeback time of all jobs and improves secrete bulk
• It is not suitable for asynchronous task execution
Islambouli et al.–November, [19]
Metaheuristic algorithm
• Optimal performance with low complexity
• In case of high rush demand in traffic, no additional UAVs deployment
Bi et al.–October, (2019)
DROO algorithm
• Computation time is decreased • WDs can be charged unceasingly over the midair
• Exposure range is affected due to mobility of the WDs
Bai et al.–October, (2019)
JOO algorithm
• Offloading time is minimized • Offer an competent advice for naval Communication
• Energy consumption is high
(continued)
184
I. B. Santhosam et al.
Table 2 (continued) Authors name
Algorithm/methodology
Merluzzi et al.–September, [20]
DL–CCO algorithm with • Stabilizing battery level EH of all IoT devices
Advantages
Disadvantages • Solidity of the Ruse’s batteries is only around specified functional stages
the algorithm shows the efficiency, and it scales the IIoT device size increases but steadily performs better than existing algorithm on different parameters [22]. In MEC, the computation offloading scheme can be modified as a multi-intensive computation offloading problem. This considers the chore priority limitations within each application in MCOP. The proposed algorithm with two acts striking procedures obtains high-quality non-dominated solutions and outperforms a number of state-ofthe-art and empirical algorithms against various principles. This is with its advantages of the local computing model on the basis of the local scheduling model in MCC with the DVFS technique’s ability. DVFS–EC scheme helps to minimize the energy consumption of SMDs and increase the local abuse ability of the search [23]. A partial computation offloading strategy is proposed to minimize the total energy consumed by all SMDs and terminal servers. It makes the best use of the task offloading, CPU speeds and transmission power of each SMD and bandwidth of available channels. Many constraints are to be dealt with. It becomes highly challenging because of demerits such as limited resources of SMDs, heavy resource depletion in devices and cost of communication, providing energy proficient balanced sufficient communication resources to execute more tasks in the terminal servers [24]. UAV assisted MEC got through the examination with the objective to best usage of computation offloading with minimal UAV energy consumption. The main aim is to minimize the automated vehicle energy efficiency by the proposed scheme, the user spread power and computation load allocation. The demerit is energy consumption, so to overcome the issue, the allocated computation load is increased only if there are offload tasks received [25]. Offloading scheme in vehicular edge computing scenarios, the EEC minimization problem is framed to make a trade-off between latency and energy consumption for completing computational tasks. The two-level iterative optimization scheme has good convergence properties in different circumstances [26]. Improvisation is done with the QoE in edge-enabled IoT based on service latency, energy consumption and task-oriented success rate, and a presentation was done by Lu and Ruan in 2020. Compared with the existing scheme, the improvement in instability and slow convergence gives advantages on system performance: The capacity of the terminal server increases server latency, and task success rate is reduced. The drawback is power consumption which is high when more data is uploaded to terminal servers. The proposed scheme applies heterogeneous networks; hence, the increase is in the difficulty of the network [27].
Mobile Edge Computing: A Comprehensive Analysis on Computation …
185
MEC can apply UEs with computation and energy resources offered for transporting overloads from user equipment to MEC servers, and it is a witnessing computing paradigm used in 5G networks. The computation capacity of mobile edge computing servers increases, every user can be allocated with more computation resources, and the delay in computation reduces [28]. The notion of the IIoT is applied to service provisioning in many realms like smart health care, autopilot, etc. Two types of QoS are created to face the challenging data based on the proposed algorithm that can reach NE and prove the convergence of the scheme. IIoT applications have come out for developing the safety, efficiency and comfort in the industry. Cloud computing constructs a famous technique for supporting delay-tolerant scenario applications [22]. In the process of task offloading, the smart devices in IoT offloaded the task to terminal computing devices to implement it. In the process of task offloading, the transmitted information is vulnerable, so there will be data incompleteness. To overcome this, a blockchain-enabled computation offloading method is used. Blockchain technology will ensure data honesty in edge computing and provide the same during the task offloading process. This reduces task offloading times and energy taken by devices during load balance and data integrity [29] (Table 3).
2.4 Summary of the Survey In this survey discussion, nearly thirty papers are analyzed. With this analysis, the increase of edge computing and IoT, a trendy, is to integrate AI into EC, and it gives rise to a new edge paradigm such as emergence of edge intelligence. The consideration of AI performance is offloading to an EI server which was trained with similar tasks. To make multiple EI servers cooperate to provide AI service is still an open issue and research challenge. • • • •
With heterogeneous task and computing units SDN and technology with large-scale networks Offloading modeling in mobility scenarios Security and privacy.
3 Conclusion This detailed survey of the topics has been explained. Various algorithms, prior work and processing technology on literature were discussed. Here, in each literature, the proposed schemes, techniques, methodologies and frameworks are analyzed. Different algorithms and procedures are used by the researchers. Still computation offloading schemes-based machine learning is in its initial stage. To the superlative of our awareness, this is the leading learning describing efficient energy optimization and reserve apportionment for terminal devices and terminal servers as well as latency
186
I. B. Santhosam et al.
Table 3 Summarization of energy optimization Authors name
Algorithm/methodology
Jiang et al.–15 November, [28]
RL–Q learning algorithm • Support time-varying • No guarantee for system conditions long-distance • Support nonlinear transmission programming by • Lack of resource MINLP assignment scheme
Advantages
Disadvantages
Lu et al.–10 November [27]
D3 PG algorithm
• Better stability • Fast convergence
• Energy shortage in devices
Wang et al.–March, (2020)
SCA built scheme
• Provide permeating communication and computing support
• Limited onboard energy of a UAV
Gua et al.–March, Energy-efficient (2020) distributed optimization algorithm
• Enhanced computing • High delay capability fluctuation • Prolong the battery • Efficiency of task level of vehicles offloading is poor
Wu et al.–November, (2019)
UAV position optimization algorithm
• Provide uninterrupted and quality services • MDs use less computing power
• Network Congestion
Hong et al.–November, [22]
Multi-hop cooperative-messaging game theory mechanism
• Provide computation resources with low latency • Low communication costs
• Intermittent connectivity between devices • Inept to attach to a cloud access socket unswervingly
Song et al.–October, (2019)
MOEA/D scheme
• Enhanced computing • Increased response capability time • Low-energy • High transmission consumption for and execution time SMDs
Bi et al.–October, (2019)
GSA-based PSO algorithm
• Less energy consumption and communication cost • Low latency service
Ren et al.–October, (2019)
FPSO–MP algorithm
• Achieve reliable, low • Excessive delay in latency computation dynamic IoT task offloading network
Xu et al.–October, BeCome algorithm [29]
• Provides data integrity during task offloading • Task offloading time was decreased
• Limited resources in dynamic scenario
• It may not support for real IoT environment
Mobile Edge Computing: A Comprehensive Analysis on Computation …
187
minimization during task transmission which are also playing a vital role to design efficient hybrid computation offloading algorithms in MEC.
References 1. Yu S, Chen X, Zhou Z, Gong X, Di W (2020) When deep reinforcement learning meets federated learning: intelligent multitimescale resource management for multiaccess edge computing in 5G ultradense network. IEEE Internet Things J 8(4):2238–2251 2. Chang Z, Liu L, Guo X, Sheng Q (2020) Dynamic resource allocation and computation offloading for IoT fog computing system. IEEE Trans Industr Inf 17(5):3348–3357 3. Jaddoa A, Sakellari G, Panaousis E, Loukas G, Sarigiannidis PG (2020) Dynamic decision support for resource offloading in heterogeneous internet of things environments. Simul Model Pract Theory 101:102019 4. Li S, Lin S, Cai L, Li W, Zhu G (2020) Joint resource allocation and computation offloading with time-varying fading channel in vehicular edge computing. IEEE Trans Veh Technol 69(3):3384–3398 5. Liu X, Jiadong Y, Wang J, Gao Y (2020) Resource allocation with edge computing in IoT networks via machine learning. IEEE Internet Things J 7(4):3415–3426 6. Li H, Haitao X, Zhou C, Lü X, Han Z (2020) Joint optimization strategy of computation offloading and resource allocation in multi-access edge computing environment. IEEE Trans Veh Technol 69(9):10214–10226 7. Xu X, Zhang X, Liu X, Jiang J, Qi L, Bhuiyan MZA (2020) Adaptive computation offloading with edge for 5G-envisioned internet of connected vehicles. IEEE Trans Intell Transp Syst 8. Lin R, Zhou Z, Luo S, Xiao Y, Wang X, Wang S, Zukerman M (2020) Distributed optimization for computation offloading in edge computing. IEEE Trans Wireless Commun 19(12):8179– 8194 9. Xie R, Tang Q, Liang C, Yu FR, Huang T (2020) Dynamic computation offloading in IoT fog systems with imperfect channel-state information: a POMDP approach. IEEE Internet Things J 8(1):345–356 10. Dai S, Wang ML, Gao Z, Huang L, Du X, Guizani M (2019) An adaptive computation offloading mechanism for mobile health applications. IEEE Trans Veh Technol 69(1):998–1007 11. Deng Y, Chen Z, Yao X, Hassan S, Ibrahim MA (2019) Parallel offloading in green and sustainable mobile edge computing for delay-constrained IoT system. IEEE Trans Veh Technol 68(12):12202–12214 12. Merluzzi M, Di Lorenzo P, Barbarossa S, Frascolla V (2020) Dynamic computation offloading in multi-access edge computing via ultra-reliable and low-latency communications. IEEE Trans Signal Inf Proc Over Netw 6:342–356 13. Saleem U, Liu Y, Jangsher S, Tao X, Li Y (2020) Latency minimization for D2D-enabled partial computation offloading in mobile edge computing. IEEE Trans Veh Technol 69(4):4472–4486 14. Yang G, Hou L, He X, He D, Chan S, Guizani M (2020) Offloading time optimization via Markov decision process in mobile-edge computing. IEEE Internet Things J 8(4):2483–2493 15. Yang T, Feng H, Gao S, Jiang Z, Qin M, Cheng N, Bai L (2019) Two-stage offloading optimization for energy–latency tradeoff with mobile edge computing in maritime internet of things. IEEE Internet Things J 7(7):5954–5963 16. Huang L, Bi S, Zhang Y-JA (2019) Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks. IEEE Trans Mobile Comput 19(11):2581–2593 17. Chen C, Chen L, Liu L, He S, Yuan X, Lan D, Chen Z (2020) Delay-optimized v2vbased computation offloading in urban vehicular edge computing and networks. IEEE Access 8:18863–18873
188
I. B. Santhosam et al.
18. Wei H, Luo H, Sun Y, Obaidat MS (2019) Cache-aware computation offloading in IoT systems. IEEE Syst J 14(1):61–72 19. Islambouli R, Sharafeddine S (2019) Optimized 3D deployment of UAV-mounted cloudlets to support latency-sensitive services in IoT networks. IEEE Access 7:172860–172870 20. Merluzzi M, Di Lorenzo P, Barbarossa S (2019) Latency-constrained dynamic computation offloading with energy harvesting IoT devices. In: IEEE INFOCOM 2019-IEEE conference on computer communications workshops (INFOCOM WKSHPS). IEEE, pp 750–755 21. Wu G, Miao Y, Zhang Y, Barnawi A (2020) Energy efficient for UAV-enabled mobile edge computing networks: intelligent task prediction and offloading. Comput Commun 150:556–562 22. Hong Z, Chen W, Huang H, Guo S, Zheng Z (2019) Multi-hop cooperative computation offloading for industrial IoT–edge–cloud computing environments. IEEE Trans Parallel Distrib Syst 30(12):2759–2774 23. Song F, Xing H, Luo S, Zhan D, Dai P, Rong Q (2020) A multiobjective computation offloading algorithm for mobile-edge computing. IEEE Internet Things J 7(9):8780–8799 24. Bi J, Yuan H, Duanmu S, Zhou MC, Abusorrah A (2020) Energy-optimized partial computation offloading in mobile-edge computing with genetic simulated-annealing-based particle swarm optimization. IEEE Internet Things J 8(5):3774–3785 25. Li M, Cheng N, Gao J, Wang Y, Zhao L, Shen X (2020) Energy-efficient UAV-assisted mobile edge computing: resource allocation and trajectory optimization. IEEE Trans Veh Technol 69(3):3424–3438 26. Gu X, Zhang G (2021) Energy-efficient computation offloading for vehicular edge computing networks. Comput Commun 166:244–253 27. Lu H, He X, Miao D, Ruan X, Sun Y, Wang K (2020) Edge QoE: computation offloading with deep reinforcement learning for internet of things. IEEE Internet Things J 7(10):9255–9265 28. Jiang K, Zhou H, Li D, Liu X, Xu S (2020) A Q-learning based method for energy-efficient computation offloading in mobile edge computing. In: 2020 29th international conference on computer communications and networks (ICCCN). IEEE, pp 1–7 29. Xu X, Zhang X, Gao H, Xue Y, Qi L, Dou W (2019) BeCome: blockchain-enabled computation offloading for IoT in mobile edge computing. IEEE Trans Industr Inf 16(6):4187–4195
Enhanced Learning Outcomes by Interactive Video Content—H5P in Moodle LMS S. Rama Devi, T. Subetha, S. L. Aruna Rao, and Mahesh Kumar Morampudi
Abstract In this digital age, many learning technologies and tools are suitable for synchronous and asynchronous learning. There is an interaction between participants, instructor, and training in synchronous learning at fixed timing. In synchronous learning, there is real interaction between participants. In asynchronous learning, there is no real-time interaction between the participants. Students can learn at their own time and pace. So, in asynchronous learning, there is a need to understand whether the learner has understood the concepts. The evaluation can be achieved using H5P, an interactive course content creation tool. This study aims to measure the learning outcomes by making the students understand the concepts through the active learning experience. The learning enhancement is achieved by creating interactive content through H5P. The learners can study through the interactive content and revise the concept using the engagement, which leads to improved performance in their end exams. The participants included 60 engineering students of IV B. Tech Information Technology at a women-only engineering educational institution. The participants are allowed to watch prerecorded self-made videos, participate in activities like quiz at a particular duration of the video, and get feedback immediately. Summaries were also added at the end of the videos. The course instructor gets the report of all students’ participation status and scores of the entire class in the LMS platform Moodle. H5P helps the instructor understand the students’ learning difficulties, and it will be addressed enabling the attainment of improved learning outcomes. Keywords Synchronous learning · Asynchronous learning · Learning management system · H5P · Interactive elements S. Rama Devi (B) · S. L. Aruna Rao · M. K. Morampudi Department of Information Technology, BVRIT HYDERABAD College of Engineering for Women, Hyderabad, India e-mail: [email protected] S. L. Aruna Rao e-mail: [email protected] T. Subetha · M. K. Morampudi Department of Computer Science and Engineering, SRM University AP, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_13
189
190
S. Rama Devi et al.
1 Introduction Online education is popular nowadays in this pandemic situation. Two types of online education are synchronous [1] and asynchronous learning [2, 3]. Synchronous learning is online education that happens, where students are engaged in learning simultaneously, and training happens at fixed time slots. Genuine interaction, exchange of knowledge, experience between the participants happen in synchronous learning. Asynchronous learning is the contradictory of synchronous learning which does not require real-time interaction between participants. Students can acquire knowledge at their own time, and students can access the content when it best suits their schedules and assignments are completed to deadlines. The combination of both types of synchronous and asynchronous learning is known as the hybrid learning model. Nowadays, online education has grown in popularity because of the pandemic situation. Asynchronous e-learning, an asynchronous mode of learning/teaching, is the most basic form of online teaching in which content like audio/video lectures, PPT, articles is kept in the online platform like LMS [4]. This material is available anytime, anywhere via the learning management system (LMS). LMS consists of many tools for keeping course content by faculty and provides a platform for easy access to students. Other terms sometimes used instead of LMS are the course management system (CMS) [5] and virtual learning environment (VLE) [6]. The case study in this paper uses LMS Moodle to provide a platform for students to gain an engaging experience. In Moodle, the content for the course is organized in the following manner: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
Preamble Prerequisites Syllabus Textbooks/reference books (please drop e-books if they are open source) Course objectives, course outcomes, and co-po mapping Course Mind map Job opportunities/industry relevance/research scope Subject introduction video (optional) Lesson plan Course time table Web resources—material/research papers/videos, etc. University old question papers Assignments GATE questions Course survey.
Also, the content for each unit is grouped in the following way: 1. 2. 3.
Handouts/PPTs Quiz Video on the problematic topic or important topic.
Enhanced Learning Outcomes by Interactive Video Content …
191
The content is well organized and presented so that students can access the content anywhere, anytime, which helps them learn the course without difficulty. To meet the students’ expectations, the H5P activity is identified and installed in the Moodle LMS to create engaging and interactive videos using various features of H5P. There are around 90 reusable libraries to create the H5P interactive content. This paper discusses the various features of H5P and the utilization of H5P in the course content. The paper also discusses the improvement in students’ learning attainment while using H5P course content. The paper is arranged in the following manner. Section 2 discusses the literature survey done on H5P. Section 3 depicts the H5P background, and Sect. 4 gives an outline of the H5P installation. Section 5 characterizes the advantages of H5P. Section 6 illustrates various H5P features, and Sect. 7 discusses the implementation of H5P in the course content and concludes in Sect. 8.
2 Related Work Singleton et al. [7] have applied the various H5P activities to formatively determine students’ understanding level of topic instructed in courses like pathology, physiology, and anatomy at the undergraduate level. They proved that H5P helps to achieve the specific learning outcomes of the specified courses. Teachers can be able to track the student’s comprehensive, automatic feedback, and their engagement by using the activities in H5P. Sonia et al. [8] mentioned some design principles which help to achieve the responsive active learning and inspire the student engagement. They prepared a presentation on neuropsychology as an online interactive activity. With H5P, they identified that the number of slides is reduced from 22 to 6. These 6 slides provide further detailed diagrams, information with clickable hotspots. H5P makes the learning clearer and simpler by reducing the cognitive load on the content creators. MacFarlane et al. [9] used the H5P features to emphasize active learning. They concentrated on the pedagogical considerations like always start with a clear purpose, assess your student’s cognitive load, and apply active learning best practices for creating interactive videos by using H5P. Suneeti et al. [10] used the H5P features to acquire the foundational knowledge in anatomy and physiology courses which helps to solve the struggles faced by the first-year students with the volume and complexity of the content. Mainly, an interactive and mobile-friendly HTML5 learning content is created by the staff to solve the complexities faced by the student in understanding the course content. Sinnayah et al. [11] also made a survey for 250 first-year students enrolled in block mode in 2019 who took the anatomy and physiology course in their first year at Victoria University. The learning analytics data from the online learning management system showed that 60% of students in block mode who attempted the H5P activities completed them consistently, which is an indication of student engagement in self-directed study.
192
S. Rama Devi et al.
An e-learning content is developed by Amali et al. [12] by using the interactive multimedia such as H5P and iSpring to enhance the efficiency of a learning system. In their study, authors used the waterfall model for the system development life cycle method. The proposed multimedia e-learning content helps the students to motivate and understand the content easily and efficiently. Instead of using the H5P tool for active learning, Sarika et al. [13] used the tool to support the University’s blended learning. The H5P tool helped to improve the library instructional videos. The students can easily learn the content of a particular course easily by revisiting and reviewing at any time and able to finish self-assessment activities. The performance of the students can be improved by means of immediate feedback. Since the tool is used to create interactive learning videos at the university, the time of the teaching staff can be saved by avoiding the duplicating learning resources. Reyna et al. [14] used the H5P tool to provide the flipped learning. The basic idea is to use the H5P to flip content in the first-year education by using the flipped learning comprehensive model with multimedia and visual design learning principles. At the end of the semester, 15-step questions are completed by the 45 students. To measure student attitudes toward perception of knowledge construction, they considered demographic questions and Likert-type scale items. The approach can be enhanced by including more number of students, questions, and also conducting interviews. The H5P tool is used by Handyani et al. [15] to learn effectively plant tissue culture by the students. Authors explained that the tool can be useful not only for the concepts but also for the laboratory techniques used to grow plant cells. This leads to cost-effective practical sessions with minimal repetition in executing experiments of plant tissue culture and improved learning efficiency. A case study was done with 19 students who were taking an English course in the second semester in a university context by using interviews, observation, and document analysis by Wicaksono et al. [16]. Authors used the features of H5P for the case study and conclude that the use of H5P in teaching English course within 6 months had an impact on students that provides the interest, interactive and effects the effectiveness of their skills in English course. Chilukuri [17] addressed the challenges of existing active learning techniques like problem-based learning, group writing assignments, Think-pair-share, and peer instruction by suggesting H5P as an alternative and efficient tool for active learning when the active participants are more. A survey was conducted on a course “Routing and Switching Concepts” for 61 students prior to and after using the proposed framework and concludes the importance and effectiveness of the chosen framework. Richtberg et al. identified the importance of educational videos to enhance the learning skills of the students and observed that H5P is the best tool to enrich learning videos by the teachers. So, they conducted a study for 260 German students to determine how often and why students are watching educational videos in biology, chemistry, and physics. The learning must be done actively not just watch learning videos passively. By using H5P tool, they created interactive questions and tasks to achieve active learning.
Enhanced Learning Outcomes by Interactive Video Content …
193
3 Background H5P (= HTML5 Package) H5P is a free and open-source content collaboration framework based on JavaScript. H5P aims to make it easy for everyone to create, share, and reuse interactive content [18]. The developed H5P is a public directed project to make the instructor create engaging content. Researchers can create their interactive content and can upload it to this community. The main goal of this H5P is to benefit both the learners and instructors. The instructors can use the interactive content to portray the instructional materials, and the learners can get the benefit of understanding the content engagingly. The user-friendly tool allows creating interactive content for the learners to freshen up the course materials; H5P content is fully responsive and mobile-friendly—allowing learners to access interactive resources on any device at any time. There are also no technical skills required, and all content can be reused, shared, and adapted. Already almost 90 content types (and there are still more in the queue). The most popularly used are interactive videos, interactive presentations, quizzes, games, etc. The userfriendly H5P can also be integrated with learning management systems like Moodle, WordPress, and Drupal. H5P is accessible in building the plugin for the user website by porting the H5P-PHP-editor library into the language’s user choice. The new code can be written with the prerequisite knowledge of HTML and CSS. The opensource H5P can be installed on the website and can be customized according to the user’s needs. However, the interactive content cannot be made private to a particular community.
4 H5P for Moodle Moodle is used as a learning management system (LMS) for creating the course content for the students. The course content is made engaging by adding some interactive content in the videos, quiz, PPT, etc., The interactive content can be added in the LMS by installing various plugins. One most popular plugin is the H5P content [19]. H5P content is installed in the Moodle by installing the H5P plugin. The plugin is downloaded, and the steps to install H5P plugin in Moodle is given below in Fig. 1. If H5P is included in the LMS, it is the best option. This is because LMS brings together all the aspects of asynchronous learning for a comfortable and better experience that excludes the administrative hurdles.
194
S. Rama Devi et al.
Fig. 1 Steps in adding H5P plugin in Moodle LMS
5 Advantages Following are the advantages of H5P for creating the interactive content i. ii.
iii. iv. v. vi.
vii.
The editor creation is exceptionally productive. This is very less time consumption process. When you define your data structure, authorization tool, and sanitization routing, every structure needed will be automatically generated according to the requirement of the data structure The user does not want to write the complex rules of the database. Instead, only the description of the data structure according to their content is required. More than 90 reusable libraries are present to create interactive content. Effective APIs are present for producing the engaging content The user can import and export the content spontaneously as they are inbuilt within H5P. The content depiction looks similar on all platforms as all the H5P packages contain H5P libraries. H5P sharing center available online, a shared pool connecting all H5P permitted websites through with the users can distribute their developed content globally.
Enhanced Learning Outcomes by Interactive Video Content …
viii. ix. x. xi.
195
H5P cut down all the hurdles that cause sharing their engaging content H5P updation does not affect your content; instead, it will automatically get updated. H5P supports content management softwares such as Drupal, WordPress, Moodle, and Joomla. The copyright handling is also very easy in H5P.
6 Different H5P Content Types There are different H5P content types [20], and the few are explained below. Drag the Words: In drag the words task, instructors have to give textual sentences with missing pieces of text. The learner has to drag a missing text to the correct place. This activity is used to assess whether the learner remembers a text which she has read and the sample drag the words. Fill in the Blanks: The missing words or expressions in a text are filled by the learner. Multiple missing words can be given by the instructor. Task description can be given. Fill in the blanks can be included in question sets, interactive videos, or presentations, the instructor has to give an asterisk before and after the missing word. Also, the alternate words can be given by the slash operator. Image Sequencing: More versatile tools are for creating learning tasks (combining simple tools). This content type is used to arrange a randomized set of images in the correct order according to a task description. Course Presentation: The course presentation permits the users to include fill in the blanks, multiple-choice questions (MCQs) embed video, drag the text, etc., to create the interactive presentation. Accordion: Accordion reduces the display of text by highlighting the headlines through which the users can intensifying the title. This tool is exquisite to give an in-depth explanation and the sample accordion. Column: Column content type groups several H5P content items listed below each other on one page like fill in the blanks, drag the words, mark the words, etc. We can create new content in a column or existing content. Interactive Video: In this content type, we can add interactions such as true/false, fill in the blanks, images, drag and drop questions and summaries, etc., at any given time in the video clips. These interactions pop up while the learner is watching the video. We can upload videos that are created by the instructor or upload a video from YouTube. There are text interactions and image interactions; also we can add multiple choice questions, fill in the blanks, drag and drop questions, and summaries in interactive videos. In the end, we can add a summary used for the learner to recap the content which he has learned and the sample interactive video. Summary: Summary interactive content makes the user recall the vital information in either text, video, or presentation by creating the short statement of main points. This interactive content can be added in making the students remember the key points in the entire chapter.
196
S. Rama Devi et al.
Chart: The chart is used to visualize the statistical data using various graphs like pie charts and bar charts. The significant feature of this chart is the option to choose between bar and pie charts. The labels and their values can be added to each data element. The background color and the font color can be set for each data element. This type of interactive content is needed in the statistical analysis of the data. Collage: The collage tool gently allows to combine various images in a single page, and this allows the instructor to highlight the various effects caused by some operations in a single image. The various highlighting features of this collage are pan and zoom, inner and outer frame size spacing, and adjusting the height of the collage. Dialog cards: The instructor can utilize dialog cards to ensure the students have remembered the particular words or sentences. The corresponding answer will be revealed while flipping the card. This interactive content can be utilized in subjects and laboratory to make the students remember the basics by adding features like image, audio, and tips. Find the words: The instructor will choose some keywords from the topic and draw that in a grid. The learner’s mission is to find the keywords in the grid. Branching Scenario: A branching scenario is an adjustable content that facilitates the authors to demonstrate diversified content and provide an option to learners. The learners can choose anything among the displayed content. The instructor can use the authoring tool in full screen to build the content as a tree using manifold branches and final ending. Essay: The essay allows the students to write their exams, and the instructor can grade them automatically by fixing so many keywords while creating the essay question. The pros of this essay content are giving instant feedback, and the instructors can limit the number of words for the question. The automatic grading allows them to focus on the essential keywords and to know the irrelevant details. The essay contest will not substitute the teachers. Instead, it will assist them. Drag and Drop: Drag and drop questions make the instructor group two or more elements and perform the logical connections pictorially. These questions can be built using images and text as draggable alternatives. Guess the answer: The instructor can create a picture and make the learner guess the answer corresponding to that picture. The highlighting features are adding a task description and solution label for the given task. Interactive book: An interactive book is a combination of textual content and its tasks. The book can have its own URL on each page through which the learner can have access to a specific web page, and the criteria for linking is that it should be indexed in Google. Impressive Presentation: Impressive presentations create 3D presentations, and it is a new feature added in H5P recently. Though it looks impressive, creating an impressive presentation is difficult as most people are not knowledgeable about 3D creation tools. The primary features available for designing the presentation are zooming and image pan, camera rotation, image transformation concerning height and width.
Enhanced Learning Outcomes by Interactive Video Content …
197
Mark the words: This activity makes the instructors design some textual expressions with a predefined appropriate word and score. The learner can spot the correct words based on the assignment description, and they will be awarded marks. The advantage of using this system is to include as many textual expressions without any number of limitations. The activity can be embedded in other interactive content like presentation, question set, and interactive video. Questionnaire: The questionnaire can be created as a survey for learners, and the answers can be retrieved in various ways depending upon the instructor’s requirement. Multiple choice questions and open-ended questions can be created using this activity, and the mandatory questions can be assigned using a questionnaire. KewAr Code: This is an innovative feature, where the KewAr code enables the instructors to design their QR codes. These QR codes can contain information about the embedded URL, events, and geolocations, etc. Personality Quiz: A personality quiz can be created using H5P, where each option is matched against one or more options and the sample personality quiz. True/False: True/False allows the instructor to add some belief questions to know the learner’s understanding level in that topic. Images can be added along with True/False questions and the sample True/False. Quiz: Quiz admits the instructors to create various assessment questions for learners like multiple choice questions, fill in the blanks, drag the words to evaluate the learners. Timeline: Timeline content type can tell about the evolution of particular areas in engineering, timeline schedule for students, etc. The main advantage of using this content type, we can add images, text, and even social media sites such as Facebook, Twitter URL’s, YouTube.
7 Results and Discussions 7.1 Prerequisites in Android Application Development Android application development course is in the final year first semester. This course demands students to have a solid foundation before they can proceed like other subjects. In this course, approximately 25% of the foundational knowledge and skills come from the prerequisite subject Java programming they studied in second year. Given this situation, it is critical to reactivate the prerequisite knowledge first so that students are ready to study the Android course. Traditionally, for each unit, with extensive prerequisite knowledge, the instructor would spend lecture time going over key concepts taught in the previous course, Java. The flipped review approach given the benefits of the flipped classroom and the necessity to help students recall prerequisite knowledge, H5P is a good candidate for adopting a modified version of flipped instruction which we call the “flipped review” approach.
198
S. Rama Devi et al.
The prerequisites can be accomplished by a series of interactive videos and corresponding mini-assessments [21]. Overall, the amount of review to be completed takes only a modest portion of the course, making review assignments manageable. The interactive content designed is distributed throughout the course to support unitspecific topics. Each video is bundled with a graded quiz. Students must complete the video/quiz review assignment before the first class of a learning unit. Students are surveyed throughout the course to provide feedback on video features and their helpfulness to learning. The instructor spent more time creating interactive content to understand in-depth application problems or advanced topics.
7.2 Discussions H5P is useful for students and instructors in many ways. The instructors can even prepare a study plan using timeline content so that the students can keep track of their study progress. This timeline has many added features like addition of multimedia content, etc. As discussed before, accordion is best suitable for giving in-depth explanations for students. The prerequisite content is generated using accordion content type, and after completing each chapter, a summary content type is provided to the students to understand their learning attainment level in that chapter. The designed summary makes the student understand their attainment level and will make them revisit the chapter if necessary. The human brain has two memory: chunk memory and secondary memory like our computer random access memory (RAM) and read only memory (ROM). Our brain can hold a limited amount of information in the chunk memory, and if it is not transferred to the secondary memory that is long-term memory, the information will get lost. So, it is always recommended to have a break for every 15 min and give a two minutes activity so that the taught instructor’s content can go to the secondary memory. In view, small activities like dialog cards are created to make the students interactive and knowledgeable. From the course design perspective, the review videos/quizzes are the foundation of subsequent homework problems. The interactive course presentation is created using the H5P course presentation. The advantage of using this in the learning management system is the addition of both the content and quiz materials in one slide. The students can evaluate themselves at the same time when they are going through the course content. They will come to know about the misconceptions and will rectify them while learning themselves. The interactive videos can be generated only with the already created video. So, video is created first, and then features of the interactive videos are created to help students review the prerequisite knowledge and skills. All videos were between 9 and 11 minutes in length as shorter videos have a better chance of being viewed. Each video summarizes key concepts in one of the five technical areas classes, objects, constructors, etc. These review videos can be used as a substitute for teaching an entire course to avoid misconceptions. The review videos are created by keeping the attainment goals in mind, and it is explicitly stated on the opening slide as goals to be achieved after this lecture, as shown in Fig. 2.
Enhanced Learning Outcomes by Interactive Video Content …
199
Fig. 2 Purpose of the review video
The videos are recorded using zoom, a video conferencing tool that allows screen sharing and recording via a headset with a mic. Zoom is free software under the university license. It produces videos of decent quality. The videos can be exported to a standard mp4 format for additional editing if needed. In addition to zoom video, software is also utilized for merge/edit/trim the videos. According to the convenience and requirement, some videos are even recorded with the PowerPoint PPT screen recorder, OBS software. Interactive self-check questions After the creation of self-made videos, interactive video is created using the H5P plugin. Several conceptual questions were added to each online review video to keep students engaged in the viewing process [22]. The questions were placed at time frames decided by the instructor, as shown by the purple dots on the timeline in Fig. 3. When a question appears on the screen, the video will pause, and the viewer has to submit a correct answer to proceed. The instructor has designed the questions to serve two purposes: students can conduct a quick self-check; the questions can be used as bookmarks to segment a longer video into smaller chunks [22]. Students can replay a segment if they could not figure out why they missed the question. Both the self-check questions and segmenting help students stay focused as the average amount of time viewers spend watching media is only about 4 min [23]. The interactive questions were simple multiple-choice or True/False questions. They were created using H5P—a JavaScript-based, open-source content collaboration tool [24]. The only drawback in the H5P is only the students can check the correct answers to these questions. The instructor cannot view the results as the answers are not recorded or collected. After the students have watched the prerecorded videos, adding interactivity, gamification, and engaging multimedia can take the course to the next level.
200
S. Rama Devi et al.
Fig. 3 Interactive multiple-choice questions
Quiz Since the instructor does not have access to the interactive video’s quiz results, a quiz is created to check students’ learning attainments. Two quizzes are created before going through the interactive video and another after going through the interactive video to evaluate the students’ learning attainments. The student’s grades are much improved in the second quiz after attempting the interactive video. The online quiz is conducted consisting of multiple algorithmic questions, all of which assess students’ ability to apply the concepts to the engineering context as reviewed in the video. The algorithmic question feature in the learning management system allows each student to receive their version of the question to test the same concepts but with different numbers. For each quiz, students have between 40 and 60 min to complete the questions. Correct answers are revealed after each submission. Students can take a quiz up to two times, and the average score–rather than the higher one—is recorded by the grade book, and the sample created quiz is shown in Fig. 4. The quiz results obtained before going through the interactive video are shown in Fig. 5. The quiz results after going through the interactive video are shown in Fig. 6. The quiz results show solid evidence stating that students’ learning attainments are improved after using H5P interactive content in their learning management system.
Enhanced Learning Outcomes by Interactive Video Content …
Fig. 4 Moodle quiz
Fig. 5 Result summary before the interactive video
201
202
S. Rama Devi et al.
Fig. 6 Quiz results
8 Conclusions Smart learning intents to provide holistic learning to learners via state-of-the-art technologies to enhance them in the rapid-changing environment, where versatility is essential. Innovative education can be encapsulated by adding interactive content to enhance conventional education’s learning outcome. Such innovative education plays a significant role in asynchronous learning for their cognitive and emotional development. H5P complements interactive video to induce an engaging learning experience for the students. Students will gain an engaging experience, and the fulfilment and achievement rates will be high. The utilization of H5P for creating the interactive content leads to writing their end exams with confidence if their basics are substantial, leading to improved learning outcomes.
References 1. Bower M, Dalgarno B, Kennedy GE, Lee MJ, Kenney J (2015) Design and implementation factors in blended synchronous learning environments: outcomes from a cross-case analysis. Comput Educ 86:1–17 2. Rehman R, Fatima SS (2021) An innovation in flipped class room: a teaching model to facilitate synchronous and asynchronous learning during a pandemic. Pak J Med Sci 37(1):131 3. https://thebestschools.org/magazine/synchronous-vs-asynchronous-education/ 4. Alias NA, Zainuddin AM (2005) Innovation for better teaching and learning: adopting the learning management system. Malays Online J Instr Technol 2(2):27–40 5. Cole J, Foster H (2007) Using moodle: teaching with the popular open source course management system. O’Reilly Media Inc 6. Van Raaij EM, Schepers JJ (2008) The acceptance and use of a virtual learning environment in China. Comput Educ 50(3):838–852
Enhanced Learning Outcomes by Interactive Video Content …
203
7. Singleton R, Charlton A (2020) Creating H5P content for active learning. Pac J Technol Enhanced Learn 2(1):13–14 8. Wilkie S, Zakaria G, McDonald T, Borland R (2018) Considerations for designing H5P online interactive activities. Open oceans: learning without borders. Proceedings ASCILITE pp 543– 549 9. MacFarlane L-A, Ballantyne E (2018) Bringing videos to life with H5P: expanding experiential learning online. Proc Atlantic Univ Teach Showcase 22:28–33 10. Rekhari S, Sinnayah P (2018) H5P and innovation in anatomy and physiology teaching. In: Research and development in higher education: [re] valuing higher education: volume 41: refereed papers from the 41st HERDSA annual international conference. 2–5 July 2018, Convention Centre, Adelaide, vol 41. Higher Education Research and Development Society of Australasia, pp 191–205 11. Amali LN, Kadir NT, Latief M (2019) Development of e-learning content with H5P and iSpring features. J Phys: Conf Ser 1387(1):012019. IOP Publishing 12. Singh S, Scholz K (2017) Using an e-authoring tool (H5P) to support blended learning: librarians’ experience. Me, Us, IT: 158–162 13. Reyna J, Hanham J, Todd B (2020) Flipping the classroom in first-year science students using H5P modules. In: EdMedia+ innovate learning. Association for the Advancement of Computing in Education (AACE), pp 1077–1083 14. Handayani AP, Singaram N, Har LS (2019) Use of H5p interactive video to support action learning of plant tissue culture techniques. University Carnival On E-Learning (IUCEL) 2019, p 35 15. Wicaksono JA, Setiarini RB, Ikeda O, Novawan A (2021) The use of H5P in teaching english. In: The first international conference on social science, humanity, and public health (ICOSHIP 2020). Atlantis Press, pp 227–230 16. Sinnayah P, Salcedo A, Rekhari S (2021) Reimagining physiology education with interactive content developed in H5P. Adv Physiol Educ 45(1):71–76 17. Chilukuri KC (2020) A novel framework for active learning in engineering education mapped to course outcomes. Procedia Comput Sci 172:28–33 18. https://www.slideshare.net/MatleenaLaakso/h5p-content-types-22218 19. H5P. https://h5p.org/ 20. https://docs.moodle.org/38/en/Interactive_Content_-_H5P_activity 21. Dunsworth Q, Wu Y (2018) Effective review of prerequsites: using videos to flip the reviewing process in a senior technical course 22. Brame CJ (2016) Effective educational videos: principles and guidelines for maximizing student learning from video content. Cell Biol Educ 15(4):es6-es6 23. Hibbert M (2014) What makes an online instructional video compelling? Columbia University Academic Commons. https://doi.org/10.7916/D8ST7NHP 24. https://h5p.org/documentation/for-authors/tutorials
Anti-cancer Drug Response Prediction System Using Stacked Ensemble Approach P. Selvi Rajendran and K. R. Kartheeswari
Abstract Sudden cell elongation is one of the major problems in cancer analysis. Inhibitory concentration’s (IC50) effect is an important solution in cancer recovery. So, in cancer analysis, drug response prediction is based on the inhibitory concentration (IC50) which depends upon the cell line and drug line similarity analysis. This research plans to improve the “early drug response prediction” and maintains the cell stability. This in turn reflects in the cell line recovery. To obtain this, two additional parameters like mechanical and electrical are added in drug line. This increases the inhibitory concentration, avoiding cell elongation, and maintaining the cell stability. The stacked ensemble machine learning algorithm is used for this purpose. In this ensemble algorithm, random forest, linear regression, and Gaussian Naïve Bayes are stacked and enhanced with the voting average method. The efficiency level obtained in this research is 97.5%. The dataset is taken from GDSC and GCLE for the experimentation. Keywords Anti-cancer · Prediction model · Machine learning algorithm · Stacking ensemble algorithm
1 Introduction The world’s second most-scared disease is cancer as per the report obtained by statistical analysis of the United States National Cancer Institute. In that report, it was stated that 1.8 million new cancer cases and 600,000 deaths were expected in 2020 [1]. Also, it was projected that lung cancer is the second most common dangerous cancer. The analysis of the United States specified that estimated new cases were 2,28,820 in the year 2020. In the same year, predictable death cases were 1, 35720. So, 25% of the population in the United States were expected to be affected by lung cancer in 2020. A sustained five years virtual survey report stated that in a lifetime, 1 out of 15 in men, 1 out of 17 women, and smokers had a high-risk factor for lung cancer. P. S. Rajendran (B) · K. R. Kartheeswari Department of Computer Science & Engineering, School of Computing, Hindustan Institute of Technology and Science, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_14
205
206
P. S. Rajendran and K. R. Kartheeswari
This five years report indicated that 20% of lung cancer patients were badly affected [2]. Experimental features to the treatment response in patients with NSCLC have been recognized by the growing use of antiePD-1 therapy. Immunotherapy, smoking history, performance status, sex, the presence of metastates, mutations, and pathology [6e14] are routinely examined in clinical practice. These are collected in electronic medical records (EMRs). No clinical influence can exactly predict the response to antiePD-1 (programmed cell death protein-1). To improve the response of antiePD-1 models, integrating factors are needed. Machine learning plays a vital role in disease progression and treatment response [3, 4]. In the early stages, an insignificant amount of molecule inhibitors produced hopeful results in drug therapy. So, the lifetime of the patient is also increased. But, unexpected secondary mutation(s) restricted the effectiveness of the drug due to drug resistivity [5]. Many computational studies have been directed in molecular dynamics to crack the drug resistivity simulations [6, 7]. Mechanical properties play a vital role and are strictly interrelated to human health conditions. The physiological function of the cells depends upon the mechanical property. The changes in mechanical property illustrate the damage in the physiological functions of the cell. Therefore, it yields to an increase in the disease state [8, 9].
2 Literature Survey The drug target integration (DTI) is analyzed by Ding et al. [10], through a similaritybased machine learning algorithm. In this work, chemical and genomic spaces are used in the similarity-based analysis. In this approach, similarity-based analysis easily merged with Kernel-based learning methods. In [11], Zhang et al. analyzed the importance of the cell line and drug line similarity to provide a vital drug response. In this prediction approach, cell and drug line-oriented collaborative filtering models were used, and the average of predicted values from both models is taken as a final predicted sensitivity. To obtain the accurate and reproducible outcome for the missing drug sensitivities, tenfold cross validation was used. The hyper interpolation weighted collaborative filter (HIWCF) is highly based on the known drug response, and the performance of the system highly depended on the sparseness of the response matrix. It reduced the performance level of the drug response. In the research of Liu [12], he analyzed a dual-layer integrated cell line-drug network model and elaborated the tenfold cross validation. This model provided a higher Pearson correlation coefficient between the predicted and observed drug responses than that of the dual-layer integrated cell line for a CCLE dataset. For the GDSC dataset, this model achieved a better drug response. From the analysis, they found that the model was more biologically interpretable, and feature selection possibility was lagging in the analysis. In Zhu et al. [13] approach, power transfer learning has been used to predict both new tumor cells and new drugs. This analysis stated that this new powerful learning improved the prediction performance in the drug response prediction applications. From the analysis, it was observed that a single drug response was not available, and a combination
Anti-cancer Drug Response Prediction System Using Stacked …
207
of drug responses reduced the performance. Chen et al. [14] elaborated DTI prediction machine learning methods. In this technique, chemogenomic databases were used. Negative samples were also handled. A chemogenomic method is categorized into supervised and semi-supervised learning methods. This supervised learning is further subdivided into similarity and feature-based methods. In this research [15– 17], the author Guosheng Lianga et al. examined that artificial intelligent plays an important role in the discovery of new materials and that intensely accelerates the anticancer drug development. These services help the doctors to make the right treatment decisions and reduce unnecessary surgery. Machine learning analysis was used to predict the sensitivity of the drug. Sachdev et al. [17, 18] explained the feature-based chemogenomic approaches for DTI prediction. In this survey, feature-based methods were categorized as SVM and ensemble-based methods. Here, the similarity-based approach is excluded by the chemogenomic approach. In this work, Bhardwaj et al. [16, 19, 20] developed a double ensemble machine learning algorithm to predict the pathological response after neoadjuvant chemotherapy. This double ensemble was used to predict multi-criteria decision-making. About 99.08% accuracy is obtained using the k-fold cross validation technique. In [21], Sharma et al. proposed a drug sensitivity prediction framework using ensemble and multi-task learning, which implied that the mean square error (MSE) obtained from a CGP dataset was 3.28. In CCLE dataset, the MSE was 0.49 and 0.54 for NCI-DREAM. Mehmet Tan, in [22] proposed an ensemble method for the prediction of drug response. A significant amount of drug induced gene expression dataset was added to cytotoxicity databases. The prediction is evolved by vitro experiments in addition to the tests on datasets. In this research, Xia et al. [19, 23–26] used five publicly available cell line-based datasets. These datasets were NCI60, CTRP, GDSC, CCLE, and GCSI. In this analysis, a variety of machine learning models were used for the cross-study analysis. The best cross-study was achieved through deep neural networks. In this research, initially stated that CTRP yields a better prediction on the test data. GCSI was the most predictable dataset among all other cell line dataset. In this cross-study, GCSI combined with CTRP provides a more accurate prediction. The research of M. B. Senousyet et al. [27] provided a micro-array technology for cancer prediction. This micro-array can overcome the missing values or imbalanced bio-medical data problem. Ensemble learning based on ranking attribute value (ELBRAV) has been used for the research. Aman Sharma et al. [28] established an anti-cancer drug response prediction model using a similarity-based regularization matrix. In this similarity analysis, drug and tissue were the main objective and GDSC and CCLE datasets were used. The average MSE was 3.24 and 0.504 (Table 1).
3 Proposed Work From the review of related works, it is observed that the majority of the drug analysis is based on the similarity metrics and the response matrix is based on the cell line and drug line similarities. For the analysis, CCLE and GDSE datasets have been
208
P. S. Rajendran and K. R. Kartheeswari
Table 1 Survey of prior literature works References
Techniques, approaches, and evaluation measures
Merits
Demerits
Ding et al. [10]
Similarity-based analysis merged with Kernel-based learning methods. Chemical and genomic spaces are used in similarity
High performance in prediction
Isolated interaction cannot be predicted well by similarity-based methods
Lin et al. [11]
Cell line and drug line similarity analysis tenfold cross validation was used. Hyperinterpolation weighed collaborative filter
Accurate and reproducible
Sparseness has reduced the performance outcome for drug response
Liu [12]
Dual-layer integrated cell line network model Ten-fold Cross validation
Higher Pearson correlation co-efficient
Feature selection possibility was lagging
Zhu et al. [13]
Power transfer learning
High percentage of Single drug response prediction in new tumor was not available. cells and drug cells Combination of drug response reduced the performance
Chen et al. [14]
Drug target integration Negative samples handled. Most frequently used data and features were taken from various datasets to improve the efficiency
Reduced the scope of research for bio-medical experiments
Lianga et al. [15]
AI-based analysis
Powerful driving force for cancer analysis. Used for fast recovery
Difficult to formulate for the most appropriate treatment
Sachdev et al. [18]
Feature-based chemogenomic
Feature-based method is a straight forward, linear, and efficient method for drug target interaction
Optimal feature extraction is needed to obtain good accuracy
Bhardwaj et. al [19]
Double ensemble learning method
Efficient method to predict the complete pathological response
It is not suitable for web application
Sharma et al. [21]
Ensemble and multi-task learning
Mean square error is less to increase the efficiency
Mean square error varies with various dataset for same method (continued)
Anti-cancer Drug Response Prediction System Using Stacked …
209
Table 1 (continued) References
Techniques, approaches, and evaluation measures
Merits
Demerits
Tan [22]
Cytotoxity databases plus vitro experiments
Drug performance was Drug screening enhanced with help of experiments are random selection of cell costly drug pairs
Xia et al. [23]
Multi deep neural networks
Cross-study for various Usage of too many dataset is used to obtain parameters reduce the good accuracy efficiency
Senousyet et al. [27]
Micro-array technology Ensemble based on ranking attribute value
Missing values or High fabrication imbalanced bio-medical needed data problem could be overcome
Sharma et al. [28]
Kernelized similarity-based analysis
Accuracy was increased MSE varies with with less average mean various dataset square error
taken. In the survey, most of the research on the cell line features are based on the expression, mutation, copy number of variations (CNV), and most of the drug line features are based on the properties like chemical, geometric and physical. Some of the related works stated that the response of CCLE dataset is good compared to GDSC dataset. On the whole, the reviews state that anti-cancer predictions are mostly done by machine learning analysis, and this type of prediction will help the doctor to make the right decision in cancer surgery. Based on the reviews, the work is extended by applying stacked ensemble machine learning algorithm. Figure 1 shows the similarity analysis of an early-stage anti-cancer drug response prediction using ensemble learning with three various stages. The first stage is a similarity analysis of drug and cell lines. To improve the prediction, mechanical and electrical parameters have been added to the drug line features in the second stage. These properties are used to predict cell elongation and cell stability in drug prediction. In third stage, the ensemble meta-learning voting method is used. This method is used to provide better performance than any single model used in the ensemble. Followed by the analysis and pre-processing, reshaping of size of the matrix has been done, selected features from the cell (genetic expression) and drug (target protein, mechanical, and electrical property) lines are fed into the stacked algorithm. Here, random forest, Gaussian NB, logistic regression algorithms have used for prediction. Voting ensemble predicts the highest accuracy among individual models. Complete analysis is done in the experimentation section.
210
P. S. Rajendran and K. R. Kartheeswari
Fig. 1 Similarity analysis of early-stage anti-cancer drug response prediction diagram
3.1 Similarity Calculation A similarity score of cell lines or drug lines was solved using the Pearson similarity analysis and Jaccard similarity index. It is a measure of the similarity between two sets of data. J (c, d) =
|c ∩ d| |c ∪ d|
(1)
If two features share the exact same members, then their index is 1. If they share no common members, then the similarity is 0, whereas Jaccard is used to find the similarity of binary data. n cd − c d Pearson Correlation Coefficient pcc(c, d) = (2) n c2 c 2 n d2 − d 2
Anti-cancer Drug Response Prediction System Using Stacked …
211
PCC is used to calculate the covariance of two variables. where, c, d n
features vectors Total number of values.
In electrical property analysis, stability is obtained through the binary data, Jaccard similarity analysis is used to calculate the similarity matrix of electrical properties. Pearson coefficient similarity analysis is used to find the similarity analysis of mechanical property.
3.2 Similarity Analysis The drug response prediction has been analyzed with the help of drug and cell line similarity. Hence, similarity in the cell line and the drug line yields the similar response to the drug [29–31]. The similarity between cell lines and drug lines is predicted using a machine learning algorithm. Three types of cell line similarities like expression, mutation, and copy number of variations (CNV) and three types of drug similarities like physical, chemical, and geometric properties are computed for GDSC and CCLE datasets (Fig. 2).
3.3 Enhanced Drug Line Parameter To improve the predictions effectively, two more additional parameters are added in the drug line which reflect in the early drug response prediction. Fig. 2 Similarity matrix
S S S S
212
3.3.1
P. S. Rajendran and K. R. Kartheeswari
Mechanical Property
Cancer cells have a low Young’s modulus, representing higher elasticity of softer than the normal cell. Therefore, mechanical properties play a vital role in early cancer prediction. The cancer cells have the inverse correlation with their elastic properties. Slow potential for drug screening happens because of the changes of the mechanical properties. Even though several issues in clinical application in the cancer line, mechanical properties of single cells may provide new turn in cancer analysis.
3.3.2
Electrical Property
Electrical property is used to find the stability analysis in the cell line and drug line. It is used to calculate the stability of the cell line. Since electrical stability has been analyzed through binary analysis, Jaccard similarity is used in this analysis.
3.4 Pre-processing Module It is difficult to feed the raw data into the algorithm. Pre-processing concepts is the backbone to give meaningful inputs to the algorithm implementation work (Table 2). The main important steps involved in this data pre-processing are: • • • • •
Reading the CSV files #including path Loading the data Resize and reshape of the matrix Value formatting for binary conversion Test, train to split the data.
By using the statistical approach, the dimension of the matrix can be reduced and processed using the algorithm. Table 2 Reshaped size of pre-processed data
Datasets
No. of instances
No. of rows and columns
CCLE
363
Initial size
GDSC
555
Reduced size
(650,8688)
600
(650,9506)
600
Anti-cancer Drug Response Prediction System Using Stacked …
213
3.5 Procedure for PCA Analysis PCA is used to reduce the dimensionality of the data matrix. Given NxM is taken initially. Mean of each column is calculated in the matrix and stored in the variable X. Covariance of centered matrix is done. Eigen values and eigen vectors are calculated. Pseudo code process for PCA: 1. 2. 3. 4. 5.
B = MATRIX #input X = Mean (B) Centered matrix C = B − X E = cov (C) Samples transformed to a new subspace. (BˆT.A) # A is the original data, where the output is the projection of input A.
3.6 Ensemble Learning Stacking or stacked generality is a short form of an ensemble machine learning algorithm. Bagging, boosting, and voting are the types of ensemble learning. It involves merging the predictions from many machine learning models on the same dataset. From the Wisdom of crowds [32], the ensemble outstanding concept idea was taken. The voting average ensemble method is applied here. This is a method used to aggregate the output performance based on the decision taken from various feedbacks. Even though lots of ensemble methods are used to calculate the prediction, for easy evaluation, the average voting meta-model ensemble method is applied. Two types of voting methods are obtainable hard voting and soft voting. Hard voting predicts the class with a huge sum of votes from models. Class probability before averaging is called soft voting. In the stochastic learning process, voting is a simple and more effective technique. Different hyper-parameters are tuned in this voting method to obtain the respectable output. Soft voting is preferred for this implementation.
3.7 Classifiers Used In popular stacking ensemble voting average method, N number of algorithms are stacked, and their average is calculated using an average voting method. Combined stacked algorithms used in this analysis are logistic regression, Gaussian Naïve Bayes, and random forest. Based on the three algorithm’s analysis, average voting is evolved.
214
P. S. Rajendran and K. R. Kartheeswari
Pseudo code process:
Inputs: Xi= SimCnxn: Cell line similarity matrix Wj=SimDmxm: Drug similarity matrix k: Model hyper-parameter Output Y: Predicted drug responses N -Number of features Wi-Drug similarity matrix Xi - Cell line similarity matrix Y=∑Wi.Xi =(W0.X0) +(W0.X1) +……(Wn. Xn) where Wi= [ W0 W1 ……Wn] Xj = [ X0 X1 ……X n] Compare with threshold k=0.3 # hyperparameter tuning if Y>k, then Y=1#Drug is responding Otherwise, Y=0 #Drug is not responding Where Y is output Y predicted = [ Y0 Y1 ……Y z] probability = [a. fit (X, Y) predict _ proba (X) for a in (alf1, alf2, alf3, b clf)] where alf1-Logistic Regression alf2-RandomForestClassifier alf3-GaussianNB b clf -Voting Classifier
This output Y predicted is taken as an input to voting method. This Y is tuned by ensemble stacking algorithm. Here, the average voting technique is implemented. This stacking comprised the algorithms like logistic regression, Naive Bayes, random forest classifier, and voting classifier. Then, testing and training datasets are split. The accuracy level of the output for the particular similarity inputs is 97.5%. After adding additional features in both CCLE and GDSC dataset and implementing the voting method, the accuracy level is calculated, and the results are tabulated in Table 3.
Anti-cancer Drug Response Prediction System Using Stacked …
215
Table 3 Classifier’s accuracy Weights
Classification accuracy (%) LR
GB
RF
VC (AVG)
(1,1,1)
58
97
98
79.5
(1,1,2)
58
97
98
90
(1,1,3)
58
98
97
97
(1,1,4)
59
98
97
97.3
(1,1,5)
58
90
98
97.5
4 Results By adding hyper-tuned parameters, the accuracy rate is increased. Weighted voting scheme is used in this analysis. A positive weight is added to each algorithm to get the accuracy of each particular algorithm, and giving the higher prediction in the voting is known as weighted voting scheme. Likewise, we changed the hyper-parameter values and obtained the accuracy through voltage average method. From this, it is observed by changing the weighing factor, the accuracy level of the classifier is increased. GDSC and CCLE data are taken. The accuracy for both datasets are updated. From this tabulation, it is observed that by adding the additional parameters and by using the stack ensemble, we obtained 97.5% accuracy. From Table 4, it has been observed that 96.4% accuracy is obtained by taking substructure, target, pathway features from drug line and expression, mutation, CNV features from the cell line. The same features provide 97% accuracy in CCLE dataset. When adding mechanical and electrical properties additionally in the drug line, the accuracy is increased to 97% in GDSC and 97.5% in CCLE. The graphical representation of the performance comparison is shown on Fig. 3. From this result, it is proved that mechanical and electrical properties with voting ensemble play a vital role in cancer prediction.
5 Conclusion The predictions of anti-cancer drug response using machine learning is required and it is needed to enhance the drug policy. Even though similarity match on drug and cell line plays a vital role in the machine learning analysis, to improve the drug response in the cell line, two more features have added with ensemble technique. Average voting ensemble process is initiated to progress the response in the drug analysis, and 97.5% accuracy is obtained for CCLE and GDSC datasets. In the future, cascaded algorithm in stacking will be implemented for this analysis to increase the accuracy.
Target
Pathway
3
CNV
Mutation
Substructure Expression 96.4%
Pathway
Target CNV
Mutation
Electrical property
Electrical property
Pathway
Target Mechanical property
CNV
Mutation
Mechanical property
Pathway
Target
Cell line features
Accuracy percentage
CNV
Mutation
Substructure Expression 97.5%
Accuracy Drug line percentage features
Substructure Expression 97%
Cell line features
GDSC dataset (after adding additional CCLE dataset (after adding additional features) features)
Accuracy Drug line percentage features
Substructure Expression 97%
Cell line features
CCLE dataset (before adding additional features)
Accuracy Drug line percentage features
2
Cell line features
1
Drug line features
S. GDSC dataset (before adding No. additional features)
Table 4 Performance comparison based on features
216 P. S. Rajendran and K. R. Kartheeswari
Anti-cancer Drug Response Prediction System Using Stacked …
217
Fig. 3 Performance estimation graph
Acknowledgements This research is funded by the Indian Council of Medical Research (ICMR). (Sanction no: ISRM/12(125)/2020 ID NO.2020-5128 dated 10/01/21)
References 1. National Cancer Institute (NCI) (2019) Cancer stat facts: cancer of any site. https://seer.cancer. gov/statfacts/html/all.html 2. American Cancer Society (ACS) (2020) Key statistics for lung cancer. https://www.cancer.org/ cancer/lung-cancer/about/key-statistics.html 3. Wiesweg M, et al (2019) Machine learning-based predictors for immune checkpoint inhibitor therapy of non-small-cell lung cancer. Ann Oncol 30(4):655e7 4. Heo J, et al (2019) Machine learning based model for prediction of outcomes in acute stroke. Stroke 50(5):1263e5 5. Gunther M, Juchum M, Kelter G, Fiebig H, Laufer S (2016) Lung cancer: Egfr inhibitors with low nanomolar activity against a therapy resistant l858r/t790m/c797s mutant Angewandte Chemie. Int Edition 55(36):10890–10894 6. Qureshi R, Nawaz M, Ghosh A, Yan H (2019) Parametric models for understanding atomic trajectories in different domains of lung cancer causing protein. IEEE Access 7:67551–67563 7. Ikemura S, Yasuda H, Matsumoto S, Kamada M, Hamamoto J, Masuzawa K, Kobayashi K, Manabe T, Arai D, Nakachi I (2019) Molecular dynamics simulation-guided drug sensitivity prediction for lung cancer with rare EGFR mutations. Proc National Acad Sci 116(20):10025– 10030 8. Lee GYH, Lim CT (2007) Biomechanics approaches to studying human diseases. Trends Biotechnol 25:111–118 9. Lim CT, Zhou EH, Li A, Vedula SRK, Fu HX (2006) Experimental techniques for single cell and single molecule biomechanics. Mater Sci Eng C 26:1278–1288 10. Ding H, Takigawa I, Mamitsuka H, et al (2013) Similarity-based machine learning methods for predicting drug-target interactions: a brief review. Brief Bioinform 5(5):734–47 11. Zhang L, Chen X, Guan NN, Liu H, Li J-Q (2018) A hybrid interpolation weighted collaborative filtering method for anti-cancer drug response prediction. 9:1017 12. Liu C, Wei D, Xiang J, Ren F, Huang L, Lang J, Tian G, Li Y, Yang J (2020) An improved anticancer drug-response prediction based on an ensemble method integrating matrix completion and ridge regression. 21 13. Zhu Y, Brettin T, Evrard YA, Partin A, Xia F, Shukla M, Yoo H, Doroshow JH, Stevens RL (2020) Ensemble transfer learning for the prediction of anticancer drug response. 10:18040
218
P. S. Rajendran and K. R. Kartheeswari
14. Chen R, Liu X, Jin S (2018) Machine learning for drug-target interaction predicition. Molecules 23(9):2208 15. Lianga G, Fanb W, Luoa H, Zhua X (2020) The emerging roles of artificial intelligence in cancer drug development and precision therapy. 128:110255 16. Chen JIZ, Hengjinda P (2021) Early prediction of coronary artery disease (CAD) by machine learning method-a comparative study. J Artif Intell 3(01):17–33 17. Balasubramaniam V (2021) Artificial intelligence algorithm with SVM classification using dermascopic images for melanoma diagnosis. J Artif Intell Capsule Netw 3(1):34–42 18. Sachdev K, Gupta MK (2019) A comprehensive review of feature based methods for drug target interation predicition. J Biomed Inform 93:103159 19. Bhardwaj R, Hooda N (2009) Prediction of pathological complete response after neoadjuvant chemotherapy for breast cancer using ensemble machine learning. 2352–9148 20. Manoharan S (2019) Study on Hermitian graph wavelets in feature detection. J Soft Comput Paradigm (JSCP) 1(01):24–32 21. Sharma A, Rani R (2019) Drug sensitivity prediction framework using ensemble and multitask learning 22. Tana M, Özgüla OF, Bardaka B, Ek¸sio˘glua I, Sabuncuo˘glu S (2018) Drug response prediction by ensemble learning and drug-induced gene expression signatures. Grand No. 115E274 23. Xia F, et al (2021) A cross-study analysis of drug response prediction in cancer cell lines. 1–14 24. Pappala LK, Rajendran PS (2021) A novel music genre classification using convolution neural networks: IEEE conference on communication and electronics system. 7:8–10 25. Rajendran PS, Geetha A (2021) Optimization of hospital bed occupancy in hospital using double deep Q network. International conference on intelligent communication technologies and virtual mobile network (ICICV-2021) pp 4–6 26. Smys S, Chen JIZ, Shakya S (2020) Survey on Neural Network Architectures with Deep Learning. J Soft Comput Paradigm (JSCP) 2(03):186–194 27. Senousy MB, El-Deeb HM, Badran K, Al-Khlil IA (Jan 2012) Ensample learning based on ranking attribute value (ELBRAV) for imbalanced biomedical data classification. 36(1). ISSN1110-2586 28. Sharma A, Rani R (2018) Kernelized similarity based regularized matrix factorization framework for prediction anti-cancer drug responses. KSRMF 1779–1790 29. Emdadi A, Eslahchi C (2020) DSPLMF: a method for cancer drug sensitivity prediction using a novel regularization approach in logistic matrix factorization. Front Genet 11:75. pmid:32174963 30. Suphavilai C, Bertrand D, Nagarajan N (2018) Predicting cancer drug response using a recommender system. Bioinformatics 34(22):3907–3914. pmid:29868820 31. Zhang N, Wang H, Fang Y, Wang J, Zheng X, Liu XS (2015) Predicting anticancer drug responses using a dual-layer integrated cell line-drug network model. PLoS Comput Biol 11(9):e1004498. pmid:26418249 32. Surowiecki J (2014) The wisdom of crowds 33. Garnett MJ et al (2012) Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483:570–575 34. Barretina J, et al. (2012) The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity Nature. 483:603–607 35. Rajendran PS, Anithaashri TP. CNN based framework for identifying the Indian currency denomination for physically challenged people. IOP conference series: materials science and engineering for the publication
Smart Agent Framework for Color Selection of Wall Paintings Mallikarjuna Rao Gundavarapu, Abhinav Bachu, Sai Sashank Tadivaka, G. Saketh Koundinya, and Sneha Nimmala
Abstract Smart color selection agent for wall painting is extremely useful for people while selecting desired colors. In traditional approach, for painting the house/building, the physical agent on behalf of company visits customer site and provides bulky wall painting color catalog. However, the approach many times results customer unsatisfaction due to technical/manual mistakes. ‘Smart Agent Framework’ developed in this paper addresses this problem by providing all details of the selected color at customer site itself. This in turn reduces the time and human effort required for both customer and company. As our framework is built on Python platform, the agent is robust, scalable, and portable. For our experimentation, we have created a dataset containing 860 colors. Since the agent is embedded with Google Text-toSpeech (gTTS), customer will get auditory response for his/her color selection. The three best matches were provided to enhance the satisfaction as well as to avoid manual/technical errors. Keywords Color detection · Text-to-Speech · RGB values
1 Introduction The procedure of detecting the name of any color is known as color detection. The brain and eyes of humans work together to transform light into color stimulus. Firstly, the signal is sent to the brain from the light receptors which are present in eyes of a human being. Then, human brain recognizes the color that is seen. Humans have mapped certain lights with their color names. A similar strategy is used to detect color names. Any color is a mixture of primary colors (red, green and blue). A dataset which contains the color names and its values is used in this project. M. R. Gundavarapu (B) · A. Bachu · S. S. Tadivaka · S. Nimmala Department of CSE, GRIET, Hyderbad, India e-mail: [email protected] G. S. Koundinya SAP Consultant, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_15
219
220
M. R. Gundavarapu et al.
Firstly, the user is prompted to give the image path as input. This is facilitated by creating an argument parser. Additionally, the image is read using the Python OpenCV library. A window with the given input image opens. The RGB values of colors are recognized when the user clicks on any location on the image using the XY coordinates of the current location. The shortest distance is now determined, assisting in the retrieval of the top three colors and simultaneously a voice with the detected color also will play. This is attained with the help of Python gTTS and playsound modules. Color detection is helpful in creating an application which teaches colors to the kids, recognizing objects, and it is also used as a tool in different image editing and drawing apps. Various table text styles are provided. The formatter will need to create these components, incorporating the applicable criteria that follow.
2 Literature Survey Color is defined as the appearance of objects as a result of the various characteristics of light that they reflect or emit. The method of recognizing the color name is known as color detection. For humans, light enters the eye and travels to the retina at the rear of the eye. The retina is covered with millions of light-sensitive cells called rods and cones. These cells help to identify colors. When these cells detect light, they send signals to the brain. Machines, on the other hand, do not work in the same way. They recognize colors based on the data they have. Normally, we define each color value within a range of 0 to 255. There are approximately 16.5 million different ways to represent a color. Humans can identify limited number of colors. Machines, on the other hand, can detect a wide range of colors and distinguish between different variants of the same color. Given how difficult it is for humans to identify color variants, the proposed method is particularly useful while choosing colors for wall paintings. It assists customers in selecting the exact color they require. This approach allows children to improve their color recognition skills by recognizing different hues of color. This application aids blind persons in identifying colors as it is enabled with a voice feature [1]. This application is used in image processing, digital signal processing, and object identification. The project’s main goal is to act as a guide for choosing colors for wall paintings. Color choices may not be correct if people are involved. For the same color, there are numerous variants or shades. There is a potential that the staff misunderstands the color scheme chosen by the consumer. As a result, this method removes such inaccuracies and delivers not only the best color, but also two more colors that are similar to the original color. This gives customers a range of options to choose from while still getting the intended result. The voice option will be an extra feature because there are some color names that are difficult to pronounce for many people. The speech option aids in accurate pronunciation and assists consumers in accurately conveying the color name. The application also possesses few real-world applications. Along with object detection, color detection in real time is critical for a robot to view the environment.
Smart Agent Framework for Color Selection of Wall Paintings
221
Color detection helps to detect the traffic signals in self-driving cars like Tesla [2]. It reduces human effort and will be accurate. Many people now a days disregard traffic laws by jumping signals; nevertheless, this feature in self-driving cars requires people to obey traffic laws by requiring them to stop at red lights. It can also be utilized in sectors where machines are used to accomplish tasks such as picking and placing products in different colored objects. It saves time and improves work efficiency. It is also cost effective because it only requires a single payment. It saves money on labor expenditures, and the risks of error are negligible when compared to humans. People may make mistakes when allocating items to objects, but when a machine is involved, the possibilities of errors are minimal. It is usual to discover applications in image processing and computer vision, where it is important to recognize reference points with extreme color, such as a primary color RGB or complementary cyan, magenta, and yellow (CMY) with very high saturation. As a result, there are instances where a group of objects can be differentiated by their distinctive extreme color, which can be used as to identify objects. The significance of color abstraction has been dealt in previous research papers [3–5].
3 Proposed System Architecture Architectural diagram is a representation of a system which helps in summarizing the entire outlines of programs, associations, conditions, and divisions between elements. It allocates an entire view of the concrete arrangement of the software system and its plan of advancement. The architecture depicted in Fig. 1 demonstrates the project’s characteristics. It is portrayed as a well-defined model that hides the source code. It helps us to measure the performance of the application. The performance is determined by computing the time required to determine the possible colors. It also emphasizes the importance of extensively used modules such as OpenCV and Pandas.
Fig. 1 System architecture
222
M. R. Gundavarapu et al.
The identified dataset is read by the application using the Pandas library. The colors dataset which is used in our application has R, G, and B values and hexadecimal code for all the colors. These values are useful for comparison and help in detecting the user-selected color with ease. Initially, when the user provides image as input to the application, it displays the image to the user using draw function. After the user sees the given input image and double clicks on any desired point of the image, the application detects the coordinates and calculates the shortest distance. The coordinates extraction is done with the help of mouse click event. The calculated distance is useful to identify how close we are present to the color in the dataset and chooses the three colors having minimum distance. After the colors are detected, they are displayed on the screen along with RGB values, and the user will be able hear the text (name of the color along with RGB values) in the form of speech. This is implemented with the help of gTTs and playsound modules. The stored audio files satisfy the requirement of mp3 format.
4 Development Framework 4.1 Software Installation Step-1: Visit the following website https://www.python.org/downloads/release/pyt hon-396/ Step-2: Download Python software according to your system requirements Step-3: Now run the downloaded installer a. b.
Right click on the installer Select Run as Administrator and click on Yes Step-4: Now Python installer wizard appears
c. d.
Check the option Add Python 3.9 to PATH Click on Install Now Step-5: After the setup was successful, click on close Step-6: Open command prompt
e. f.
On the taskbar, to the bottom left, there will be an option to search. Type cmd in search and open the command prompt. Execute the following command to check if Python is properly installed. • Python version
g.
If the version is displayed, then the Python is properly installed Step-7: Execute the following command to update pip
• pip install–upgrade pip
Smart Agent Framework for Color Selection of Wall Paintings
223
Step-8: Install Pandas by executing the following command. • pip install pandas Step-9: Install gTTS by executing the following command. • pip install gtts Step-10: Install playsound by executing the following command. • pip install playsound Step-11: Install OpenCV by executing the following command. • pip install opencv-python Step-12: Install argparse by executing the following command. • pip install argparse
4.2 Execution 1.
2.
3.
4. 5.
Image Input: The initial step is to take an image as input from the user. To get better results, a high-quality image with good resolution is preferred. Image input will be taken from the command prompt, and the user has to make sure that image should be present in the current working directory or the full path of the image must be specified. The aforementioned procedure is carried out with the assistance of the argparser module, which extracts the image from the command prompt. Loading Image: Using Python’s OpenCV module, the image from the previous step is now loaded into the application. Cv2.imread() function is used to load the image from the file. The application and image should be present in the same directory or the entire location of the image must be specified. image = Cv2.imread(image location). Color Recognition: In this phase, we must teach the application to recognize colors. This can be accomplished by establishing a dataset including the names of the colors as well as their RGB values, which can then be mapped to the RGB values obtained. We used RGB format as our datapoints because many of the colors are specified by red, green, and blue. We have built a dataset which contains 865 different colors. Loading Dataset: The Pandas module is used to load the external dataset into the program. The read_csv() method in Pandas assists in loading the dataset. Extracting RGB Values: In this stage, when the user double clicks on any part of the image, the XY coordinates of the current location are extracted and are converted into RGB values using pixel() method or by arraying of XY coordinates. The obtained RGB values are then passed to another function to obtain the possible colors. This process gets executed when a mouse click event occurs.
224
6.
7.
8.
9.
M. R. Gundavarapu et al.
b, g, r = image [X, Y]. (or). RGB-value = Image.getpixel(X, Y). Obtaining Best Possible Colors: The obtained RGB values are passed to another function which calculates the minimum distance. In this stage, the dataset is iterated in such a way that three possible RGB values which are closest to the obtained RGB values are extracted. The resulting RGB values are used to obtain their respective color names from the dataset. The color names are appended into a list in the decreasing order of their priority. Displaying Color Names: In this phase, the color names for the list obtained in the above process are extracted and displayed on the screen in decreasing order of their priority separated by a new line. The color names are also accompanied by the RGB values of the current location. This process is executed with the help of OpenCV’s PutText function. Converting Text-to-Speech: Finally, the color names which present in the text format are converted into and audible object using gTTS module. Then it is played using playsound module. When the user double taps on any region of the image, three most probable colors and RGB values appear, which can also be heard as speech. And the time complexity of entire code was very speedy as compared to previous approaches which took 1–2 min. Exiting the Application: Not only should an application be able to work well, but it should also be simple to exit. As a result, we designed a feature that closes the applications when the user presses the escape key.
5 Existing Approaches There a quite a few approaches. A new scheme is introduced to improve the accuracy of face detection using RGB color space which results in improvement of equal error rate from 3.3% to 1.8% [6]. Another approach is by calculating hue-saturation-value. This technique converts the XY coordinates of the chosen place to RGB values, which are then converted to HSV (hue-saturation-value) color system [7] (Fig. 2). Other approach is that an algorithm is used for edge detection of a color image using threshold technique [8]. A method is proposed for recursive detection of edges in color image based on green’s function approach in color vector space field [9]. Another technique is color detection using hierarchical neural networks in RGB space [10]. A color recognition system is built using finger interaction method for object color detection [11]. An approach is proposed which detects the color in RGBmodeled images using MATLAB [12]. Finally, an approach uses computer vision for color detection [13]. Color components have been successfully explored even in vegetables classification. [14, 15]. The methodology used for fire detection as well as relationship between color and antioxidant capacity of fruits and vegetables by color pixel classification. The authors [16–18] color coding and image coding have been dealt in detail.
Smart Agent Framework for Color Selection of Wall Paintings
225
Fig. 2 Colour compositions
6 Experimental Results Initially, we are using argparse module to create an argument parser which enables us to directly give the location of the image from the command prompt. Then we read the location of the image using OpenCV’s ‘imread’ method and store it in a variable. With the help of Pandas module, the command (pd.read_csv) reads the dataset which is in form of CSV file and then loads it into the Pandas DataFrame. We have assigned a name to each column for easy accessing. Now we need to create a window which displays the input image. To develop this functionality, we used draw_function whenever a mouse event occurs. The ‘draw_function’ will also be useful to calculate the RGB values of the selected portion of the image. The function parameters are the name of the event and coordinates of the pixel. When the event is double clicked, then we calculate the RGB values of the pixel using the coordinates. Now we define another function, which will return the color name form the CSV file by using the calculated RGB values. To get the name of the color, we calculate a distance which tells us how close we are to the color and chose the one having least distance. Finally, when a double click event occurs, we draw a rectangle using ‘cv2.rectange’ method and get the name of the color to draw text on the rectangle using ‘cv2.putText’ method.
226
M. R. Gundavarapu et al.
Fig. 3 Original image of wall painting
Finally, we convert the detected color into voice using gTTS module and play it using ‘playsound()’ method. Thus, when we double-click on any part of the image, we will be able to view the name of the color along with RGB values and also perceive it. When the user presses escape key, the application ends. We have to make sure that we give the image path using ‘-i’ argument and also provide the exact location of the image. Figure 3 depicts the image input given to the application. This is displayed to the user with the help of draw function. Figure 4 depicts the output image with three possible colors of the user-desired location. Along with the text displayed, the output will also come in form of voice. Here Pear, Amarnath, and Almond are the possible colors for the user-selected point. Figure 5 depicts the image input given to the application. This is displayed to the user with the help of draw function. Figure 6 depicts the output image with three possible colors of the user-desired location. Along with the text displayed, the output will also come in form of voice. Here Pine Green, Aliziran Crimson, and Alice Blue are the possible colors for the user-selected point.
Smart Agent Framework for Color Selection of Wall Paintings
Fig. 4 Output image with three possible colors of the specified location with speech
Fig. 5 Original image of wall painting
227
228
M. R. Gundavarapu et al.
Fig. 6 Output image with three possible colors of the specified location with speech
7 Pseudocode //Load the image to identify the color Img=cv2.imread(image_path) // load data set using pandas cds = pd.read_csv(‘colors.csv’) // Extract R, G, B components of the selected image portion using in components function [R G B] = imcomponents(Image) // Train the RGB dataset // Obtain true color three best matches for testing place of image using Euclidian distance d=abs(R-int(csv.loc[i,”R”]))+ abs(G-int(csv.loc[i,”R”]))+abs(int(csv.loc[i,”R”])); // generate auditory response using GTTS and play sound myobj=gTTS(text=mytext,lang=language,slow=False) playsound(name) // Display original image with true color values
8 Conclusion and Future Scope In this project, by using concepts of Python such as Pandas, gTTS, playsound, OpenCV, and ArgParse, we are able to successfully. • Detect around 850–860 colors correctly.
Smart Agent Framework for Color Selection of Wall Paintings
229
• Detect the possible colors pointed in the image input given by the user. • Display the color detected in form of text, and simultaneously play the generated voice output which reads out the name of the color. This project can further be enhanced by converting this project into an app, and it would be more convenient for the user for using the developed features and by converting this project into a machine learning model, to make the color detection more efficient and also decrease the response time. Data science field is mainly useful to imitate human tasks. Vision and speech are two essential components of human interaction, which data science has already begun to mimic. The main objective of this project is to detect a color and get to know about different shades of a color. There are 16.5 million ways to represent a color, though we did not use all the colors, but this project gives an idea about the basic colors and its different variations. Color detection is essential for recognizing objects, and it is also used as a tool in various drawing and image editing apps. There are different color names for which we do not know the correct pronunciation. The voice function makes it possible to know the exact pronunciation which is easily understandable. This project is helpful for any different applications like segmentation, image matching, recognition of objects, and visual tracking in the fields image processing and computer vision.
References 1. Navada BR, Santhosh KV, Prajwal S, Shetty HB (2014) An image processing technique for color detection and distinguish patterns with similar color: an aid for color blind people. International conference on circuits, communication, control and computing, pp 333–336. https://doi.org/ 10.1109/CIMCA.2014.7057818 2. Fleyeh H (2004) Color detection and segmentation for road and traffic signs. Cybernetics and intelligent systems 2004 IEEE conference on, vol 2, pp 809–814 3. Kumar ST (2021) Study of retail applications with virtual and augmented reality technologies. J Innovative Image Process (JIIP) 3(02):144–156 4. Kulshreshtha K, Niculescu AI, Wadhwa B (2017) On the design and evaluation of Nippon paint color visualizer application—a case study. IFIP conference on human-computer interaction, pp 372376 5. Podpora M, Korbas GP, Kawala-Janik A (2014) YUV vs RGB-choosing a color space for human-machine interaction. FedCSIS position papers, pp 29–34 6. Bours P, Helkala K (Aug 2008) Face recognition using separate layers of the RGB image. Information hiding and multimedia signal processing 2008. IIHMSP 08 international conference on, pp 1035–1042 7. Sebastian P, Voon YV, Comley R (June 2008) The effect of color space on tracking robustness. Industrial electronics and applications 2008. ICIEA 2008. 3rd IEEE conference on, pp 2512– 2516 8. Dutta S, Chaudhuri BB (2009) A color edge detection algorithm in RGB color space. International conference on advances in recent technologies in communication and computing, pp 337–340 9. Zareizadeh Z, Hasanzadeh RPR, Baghersalimi G (2013) A recursive color image edge detection method using green’s function approach. Optik—Int J Light Electron Optics 124(21):4847– 4854
230
M. R. Gundavarapu et al.
10. Altun H, Sinekli R, Tekbas U (2011) An efficient color detection in RGB space using hierarchical neural network structure. International symposium on innovations in intelligent systems and applications (INISTA), pp 154–158 11. Manaf AS, Sari RF (2011) Color recognition system with augmented concept and finger interaction. Ninth international conference on ICT and knowledge engineering, pp 118–123 12. Duth PS, Deepa MM (May 2018) Color detection in RGB-modeled images using MAT LAB. Int J Eng Technol [S.l.], 7(2.31):29–33. ISSN 2227-524X 13. Senthamaraikannan D, Shriram S, William J (2014) Real time color recognition. Int J Innovative Res Electr, Electron, Instrum Control Eng 2(3) 14. Manan A, Bakri NS, Adnan R, Samad, Ruslan FA. A methodology for fire detection using colour pixel classification. 2018 IEEE 14th international colloquium on signal processing and its applications (CSPA) 15. Comert ED, Mogol BA, Gokmen V (June 2020) Relationship between color and antioxidant capacity of fruits and vegetables. Curr Res Food Sci, Elsevier 2:1–10 16. Pasumpon (2021) Review on image recoloring methods for efficient naturalness by coloring data modeling methods for low visual deficiency. J Artif Intell 3(03):169–183 17. Bianco S, Gasparini F, Schettini R (2015) Color coding for data visualization. Encyclopedia of information science and technology, pp 1682–1691 18. Sathesh A (2020) Light field image coding with image prediction in redundancy. J Soft Comput Paradigm 2(3):160–167
Brain MR Image Segmentation for Tumor Identification Using Hybrid of FCM Clustering and ResNet P. S. Lathashree, Prajna Puthran, B. N. Yashaswini, and Nagamma Patil
Abstract The current study focuses on brain tumor segmentation which is a relevant task for processing a medical image. The treatment possibilities can be increased by diagnosing the brain tumor at early stages which also helps in enhancing the survival rates. It is a hard and time-consuming task to perform the processes like brain tumor segmentation from huge number of magnetic resonance imaging (MRI) images created in laboratories manually for cancer diagnosis. Hence, there arises a need for the segmentation of brain images for tumor identification. In this paper, we attempted to address the existing problems and generate the image segmentation using fuzzy C-means (FCM) clustering method and then the classification of brain MR images using ResNet-50 model. A dataset provided by Kaggle comprising brain MR images has been used in training and testing the model. The image segmentation results are compared with the K-means clustering method using various performance metrics. The maximum accuracy of 91.18% has been observed at 350 epochs by the proposed method. Keywords Brain image segmentation · Fuzzy C-means clustering method · ResNet-50 · Tumor identification · MRI images
1 Introduction With expanding utilization of computed tomography (CT) and magnetic resonance imaging (MRI), it is practically mandatory to utilize computers to help clinical experts for determination, treatment arranging, and clinical examinations [1]. Segmentation is the way toward separating an image into regions with similar and comparable properties such as dark level, splendor, surface, shading, and differentiation. The P. S. Lathashree (B) · P. Puthran · B. N. Yashaswini · N. Patil Department of Information Technology, National Institute of Technology, Surathkal, Karnataka 575025, India e-mail: [email protected] N. Patil e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_16
231
232
P. S. Lathashree et al.
methods currently present for segmentation of medical images are specific to application, imaging sensory system, and body part type to be studied. Brain tumors are generally grouped into two types, Benign tumors, which are slow-growing stagnant cancerous tumor, and Malignant tumors, which are cancerous tissues that grow at fast pace. The majority of the tumors that exist are dangerous. Therefore, there is a need for quick diagnosis for which the tumor detection with the highest precision is recommended. Diagnostic methods are of various types, out of which performing images are used widely. In this type of diagnostic technique, X-rays, ultrasounds, and MRI are included. Earlier there were many conventional methods to perform the detection which have now been replaced by many computational and machine learning methods [7, 10]. This is due to the fact that manual methods proved to be time-consuming and inconsistent with room for errors. Also, these systems depend on reliable and well-performing individuals. Imaging tumors with more exactness assumes crucial job in the diagnosis of tumors. MRI being one of the significant means for examining the body’s instinctive structures is progressively sensitive in distinguishing cerebrum anomalies during the early stages of disease and provides the best results in recognition of samples of cerebral dead tissue, tumors, and infections. The brain images are collected utilizing MRI, which are inclined to noise and curios, for example, marks and force varieties during acquisition. Along with that, there are numerous structures in the brain image, for example, cerebrospinal liquid, dark tissue, white tissue, and skull tissues separated from the tumor. Regardless of specific favorable circumstances, the most widely recognized issue with common systems is their failure to anticipate tumor with high precision. There are wide variety of approaches that have been used for the MRI segmentation [13, 14] which can further be extended to get more accurate results. One of the important objectives of the segmentation includes the abnormality detection in the image. The effectiveness of the abnormality detection depends on the algorithm used for the segmentation. Hence, neural networks have proved to be successful in image segmentation [18]. However, they rely on simple feed-forward methods which have not met the real needs of the medical field’s rapid growth. Many methods using neural networks tend to provide with better results than any other basic methods in case of the image [16]. One of the methods is the clustering algorithm in which a set of items is grouped into continuous regions of varying densities. The data points are grouped in such a way that points having similar features belong to the same group and the ones in different groups are highly dissimilar. Unlike the hard clustering algorithms where each data point belongs to only one cluster, in case of soft clustering algorithms one data point may belong to more than one cluster which makes the latter more natural and essential [3]. In the clustering methods, FCM-based algorithms have been prominent in medical image segmentation as they introduce fuzziness for the inclusion of each pixel in an image. This enables them to retain more information from the original image when compared to hard clustering methods. Considering the above points FCM is chosen as the segmentation algorithm, and ResNet-50 is chosen as the classification model in our proposed method for brain
Brain MR Image Segmentation for Tumor Identification …
233
MR image segmentation. The purpose of this study is to address the previously mentioned confinements in existing strategies and to improve the precision of tumor identification using ResNet-50 model. The clustering technique using FCM algorithm enhances the segmentation performance to a higher level. Segmentation is performed for the images of the considered dataset which is later on executed for classification using ResNet-50. The remaining sections in the paper are organized as follows. Related work and motivation are discussed in Sects. 1.1 and 1.2, respectively, followed by proposed methodology in Sect. 2. Section 3 contains results and discussion. Eventually, Sect. 4 accounts for the conclusion and future work.
1.1 Related Works There are different existing procedures which are utilized for medical image segmentation. These procedures can be drawn closer from two essential methodologies of division, for example, area-based or edge-based methodologies. Each procedure can be applied on various pictures to perform required division. Aslam et al. [2] proposed a modified brain tumor segmentation method using edge detection. They used a combination of both the sobel edge detection method and image-dependent thresholding method, which uses closed contour algorithm to find different regions. Using the intensity information in the closed contours, tumors are finally removed from the image. This was one of the basic methods of image segmentation which provided decent results. Moeskops et al. [12] suggested a brain image segmentation strategy by using a convolutional neural network (CNN) to classify the image into different tissue groups. The CNN uses various convolution kernel sizes and different patch sizes to obtain details about every voxel in multi-scale to ensure accurate segmentation details along with spatial accuracy. The system does not rely on specific features, but on training data, it learns to recognize important information required for classification. However, this strategy did not work for brain images of all the types but only for single anatomical brain image. Selvaraj et al. [17] have reviewed various approaches of brain MR image segmentation algorithms like thresholding, classification techniques, clustering techniques, region-based segmentation methods, edge-based segmentation methods and various feed-forward and feedback artificial networks. They are analyzed and reviewed providing detailed inference about the working, their benefits and drawbacks. He states that it is quite difficult to attain a common method that can generally be used for all types of the MRI brain images, the reason which he states in the paper is that the performance of a particular segmentation method may provide good results for the MRI brain image but it cannot guarantee good results for the images of same type. Pham et al. [15] described image segmentation in different imaging modalities along with the difficulties encountered in each modality. In his work, he also reviewed the image segmentation approaches with an emphasis placed on revealing the advan-
234
P. S. Lathashree et al.
tages and disadvantages of these methods for medical imaging applications. Chen et al. [5] suggested the Dense-Res-Inception Net (DRINet) novel CNN architecture. The proposed method addresses the problem of learning distinctive features when there are differences among various categories with respect to intensity, shape, size, and location. The network is made up of three blocks, a dense-connected convolution block, a deconvolution block with modules of residual inception, and a block of unpooling. It surpasses U-Net in brain tumor segmentation. Samuel [11] described various extreme learning machine (ELM) algorithms useful for different classification problems and compared their accuracy and execution times. The chapter describes methods to improve a machine learning algorithm in order to achieve better accuracy. Karuppusamy [9] suggested novel optimization model for brain tumor detection. The foraging process is incorporated into the model, and the best features are obtained to analyze the image. In his work, he also stated that the proposed optimized hybrid model is superior than other conventional methods. It is inferred that many methods fail to work on all types of brain images and restrict only to few anatomical brain images. Using basic approaches in the methods gives decent results. Hybrid optimized models perform better than conventional methods with better accuracy.
1.2 Motivation and Contribution 1. After going through various works and methods, we infer that deep learning with CNNs is successful in automatic segmentation of medical images. 2. The major drawbacks observed in the segmentation process are the poor quality of images and variation of images among the patients, and these are the sensitive complications which needs to be taken care of. 3. Most of these algorithms need comprehensive monitoring and extensive training. The algorithm’s efficiency depends on the training method and the training data. 4. Also, the testing and the training time must be managed using decent number of parameters. 5. Under certain circumstances difficulties arise in correctly selecting and labeling data, and also in complex structure segmentation with different shape, properties, and size. Hence, an unsupervised method, like FCM clustering method, is suitable in such situations.
2 Methodology The proposed approach has been exhibited using a flow diagram as shown in Fig. 1, followed by a detailed description of each step in the process. Figure 2 shows the image segmentation using the FCM algorithm, where one of the images containing
Brain MR Image Segmentation for Tumor Identification …
235
Fig. 1 Proposed method for brain tumor identification
the tumor is processed using the FCM algorithm which provides the output image as shown in the figure. 1. Conversion of the input images to a pixel matrix: Initially, the brain tumor images are given as an input to the FCM clustering method and they are converted into pixel matrix according to the pixel values of the images. The dimensions of the original image data determine the size of matrix. This is done for the succeeding mathematical operations to be carried out on the image for segmentation. This is one of the crucial steps as it decides the output image. Any errors made here will result in less accuracy. 2. Segmentation of the image using FCM clustering: The segmentation of the brain image is done using FCM clustering method. (a) Initialize the Clusters: Randomly ten clusters are formed using the cluster centers, and all the pixel values of the image are grouped in corresponding
236
P. S. Lathashree et al.
Fig. 2 Image segmentation using FCM
clusters using the membership function. All the pixel points of the image are considered, and the contribution of each point to the centroid is weighed by its membership degree which is carried out in the FCM algorithm.The steps involved in FCM clustering algorithm are as follows Let XP = {x1 , x2 , x3 , . . . , xn } and VP = {v1 , v2 , v3 , . . . , vc } be the set of data points and centers, respectively. i. Select clusters randomly where size = c. Objective function to be minimized, n c 2 H (U, V ) = ((u i j )m ) xi − v j (1) i=1 j=1
m is the fuzziness index m [1, ∞] and membership of ith data to jth cluster center is u i j . ii. Calculate the fuzzy membership matrix U by calculating u i j using: 1 (2/m−1) k=1 (di j /dik )
u i j = c
(2)
where Euclidean distance between ith data and jth cluster center is di j and the Euclidean distance between ith data and kth cluster center is dik . iii. Compute the fuzzy centers v j using: vj =
n i=1
(u i j ) xi m
n m (u i j ) , ∀ j = 1, 2, . . . , c / i=1
where v j represents the jth cluster center.
(3)
Brain MR Image Segmentation for Tumor Identification …
237
iv. Steps (2) and (3) are repeated till the ‘H’ value is minimized or U (k+1) − U k < δ where δ is the termination factor which exists between [0, 1]. (b) Selection of finest clusters: After the cluster formation, the finest clusters are considered which have highest number of pixels in it. Taking into consideration the lower Davies–Bouldin (DB) index [6] value, as shown in Table 2, and a better segmentation result, finest clusters are chosen and the colors are added for each of the selected clusters’ points for the output segmented image. (c) Validation of clusters’ compactness: Analysis of the cluster validity and compactness is done using the DB index whose value is inversely proportional to the clusters’ compactness. Usually, DB index value must be low if the clusters formed are valid and more compact. (d) Segmented image given as output: Each of the pixels in the selected clusters is given as an output in the form of an image wherein the clusters’ pixel values are changed to white, black, and blue pixel values in decreasing order of pixel count, which gives the segmented image in three colors. 3. Image classification using ResNet-50: The output images of the segmentation process are fed to the ResNet-50 model for classification. ResNet-50 model is used for the image classification which has two classes, namely ‘yes’ and ‘no’. ResNet models have proved to perform better than some of the other classification methods. The reasons for using the ResNet-50 model for classification are that the deep neural networks pose a problem of difficulty in training and convergence and thus a degradation of accuracy [8]. But these problems are resolved by residual learning used by the ResNet model which converges more rapidly, and the network can easily learn rich feature representations for a wide range of images. ResNet follows the concept of skip connection which stacks convolution layers and also adds the first input to the yield of the convolution block. Skip connection is applied before the RELU activation. Figure 3 shows how skip connections are done in ResNet [8]. ResNet solves the problem of vanishing gradient by using the identity matrix, where the back-propagation is carried out by the identity function. This helps in preserving the input information and avoiding any loss of information. The ResNet-50 model comprises of five sequences with convolution and identity block in each sequence phase. Figure 4 shows the ResNet-50 model [8]. They overcome the issue of vanishing gradient by permitting the other alternate route for slope to stream through. Also, they enable the model to get familiar with an identity function which guarantees that the higher layer will perform at any rate in the same class as the lower layer, and not worse. Table 1 shows the parameter details of the ResNet-50 model utilized in the proposed approach. The train–test split was chosen to avoid overfitting the model by training on the entire dataset.
238
P. S. Lathashree et al.
Fig. 3 Skip connection in ResNet
Fig. 4 ResNet-50 model Table 1 Parameter specification of ResNet-50 model Parameters Values No. of epochs Batch size Train–Test Loss Optimizer
350 10 4:1 Categorical crossentropy Adam
3 Results 3.1 Dataset Description Dataset for brain tumor detection provided by Kaggle comprising MRI images of brain [4] was used in the proposed method. The dataset contains 252 brain MRI images where 98 images are without brain tumor and 155 images are with brain tumor. Among 252 images, 184 images were used for training and remaining were used for testing the ResNet-50 model. Figure 5a, b shows the dataset’s image with brain tumor and without brain tumor, respectively, with the brain tissues. The figures are provided as example images of the dataset.
Brain MR Image Segmentation for Tumor Identification …
(a) Image with Brain Tumor
239
(b) Image without Brain Tumor
Fig. 5 Dataset MRI images [4]
3.2 Performance Metrics Performance metrics used in the proposed model are as follows: 1. DB index which is a measurement used in many clustering algorithms in which grouping of objects is done based on intra-cluster similarity and inter-cluster difference. Equation 4 is used for computing the DB index value. DB index(U ) = 1/k
k
maxi= j
i=1
X i + X j δ Xi , X j
(4)
where k is the number of clusters, δ X i , X j is the distance between cluster X i and X j which is the inter-cluster distance. X k is the distance within the cluster X k which is the intra-cluster distance of cluster X k . 2. Performance metrics used for the classification model are as follows: Accuracy =
TP + TN TP + TN + FP + FN
(5)
TP TP + FP
(6)
Precision = Recall =
TP TP + FN
(7)
240
P. S. Lathashree et al.
Table 2 DB index for different number of clusters Number of clusters DB index 5 10 12 15 20 30
0.300 0.208 0.186 0.161 0.133 0.103
F1-Score = 2 ∗
Precision ∗ Recall Precision + Recall
(8)
In the above equations, TP, TN, FP, and FN are True Positive, True Negative, False Positive, and False Negative, respectively.
3.3 Results and Analysis Figure 6a, b shows the output segmented image results. Figure 6a clearly shows the brain tumor region in the left center of brain. Figure 6b clearly shows no brain tumor exists in the brain. Homogeneous colors in the segmented image correspond to separate clusters and hence similar regions in the image.
(a) Result of Figure 5a Fig. 6 Segmentation results
(b) Result of Figure 5b
Brain MR Image Segmentation for Tumor Identification …
241
Table 2 shows the DB index for different number of clusters, where DB index value keeps decreasing with the increase of clusters. The least value and greatest value of the DB index are 0.103 with the cluster number as 30 and 0.300 with the cluster number as 5, respectively. However, the best clusters are the ones whose intersimilarities are high and intra-similarities are less. Homogeneity inside the clusters and heterogeneity among the clusters are important factors which affect the DB index value. Clustering is better if the DB index value is lower, but a lower value recorded by this approach may not always mean the best segmentation. So taking into consideration both DB index value and segmentation results, the number of clusters for FCM clustering is chosen as ten. The results of FCM clustering for image segmentation give the pixel matrix of the image, the cluster centers of all ten clusters of the image, and the DB index value for the finest clusters that are selected at the end which are later obtained as output image. Table 3 shows the performance of the ResNet-50 model with varying number of epochs. The number of images used for training and testing is 184 and 68, respectively. In the proposed method, accuracy is used as the main metric for validation of model’s performance and comparison with other methods as it is one of the best performance metrics for any classification model [18]. Maximum accuracy of 81.08% has been observed at 350 epochs. Figure 7a, b is the bar graphs plotted for precision, recall, and F1-score values for different epochs for both FCM and K-means models, respectively. As these metrics are used for the measure of quality of our prediction, better values indicate how accurate and precise is the model.
3.4 Comparison of the Results In our study related to medical image segmentation, we compared FCM clustering method with K-means clustering method [19]. Clear comparison between the FCM model and K-means model can be done which shows that at the long-run FCM performs better than K-means for the image segmentation. Maximum training accuracy of 81.08% with testing accuracy of 91.18% has been observed by using FCM clustering method at 350 epochs for the input dataset while the K-means clustering method resulted in 68.42% training accuracy with testing accuracy of 76.12% at 350
Table 3 Performance of ResNet-50 model for FCM (in %) Epochs TrA TeA P 20 45 50 100 350
78.37 75.67 75.68 73.16 81.08
58.82 66.18 77.94 76.47 91.18
100 97.62 96.15 96.08 95.24
R
F
55.55 65.08 79.36 77.77 95.24
71.43 78.09 86.96 85.96 95.24
TrA = Training Accuracy, TeA = Testing Accuracy, P = Precision, R = Recall, F = F1 Score
242
P. S. Lathashree et al.
(a) FCM
(b) K-means Fig. 7 Precision, recall, and F1-score graph Table 4 Performance of ResNet-50 model for K-means (in %) Epochs TrA TeA P 20 45 50 100 350
73.68 62.16 68.42 75.33 68.42
59.70 71.64 86.57 88.06 76.12
100 100 98.18 98.21 100
R
F
56.45 69.35 87.09 88.71 74.19
72.16 81.90 92.31 93.22 85.18
TrA = Training Accuracy, TeA = Testing Accuracy, P = Precision, R = Recall, F = F1 Score
epochs. Both the techniques are compared with different number of epochs, and the results are plotted as a graph which is shown in Figure 9a, b. Results clearly show that FCM gives better accuracy than K-means when used for the brain MRI image segmentation. Table 4 shows the performance of the ResNet-50 model for the K-means clustering method.
Brain MR Image Segmentation for Tumor Identification …
243
Fig. 8 ROC curve
Figure 8 is the receiver operating characteristic (ROC) curve which is a performance measurement for classification problems at various threshold values. Area under the curve (AUC) gives the usefulness of tests carried out for any model, where the greater area means more usefulness of the carried tests. Initially, K-means result in better performance but at the long-run FCM performs better.
4 Conclusion and Future Works The paper focuses on brain image segmentation for tumor identification using FCM clustering technique and further binary classification of tumors using ResNet-50 model. The model is formulated by incorporating the segmented images by FCM algorithm to the ResNet-50 model. Brain MRI images were considered for the experiment. The validation of the clusters’ compactness has been done using the DB index values which provided low values. The images obtained are further fed to the ResNet50 model. A comparative study has been made between the FCM and K-means model. Various performance metrics were taken for the evaluation of the model. This shows that FCM being a soft clustering method performs better than K-means which is a hard clustering algorithm. Outcomes show that the proposed technique works well in classifying the test images with an accuracy of 91.18%. Future work is to improvise the image segmentation model so that bettersegmented images can be used for further classification model to obtain better accuracy than the currently proposed method.
244
P. S. Lathashree et al.
(a) Training Accuracy
(b) Testing Accuracy Fig. 9 Comparison of accuracy results
References 1. Alqazzaz S, Sun X, Yang X, Nokes LDM (2019) Automated brain tumor segmentation on multi-modal MR image using SegNet. Comput Visual Media 5:209–219 2. Aslam A, Khan E, Beg MS (2015) Improved edge detection algorithm for brain tumor segmentation. Procedia Comput Sci 58:430–437. In: Second international symposium on computer vision and the internet (VisionNet’15) 3. Bora D, Gupta D (2014) A comparative study between fuzzy clustering algorithm and hard clustering algorithm. Int J Comput Trends Technol 10:108–113 4. Chakrabarty N (2019) Brain MRI images for brain tumor detection. Accessed 20 Oct 2019 5. Chen L, Bentley P, Mori K, Misawa K, Fujiwara M, Rueckert D (2018) Drinet for medical image segmentation. IEEE Trans Med Imaging 37(11):2453–2462
Brain MR Image Segmentation for Tumor Identification …
245
6. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell PAMI-1(2):224–227 7. Despotovi´c I, Goossens B, Philips W (2015) MRI segmentation of the human brain: challenges, methods, and applications. Comput Math Methods Med 2015:1–23 8. Jay P (2018) Understanding and implementing architectures of ResNet and ResNeXt for stateof-the-art image classification: from microsoft to facebook [Part1]. Accessed 12 Nov 2019 9. Karrupusamy D (2020) Hybrid manta ray foraging optimization for novel brain tumor detection. J Trends Comput Sci Smart Technol 2:175–185 10. Kurmi Y, Chaurasia V (2018) Multifeature-based medical image segmentation. IET Image Process 12(8):1491–1498 11. Manoharan J, Samuel D (2021) Study of variants of extreme learning machine (elm) brands and its performance measure on classification algorithm. J Soft Comput Paradigm (JSCP) 03:83–95 12. Moeskops P, Viergever MA, Mendrik AM, de Vries LS, Benders MJNL, Išgum I (2016) Automatic segmentation of MR brain images with a convolutional neural network. IEEE Trans Med Imaging 35(5):1252–1261 13. N, S, Rajesh R (2011) Brain image segmentation. https://www.researchgate.net/publication/ 252069542_Brain_Image_Segmentation 14. Pereira S, Pinto A, Alves V, Silva CA (2016) Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging 35(5):1240–1251 15. Pham DL, Xu C, Prince JL (2000) Current methods in medical image segmentation. Annu Rev Biomed Eng 2(1):315–337 PMID: 11701515 16. Queen M (2014) A survey on brain tumor classification using artificial neural network 17. Selvaraj D, Dhanasekaran R, Ammal S (2013) Mri brain image segmentation techniques—a review 18. Sharma N, Jain V, Mishra A (2018) An analysis of convolutional neural networks for image classification. Procedia Comput Sci 132:377–384. In: International conference on computational intelligence and data science 19. Wilkin GA, Huang X (2007) K-means clustering algorithms: Implementation and comparison. In: Second international multi-symposiums on computer and computational sciences (IMSCCS 2007), pp 133–136
A Review on Automated Algorithms Used for Osteoporosis Diagnosis Gautam Amiya, Kottaimalai Ramaraj, Pallikonda Rajasekaran Murugan, Vishnuvarthanan Govindaraj, Muneeswaran Vasudevan, and Arunprasath Thiyagarajan
Abstract Osteoporosis is a major public health issue requiring significant resources to address the immediate and long-term consequences of fractures. Only a few research studies have been conducted on the efficacy and importance of various diagnostic factors in the clinical evaluation of patients. This research compares the performance of several state-of-the-art techniques for predicting osteoporosis risk from multimodal images. Consequently, there is a lot of literature on segmentation, employing region growth, conventional machine learning, and deep learning techniques. Likewise, several tasks in the domain of bone disease detection have been completed, with outstanding performance outcomes. This study aims to present a comprehensive assessment of recently proposed strategies, considering state-of-theart methodologies and their performance. The existing methods addressed in this analysis also cover technological aspects such as the pros and cons of methods, pre-and post-processing methodologies, datasets, feature extraction, and standard evaluation metrics.
G. Amiya Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India e-mail: [email protected] K. Ramaraj (B) · P. R. Murugan · M. Vasudevan Department of Electronics and Communication Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India e-mail: [email protected] P. R. Murugan e-mail: [email protected] M. Vasudevan e-mail: [email protected] V. Govindaraj · A. Thiyagarajan Department of Biomedical Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India e-mail: [email protected] A. Thiyagarajan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_17
247
248
G. Amiya et al.
Keywords Osteoporosis · Bone mineral density · Soft computing techniques
1 Introduction Osteoporosis (OP) is a common bone disease occurring due to low bone mineral density [1]. Asians, for example, have a higher risk of developing OP, especially once they reach the age of 35. Older women are more vulnerable to OP as they live longer and can have other illnesses too. Women over the age of 65 and males over 70 should have their bone densitometry checked regularly [2]. Osteopenia is the precursor to OP, characterised by a lower bone mineral density (BMD) than OP [1]. Low BMD is strongly linked to OP. The BMD test determines the amount of calcium and other minerals present in the bone [3]. The higher the mineral concentration, the more normal the bone, making it denser, harder, and not lead to fracture. Otherwise, the bone may break if the mineral concentration is low. Quantitative ultrasound, single-energy X-ray absorptiometry (SEXA), Magnetic Resonance Imaging (MRI), Quantitative computed tomography (QCT), dual-energy X-ray absorptiometry (DEXA/DXA) were all already available scanning methods for evaluating BMD. Figure 1 portrays sample multimodal images of normal and OP. Currently, BMD is measured using DEXA scans, a World Health Organisationrecommended standard (WHO). Finally, normal bones, osteopenia bones, and osteoporotic bones are the three output groups. WHO divides the disease into three categories based on T-score values. A T-score of > − 1.0 indicates normal bone, − 1.0 to − 2.5 indicates osteopenia, and < − 2.5 indicates OP [2]. The BMD is calculated by detecting OP requires precise segmentation of multimodality images obtained from the above said scanning techniques. For OP image segmentation, several methods have been applied. Researchers in biomedical engineering used image processing and machine learning algorithms to segregate bone tissue and soft tissue to assess fracture risk [4–7]. Image segmentation can be accomplished using a variety of approaches. Segmentation can be done using methods like (a) thresholding [8], (b) region-based [9], (c) clustering [10], and (d) hybrid [11, 12]. The traditional procedures are the tried and true ways that are straightforward to use, and they achieve and produce precise outcomes to the segmentation problems. The various soft computing techniques applied over the osteoporotic images for better segmentation by the researchers are addressed in this study and their respective results were discussed. Figure 2 illustrates the various state-of-the-art methods used for OP detection.
A Review on Automated Algorithms Used for Osteoporosis Diagnosis Fig. 1 Sample multimodal images of normal and OP
Modality
249
Normal Bone
Osteoporo c Bone
X-Ray
CT
MRI
DEXA
2 Segmentation of Osteoporosis Using Soft Computing Methods 2.1 Structuring Soft Computing Approaches for Segmentation of X-ray Images X-ray aids in the treatment of a variety of diseases. Only a clinician can evaluate the condition of the bones determined by visual interpretation. It may cause the normal bone to be misclassified as abnormal, and conversely. As a result, a variety of automated algorithms have been applied to rectify the problem in identifying the OP in X-rays. Many studies and methodologies have been established to detect BMD from bone X-ray images. Sikkandar et al. [13] employed a technique to find the arthritis stage on the knee using the local centre of mass method on X-ray images, which helps diagnose whilst itself. Nguyen et al. [14] have developed a Sobel algorithm based CNN on hip radiographs. The output of CNN models of three Singh index site and other metrics like height, age, and weight formed the Ensembled artificial neural network (ANN).
250
G. Amiya et al.
Fig. 2 Soft computing methods for OP detection
Fathima et al. [15] presented a modified U-net model for efficient segmentation of bone disease from DXA and X-ray images. Zheng et al. [16] have proposed an algorithm to train the BMD regression model using CNN on X-ray images. Labelled images have DEXA measured BMD’s, and Unlabeled Images have pseudo-BMDs. They have suggested a new adaptive triplet loss technique to achieve better regression accuracy. Singh et al. [17] suggested a statistical feature extraction method by combining the four classification approaches like an ANN, Naive Bayes, k-nearest
A Review on Automated Algorithms Used for Osteoporosis Diagnosis
251
Table 1 Studies on various methods applied to detect OP in X-rays Author name
Methodology
Dataset
Result
Sikkandar et al. [13]
CLAHE, local centre mass
80 clinical X-ray images
DC = 0.59 ± 0.02 Less DC value Memory = 250 ± 30 (KB)
Limitations
Nguyen et al. [14]
Sobel gradient-based map, CNN
510 hip images
High correlation Need to increase coefficient of accuracy 0.807 Computation time = 0.12 s
Fathima et al. [15]
U-Net, FCNN
78 X-ray
Accuracy = 88% DC = 0.92
–
Zheng et al. [16]
ROI, Deep adaptive graph (DAG) network
1090 images
PC = 0.8805
Should specify more metrics
Singh et al. [17]
Discriminatory statistical features, SVM, KNN, Naive Bayes, ANN
174 women aged 40–92 years
SVM Accuracy = 97.87% Sensitivity = 100 Specificity = 95.74
–
Geng et al. [18]
DCNN
48 males and 42 females with rheumatoid arthritis
Accuracy = 91% – Sensitivity = 98% False-negative rate = 2%
neighbour, and support vector machine (SVM) to classify the healthy and osteoporotic X-ray images obtained from Orleans University. The defected calcaneus bone is identified and classified with a 98% classification rate with the help of the SVM classifier and is noticed as better than the other three classifiers. Geng et al. [18] analysed the X-ray images and identify the diseased bone with the help of DCNN for quick diagnosis. Table 1 represents the studies on various methods applied to detect OP in X-rays.
2.2 Soft Computing Approaches for Segmentation of CT, HR-pQCT and Micro-CT Images Various CT methods are used for the estimation of BMD at lumbar spine. It employs radiation in low dose, and delivers low contrast images in less time. Some researchers have applied soft computing methods on various types of CT images as a remedy and the performance of those are all discussed below.
252
G. Amiya et al.
Requist et al. [19] developed a semi-automatic segmentation technique based on threshold-based segmentation to dissect the cuneiform bones from the microCT images of midfoot and estimated the BMD. Valentinitsch et al. [20] proposed an algorithm based on fourfold texture analysis to classify the osteoporotic patients with and without vertebral bone fracture. The bone features are excerpted using wavelets (WL), local binary patterns (LBP), histogram of gradients (HoG), and grey-level co-occurrence matrix Haralick features (HAR). Subsequently, a random forest (RF) classifier using 3-D texture features combined with trabecular BMD features designated a high possibility for distinguishing patients with low bone quality vulnerable to vertebral fractures. Vivekanandan et al. [21] developed a segmentation method to dissect the trabecular and cortical bone from the CT images using active contour (ACM) and minimum distance boundary method. Kim et al. [22] introduced an optimisation method to visualise the microstructure of the bone in CT images using the volume of interest and wolf’s law. Xu et al. [23] considered the SVM and KNN classifiers for classifying the normal and osteoporotic micro-CT images after extracting the features. Uemura et al. [24] suggested an automated technique based on CNN for segmenting the phantom sections found in CT scans for bone disease detection, which produced improved DC findings. Pan et al. [25] suggested a deep learning (DL)-based algorithm to identify and visualise the defective bone in low dose CT images and predict the time of fracture in the future. To find and extract the features of the damaged bone region in CT, Tang et al. [26] suggested a CNN-based dissection method. They use mark segmentation and the ROI approach to get better results. Schmidt et al. [27] have introduced a fully automated DL technique to perform lumbar vertebral segmentation in patients undergone CT. In particular, they have applied CNN and adopted Wilcoxon Signed Rank Test and found a noticeable difference between the bone and soft tissues. Fang et al. [28] suggested a method using Deep CNN-based U-Net to segment the vertebral body and DenseNet-121 to find BMD. Yasaka et al. [29] employed a CNN model to predict the BMD in CT images and is validated with BMD values of DXA images. Periyasamy et al. [30] presented KNN-based OP prediction and also divided the patients into groups based on genetic characteristics that aid in early detection of the condition. Folle et al. [31] have proposed a fully automatic volumetric BMD measurement pipeline that utilises DL techniques on metacarpal bones. Table 2 represents the studies on various methods applied to detect OP in CT images.
2.3 Soft Computing Approaches for Segmentation of MR Images MRI is another imaging method used to analyse the bone structure to determine the bone quality and risk for fracture. This approach is non-invasive and radiation-free,
A Review on Automated Algorithms Used for Osteoporosis Diagnosis
253
Table 2 Studies on various methods applied to detect OP in CT/micro-CT images Author name
Methodology
Dataset
Result
Limitations
Requist et al. [19]
Threshold, hole filling, Boolean subtraction
Medial and intermediate Cunieforms from 24 cadavaric specimens
Medial: BMD value = 429.5 mgHA/cm3 Intermediate: BMD value = 483.3 mgHA/cm3
–
Valentinitsch et al. [20]
3-D texture features, RF classifier
154 patients with vertebral OP fractures
AUC = 0.88 Sensitivity = 0.77 Specificity = 0.78
Need increase in metric values
Vivekanandhan et al. [21]
ACM, morphological parameters
50 Indian women aged 30–80 years
Correlation coefficient (r > 0.7 and p < 0.001)
–
Kim et al. [22]
FEA and topology 62-year-old optimization, female wolf’s law
Femur head = 0.29%
Less accuracy
Xu et al. [23]
Image 200 binarization, Feature extraction, SVM, KNN
Recall, precision, F-measure = 100% for all
–
Uemura et al. [24]
CNN
1040
Median DC = 0.977
–
Pan et al. [25]
3-D CNN model with U-net architecture
200
DC = 86.6% Accuracy = 97.5%
Need increased DC
Tang et al. [26]
CNN model with U-net
150 patients
Accuracy = 76.65% AUC = 0.9167
Less accuracy
Schmidt et al. [27]
CNN
1008 patients
–
–
Fang et al. [28]
Deep CNN, U-Net 1449 patients
DC = 0.8
Less DC
Yasaka et al. [29]
CNN
1665 CT images
AUC = 0.96
–
Periyasamy et al. [30]
KNN
90 subjects of women of age (35 ± 20) years
Accuracy = 94.44%
–
Folle et al. [31]
U-Net
541 patients with arthritis
DC = 0.97 Computational time = 33 s
Processing time is comparatively high
254
G. Amiya et al.
Table 3 Studies on various methods applied to detect OP in MR images Author name
Methodology
Dataset
Result
More and Singla [32]
DWT, multiResUNet
400 subjects labelled knee MRI from OAI dataset
Accuracy = 96.77% – Sensitivity = 98%
Limitations
Al-Kafri et al. [33]
SegNet
48,345 MRI slices
Distance error tolerance d T = 1 IVD = 0.21, PE = 0.19, TS = 0.35, AAP = 0.26
–
and it can assess both trabecular and cortical bone. Only a few researchers have used MRI to identify OP. More and Singla [32] proposed a DWT and MultiResUNet to identify the severity of arthritis on MR knee images. Al-Kafri et al. [33] suggested a method using SegNet based on 13 layers to evaluate disease in lumbar spine MRI, which delivers effective segmentation results based on the tolerance value. The spine region is split into four regions as Area amongst Anterior and Posterior (AAP), Thecal Sac (TS), Posterior Element (PE) and Intervertebral Disc (IVD) for segmenting and easy identification of the bone disorder. Table 3 signifies the studies on various methods applied to detect OP in MR images.
2.4 Soft Computing Approaches for Segmentation of DEXA Images DEXA utilises two low-energy X-ray beams of different energies to evaluate the bone loss for estimating BMD and disease risk. The radiologists mostly consider these images as a gold standard. The DEXA visualises images from numerous locations, including the entire body, femur, toe, and spine. Many researchers currently employ DXA images to segment the OP using soft computing algorithms and compare the results to WHO-normalised criteria. Table 4 portrays the studies on various methods applied on DEXA images. Sapthagirivasan and Anburajan [34] combined kernel-based SVM with RBF for assessing the BMD value in hip radiographs and prediction achieved 90%. Hussain et al. [35] have used pixel label-based decision tree for delineating affected region in femor bone, which greatly depends on extracting the features of low and high energy of DXA and achieved higher accuracy than ANN. Mohamed et al. [36] have analysed the DXA images of full-body scans from both males and females to assess the BMD using a multilayered ANN backpropagation network based on binary and histogram algorithms. Mantzaris et al. [37] suggested a method to estimate the level of OP using feed-forward networks like Probabilistic NN and Learning Vector Quantization ANN. This method helps predict the disease and its severity based on the BMD and
A Review on Automated Algorithms Used for Osteoporosis Diagnosis
255
Table 4 Studies on various methods applied to detect OP in DEXA images Author name
Methodology
Sapthagirivasan and Anburajan [34]
Kernel SVM with 50 femoral neck RBF fracture
Dataset
Result
Limitations
Hussain et al. [35]
Pixel label 600 femur images Accuracy = decision tree, 91.4% recursive wrapper TP = 94.19% feature elimination
Mohamed et al. [36]
Binary algorithm, 3000 male and ANN female Age range 22–49 years
Accuracy = 100%
–
Mantzaris et al. [37]
Radial basis function networks and approximate Bayesian statistical techniques, probabilistic NN and learning vector quantization (LVQ) ANN
3346 women cases 80 men cases
PNN = 96.58 ANN = 96.03
–
Kirilov et al. [38]
CNN
4,894 images of the lumbar spine
MAE of 8.19% and STD of AE 6.75%
Less STD and MAE
Yamamoto et al. [39]
CNN models (1) ResNet18 (2) ResNet34 (3) GoogleNet (4) EfficientNet b3 and (5) EfficientNet b4
1131 images The EfficientNet 60 years of age or b3 network older accuracy: 88.5% recall: 88.7% F1 score: 0.8943 AUC score: 0.9374
Fathima et al. [15]
U-Net, FCNN
126 images
Accuracy = 90% Need to increase Sensitivity = 90% all metrics PPV = 89%
Accuracy = 88% DC = 0.98
Need to increase all metrics
Less accuracy and recall
Need more accuracy
tabulates their corresponding age. Kirilov et al. [38] proposed a CNN to calculate BMD values from DEXA images of the Lumbar Spine. A PC Coefficient showed a high positive correlation between the predicted values and the actual ones. Yamamoto et al. [39] introduced a novel DL technique to diagnose bone disease in radiograph images acquired from the hip portion. The clinical covariates of the patients are noted and segmented as per the normal and abnormal results. Here the author analysed the defective portion using five different types of CNN and out of which EfficientNet b3 provides better classification accuracy.
256
G. Amiya et al.
2.5 Soft Computing Approaches for Segmentation Based on Other Factors In recent studies, OP is segmented based on hereditary characteristics such as age, sex, height, weight, and others, as well as some digital radiography imaging. Kavitha et al. [40] developed a hybrid algorithm based on a genetic swarm fuzzy classifier, which employed fivefold cross-validation features to classify women’s defected corticular and trabecular bone. Jennane et al. [41] developed a method to classify osteoporotic and osteoarthritic patients using the application of artificial intelligence algorithms. Here, the hybrid skeleton graph method helps analyse bone morphology/structure and the finite element analysis (FEA) for topological detection. Finally, AI techniques like ANFIS, SVM, and GA are used for classification, and from those, GA delivers a better success rate. Bhan et al. [42] propounded a method different from the conventional techniques by selecting the frequency domain features instead of the spatial domain using ANN with Gabor and wavelet selection on normal and abnormal bone radiographs. Jolly et al. [43] proposed a method to classify the normal and osteoporotic bone images. For enhancing the image for a clear view, the noise is removed by a wiener filter, and the contrast is improved by the histogram equalisation method. Vishnu et al. [44] proposed a regression type of ANN to estimate the BMD in calcaneus images acquired from various trabecular regions like elbow, hip, and toe. Chang et al. [45] introduced an algorithm to classify the OP by selecting the features using a wrapper-based model and including different classification methods like naive Bayes, multilayer feed-forward network, and logistic regression. Table 5 signifies the studies on various methods applied to detect OP based on genetic factors. Bhattacharya et al. [46] recommended a method to detect the normal and abnormal BMD images using the combination of principal component analysis, SVM and kernel-based NN. Kavitha et al. [47] introduced a method to find the symptoms related to OP by finding the BMD values at the femur and spine region using RBF and SVM, which was analysed with the help of dental radiographs (DR). Arifin et al. [48] classified the femur and spine defected bone on postmenopausal women using an algorithm that combines fuzzy inference system and NN based on the shape and cortical width of the DR images. Hseih et al. [49] have utilised lumbar/spine radiographs and Hologic DXA to assess BMD and better precision-recall curve, accuracy, positive predictive value, and negative predictive value. The review examines the application of various state-of-the-art methodologies outcomes in detail on OP image segmentation. The review effort focused primarily on soft computing methodologies, covering subjects such as developing problems for image segmentation, innovative methods, and unique segmentation applications on OP detection. The study also introduces various databases and evaluation factors that can help researchers in this area. Eventhough the previous approaches have some limitations like less accuracy, less classification rate and poor segmentation over various modalities to assess OP stage.
A Review on Automated Algorithms Used for Osteoporosis Diagnosis
257
Table 5 Studies on various methods applied to detect OP based on various factors Author name
Methodology
Factors
Dataset
Result
Kavitha et al. [40]
Feature selection, hybrid genetic swarm fuzzy (GSF) classifier
Dental panoramic radiograph (DPR)
141 female subjects within the age range 45–92 years
Spine OP Accuracy = 95.3% Sensitivity = 94.7 Specificity = 96.01 Femur OP Accuracy = 99.1% Sensitivity = 98.4 Specificity = 98.9
Jennane et al. [41]
Hybrid skeleton Genetic factors graph algorithm, morphological, topological, SVM, GA, ANFIS
18
True positive = 100 for GA
Bhan et al. [42]
Wavelet features, Gabor features, wrapper method, ANN
Genetic factors
58
Accuracy = 95%
Genetic factors
58
For SVM accuracy = 97.87%
Jolly et al. [43] GLCM; Gabor filter, SVM Vishnu et al. [44]
GLCM, regression Digital calcaneus ANN radiographic (DCR) images
106 images
Accuracy of 90%
Chang et al. [45]
Wrapper-based approach, MFNN, naive Bayes, and logistic regression
Genetic factors
295 Taiwanese women
For MFNN AUC = 0.489 Sensitivity = 0.4 Specificity = 0.629
Bhattacharya et al. [46]
CLAHE, GLCM, PCA, SVM, KNN
Genetic factors
58
Accuracy = 90% Sensitivity = 50 Specificity = 50 (continued)
258
G. Amiya et al.
Table 5 (continued) Author name
Methodology
Factors
Dataset
Result
Kavitha et al. [47]
Histogram-based automatic clustering, SVM
DPR
100 postmenopausal women patients (aged > 50 years)
Spine OP Accuracy = 93% Sensitivity = 95.8 Specificity = 86.6 Femur OP Accuracy = 89% Sensitivity = 96 Specificity = 84
Arifin et al. [48]
FNN
DPR
100 postmenopausal women aged 50 years and older
Spine OP: Sensitivity = 94.5 Specificity = 63.2 Femur OP Sensitivity = 90.9 Specificity = 64.7
Hseih et al. [49]
DL
Plain radiograph and physical factors
5164 patients with Classification pelvis/lumbar accuracy = spine radiographs 84.8% 18,175 patients physical factors from Hologic DXA
Soft computing offers an alternative to conventional knowledge-driven reasoning methods or pure data-driven systems, overcoming their limitations by combining a variety of complementary reasoning and exploring methods across a broad range of problem domains. To handle a variety of real-world issues, hybrid systems computing employs multiple computational techniques. It can generate very intelligent outcomes by combining numerous systems into one. These outcomes are both potent and adaptable to every new context and can be applicable in all fields. Various machine learning and DL approaches are currently being used in a variety of scientific fields. Hardware and software advancements are now being observed in everyday life, particularly in medicine and healthcare. We can use these to perform automatic interpretation, disease identification and diagnosis, personalised treatment, and drug development. These image processing and DL technologies and techniques assist patients to save money and time by minimising the amount of time and effort needed for radiologists to achieve a result. These procedures also
A Review on Automated Algorithms Used for Osteoporosis Diagnosis
259
result in an early disease prognosis, allowing the patient to receive timely preventive measures. "Prevention is always better than cure," as the adage goes.
3 Conclusion As evidenced by the associated papers, many studies have been conducted using CT scan images rather than X-ray and DEXA images. In OP diagnosis and evaluation, the nonexistence of X-ray and DEXA datasets is also a major issue. Previous research has not yielded a superior strategy for segmentation and classification in the identification of OP. The article’s contribution is to present this material to researchers, which investigates state-of-the-art elaboration of practically all dimensions related to image segmentation using soft computing approaches, hoping to encourage researchers to invent new segmentation methods. As a result of the difficulties in diagnosing OP from X-ray and DEXA, a metaheuristic-based optimisation or hybrid fuzzy-based approach for precise bone region segmentation is envisaged. In addition, a statistical model is to be established for calculating BMD and T-score from segmented X-ray and DEXA bone images, allowing for the prediction of fracture risk. Acknowledgements The authors thank Dr. M. Thirumurugan, M.S. (Ortho), Srivilliputtur, Tamilnadu, and Dr. R. Sayee Venkatesh, M.D. (General Medicine), D.M. (Cardio), Chennai, Tamilnadu, India, for supporting the research. Also, the authors thank the International Research Centre of Kalasalingam Academy of Research and Education, Tamil Nadu, India for permitting to use the computational facilities available in the Biomedical Research and Diagnostic Techniques Development Centre. This research was supported by Department of Science and Technology, New Delhi under Biomedical Device and Technology Development Programme (BDTD) of Technology Development Programme (TDP). [Ref. No. DST/TDP/BDTD/28/2021(G)].
References 1. Langdahl BL (2020) Overview of treatment approaches to osteoporosis. The British Pharmacological Society, pp 1–16 2. Anam AK, Insogna K (2021) Update on osteoporosis screening and management 105(6):P1117–P1134 3. van der Burgh AC, de Keyser CE, Carola Zillikens M, Stricker BH (2021) The efects of osteoporotic and non osteoporotic medications on fracture risk and bone mineral density. Effect Osteopor Non-osteop Med Fracture Risk Bone Min Dens Drugs 81:1831–1858 4. Palani U, Vasanthi D, Rabiya Begam S (2020) Enhancement of medical ımage fusion using ımage processing. J Innov Image Process (JIIP) 2(04):165–174 5. Shakya S, Nepal L (2020) Computational enhancements of wearable healthcare devices on pervasive computing system. J Ubiquit Comput Commun Technol (UCCT) 2(02):98–108 6. Balasubramaniam V (2021) Artificial intelligence algorithm with SVM classification using dermascopic images for melanoma diagnosis. J Artif Intell Capsule Netw 3(1):34–42 7. Dhaya R (2020) Deep net model for detection of covid-19 using radiographs based on roc analysis. J Innov Image Process (JIIP) 2(03):135–140
260
G. Amiya et al.
8. Arunpandian M, Arunprasath T, Vishnuvarthanan G, Rajasekaran MP (2018) Thresholding based soil feature extraction from digital ımage samples—a vision towards smarter agrology. In: Satapathy S, Joshi A (eds) Information and communication technology for ıntelligent systems (ICTIS 2017)—volume 1. ICTIS 2017. Smart ınnovation, systems and technologies, vol 83. Springer, Cham 9. Mazouzi S, Guessoum Z (2021) A fast and fully distributed method for region-based image segmentation. J Real-Time Image Proc 18:793–806 10. Kottaimalai R, Vishnuvarthanan G, Pallikonda Rajasekaran M, Yudong Z, and Shuihua W (2020) Safe Engineering Application for Anomaly Identification and Outlier Detection in Human Brain MRI. J Green Eng (10):9087–9099 11. Vigneshwaran S, Govindaraj V, Murugan PR, Zhang Y, Prasath TA (2010) Unsupervised learning-based clustering approach for smart identification of pathologies and segmentation of tissues in brain magnetic resonance imaging. Int J Imag Syst Technol:1–18 12. Sengan S, Arokia Jesu Prabhu L, Ramachandran V, Priya V, Ravi L, Subramaniyaswamy V (2020) Images super-resolution by optimal deep AlexNet architecture for medical application: a novel DOCALN. J Intell Fuzzy Syst 39(6):8259–8272 13. Sikkandar MY, Sabarunisha Begum S, Alkathiry AA, Alotaibi MSN, Manzar MD, Aboamer MA (2021) Segmentation of human knee osteoarthritis images using unsupervised local center of mass computation technique. J Amb Intell Human Comput 14. Nguyen TP, Chae D-S, Park S-J, Yoon J (2021) A novel approach for evaluating bone mineral density of hips based on Sobel gradient-based map of radiographs utilizing convolutional neural network. Comput Biol Med 132 15. Nazia Fathima SM, Tamilselvi R, Parisa Beham M, Sabarinathan D (2020) Diagnosis of Osteoporosis using modified U-net architecture with attention unit in DEXA and X-ray images. J X-Ray Sci Technol 16. Zheng K, Wang Y, Zhou X-Y, Wang F, Lu L, Lin C, Huang L, Xie G, Xiao J, Kuo C-F, Miao S (2021) Semi-supervised learning for bone mineral density estimation in hip X-ray images 17. Singh A, Dutta MK, Jennane R, Lespessailles E (2017) Classification of the trabecular bone structure of osteoporotic patients using machine vision. Comput Biol Med 18. Geng Y, Liu T, Ding Y, Liu W, Ye J, Hu L, Ruan L (2021) Deep learning-based self-efficacy X-ray ımages in the evaluation of rheumatoid arthritis combined with osteoporosis nursing. Sci Progr 19. Requist MR, Sripanich Y, Peterson AC, Rolvien T, Barg A, Lenz AL (2021) Semi-automatic micro-CT segmentation of the midfoot using calibrated thresholds. Int J Comput Assist Radiol Surg 16:387–396 20. Valentinitsch A, Trebeschi S, Kaesmacher J, Lorenz C, Loffler MT, Zimmer C, Baum T, Kirschke JS (2019) Opportunistic osteoporosis screening in multi-detector CT images via local classification of textures. Osteopor Int 21. Vivekanandhan S, Subramaniam J, Mariamichael A (2016) A computer-aided system for automatic extraction of femur neck trabecular bone architecture using isotropic volume construction from clinical hip computed tomography images. J Eng Med 22. Kim J, Chun BJ, Jang IG (2021) Topology optimization-based bone microstructure reconstruction from CT scan data. Adv Struct Eng Mech 23. Xu Y, Li D, Chen Q, Fan Y (2013) Full supervised learning for osteoporosis diagnosis using micro-CT images. Microsc Res Techn 76:333–341 24. Uemura K, Otake Y, Takao M, Soufi M, Kawasaki A, Sugano N, Sato Y (2021) Automated segmentation of an intensity calibration phantom in clinical CT images using a convolutional neural network. Int J Comput Assis Radiol Surg:1–10 25. Pan Y, Shi D, Wang H, Chen T, Cui D, Cheng X, Lu Y (2020) Opportunistic osteoporosis screening in multi-detector CT images using deep convolutional neural networks. Euro Radiol 26. Tang C, Zhang W, Li H, Li L, Li Z, Cai A, Wang L, Shi D, Yan B (2021) CNN-based qualitative detection of bone mineral density via diagnostic CT slices for osteoporosis screening. Osteoporos Int 32(5):971–979
A Review on Automated Algorithms Used for Osteoporosis Diagnosis
261
27. Schmidt D, Ulén J, Enqvist O, Persson E, Trägårdh E, Leander P, Edenbrandt L (2021) Deep learning takes the pain out of back breaking work—automatic vertebral segmentation and attenuation measurement for osteoporosis. Clin Imag 28. Fang Y, Li W, Chen X, Chen K, Kang H, Yu P, Zhang R, Liao J, Hong G, Li S (2020) Opportunistic osteoporosis screening in multi-detector CT images using deep convolutional neural networks. Euro Radiol 29. Yasaka K, Akai H, Kunimatsu A, Kiryu S, Abe O (2020) Prediction of bone mineral density from computed tomography application of deep learning with a convolutional neural network. Euro Radiol 30. Periasamy K, Periasamy S, Velayutham S, Zhang Z, Ahmed ST, Jayapalan A (2021) A proactive model to predict osteoporosis: an artificial immune system approach. Expert Syst 31. Folle L, Meinderink T, Simon D, Liphardt A-M, Krönke G, Schett G, Kleyer A, Maier A (2021) Deep learning methods allow fully automated segmentation of metacarpal bones to quantify volumetric bone mineral density. Sci Rep 11 32. More S, Singla J (2021) Discrete-multiResUNet segmentation and feature extraction model for knee MR images. J Intell Fuzzy Syst 33. Al-Kafri AS, Sudirman S, Hussain A, Al-Jumeily D, Natalia F, Meidia H, Afriliana N, AlRashdan W, Bashtawi M, Al-Jumaily M (2019) Boundary delineation of MRI ımages for lumbar spinal stenosis detection through semantic segmentation using deep neural networks 34. Sapthagirivasan V, Anburajan M (2013) Diagnosis of osteoporosis by extraction of trabecular features from hip radiographs using support vector machine: an investigation panorama with DXA. Comput Biol Med 43:1910–1919 35. Hussain D, Al-antari MA, Al-masni MA, Han S-M, Kim T-S (2018) Femur segmentation in DXA imaging using a machine learning decision tree. J Xray Sci Technol 26(5):727–746 36. Mohamed EI, Meshref RA, Abdel-Mageed SM, Moustafa MH, Badawi MI, Darwish SH (2018) A novel morphological analysis of DXA-DICOM Images by artificial neural networks for estimating bone mineral density in health and disease. J Clin Densitometry 37. Mantzaris D, Anastassopoulos G, Iliadis L, Kazakos K, Papadopoulos H (2010) A soft computing approach for osteoporosis risk factor estimation. Int Feder Inf Process 38. Kirilov N, Kirilova E, Krastev E (2021) Using machine learning to predict bone mineral density from dual-energy X-ray absorptiometry images of the lumbar spine 39. Yamamoto N, Sukegawa S, Kitamura A, Goto R, Noda T, Nakano K, Takabatake K, Kawai H, Nagatsuka H, Kawasaki K, Furuki Y, Ozaki T (2020) Deep learning for osteoporosis classification using hip radiographs and patient clinical covariates. Biomolecules 10 40. Kavitha MS, Kumar PG, Park S-Y, Huh K-H, Heo M-S, Kurita T, Asano A, An S-Y, Chien S (2016) Automatic detection of osteoporosis based on hybrid genetic swarm fuzzy classifier approaches. Dentomaxil Radiol 45 41. Jennane R, Almhdie-Imjabber A, Hambli R, Ucan ON, Benhamou CL (2010) Genetic algorithm and image processing for osteoporosis diagnosis. IEEE 42. Bhan A, Kulshreshtha C, Kumar P, Goyal A (2020) Machine vision based analysis of the trabecular bone structure in frequency domain for osteoporosis. In: 7th ınternational conference on signal processing and ıntegrated networks 43. Jolly S, Chaudhary H, Bhan A, Rana H, Goyal A (2021) Texture based bone radiograph ımage analysis for the assessment of osteoporosis using hybrid domain. In: 3rd ınternational conference on signal processing and communication 44. Vishnu T, Arunkumar R, Saranya K, Gayathri Devi M (2015) Efficient and early detection of osteoporosis using trabecular region. In: International conference on green engineering and technologies 45. Chang H-W, Chiu Y-H, Kao H-Y, Yang C-H, Ho W-H (2013) Comparison of classification algorithms with wrapper-based feature selection for predicting osteoporosis outcome based on genetic factors in a taiwanese women population. Int J Endocrinol 46. Bhattacharya S, Nair D, Bhan A, Goyal A (2019) Computer based automatic detection and classification of osteoporosis in bone radiographs. In: 6th ınternational conference on signal processing and ıntegrated networks
262
G. Amiya et al.
47. Kavitha MS, Asano A, Taguchi A, Heo M-S (2013) The combination of a histogram-based clustering algorithm and support vector machine for the diagnosis of osteoporosis. Imag Sci Dentis 43:153–161 48. Arifin AZ, Asano A, Taguchi A, Nakamoto T, Ohtsuka M, Tsuda M, Kudo Y, Tanimoto K (2007) Developing computer-aided osteoporosis diagnosis system using fuzzy neural network. JACIII 11:1049–1058 49. Hsieh C-I, Zheng K, Lin C, Mei L, Lu L, Li W, Chen F-P, Wang Y, Zhou X, Wang F, Xie G, Xiao J, Miao S, Kuo C-F (2021) Automated bone mineral density prediction and fracture risk assessment using plain radiographs via deep learning. Nat Commun
Supervised Deep Learning Approach for Generating Dynamic Summary of the Video Mohammed Inayathulla and C. Karthikeyan
Abstract In this growing digital world, videos are one of the important forms of data. Daily, millions of users watch videos on the Internet, and hence, video summarization will be handy in generating small and concise videos that will reduce the time of users. By video summarization, users can decide about the usefulness of a video. In this paper, supervised deep learning approach using convolutional neural networks (CNNs) is used for generating the dynamic summary of the video. The proposed methodology is implemented on Tour20 dataset, and summary of the video is generated. Experimental results were evaluated against the Tour20 dataset and were satisfactory. Keywords Convolutional neural networks · Supervised learning · Classification · Video summarization
1 Introduction Due to the rapid advances in digital technology humongous data are produced by various sources in different forms. Millions of users have access to social media platforms where they upload huge amounts of data which have become one of the primary resources. Videos are one of the frequent forms of data found on the Internet today. Hence, video summarization will be very handy as it reduces the user’s time and effort significantly. As more video content is increasing on the Web and there is need to extract useful information out of it so video summarization has gained more attention these days. Automatic summarization creates a subset of original data which still represents the original form of data. One of the main objectives of video summarization is to create short videos without disturbing the original form. The task of video summarization is challenging because of its subjectiveness because every user has their own choice or preference toward the summary of the video and hence generating a summary that is of generic nature is a tedious task. Video M. Inayathulla (B) · C. Karthikeyan Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Green Fields, Vaddeswaram, Guntur, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_18
263
264
M. Inayathulla and C. Karthikeyan
thumbnail can be considered as the first summary at the highest level of abstraction. Summarization can be of two types abstractive and extractive summarization. In abstractive summarization, the subset of information will be extracted, and it will be paraphrased in such a way that revised content is closer to the original content. Both supervised and unsupervised techniques can be used for video summarization [1]. In supervised learning, the machine will be trained with huge amounts of labeled data, and later, unseen data will be given for the machine to generate a summary of the video. The general supervised approach of video summarization is described in Fig. 1. In unsupervised learning, unlabeled data will be given to the machine, and then, with the help of the model, machine will generate the summary by identifying the critical frames of the video. To evaluate the automatic summary generated, manually designed summaries can be used. In this paper, supervised machine learning is used to generate a static summary. Initially, the machine is trained with images of cricket events using convolutional neural networks, and then, a new video is given as input to generate the summary. Fig. 1 Overview of video summarization using deep learning
Extract frames from training video
Label the frames as summary and non summary classes
Perform Data Augmentation if dataset is imbalanced
Train CNN with labelled frames
Extract frames from target video and perform classification using trained CNN
Generate video by concatenating all frames classified as summary by CNN
Supervised Deep Learning Approach …
265
2 Related Work Divide and Conquer [2] policy can be used to select the frames that give good significant scores to generate the summary. Model for identifying the relationship between side information and the video frames was identified to generate the summary [3]. Ting Yao et al. focused on the user’s major or special interest in generating the summary [4]. Abstractive text summarization of audio visual content using neural networks was implemented [5]. Capsules net was trained [6] to generate spatiotemporal information, and a curve was generated which described the relationship between frames. Audio video maximal marginal relevance (AV-MMR) algorithm [7] makes use of both audio and video information. This algorithm is used to choose the segments which denote the unselected information and that does not repeat with previously selected information. A survey of trajectory-based surveillance applications [8] with various concepts such as clustering, summarization, anomaly detection, and synopsis generation was performed. A maximal biclique finding (MBF) algorithm [9] was devised to resolve the issues raised due to the dearth of co-occurrence of patterns when irrelevant shots in videos are considered. A Web image is used as a preceding factor to ease the summarization [10]. Images that have been captured by people are used as the prior information for video summarization. An unsupervised video summarization framework TVSum has been proposed by Song et al. [11]. It makes use of title-based image search results so that crucial frames that are in visual could be identified. Sparse coding for context-aware video summarization by learning from the dictionary of features and identifying the correlation between them was proposed by authors in [12]. Bin Zhao et al. proposed a hierarchical RNN for identifying the confidence of each frame [13]. Supervised approach [14] which was trained to study the universal characteristics as well as optimizing the multiple objectives of the target summary was proposed. Video summarization was treated as a subset selection problem. Each selected subset was evaluated against the objectives, and the sets which maximize the overall score were selected as part of the summary. Keyframes that represented the summary of the video were selected using the perceptual hash algorithm [15]. Reinforcement-based action parsing video summarization was proposed [16]. The model was developed in two stages. In the first stage, the model was constructed with weakly annotated data, whereas in the second stage, deep recurrent neural network was implemented. In the second stage, the model would highlight the keyframes of the video. A hierarchical keyframe extraction technique using QR decomposition was used [17]. Initially, a detailed review of QR decomposition was conducted which followed intra-shot redundancy detection which was based on QR decomposition. The next step was to measure the dynamicity of the frame by using QR decomposition. To generate the summary of the video, initially, video shot boundaries were detected using QR decomposition; then, duplicate frames were eliminated, and finally, shot-level important frames were grouped to form common scenes. A novel approach [18] for both audiovisual summaries was proposed by the authors using minimum spanning trees. In [19], Taufiq Hasan et al. used the probability density function to classify the video segments as exciting and rare. Later, all
266
M. Inayathulla and C. Karthikeyan
the exciting segments were grouped to generate the summary. Audio features also can be used in generating a summary of the video [20]. A fractional deep belief network to detect the emotion of the speaker was implemented in [21] which can be further explored in generating the summary of the video. A transfer learning framework by using a neuro-fuzzy approach [22] for classification of traffic videos was implemented. Video summarization involves dealing with images, and hence, extracting features from images forms a key aspect. Various techniques proposed in [23] can be used for the classification of images. Adaboost multiclass classifier [24] was implemented for identifying the action of Indian classical dance, and digital image steganography for dealing with images was proposed [25]. The stock part segmentation technique in high-speed video data [26] was proposed. To identify area-specific events using Wi-Fi in real time, video streaming was analyzed [27]. Deep learning techniques for emotion recognition are addressed in online meetings [28]. Dimensionality reduction process for identification of abnormal incidents in surveillance videos was implemented using SVM and CNN, and object detection algorithms were proposed [29, 30].
3 Proposed Methodology In this section, we describe the implementation procedure of generating dynamic summary of a video. Summarization can be thought of image classification process where we build a classifier which classifies an image as a summary image or nonsummary image. In order to generate the final summarized video, all summary images can be concatenated. Figure 1 describes the overview of proposed methodology. We used a LeNet convolutional neural network (CNN) which is described in Fig. 2, with which we categorized the implementation process into four stages: data collection and augmentation, building CNN, training the CNN model, and classification with the unseen data. The first step in CNN is convolution which involves the use of a filter to reshape the input image. The next step is pooling which is aimed at reducing the size of parameters thereby reducing the computational overhead in the network. The third step is flattening which converts feature maps into a one-dimensional array. Finally, a fully connected neural network is constructed. The process is initiated by extracting
Input 224*224*3
Conv1 32 filters
Pool-1 Pool size-2
Conv2 32 filters
Output
Dense 2 layers
Faltten
Pool-2 Pool size-2
Fig. 2 Overview of LeNet
Supervised Deep Learning Approach …
267
the frames from the video with a frame rate of 5. For the data preprocessing, the frames are resized to 224 * 224 * 3. The formed image dataset is initially annotated with class labels and inspected for imbalance. Data augmentation is performed to balance the dataset so that the model performs well. Further, dataset is divided as a training set for training the model and a validation set for validating the model. Frames were annotated with only two class labels as binary classification was implemented as every frame was intended to be either a summary frame or a non-summary frame. The convolutional neural network is built with 32 filters, and a kernel size of 3. We then used max-pooling strategy to normalize the input size using a pool matrix of 2 * 2 and with a stride of 2. After CNN is built, unseen frames were extracted from new videos and were applied to the CNN model to classify them as a summary frame or not. All summary frames were concatenated to generate the final summary. Keras and OpenCV libraries are used for implementing and generating final video.
3.1 Dataset The Tour20 dataset [31] contains 140 videos of total 6 h 46 min duration that are crawled from YouTube with creative commons license, CC-By 3.0 and a search filter of maximum 15 min duration. All the videos are grouped into 20 most visited tourist attractions that are chosen from the Tripadvisor traveler’s choice landmarks 2015 list. From 20 categories, we have used only five categories for our work. Each category had different number of videos, and hence, we used fifty percent videos to train and fifty percent videos to generate the summary.
4 Results and Discussion The proposed methodology using LeNet was implemented on five videos of Tour20 dataset. The five videos which we have selected for our work are of Taj Mahal (TM), Eiffel Tower (ET), Burj Khalifa (BK), Panama Canal (PC), and Golden Gate Bridge (GB). Average training and validation accuracy of 5 videos is shown in Table 1 which is greater than 90% in all the five cases. Total number of videos in each category Table 1 Average training and validation accuracy of the model for five videos S. No.
Video
Average training accuracy (%)
Average validation accuracy (%)
1
TM
96
94
2
ET
97
95
3
BK
96
95
4
PC
96
96
5
GB
97
95
268
M. Inayathulla and C. Karthikeyan
Table 2 Number of videos selected for training and evaluation in each category S. No.
Video
Total no. of videos
No. of videos No. of videos used for training
No. of videos used for evaluation
1
TM
7
4
3
2
ET
8
4
4
3
BK
9
5
4
4
PC
6
3
3
5
GB
6
3
3
and number of videos selected for training and evaluation is described in Table 2. Average accuracy of summary generated which is compared against the three human summaries of Tour20 dataset is also greater than 90% in all the five cases is observed and described in Table 3. Screenshot of summary of first Taj Mahal video frames is shown in Fig. 3. Table 1 shows the average training and validation accuracy of each video, whereas in Table 2, number of videos used for training and validation is described. There are total of seven videos of TM available in the dataset out of which four were used for training and three for validation. Similarly, the details of other four categories are furnished in Table 2. Table 3 describes the overall results of the summarization of the five mentioned categories. Average accuracy of the entre summarization is approximately 94%. Time duration of videos before and after summarization is described in Table 3 along with the percentage of video frames correctly classified by LeNet.
5 Conclusion Dynamic video summarization was successfully implemented with the help of convolutional neural networks (LeNet) using sequential model of Keras, and the results shown are also promising. Labeling the images manually is really a tedious task and can be automated in future. The CNN architecture used also has shown optimal performance in video summarization. This manual labeling has a significant effect on the performance of the model. In future techniques, to extract the semantic relationship between the frames can be proposed which can be handy in generating the summary. Also, useful information from the shots can be extracted rather than complete information for implementing the classifiers.
Video
TM
ET
BK
PC
GB
S. No.
1
2
3
4
5
170
46
273
125
427
59
263
39
62
47
18
342
579
139
159
–
–
94
111
–
102
31
147
78
248
23 28
41
19
158
2
5 83
84
189
359
3
–
–
39
68
–
4
1
4
Duration of video after summarization (s)
3
1
2
Actual duration of video (s)
Table 3 Results of five categories after summarization
94
92
90
98
97
Average % of frames correctly classified (evaluated against Tour20 human summaries) (%)
Supervised Deep Learning Approach … 269
Fig. 3 Snapshot of some summary frames of one Taj Mahal video
270 M. Inayathulla and C. Karthikeyan
Supervised Deep Learning Approach …
271
References 1. Ji Z, Xiong K, Pang Y, Li X (2019) Video summarization with attention-based encoder-decoder networks. IEEE Trans Circ Syst Video Technol 2. Siyu Huang X, Zhang Z, Wu F, Han J (2019) User-ranking video summarization with multistage spatio–temporal representation. IEEE Trans Image Process 28(6):2654–2664 3. Yuan Y, Mei T, Cui P, Zhu W (2019) Video summarization by learning deep side semantic embedding. IEEE Trans Circuits Syst Video Technol 29(1):226–237 4. Yao T, Mei T, Rui Y (2016) Highlight detection with pairwise deep ranking for first-person video summarization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 982–990 5. Dilawari A, Khan MUG (2019) ASoVS: abstractive summarization of video sequences. IEEE Access 7:29253–29263 6. Huang C, Wang H (2019) Novel key-frames selection framework for comprehensive video summarization. IEEE Trans Circ Syst Video Technol 7. Li Y, Merialdo B (2010) Multi-video summarization based on videommr. In: 11th international workshop on image analysis for multimedia interactive services WIAMIS, Apr 2010 8. Arif Ahmed S, Dogra DP, Kar S, Roy PP (2018) Trajectory-based surveillance analysis: a survey. IEEE Trans Circ Syst Video Technol 29(7):1985–1987 9. Chu W-S, Song Y, Jaimes A (2015) Video co-summarization: video summarization by visual co-occurrence. In: Proceedings of IEEE conference CVPR, pp 3584–3592 10. Khosla RH, Lin C-J, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proceedings of IEEE conference CVPR, pp 2698–2705 11. Song Y, Vallmitjana J, Stent, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: Proceedings of IEEE conference CVPR, pp 5179–5187 12. Zhang S, Zhu Y, Roy-Chowdhury AK (2016) Context-aware surveillance video summarization. IEEE Trans Image Process 25(11):5469–5478 13. Zhao B, Li X, Lu X (2017) Hierarchical recurrent neural network for video summarization. In: Proceedings of ACM multimedia, pp 863–871 14. Gygli M, Grabner H, Van Gool L (2015) Video summarization by learning submodular mixtures of objectives. In: Proceedings of IEEE conference on CVPR, pp 3090–3098 15. Li X, Zhao B, Lu X (2018) Key frame extraction in the summary space. IEEE Trans Cybern 48(6):1923–1934 16. Lei J, Luan Q, Song X, Liu X, Tao D, Song M (2019) Action parsing-driven video summarization based on reinforcement learning. IEEE Trans Circ Syst Video Technol 29(7):2126–2137 17. Amiri A, Fathy M (2010) Hierarchical Keyframe-based video summarization using QRdecomposition and modified k-means clustering. ACM EURASIP J Adv Signal Process 102 18. Gong Y (2003) Summarizing audiovisual contents of a video program. EURASIP J Appl Signal Process 2:160–169 19. Hasan T, Boˇril H, Sangwan A, Hansen JHL (2013) Multi-modal highlight generation for sports videos using an information-theoretic excitability measure. EURASIP J Adv Signal Process 173 20. Meghdadi AH, Irani P (2013) Interactive exploration of surveillance video through action shot summarization and trajectory visualization. IEEE Trans Visual Comput Graph 19(12) 21. Mannepalli K, Sastry PN, Suman M (2017) A novel adaptive fractional deep belief networks for speaker emotion recognition. Alexandria Eng J 56(4):485–497 22. Ashok Kumar PM, Vaidehi V (2017) A transfer learning framework for traffic video using neuro-fuzzy approach. Sadhana-Acad Proc Eng Sci 42(9) 23. Bhargavi RV, Rajesh V (2018) Computer aided bright lesion classification in fundus image based on feature extraction. Int J Pattern Recogn Artif Intell 32(11) 24. Kumar KVV, Kishore PVV, Anil Kumar D (2017) Indian classical dance classification with adaboost multiclass classifier on multifeature fusion. Comput Intell Image Process
272
M. Inayathulla and C. Karthikeyan
25. Swain G (2018) Digital image steganography using eight-directional PVD against RS analysis and PDH analysis. Adv Multimedia 26. Prasad R, Kishore PVV (2017) Performance of active contour models in train rolling stock part segmentation on high-speed video data. J Cogent Eng 4 27. Raj JS, Vijesh Joe C (2021) Wi-Fi network profiling and QoS assessment for real time video streaming. IRO J Sustain Wirel Syst 3(1):21–30 28. Vivekanandam B (2020) Evaluation of activity monitoring algorithm based on smart approaches. J Electron 2(03):175–181 29. Sharma R, Sungheetha A (2021) An efficient dimension reduction based fusion of CNN and SVM model for detection of abnormal incident in video surveillance. J Soft Comput Paradigm (JSCP) 3(02):55–69 30. Chen JI-Z, Chang J-T (2020) Applying a 6-axis mechanical arm combine with computer vision to the research of object recognition in plane inspection. J Artif Intell 2(02):77–99 31. Panda R, Mithun NC, Roy-Chowdhury A (2017) Diversity-aware multi-video summarization. IEEE Trans Image Process
Design and Implementation of Voice Command-Based Robotic System Md. Rawshan Habib, Kumar Sunny, Abhishek Vadher, Arifur Rahaman, Asif Noor Tushar, Md Mossihur Rahman, Md. Rashedul Arefin, and Md Apu Ahmed
Abstract This project is designed to control a robotic vehicle by voice commands and manual control for remote operation. For detecting the signals provided through any mobile application, an ATMEGA32 microcontroller is employed in conjunction with a Bluetooth connection that is linked to a control module. The Bluetooth receiver connected to the Arduino receives serial data delivered from mobile application. We will deliberate how to control robot-controlled car using Wi-Fi module through android application of an android mobile. It provides control commands over Bluetooth that include managing the motor’s pace, detecting, and exchanging data with mobile regarding the robot’s direction as well as range from the closest barrier. Keywords Voice command · Motor driver · Android ınterface · Arduino · Ultrasonic sensor · Bluetooth
1 Introduction Robotics is quickly becoming one of the most exciting areas of engineering. Even though it may seem that robotics sprang from the advancement of technology, this is not the case. Humanity has always been attracted by the concept of artificial humans. Computers are only an approach to implement the multiple intricate systems Md. Rawshan Habib · A. Vadher Murdoch University, Murdoch, Australia K. Sunny · A. Rahaman · A. N. Tushar University of Science and Technology Chittagong, Chittagong, Bangladesh Md Mossihur Rahman Islamic University of Technology, Gazipur, Bangladesh Md. Rashedul Arefin (B) Ahsanullah University of Science and Technology, Dhaka, Bangladesh e-mail: [email protected] Md A. Ahmed Chemnitz University of Technology, Chemnitz, Germany © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_19
273
274
Md. Rawshan Habib et al.
necessary to construct a robot. Individuals at the period anticipated robots becoming a frequent part of everyday life in roles including such local helpers, retail employees, and bankers, thanks to a wave of films, TV programs, and literature featuring robots that were famous in the mid-twentieth century. Robots provide us with practical answers to a variety of global issues. A voice-controlled robot is a smart device that can be regulated by the user via voice instructions. The voice recognition app can recognize the five vocal instructions provided by a specific individual: ‘Go forward’, ‘Go backward’, ‘Move right’, ‘Take a left’, and ‘Stop’. The technology for voice recognition is speaker dependent. The research into various techniques of operating robots has resulted in a variety of successes, including the entry of new, creative, and distinctive ways of robot motion control. Robots are being used in a variety of different industries because of technological advancements. We will discuss how to drive a robot-controlled automobile using a Bluetooth device as well as a smartphone app in this study. The benefit of employing a robot-controlled automobile is that it may be utilized for a variety of objectives, such as reducing manual labor. The employment of robots in today’s world has spread from industry to everyday life. Several reports of user-configurable robotic gadgets or even automata mimicking animals and people meant largely for amusement, date back to early civilization. With the advancement of mechanical processes throughout the Industrial Revolution, new practical uses including automated systems, remote control, and wireless remote control arose. A robot would ideally connect itself with individual operators using voice recognition as an independent platform. The foundation for accomplishing this goal was laid out in the project. As a result, a novel set of options should have been ready to incorporate into the robot and begin receiving new functions from it. The project’s major goal is to steer the robotic vehicle into the intended location. The project’s primary goal is to use voice instructions to operate the robot. Human–robot interaction is now possible. The purpose of a voice-controlled robot is to hear to and react on the user’s orders. The suggested system is made up of two blocks: a transmitter and a receiver, both of which are powered by a microcontroller and a battery. Anyone may use the individual smartphone to operate the robotic car with this application. The project’s goal is to use voice instructions to drive a robotic vehicle. A basic speech app detects spoken instructions, which is used by the voice control robot. The software might be improved by employing background noise reduction technologies to record only the command, resulting in a more efficient approach. In the field of sensors used in smartphones, extensive studies are underway to remove noise and identify just the essential voice signals. The wheelchair technology may be effectively built and implemented using certain advanced signal processing methods.
Design and Implementation of Voice Command-Based Robotic System
275
2 Literature Review with In-Depth Investigation Robotics is a discipline of mechanical, electrical, and computer engineering involved in the design, building, performance, and use of robots, and the computer networks that regulate them, provide sensory input, and process data. Such technologies are concerned with automated devices which can replace people in hazardous situations or production processes, or that look and act like humans. Most of today’s modern robots are bio-inspired, which help to advance the area of bio-inspired robotics. The presentation of a speech-control robotic platform is shown in [1]. The capacity to employ voice recognition algorithms to the control application is demonstrated in this small system. This robot can comprehend natural-sounding control commands and carrying out the appropriate activity. A mobile robot with this capability could be useful in situations when voice communication is critical. The approach has been shown to be sufficiently efficient for actual use. Unmanned automotive research is ongoing daily, owing to technical advancements. The Arduino mega microcontroller and Bluetooth sensor have been used in [2] to create a car that can be operated by voice commands through smartphone. For voice instructions, Google Voice and VoiceBot applications are utilized. A SCOUT robot is controlled in [3] by using a voice command detector. The hidden Markov technique is used to create the voice command detector. For required instructions, a non-native voice dataset is developed with a variety of instructions. Every command is delivered by a group of twenty Persian speakers. The HMM-based technology is used to identify speech commands. The discovered instruction is then transmitted to the robot using Bluetooth. The findings of the device’s evaluation show that the human–robot interaction is satisfactory. The study which is shown in [4] presents details research into the conception, installation, verification, and assessment of an intelligent system for voice-based human–machine interaction, which enables a robot to acquire language based on voice instructions. A robot voice recognition control scheme is described in [5] that can reliably distinguish elder and children voices in chaotic surroundings. This system is tested using a communication robot in a loud setting. A cordless microphone is used to record the voice. Voice recognition is done via a node that incorporates the bag entitled voice instruction recognition provided by iFLYTEK to operate a twin-arm robot that is built in [6] utilizing speech depending on ROS. A speech test is used to verify the viability of the voice control method. Another ROS-based voice-controlled robot is proposed in [7] that works on a differential drive system and responds to the recipient’s commands by moving a wheeled robot. Viterbi algorithms is used in the HMM method to voice improvement. A user-controlled robot is designed in [8] for the mission of monitoring and rescuing. A sample is being developed where the structure is simple but efficient. To take use of the vast bandwidth available in a light fidelity media, a voice-based motion is suggested in [9] where an orientation control method for a robotic vehicle is also developed. This media is chosen because of its fast response time and capacity to react to voice recognition.
276
Md. Rawshan Habib et al.
A radio-controlled vehicle is operated in [10] through Thai voice instructions. The study is carried out by transmitting a person’s instructions to a computer, which then turned the instructions to digital signals. The digital signal then is translated into radio signal instructions. Lastly, the radio signal instructions are used to control a car. The goal of the study carried out in [11] is to utilize voice control system which gives directions to a soccer robot via base station. Speech is entered into a database in the shape of analog impulses. Voice recognition is carried out with the help of a deep speech module, which produces text as a result. The robot operating system would power the device. An intelligent and multifunctional speech recognition guidance technology is presented in [12] that utilizes the real-time operating system to direct blind person to a remotely configured location. The robot is built with a self-charging capability that allows it to discover sunshine by roaming around the workplace and power itself via solar panels during daytime. Research is carried out in [13] where voice recognition and human language understanding are combined for home automation with customizable gadgets. Development and construction of a software interface are shown in [14] that uses cloud-based voice processing to detect instructions and then converts these to computer code. In order to manage artificial mechanical arms, an adequate modeling based on mathematical models utilized in the research is undertaken, and a voice control model is implemented in [15]. Our hybrid approach, as well as the phonetic unit, is combined into a single statistical method is presented in [16]. A categorization method is provided in [17] for accurately discriminating among real users and chatbots based on the result of the research. Image segmentation methodology is used in [18] to simulate an electric wheelchair which is dependent on eye-tracking control.
3 System Description ˙In this section, description of existing and proposed system are discussed. Here, detailed description of proposed work is also given.
3.1 Existing System A speech recognition system is now in use. Voice recognition is a method of turning uttered sounds into an electronically stored bunch of words utilizing a microphone or a telephone. The correctness (error rate in translating spoken sounds to digital information) and performance of a voice recognition are used to evaluate its effectiveness. Voice recognition offers a plethora of uses. Automated translation, pronunciation, hands-free computing, document processing, robotic client service, automation, and other applications typically use those programs. Anyone has likely used voice recognition software if they have ever paid a bill over the telephone utilizing an automated service. Inside the previous decade, voice recognition technology has
Design and Implementation of Voice Command-Based Robotic System
277
advanced dramatically. Voice recognition, on the other hand, has flaws and issues. Conversational voice recognition is still a way away off in present tech. Despite the problems, voice recognition is becoming increasingly popular. Analysts predict that voice recognition would be standard in phone networks in the next several generations. Its expansion would be assisted by the reality that in areas, wherein dial-up phones are scarce, voice is the only choice for operating automated processes.
3.2 Voice-Controlled System Though voice recognition is concerned with transforming voice to digital information, voice recognition is devoted to understanding the speaker. Voice recognition operates by assessing the characteristics of voice which vary from person to person. Because of individual physique and behavioral patterns, all have a distinct way of talking. Voice recognition techniques differ dramatically from speech recognition applications. Voice recognition technique is most typically used to confirm a subject’s identification or to verify the identity of an unfamiliar speaker. Voice recognition can be divided into two categories: speaker verification and speaker identification. The technique of utilizing a person’s speech to confirm that they are who they claim they are known as speaker verification. A user’s voice is basically treated in a similar way that a fingerprint is. A person’s voice patterns are compared to a database after a sample of their voice is captured to see whether the voice fits their stated identities. Speaker verification is most typically used in instances when safe access is required. These programs depend on the individual’s consent and participation to function. The technique of discovering the identity of an unfamiliar person is termed as speaker identification. Speaker identification, unlike speaker verification, is frequently done even without individual’s consent.
3.3 Detailed Description of Proposed Work • Transmitter Section: Voice commands are delivered to a program here on transmitter part. The voice recognition software will therefore consider taking the commands and transform them to digital signals using a built-in analog to digital converter, start comparing them to preset commands (for example, forward or backward), and transfer those results in binary format in response to the voice commands. The microcontroller receives such binary message and transmits it to the relay casing. This would compare the value to the cases and send the string with the command through Bluetooth module based on the results. • Receiver Section: The Bluetooth receiver system receives digital data in the reception portion and delivers binary values to the microcontroller (Arduino UNO). The microcontroller accesses the switch case and matches the data items to the switch case’s readings. The servo motors would then be driven in a constant loop based
278
Md. Rawshan Habib et al.
on the string value. Three or even more servos are working at the same time since the robot must move, which necessitates the cooperation of at least three servos.
4 Design and Implementation The proposed voice command-based robotic system is fabricated with handy equipment. The primary equipment are Arduino (UNO-R3), DC motors, sensors, motor driver, power supply, battery, Bluetooth module, wheel, chassis, wire, etc. Table 1 lists the qualities and functions of several components employed in the suggested robot. During the execution of the suggested robot, a power supply unit is initially created, which uses a 7805 to convert 9–5 V and an LE 33 to convert 5–3.3 V. Capacitors are used to maintain modest changes and to bypass very short-period spikes and extremely short-duration spikes to the ground while causing no disruption to the other components. Back current is prevented by using diodes. The motor driver controller is a quad, high current, half-bridge driver capable of providing bidirectional driving currents of up to 600 mA at voltages ranging from 4.5 to 36 V. Inputs 1 and 2 are connected to digital pins 2 and 3 of the left motor, while inputs 3 and 4 are connected to digital pins 4 and 5 of the right motor. For speed control, the enable pin of the l293D is connected to digital pin 6 of the Arduino. RX of the Bluetooth module is connected to TX of Arduino digital pin 1, and TX of Bluetooth module Table 1 Features of the fabricated robot’s major parts Name
Features
Arduino (UNO-R3)
ATmega328 microcontroller which operates with 5 V. ˙Its input voltage varies from 7 to 12 V. It has six analog inputs and 14 pins for digital I/O RM0402 model is used in this project which weights 100 gm. ˙Its RPM is 60–300
DC motors Motor driver
L293D motor controller is used here. Supply voltage range is 4.5–36 V. Pulsed current is 1.2 A per driver
Bluetooth module
Class-2 Bluetooth module with 2.4 GHz ISM band is used here. Its dimension is 26.9 mm × 13 mm × 2.2 mm
Ultrasonic sensor
The model used here is HC-SR04. Working voltage, current, and frequency are 5 V, 15 mA, and 40 Hz
Power supply unit
Power supply unit means the external power supply which is called battery. For a voice command robot, 9 V lithium ion battery is used. It is best for a VCR robot
Wheel
Rubber-made black wheel is used with 65-mm outer diameter and 26-mm tire width
Ball caster
It is used to make robot’s move smoothly. It has 20-mm-high-round body
Robot chassis
Robot chassis is a low-cost chassis that has all the features required to build a robot
Design and Implementation of Voice Command-Based Robotic System
279
Fig. 1 Robot block diagram
is connected to Rx of Arduino digital pin 0. The HC-SR04 is used to detect and mitigate obstacles by distance measurement in the region of 2 cm–400 cm with a 3 mm precision. The ultrasonic sensor, which is made up of an ultrasonic transmitter, receiver, and controller, works by delivering pulses and checking for echo to estimate the obstacle’s position. The ultrasonic sensor’s emitter and detector can generate a 40 kHz acoustic waves and identify the very same frequency, and provide an electrical signal back to the microcontroller. Block diagram of the proposed robot is shown in Fig. 1, and flowchart of the proposed robot is depicted in Fig. 2. The whole prototype of the proposed robotic system is shown in Fig. 3. After successfully hardware and software installation, it is needed to test the developed robot. For simulation, Proteus 8.0 simulation software is used. The whole circuit diagram is given in Fig. 4. As a result of the simulation, the following are the findings: • Code uploaded in Arduino is tested and found correct. • The Bluetooth module is correctly functioning for data transmission. • Voice control using cross-platform among Microsoft OS Windows and Android OS is tested, and it works properly. • Motors movements are found correct. The developed robotic system is analyzed in terms of its strength, weakness, opportunities, and threats. The SWOT analysis of this proposed system is described below: • Strength: In the construction industry, such robotics are utilized. In industries, such robotics are utilized to control wheelie and lift. The wheelchair system may be created and applied effectively. • Weakness: The weakness or limitation of this project is if the ultrasonic sensor is risen up to 15° angle, it is not measure the wall or track.
280
Fig. 2 Flowchart of the developed system
Fig. 3 Prototype of the developed voice command-based robot
Md. Rawshan Habib et al.
Design and Implementation of Voice Command-Based Robotic System
281
Fig. 4 Connected circuit diagram
• Opportunities: This project’s benefits include its usage in the manufacturing industry, domestic chores, the military sector, and other fields. • Threat: This project faces the risk of a short circuit while operating hours. It could be a case of overvoltage. It is possible to crush the processor.
5 Conclusion The final version met all of the original objectives. The robot operates far better than expected. The design approach is not only adaptable, but also durable. The robot may simply be upgraded with new functions. With the help of Bluetooth, the suggested system demonstrates how an Android phone is sometimes used as a remote controller for robots and different embedded systems. Simultaneously, the program communicates with the robot through a Bluetooth link. The suggested solution also demonstrates how a robot may be utilized for travel. The phone’s operating system is android and can establish better remote control programs. Connection in between mobile and the robot can be accomplished over wireless network, making robot control easy and efficient. The discipline of voice command robots is quickly expanding, as is interest in robotics and global robot competitions. Though the proposed and developed robot has successfully satisfied the primary objectives, it can still be developed in the future.
282
Md. Rawshan Habib et al.
References 1. Lv X, Zhang M, Li H (2008) Robot control based on voice command. In: 2008 IEEE international conference on automation and logistics, Qingdao, pp 2490–2494 2. ˙Iskender A, Üçgün H, Yüzgeç U, Kesler M (2017) Voice command controlled mobile vehicle application. In: 2017 international conference on computer science and engineering, Antalya, pp 929–933 3. Azargoshasb S, Korayem AH, Tabibian S (2018) A voice command detection system for controlling movement of SCOUT robot. In: 2018 6th RSI international conference on robotics and mechatronics, Tehran, pp 326–330 4. Branzila M, Sarmasanu C, Fanaru G (2017) ROBOTVOICE—voice command of a robot. In: 2014 international conference and exposition on electrical and power engineering, lasi, pp 760–763 5. Shim BK et al (2010) An intelligent control of mobile robot based on voice command, ICCAS 2010, Gyeonggi-do, pp 2107–2110 6. Zhang Y et al (2018) Voice control dual arm robot based on ROS system. In: 2018 IEEE international conference on intelligence and safety for robotics, Shenyang, pp 232–237 7. Megalingam RK, Reddy RS, Jahnavi Y, Motheram M (2019) ROS based control of robot using voice recognition. In: 2019 third international conference on inventive systems and control, Coimbatore, pp 501–507 8. Habib R, Ahmed K, Reza AS, Mouly FJ, Mahbub HR (2019) An Arduino based robot for pipe surveillance and rescue operation, Dhaka, pp 1–5 9. Saradi VP, Kailasapathi P (2019) Voice-based motion control of a robotic vehicle through visible light communication. Comput Electr Eng 76:154–167 10. Leechor P, Pornpanomchai C, Sukklay P (2010) Operation of a radio-controlled car by voice commands. In: 2010 2nd international conference on mechanical and electronics eng, Kyoto, pp V1-14–V1-17 11. Prasetyo MR et al (2020) Implementation voice command system for soccer robot ERSOW. In: 2020 international electronics symposium, Surabaya, pp 247–252 12. Kalpana S, Rajagopalan S, Ranjith R, Gomathi R (2020) Voice recognition based multi robot for blind people using lidar sensor. In: 2020 international conference on system, computation, automation and networking, Pondicherry, pp 1–6 13. Mehrabani M, Bangalore S, Stern B (2015) Personalized speech recognition for ınternet of things. In: 2015 IEEE 2nd world forum on ınternet of things, Milan, pp 369–374 14. Deuerlein C et al (2021) Human-robot-interaction using cloud-based speech recognition systems. Proc CIRP 97:130–135 15. Gundogdu K, Bayrakdar S, Yucedag I (2018) Developing and modeling of voice control system for prosthetic robot arm in medical systems. J King Saud Univ Comput Inf Sci 30:198–205 16. Manoharan S, Ponraj N (2020) Analysis of complex non-linear environment exploration in speech recognition by hybrid learning technique. J Innov Image Process 02:202–209 17. Smys S, Haoxiang W (2021) Naïve Bayes and entropy based analysis and classification of humans and chat bots. J ISMAC 3(01):40–49 18. Tesfamikael HH et al (2021) Simulation of eye tracking control based electric wheelchair construction by image segmentation algorithm. J Innov Image Process 03:21–35
Study on Advanced Image Processing Techniques for Remote Sensor Data Analysis Md. Rawshan Habib, Abhishek Vadher, Fahim Reza Anik, Md Shahnewaz Tanvir, Md Mossihur Rahman, Md Mahmudul Hasan, Md. Rashedul Arefin, Md Apu Ahmed, and A. M. Rubayet Hossain Abstract Image processing is the act of altering photographs with arithmetic computations for a variety of objectives including erasing clean lines, dividing groups of objects, noise removal from photos, detecting edges, and so on. Remote sensing is a method for observing the planet’s surface or environment through satellites or from the air (via airplanes). Electromagnetic radiation reflected or radiated by the earth’s surface is recorded using this approach. Numerous techniques and technologies can be used in remote sensing to monitor electromagnetic waves of different wave lengths, including visual, infrared, heat infrared, and so forth. In this study, digital cameras are utilized to acquire photos in the visible range and then used advanced image processing and filtering algorithms to the raw data. Homomorphic filtering, median filtering, Gaussian filtering, histogram-based thresholding, contrast stretching, and other techniques are used. The results of this research will be valuable for enhanced picture analysis. Keywords Image processing · Remote sensor · Homomorphic filtering · Median filtering · Gaussian filtering · Contrast stretching
Md. Rawshan Habib · A. Vadher Murdoch University, Murdoch, Australia F. R. Anik · Md. Rashedul Arefin (B) · A. M. Rubayet Hossain Ahsanullah University of Science and Technology, Dhaka, Bangladesh e-mail: [email protected] M. S. Tanvir · M. Mahmudul Hasan South Dakota School of Mines and Technology, Rapid City, USA M. M. Rahman Islamic University of Technology, Gazipur, Bangladesh M. A. Ahmed Chemnitz University of Technology, Chemnitz, Germany © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_20
283
284
Md. Rawshan Habib et al.
1 Introduction Image processing is the process of transferring an image to digital data and then performing operations on it to enhance the appearance and extract the data. An image is described as a set of square pixels arranged in rows and columns in an array, or matrix. Image processing is described mathematically as the device’s analysis of a two-dimensional image, i.e., an image is described because of two parameters, with an amplitude including the luminance of an image at a coordinate point (a, b). An image or a set of attributes or qualities associated to the image can be the result of image processing. Most image processing algorithms consider an image as multiple data to which typical signal processing methods are applied. Such project’s purpose can be classified into three groups. One is image processing, where the input is an image, and the output is also an image; the next is image analysis, where the input is an image, and the output is the measurements or parameters. Because to a deficit in visualizing data, the breadth of image processing and identification has grown nowadays. As a result, modern imaging methods have emerged, and it is critical to research this development to make the best use of them. Modern image processing techniques and uses is a must-have reference book for anyone interested in the most up-to-date studies on image processing technique. Several approaches for removing obstacles from photographs and recovering valid data have been established. Such techniques were extensively tested, and the findings, as well as the MATLAB code, were provided. A newly created and verified advanced image processing approach has been included. This technique combines logical image operations, linked processes, stack filters, and adapted filters (for image processing employing adaptive blind learning algorithms). On a range of photos with barriers such as fog and clouds, the efficiency of these strategies was proved [1]. Manual analysis is subject to error and is a time-consuming operation; thus, humans continue to create geographic information systems to save time that is a platform for capturing, storing, manipulating, analyzing, managing, and presenting all forms of spatial or geographic data. They do these things instantly utilizing image processing, machine learning, object recognition, and other techniques to provide us with thorough topographic maps, data about water level, tidal waves, and wave path in coastlines, and a variety of other data that help us effectively manage, evaluate, and strategize our ecosystems. It can also aid in the prediction of natural or man-made risks, allowing us to better plan for their mitigation. They provide us with data on global warming, area use patterns, and other topics. The most essential aspect of remote sensing is that satellite images are everlasting archives that provide useful data in a variety of wavelengths. Global studies on a number of topics and the detection of big characteristics are made possible by the vast coverage. Remote sensing, which is the study of gathering data about the planet through remote equipment including satellites, is naturally valuable for emergency preparedness. Satellites provide precise, regular, and near-instantaneous data over huge areas throughout the globe. When a crisis occurs, remote sensing is frequently the only method to see what is going on the surface. In our study, advanced image processing techniques such as
Study on Advanced Image Processing …
285
homomorphic filtering, median filtering, Gaussian filtering, histogram-based thresholding, and contrast stretching are applied on raw data images obtained by digital camera of particular specification.
2 Remote Sensing and Satellite-Based Scanning System The phrase “remote sensing” refers to the process of detecting the planet’s surface from orbit using electromagnetic radiation produced, deflected, or refracting by the perceived objects in order to improve natural resource management, area usage, and ecological sustainability. Stages in remote sensing (Fig. 1) can be described as follows: • EMR emitted by the sun. • Absorbing and dispersion, reflecting and emissions, and other interactions with the planet’s surface. • EMR contact with the ground: reflecting and absorption. • Transfer of energy from the ground to a remote sensor. • Output of sensor. • Analysis and processing of the output data. Remote sensing, in its broadest sense, is the study of gathering and evaluating data about things or occurrences at a range. We are all acquainted with distant detecting as individuals because humans depend on sensory acuity to get most of the data regarding the environment. Human eyes, on the other hand, are severely constrained as sensors due to their sensitivity to only the visual spectrum of electromagnetic energy, seeing angles imposed by human bodies’ positions, and the inability to create a permanent trace of what we see. As a result of these constraints, humans have strived to build technology tools to improve the capacity to have and preserve the physical qualities of our surroundings. Remote sensing has long been regarded as a vital tool for observing, evaluating, describing, and drawing conclusions about the surroundings, dating back to the early usage of aerial imagery. Remote sensing
Fig. 1 Remote sensing process [2]
286
Md. Rawshan Habib et al.
Fig. 2 Satellite operation and data distribution process [4]
technology is improving on three levels in the last few decades [3]. EMR is detected electronically among most satellite sensors as a steady flow of digital information. The data are sent to land receiving units, where it is compiled into specific data packages and sold to consumers on a range of electronic data media. The digital image data can be easily quantified utilizing computer-assisted digital image processing techniques once they have been bought. Satellite operation and data distribution process are shown in Fig. 2. Numerous studies are ongoing regarding image processing and remote sensing. Multisensory and multispectral (MS) data fusion approaches are discussed in [5], with a focus on high-resolution or ultra-remote sensing studies of the marine. It is indeed a difficult task that entails MS analysis of data, theory analysis, computation, and projection. Ever since introduction of first operational totally polarimetric sensor, AIRSAR, in 1985, the polarimetric synthetic aperture radar (PolSAR) has had an expanding favorable impact in the remote sensing field. Because of the accessibility and importance of such information, it is critical to provide users with cutting-edge PolSAR image processing and analysis tools. TerraLib is used in [6] to describe the present development stage of this technology. The experience using combination spatial/spectral algorithms to classify hyperspectral image data into complete (pure) and mixed pixels is discussed in [7]. The majority of the strategies given here are based on traditional mathematical morphology theory that offers an important basis for implementing the needed integration. By comparing the proposed techniques to certain other well-known methods, the effectiveness of the proposed methods
Study on Advanced Image Processing …
287
is proved. Wavelet analysis is used in [8] to propose algorithms and strategies for automated recognition and monitoring of regional-scale objects from satellite data. Virtual sensors are used in [9] to complete unquantified spectral bands in spectrally low datasets utilizing algorithms learned on spectrally richer datasets. In India’s Jharia coalfields, satellite data derived from heat and microwave remote sensing data are being used in [10] to calculate the impact of burning coal on subsidence. The author proposes a general categorization system in [11] for multisensor remote sensing picture processing. People today are confronted with a significant environmental issue. Utilizing satellite remote sensing techniques, long-term monitoring, and collecting of large-scale geographical data is presented in [12]. To identify objects in space, a fresh image collection is generated in [13]. A deep network extractor based on DWT is demonstrated. To generate comparison results, a widely utilized landuse dataset has been used. A comprehensive study that is presented in [14] focuses on some of the most regularly employed sensors and predictors, as well as recent breakthroughs in remote sensing techniques. It also contains suggestions for how to use these methods to assess vegetation restoration, as well as gaps in technical constraints that may be used to drive future study. Multisensor remote sensing images are classified in [15] utilizing maps and neural networks.
3 Image Processing Fundamentals With systems starting from pure digital circuits to complex parallel computers, today’s digital technology makes it possible to manage multi-dimensional data. This manipulation’s purpose may be broken down into three categories: • Image processing • Image analysis • ˙Image understanding. Instead of demising pictures, estimating visual features, and executing a segmentation step one by one, all of these elements can be merged and estimated at the same time. This method is based on a modification of the well-known Mumford–Shah operational that was first presented for combined demising and segmentation of still pictures. One of the lengthy key difficulties of computer vision is subject identification in images and movies. Because recognition entails both locating instances of an entity type in fresh, cluttered settings (sensing) and identifying such examples as pertaining to several possible groups (classification), the issue is twofold (categorization). The basic difficulty of recognition is to develop adequate image models that describe the shape and dimensions of each item class and so help differentiate things from one another. A filter is a circuit or technique in digital signal processing that eliminates undesired elements or properties from a data. Filtering is a type of signal processing that involves the entire or partial suppression of one or more aspects of the signal. To reduce radio interference and minimize ambient noise, it usually entails deleting some frequencies while leaving others alone. Low pass, high pass,
288
Md. Rawshan Habib et al.
directional, Laplacian, median, mode, and minimal/maximal filters are some usual forms of filters. Edge detection refers to a set of applied mathematics for recognizing spots in a digital image where the image intensity abruptly varies or, rather formally, where there are interruptions. The strong fluctuations in image intensity are usually grouped into a collection of curved line segments called edges. Step recognition is the challenge of detecting abnormalities in one-dimensional signals, and prediction is the difficulty in finding signal discontinuities across time. In image processing, machine vision, and computer vision, edge detection is a critical technique, especially in the fields of feature recognition and separation [16]. The four primary processes of digital image processing are image restoration, image enhancement, image categorization, and image transformation. Image restoration is the process of correcting and calibrating photographs in order to produce the most accurate portrayal of the ground atmosphere possible, which is a critical aspect for all uses. The term “image augmentation” refers to the process of modifying images to make them look better to the human eye. Though with digital image processing, visual inspection is a critical component, and the results of such methods can be striking. The machine analysis of images is referred to as image classification, and it is a crucial activity in GIS. Furthermore, image transformation means the creation of new imagery from raw picture bands that have been mathematically treated. Deep learning algorithms would be used in a variety of uses to support humans in the bearish term [17]. Spectral restrictions in satellite data pictures are an unsolved barrier for implementing the change detector in the image processing arena. Unsupervised learning could be used to repair spectral abnormalities in the suitable scene [18]. A research of a mechanical arm made using a 3D printer that integrates a computer vision technology with something like an object tracking is proposed in [19].
4 Result Analysis We have collected some raw images before image processing. For obtaining the images, we have used a digital single-lens reflex camera (Nikon D5200), whose specifications are given in Table 1.
4.1 Homomorphic Filtering Homomorphic filtering is a signal and image processing approach that entails a mapping function to a specific domain, where linear filter methods are performed, and then, a nonlinear mapping returns to the initial domain. For image enhancement, a homomorphic filter is often utilized. It boosts contrasts while also normalizing luminance across an image. To reduce multiplicative noise, homomorphic filtering is applied. Illumination and reflection are indistinguishable in the frequency domain,
Study on Advanced Image Processing … Table 1 Specification of the camera
289
Name of properties
Specifications
Model
Nikon D5200
Type
Single-lens reflex digital camera
Total pixel ˙Image sensor
23.5 × 15.6 mm CMOS sensor
24.71 million
˙Image type ˙Image size
JPEG
Shutter speed
1/200
ISO
100
Aperture
f/3.5
6000 × 4000
yet its relative positions can be determined. Because light and reflection mix in a multiplication manner, the parts are designed cumulative by calculating the logarithm of the picture brightness, allowing these multiplication image elements to be split linearly in the frequency domain. Filtering in the log domain can reduce brightness fluctuations, which can be regarded of as additive noise. Since the high-frequency elements are presumed to depict mainly refraction in the incident, while the lowfrequency elements are presumed to reflect mainly brightness in the incident, the high-frequency elements are enhanced, and the low-frequency elements are reduced to create the brightness of an image more even. The effects of homomorphic filtering can be shown in Fig. 3a–c. Homomorphic filtering is a frequency domain filtering technique that reduces luminance despite increasing contrast. Images in which the brightness is dispersed unevenly, leading the things in the image to seem in a dark shade, are good candidates for homomorphic filtering. Because these photographs are darkish in average, their subtleties are obscured.
4.2 Median Filtering It is frequently desired in signal processing to be able to do most of the noise removal on a picture or signal. The median filter is a type of nonlinear digital filter that is commonly used to eliminate noise. Noise removal is a common preprocessing procedure used to increase the performance of subsequent processing. Median filtering is commonly employed in digital image processing since it maintains edges when reducing noise under particular circumstances. The median filter’s fundamental principle is to go over the signal entry by entry, changing each one with the median of the values next to it. Smoothing techniques such as median filtering and linear Gaussian filtering are two examples. Smoothing methods are all successful in eliminating noise in smooth areas or sections of a signal, but they have a negative impact on the edges.
290
Md. Rawshan Habib et al.
Fig. 3 a–c Homomorphic-filtered output of raw images
However, frequently, at same time as lowering the noise in a room, it is critical to keep the boundaries of the signal. The median filter is a low-pass filter. Whenever the objective is to minimize noise while preserving edges, a median filter seems to be more successful than convolution. The median filter examines each pixel in the image one by one, comparing it to its peers to see if it is reflective of its surrounds. The purpose of this filter is to eliminate noise. However, because the raw image is captured with a digital camera, the original image had no noise. As a result, we use MATLAB code to add artificial salt and pepper noise to the original image which are shown in Fig. 4a–c. The image is then processed with a median filter to remove the salt and pepper noise.
Study on Advanced Image Processing …
Fig. 4 a–c Median-filtered output of raw images
291
292
Md. Rawshan Habib et al.
Fig. 4 (continued)
4.3 Gaussian Filtering Since its architecture can be changed by modifying only one variable, the variance, Gaussian filters are an excellent place to start studying with filtering. The function of a Gaussian filter is stated as, G(x, y) =
1 − x 2 +y2 2 e 2σ 2π σ 2
The degree of filtering is inversely proportional to the sigma (variance) quantity; lower sigma values indicate more frequencies are repressed, and likewise. From Fig. 5a–c, we have seen the results of Gaussian filtering. Gaussian filtering smoothens images depending on the value of sigma, controls blurring, and reduces noise.
4.4 Contrast Stretching Contrast stretching focuses on improving an image by extending the intensity range contained within it to achieve maximum use of available options. Contrast stretching, unlike histogram equalization, is limited to a linear mapping of input to output quantities. The outcome is less striking, but it avoids the false aspect that equalized photos sometimes have.
Study on Advanced Image Processing …
293
Fig. 5 a–c Gaussian-filtered output of raw images
From Figs. 6, 7, and 8, we can see the results of contrast stretching. By contrast stretching, the intensity of an image on different coordinate points can be checked, and the output intensity curve of the image can also be shown.
294
Fig. 6 Contrast stretching output of raw image
Fig. 7 Contrast stretching output of raw image
Fig. 8 Contrast stretching output of raw image
Md. Rawshan Habib et al.
Study on Advanced Image Processing …
295
Fig. 9 Histogram-based thresholding output of raw image
4.5 Histogram-Based Thresholding This method implies the image is separated into two categories: backdrop and foreground. The technique attempts to determine the best threshold level for dividing the histogram into two categories. The technique balances the histogram, determines which of the two parts is stronger, and then subtracts weight from the larger side till it is smaller. It continues the action till the measuring scale’s edges touch. The outcomes of histogram-based thresholding can be observed in Figs. 9, 10, 11. The histogram-based thresholding process includes weighing the histogram, determining whether any of the two parts is greater, and removing weight from the larger side till it becomes less. It performs the action till the measuring meter’s edges touch.
5 Conclusion We have observed a few of characteristics of a successful basic image processing software in this study. We may make a lot more sophisticated changes to the photographs. One can use a number of filters on the photograph. The image is altered using mathematical techniques by the filters. Certain filters are simple for using, whereas others necessitate a high level of technical expertise. Filters include homomorphic, median, and Gaussian filters, among others. Eliminating the border contacting the subject, closing gaps, contrast stretching, and histogram-based thresholding was also used.
296
Md. Rawshan Habib et al.
Fig. 10 Histogram-based thresholding output of raw image
Fig. 11 Histogram-based thresholding output of raw image
Initially, we gathered photographs from two separate times and two different locations. After that we added the filters to the raw photos and compared the results among two distinct locations and times.
Study on Advanced Image Processing …
297
• While using the filters on raw data images, we have found the output of Gaussian filter is more effective than others. Varying the value of standard deviation (σ), we can get perfect smoothen, noise-free image. • We have used contrast stretching to improve the contrast in an image by stretching the range of intensity values in the output. By creating input intensity and output intensity variation, the process of contrast stretching has been done. • In the raw images, there are color differences between two pixels. We have used histogram to balance this color differences. From an image, if we can find out the frequency, we can relate it to the wavelength. And from the wavelength, we can find out the temperature of the place of the image may be. We tried to find out temperature from an image. But, it wasn’t possible for us to find out the overall frequency of an image. We can only get frequency at a certain pixel of an image. So, temperature value can be obtained at a certain pixel. It can’t give us actual temperature of that place.
References 1. Luo J, Cross J (2007) Advanced image processing techniques for maximum information recovery. In: 2007 thirty-ninth southeastern symposium on system theory, Macon, pp 58–62 2. Aggarwal S. Principles of remote sensing, satellite remote sensing and GIS applications in agricultural meteorology, pp 23–38 3. Satellite remote sensing and its role in global change research. http://www.ciesin.org/TG/RS/ satremot.html 4. What is remote sensing? https://www.restec.or.jp/en/knowledge/index.html 5. Raizer V (2013) Multisensor data fusion for advanced ocean remote sensing studies. In: 2013 IEEE international geoscience and remote sensing symposium—IGARSS, Melbourne, pp 1622–1625 6. Sant’Anna S et al (2015) Integration of information-theoretic tools for PolSAR image processing and analysis. In: 2015 IEEE international geoscience and remote sensing symposium (IGARSS), Milan, pp 239–242 7. Plaza A et al (2003) Spatial/spectral analysis of hyperspectral image data. In: IEEE workshop on advances in techniques for analysis of remotely sensed data, Greenbelt, pp 298–307 8. Liu AK et al (1997) Wavelet analysis of satellite images for coastal monitoring, IGARSS’97. In: 1997 IEEE international geoscience and remote sensing symposium proceedings. remote sensing—a scientific vision for sustainable development, Singapore, pp 1441–1443 9. Srivastava AN, Oza NC, Stroeve J (2005) Virtual sensors: using data mining techniques to efficiently estimate remote sensing spectra. IEEE Trans Geosci Remote Sens 43:590–600 10. Karanam V et al (2021) Multi-sensor remote sensing analysis of coal fire induced land subsidence in Jharia Coalfields, Jharkhand, India. Int J Appl Earth Observ Geoinf 102:102439 11. Krasti B et al (2020) Remote sensing image classification using subspace sensor fusion. Inf Fusion 64:121–130 12. Zhang J, Zhao T, Zhai X (2021) Green city air measurement and health exercise big data monitoring based on remote sensing images and sensors. Environ Technol Innov 23:101679 13. Karadal CH et al (2021) Automated classification of remote sensing images using multileveled MobileNetV2 and DWT techniques. Expert Syst Appl 185:115659 14. Pérez-Cabello F et al (2021) Remote sensing techniques to assess post-fire vegetation recovery. Curr Opin Environ Sci Health 21:100251
298
Md. Rawshan Habib et al.
15. Chen CH, Shrestha B (2000) Classification of multi-sensor remote sensing images using selforganizing feature maps and radial basis function networks, IGARSS 2000. In: IEEE 2000 ınternational geoscience and remote sensing symposium. taking the pulse of the planet: the role of remote sensing in managing the environment. Proceedings (Cat. No.00CH37120), Honolulu, pp 711–713 16. Concept of edge detection. https://www.tutorialspoint.com/dip/concept_of_edge_detection. htm 17. Ranganathan G (2021) A study to find facts behind preprocessing on deep learning algorithms. J Innov Image Process 03:66–74 18. Dhaya R (2021) Hybrid machine learning approach to detect the changes in SAR images for salvation of spectral constriction problem. J Innov Image Process 03:118–130 19. Chen JI, Chang J (2020) Applying a 6-axis mechanical arm combine with computer vision to the research of object recognition in plane inspection. J Artif Intell Capsule Netw 02:77–99
Implementation and Performance Evaluation of Binary to Gray Code Converter Using Quantum Dot Cellular Automata Uttkarsh Sharma, K. Pradeep, N. Samanvita, and Sowmya Raman
Abstract Quantum dot cellular automata (QCA) is one of the most well-known nanometre-scale technologies, offering significant size and power reductions as well as a fast switching frequency to surpass the scaling restrictions of complementary metal-oxide semiconductors. A new technological revolution is necessary when an existing technology approaches a deadlock. In response to new challenges in current technology, a complex technology based on Quantum dot cellular automata (QCA) is developed. QCA is an intriguing field of nano-computing technology that offers unmatched compactness, high-speed operation, and very low-power consumption. In comparison to other designs, a 4-bit Binary to Gray Converter with various designs has been illustrated in this article with minimal cell utilization using QCAD simulation tool. Keywords CMOS · QCA · Binary to gray · Latency · Reversibility
1 Introduction In the present world, human life is incomplete without gadgets. The present-day gadgets carry out digital operations with the help of transistors. From the past decade continuous efforts are made by scientists to reduce the size of transistors thus increasing the number of transistors on microchip. With an improved ‘architecture of digital circuits,’ CMOS ushered a new era in IC technology of VLSI arena [1–4]. As per Moore’s Law ‘the number of transistors on a microchip doubles every 2 years’. But the challenge is that we have already reached nanoscale of 7 nm chip size, whereas the diameter of a single silicon atom is 0.2 nm. Further downscaling U. Sharma · K. Pradeep · N. Samanvita (B) · S. Raman Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India e-mail: [email protected] U. Sharma e-mail: [email protected] S. Raman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_21
299
300
U. Sharma et al.
of the chip size has many obstacles like short channel effects, excessive power dissipation, etc., therefore, the transistor less [5–9]. QCA is a superior alternative to the existing CMOS technology in terms of power dissipation, area occupancy and speed of operation. This technology uses Coulombic forces between charged particles [10–12]. The essential component of QCA devices is a quantum cell with four quantum dots that can house two mobile electrons [13–16]. QCA creates a device based on quantum effects that conveys information from input to output by adjusting the charge configuration [17–20]. As a result, electron orientations show flow of data rather than current flow in CMOS circuitry. QCA is a way of encoding binary data on cells that do not have any current flowing through them and attaining device execution by linking those cells [21, 22]. Even if each system utilizes a different binary code for the same information, a code converter is a combinational circuit that must be introduced between the two systems for compatibility. Code converters’ primary purpose is to improve the efficacy of signal processing systems [1, 22–24]. QCAbased code converters is an essential component of a circuit for converting the data across formats [2, 3]. Gray code converters are used to improve switching activity and in the error detection process during information transfer over communication networks. This paper aims at implementing binary to Gray code conversion using CNOT and classical gates.
2 QCA Cell CMOS technology encodes binary data based on existing switching, whilst QCA technology encodes binary data based on individual electron position. A Quantum Dot (Q-dot) array is a cell array in which Q-dots, also known as sites, are placed at the square cell’s corners where the charge in the dots is concentrated. The cell is made up of two electrons that can move to and fro between the dots, Fig. 1. Since there is a potential barrier between the cells, the electrons cannot tunnel out of the cell. Thus, the two electrons find their stability at the diagonal corners due to Coulombic repulsion. The internal out-turn of the cell displays two possible arrangements, a binary state of ‘0’ or ‘1.’ Quantum cells are the building blocks of QCA topology. Information may be transferred between the cells, allowing the direct link of the devices to be replaced. The data (logic 0 or logic 1) may only flow from the input end to the output end of the QCA cell utilizing the force of attraction and repulsion.
2.1 Polarization The logic states or values are determined by the voltage levels in CMOS technology. But in QCA technology the logic state is determined by the position of electrons. (ref). Normally QCA technology operates according to the interaction between QCA
Implementation and Performance Evaluation of Binary to Gray Code Converter …
301
Fig. 1 Basic quantum cell
cells consisting of 4 quantum dots and two mobile electrons. These two electrons are the deciding factor for the logic state of the QCA cell. The coulombic repulsion, forces the electrons to occupy the diagonal ends of the QCA cell. Figure 2 shows the different polarized states of the QCA cells [25, 26]. The cell polarization is given by the following equation P = (ρ1 + ρ2 ) + (ρ3 + ρ4 )/ρ1 + ρ2 + ρ3 + ρ4 where ρ i electronic charge at dot i. Fig. 2 Cell polarization
(1)
302
U. Sharma et al.
3 Basic Elements for QCA Circuit Design Similar to cell polarization, the data exchange between the QCA cells is also based on coulombic interaction between them. This takes place so as to retain the electrons in their minimal energy state. Basic elements such as majority gate, binary wire, inverter gate and other gates are designed based on the same principle [27]. Each QCA Cell has the same size and measures as follows: 18 nm from end to end, the interspacing distance (the gap between two adjacent cells) is 2 nm, each quantum dot has a diameter of 2.5 nm and the design’s two layers are separated by 11.5 nm which is shown in Fig. 3.
3.1 QCA Wire The simplest element in QCA is the QCA binary wire is used to transmit data from one end to another end of the circuit. There are two types of QCA wire, one is 900 and 450 which is shown in Fig. 4. 900 QCA wire is created by linking the cells together. For every QCA cell linked in cascade, the QCA wire with an orientation of 450 can communicate the logic state that alternates between +1 and −1 polarization [23]. The simplest and most widely used QCA wire is the 900 wires. This type of wire can be constructed using horizontal or vertical array of QCA cells [26, 28, 29]. Passing the value across the plane through the wire without altering its value is a unique characteristic of QCA wires [30]. This property, however, is only applicable if the QCA wires are orientated differently. The charge configuration of the preceding neighbouring QCA cell will be followed by the QCA cells in a cascade (Fig. 5). The data flow from cell A to cell B is uninterrupted by another wire of 90° orientation to its respective clock zone. In any given QCA circuit all the cells of all
Fig. 3 Standard dimension of quantum cell
Implementation and Performance Evaluation of Binary to Gray Code Converter …
303
Fig. 4 QCA wire 90°, 45°
Fig. 5 90°, 45° orientation
the clock zones will undergo all the 4 phases of their respective clock zones [12, 29–31], because of its different orientation.
304
U. Sharma et al.
Fig. 6 Majority gate
Device Cell
4 Majority Gate The majority gate is a unique feature of QCA, shown in Fig. 6, which incorporates 4 terminal cells out of which 3 are input cells and 1 is output cell. Apart from these terminal cells there is one determining cell which determines the output. The output function for the given inputs A, B, C of a majority gate is given by: Y(A, B, C) = AB + BC + CA
(2)
This majority gate can also be used to design 2 input logic ‘AND’ and logic ‘OR’ gates [26, 29, 32, 33]. Figure 6 shows the majority gate structure with 3 inputs (A, B, C) and output (Q) and one device cell.
5 Clocking In case of CMOS technology external power is required to drive the circuit, in QCA clocking plays a very important role of driving the circuit without external power. Clocking has 4 phases namely, switch, hold, release and relax as in Fig. 6. There are 4 different clock zones available in QCA namely, Clock 0, Clock 1, Clock 2 and Clock 3 as in Fig. 7. All the clock zones have the same frequency with a phase shift of 90° between the adjacent clocks. If clock 0 is considered as reference, i.e. phase = 0, then, clock 1 will have phase = π/2, clock 2 will have phase = π and clock 3 will have phase = 3π/2.
Implementation and Performance Evaluation of Binary to Gray Code Converter …
305
Fig. 7 Clock states
For QCA, oscillatory switching is preferred over the immediate switching. In oscillatory, switching is done by regulating the inter-dot tunnelling barrier of the QCA cells. When an input signal is applied, the barrier potential decreases such that the cells start polarizing. The cells are afterwards induced to retain their polarization or ‘crystallized’ in their new states by raising the barrier. According to the oscillation theory, if the inter-dot barrier is progressively changed, the system always remains in the ground state and does not permanently remain in the excited state. From the null state when the clock enters the switch state, the cell is influenced by the state of its neighbour cells. After switching, in the hold state, the cell is independent of its neighbour cell, Fig. 8. Later, this polarized state is released to a neighbouring cell thus, reaching a state of low potential energy. A single QCA circuit may consist of one or more than one clock zones based on the circuit complexity where each signal is given.
Fig. 8 Clock zones
306
U. Sharma et al.
5.1 Latency Latency is defined as the time difference between the input provided by the user and the output obtained for a given circuit. In QCA there is a phase difference of 900 between two adjacent clock zones. Latency is usually measured in terms of clock cycles. For every clock zone used there is a 0.25 clock cycle delay. Therefore, if we are using all 4 clock zones then the resulting latency will be one, which means that the desired output is delayed by one clock cycle. Latency varies for different circuits based on the number of clock zones used in that particular circuit. The circuits with minimum latency are always preferred since there is minimal delay between the input and output signal. In some circuits the latency goes beyond 1, this is because after using all 4 clock zones, the iteration again starts from the first clock zone in the order of clock 0, clock 1, clock 2, clock 3, clock 0, clock1, ... [33].
6 Binary to Gray The Gray code, given by Frank Gray, is also known as reflected Binary code shown in Table 1. It is a binary number arrangement in which the two subsequent values differ only by one bit. For example, if a decimal number ‘1’ was to be represented in binary Table 1 Binary to gray code conversion Input
Output
A0
A1
A2
A3
G0
G1
G2
G3
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
1
1
0
0
1
1
0
0
1
0
0
1
0
0
0
1
1
0
0
1
0
1
0
1
1
1
0
1
1
0
0
1
0
1
0
1
1
1
0
1
0
0
1
0
0
0
1
1
0
0
1
0
0
1
1
1
0
1
1
0
1
0
1
1
1
1
1
0
1
1
1
1
1
0
1
1
0
0
1
0
1
0
1
1
0
1
1
0
1
1
1
1
1
0
1
0
0
1
1
1
1
1
1
0
0
0
Implementation and Performance Evaluation of Binary to Gray Code Converter …
307
code, it would be ‘001’ and similarly the number ‘2’ would be ‘010’. But in Gray code, the values would be ‘001’ and ‘011’, respectively. In this way, incrementing to next number would change only one bit instead of 2 bits [34].
6.1 Proposed Designs and Simulation (A)
Using Quantum Gate (CNOT)
A controlled NOT gate is a quantum gate which performs the NOT operation on the second qubit only when the first qubit is l1>, if the first qubit is l0> then the second qubit remains unchanged (Figs. 9, 10, 11 and Table 2). Here, l0> and l1> are quantum bits or qubits just as ‘0’ and ‘1’ are classical bits. Using this concept, the representation of binary to gray code conversion is shown in Fig. 12. The 4 inputs with different waveforms are given to the circuit where A1 and A2 share the input with two CNOT gates. We can observe from the output waveform (Fig. 13) that the results obtained are the same as the truth table (Table 3). (B)
Using Classical Gates (XOR)
XOR is a digital classical gate which attains the output as ‘1’ or ‘true’ only if one of the inputs is ‘true’ or ‘1’ (Figs. 14 and 15). CNOT and XOR gate are similar gates but the only difference is in number of outputs. Quantum gate works on principle or rather uses the concept of Fig. 9 CNOT gate representation
Fig. 10 Matrix representation of CNOT gate
308
U. Sharma et al.
Fig. 11 CNOT representation in QCA
Table 2 Truth table of CNOT gate
Input
Output
X
Y
X
X+Y
|0>
|0>
|0>
|0>
|0>
|1>
|0>
|1>
|1>
|0>
|1>
|1>
|1>
|1>
|1>
|0>
reversibility/reversible gates. Reversible gates are nothing but the gates with equal number of outputs as that of the inputs. XOR gate can be modified into reversible gate by just adding extra cells for another output (Fig. 16). Similar to the previous case, the 4 different waveform inputs are given to the circuit. The truth table is verified (Table 4) since the output waveforms (Fig. 17) obtained are as required. (C)
Using Irreversible Gates (AND, OR) Design 1
The basic gates such as ‘AND’ and ‘OR’ gates have similar representation in QCA. Many gates have further been formed using these. For example, XOR gate can also be rewritten as (Figs. 18, 19, 20, 21 and Table 5): XOR = A ⊕ B = AB| + A| B = A(AND)B| (OR)A| (AND)B
(2)
Implementation and Performance Evaluation of Binary to Gray Code Converter …
309
Fig. 12 Binary to gray CNOT representation
‘AND’ and ‘OR’ gates in QCA are drawn using majority gates. A majority gate is a logical gate which returns ‘1’ or ‘true’ only when the majority of inputs are ‘1’. Here, 3 input majority gate has been used with different polarization (Fig. 22). In this case, the input waveforms are given to AND gates and the results (Fig. 23) obtained by OR gate is similar to that of the truth table (Table 6). Thus, truth table is verified. (D)
Using Irreversible gates (AND, OR) Design 2
In this competitive world, anything and everything needs a change after certain period of time according to the needs. As time passes, the requirements for a particular application change. Similarly, in QCA, more preference is given to those designs which are more robust, efficient and consumes less space (Fig. 24).
310
U. Sharma et al.
Fig. 13 Simulation result for CNOT gate Table 3 Truth Table of XOR gate
Fig. 14 XOR gate representation
A
B
Y
0
0
0
0
1
1
1
0
1
1
1
0
Implementation and Performance Evaluation of Binary to Gray Code Converter …
311
Fig. 15 XOR representation in QCA
Similar to the previous case, the input waveforms are given to the AND gates but it does not follow one directional flow of input rather it follows dual/multi directional flow of inputs. This duality results in proper output waveforms (Fig. 25) with less disturbances. Since the output waveform is the same as all previous cases, we can say that the truth table (Tables 5 and 6) is verified.
7 Results The simulation results show that the design has the same outputs which are verified using the truth table. There are many ways to design a circuit but what’s important is its compatibility and whether its user friendly or not. This project compares different
312
U. Sharma et al.
Fig. 16 Binary to gray XOR representation
Table 4 Truth table of AND gate
A
B
Y
0
0
0
0
1
0
1
0
0
1
1
1
circuits used to convert the 4-bit code from binary to gray. Here, we have compared 4 circuits for 4-bit binary conversions. As mentioned earlier, the circuit is said to be an efficient circuit when it has less power consumption, less area occupied and less number of cells used. There are certain exemptions to it. Let us consider the XOR and CNOT circuit of binary to gray. Though the number of cells and area occupied is less in XOR gate, it is not efficient over CNOT. This is because the CNOT gate is an extension of XOR gate. The basic structure of a CNOT gate has more cells than an XOR gate in QCA. In future, more emphasis will be on 8-bit code and more with high packaging density, lower area and much lower power dissipation.
Implementation and Performance Evaluation of Binary to Gray Code Converter …
Fig. 17 Simulation result for XOR gate
Fig. 18 AND gate representation
313
314 Fig. 19 AND gate representation in QCA
Fig. 20 OR gate representation
Fig. 21 OR gate representation in QCA
U. Sharma et al.
Implementation and Performance Evaluation of Binary to Gray Code Converter …
315
Table 5 Truth table of OR gate A
B
Y
0
0
0
0
1
1
1
0
1
1
1
1
Fig. 22 Binary to gray representation for Design 1
8 Conclusion This paper presents four potential circuits for Binary to Gray code conversion in the QCA architecture. It shows development and simulation of a QCA Binary to Gray code converter circuit. QCA simulation designer is used to investigate the functioning of these converters. The ‘QCA-based design technique’ broadens the possibilities for digital circuit designs with minuscule dimensions. The proposed new Binary to Gray converter, compared to previous designs has less number of cells, area and latency. This may be employed in digital communication systems as well as circuits that
316
U. Sharma et al.
Fig. 23 Simulation result for Design 1
convert higher bit codes. The circuit compares various designs in terms of cell size, total area, latency and complexity, and require fewer clock phases, cells and shorter wire lengths. Further power analysis can be carried out using QCA Pro tool and can be extended for 8 bit converter.
No of cells
No of I/p cells
44
4
34
4
356
4
AND, OR
90
4
Binary to gray (classical gate 3)
AND, OR
Binary to gray (classical gate 2)
XOR
Binary to gray (classical gate 1)
CNOT
Binary to gray (quantum gate)
Name of the gate
Table 6 Comparison analysis
7
9
3
3
No of device cell
4
4
4
4
No of output cell
0
132
0
0
No of rotated cell
0.08 um2
0.57 um2
0.05 um2
0.08 um2
Area
0.75
2
0.5
0.5
Latency
1s
1s
1s
1s
Time
Clk 0, Clk 1, Clk 2
Clk 0, Clk 1, Clk 2, Clk 3
Clk 0, Clk 1
Clk 0, Clk 1
Clock zone
_____
3
Quantum cost (reversible gate)
Implementation and Performance Evaluation of Binary to Gray Code Converter … 317
318
Fig. 24 Binary to gray representation for Design 2
Fig. 25 Simulation result for Design 2
U. Sharma et al.
Implementation and Performance Evaluation of Binary to Gray Code Converter …
319
Fig. 25 (continued)
References 1. Misra NK, Wairya S, Singh VK (2015, 2016) Optimized approach for reversible code converters using quantum dot cellular automata. In: Proceedings of the 4th international conference on frontiers in intelligent computing: theory and applications (FICTA). Springer, pp 367–378 2. Chakrabarty R, Mukherjee P, Acharjee R, Kumar R, Saha A, Kar N (2016) Reliability analysis of a noiseless code converter using quantum dot cellular automata. In: 2016 IEEE 7th annual information technology, electronics and mobile communication conference (IEMCON). IEEE, pp 1–8 3. Porod W (1997) Quantum-dot devices and quantum-dot cellular automata. J Franklin Inst 334(5–6):1147–1175 4. Seyedi S, Navimipour NJ (2018) Design and evaluation of a new structure for fault-tolerance full-adder based on quantum-dot cellular automata. Nano Commun Netw 16:1–9 5. Patidar M, Gupta N (2019) Efficient design and simulation of novel exclusive-OR Gate based on nanoelectronics using quantum-dot cellular automata. In: Proceeding of the second international conference on microelectronics, computing & communication systems (MCCS 2017). Springer, Berlin, pp 599–614 6. Kummamuru RK, Orlov AO, Ramasubramaniam R, Lent CS, Bernstein GH, Snider GL (2003) Operation of a quantum-dot cellular automata (QCA) shift register and analysis of errors. IEEE Trans Electron Dev 50(9):1906–1913 (Return to ref 6 in article) 7. Chaves JF, Ribeiro MA, Silva LM, de Assis LM, Torres MS, Neto OPV (2018) Energy efficient QCA circuits design: simulating and analyzing partially reversible pipelines. J Comput Electron 17(1):479–489 8. Mohaghegh SM, Sabbaghi-Nadooshan R, Mohammadi M (2018) Designing ternary quantumdot cellular automata logic circuits based upon an alternative model. Comput Electr Eng 71:43– 59 9. Sherizadeh R, Navimipour NJ (2018) Designing a 2–4 decoder on nanoscale based on quantumdot cellular automata for energy dissipation improving. Optik-Int J Light lectron Opt 158:477– 489 10. Shu XB, Li LN, Ren MM, Mohammed BO (2021) A new binary to gray code converter based on quantum-dot cellular automata nanotechnology. Photon Netw Commun 41(1):102–108 11. Mehta U, Dhare V (2017) Quantum-dot cellular automata (QCA): a survey. arXiv preprint arXiv:1711.08153
320
U. Sharma et al.
12. Banerjee A, Mahato DK, Choudhuri S, Dey M, Chakraborty R (2018) Performance evaluation of controlled inverter using quantum dot cellular automata (QCA). Perform Eval 5(02) 13. Fam SR, Navimipour NJ (2019) Design of a loop-based random access memory based on the nanoscale quantum dot cellular automata. Photon Netw Commun 37(1):120–130 14. Gadim MR, Navimipour NJ (2018) A new three-level fault tolerance arithmetic and logic unit based on quantum dot cellular automata. Microsyst Technol 24:1–11 15. Mukherjee C, Panda S, Mukhopadhyay AK, Maji B (2018) QCA gray code converter circuits using LTEx methodology. Int J Theor Phys 57(7):2068–2092 (Return to ref 12 in article MathSciNet) 16. Heikalabad SR, Kamrani H (2019) Design and implementation of circuit-switched network based on nanoscale quantum-dot cellular automata. Photon Netw Commun 38(3):356–377 17. Chakraborty R, Banerjee A, Mahato DK, Choudhuri S, Mandal N (2018) Design of binary to gray code converter for error correction in communication systems using layered quantum dot cellular automata: In: 2018 2nd international conference on electronics, materials engineering & nano-technology (IEMENTech), IEEE, pp 1–7 18. Afrooz S, Navimipour NJ (2017) Memory designing using quantum-dot cellular automata: systematic literature review, classification and current trends. J Circ Syst Comput 26:1730004 19. Moharrami E, Navimipour NJ (2018) Designing nanoscale counter using reversible gate based on quantum-dot cellular automata. Int J Theor Phys 57(4):1060–1081 (MathSciNet) 20. Abedi D, Jaberipur G (2018) Decimal full adders specially designed for quantum-dot cellular automata. IEEE Trans Circ Syst II Express Br 65(1):106–110 21. Lent CS, Douglas Tougaw P (1993) . Lines of interacting quantum-dot cells: a binary wire. J Appl Phys 74(10):6227–6233 22. Lent CS, Douglas Tougaw P, Porod W, Bernstein GH (1993) Quantum cellular automata. Nanotechnology 4(1):49–57 23. Taray AS, Singh SK, Hazra P (2021) Design and optimization of reversible binary to. Gray and gray to binary code converter with. Power dissipation analysis using QCA. Int J Eng Res Comput Sci Eng 8(6) 24. Ravindran RE, Santhosh C, Surya Teja S, Arun Sujash S, Vijay M, Umesh K (2020) Design of reversible and non-reversible binary to gray and gray to binary converter using quantum Dot cellular automata. Int J 9(3) 25. Karkaj ET, Heikalabad SR (2017) Binary to gray and gray to binary converter in quantum-dot cellular automata. Optik 130:981–989 26. Majeed AH (2017) A novel design binary to Gray converter with QCA nanotechnology. Int J Adv Eng Res Dev 4(9) 27. Chakrabarty R, Roy S, Pathak T, Ghosh D, Mandal NK (2021) Design of 2: 4 and 3: 8 decoder circuit using QCA technology. Hanocictemy: fizika, ximi, matematika 12(4):442–452 28. Tripathi D, Wairya S (2021) A cost efficient QCA code converters for nano communication applications. Int J Comput Digit Syst 29. Sridharan K, Pudi V (2015) QCA terminology. In: Design of arithmetic circuits in quantum dot cellular automata nanotechnology. Springer, Cham, pp 11–17 30. Liu W, Lu L, O’Neill M, Swartzlander EE (2014) A first step toward cost functions for quantumdot cellular automata designs. IEEE Trans Nanotechnol 13(3):476–487 31. Guleria N (2017) Binary to gray code converter implementation using QCA. In: 2017 3rd international conference on advances in computing, communication & automation (ICACCA)(Fall). IEEE, pp 1–6 32. Chakrabarty R, Saha A, Adhikary S, Das S, Tarafder J, Das T, Dey S (2016) Comparative analysis of code converter using quantum dot cellular automata (QCA). In: 2016 IEEE 7th annual information technology, electronics and mobile communication conference (IEMCON). IEEE, pp 1–6 33. Beiki Z, Shahidinejad A (2014) An introduction to quantum cellular automata technology and its defects. Rev Theor Sci 2(4):334–342 34. Bhamra KS, Joshi G, Kumar N (2021) An efficient design of binary to gray code binary converter using QCA. In: IOP conference series: materials science and engineering, vol 1033, No. 1. IOP Publishing, p 012014
Implementation and Performance Evaluation of Binary to Gray Code Converter …
321
35. Purkayastha T, Chattopadhyay T, De D, Mahata A (2015) Realization of data flow in QCA tile structure circuit by potential energy calculation. Proc Mater Sci 10:353–360 36. Ahmed S, Naz SF, Sharma S, Ko SB (2021) Design of quantum-dot cellular automata-based communication system using modular N-bit binary to gray and gray to binary converters. Int J Commun Syst 34(4):e4702 37. Laajimi R, Niu M (2018) Nanoarchitecture of quantum-dot cellular automata (QCA) using small area for digital circuits. In: Advanced electronics circuits–principles, architectures and applications on emerging technologies, pp 67–84
QoS-Based Classical Trust Management System for the Evaluation of the Trustworthiness of a Cloud Resource P. Kumar, S. Vinodh Kumar, and L. Priya
Abstract Distributed computing innovation has seen wonderful improvement in different spaces during the most recent twenty years. Among many components, protection, security, and trust are the issues of most significant concern. Trust assumes a pivotal part in cloud computing to offer dependable administration to the cloud clients. It is the fundamental justification behind the notoriety of cloud services among clients. Giving trust between the cloud specialist co-op and the cloud customer is a fundamental essential. Executive trust is commonly used in Internet-based administrations, E-business, and informal communities. This paper predominantly centers around the different methodologies in the trust assessment model for administrative choice in cloud computing. The Classical Trust Management System is intended for choosing the most reliable assets dependent on the QoS (Quality of Service) parameters, specifically, accessibility, achievement rate, execution time, turnaround productivity, and universal response. The universal response is the aggregate response from the clients, representatives, companions, and interpersonal organization. The proposed work beats the other existing models, for example, the FIFO and QoS models. Keywords Cloud computing · Trust management · QoS parameters · Accessibility · Achievement rate · Universal response
1 Introduction Cloud computing is a popular computing paradigm across several computing environments. The fundamental elements of distributed computing, which make it unmistakable from different types of conventional computing, are as follows: On-Demand Self-Service, Broad Network Access, Resource Pooling, Rapid Elasticity, Metered Service, Availability, and Economical. You pay more only as costs arise. Everything
P. Kumar (B) · S. Vinodh Kumar · L. Priya Rajalakshmi Engineering College, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_22
323
324
P. Kumar et al.
is provided as a service in distributed computing, but administration models are classified into three types: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). There are four types of sending models, specifically public cloud, private cloud, local area cloud, and half-and-half cloud. There are five entertainers engaged in a distributed computing climate. They are a cloud specialist organization, cloud client, examiner, transporter, and cloud agent. The cloud customer is the essential partner in a cloud climate. A client represents an individual or society that deals with a business relationship with the cloud specialist organization. They peruse the help catalog from a cloud supplier, demand appropriate assistance, and set assistance concurrences with the specialist co-op. The cloud client is charged for the help offered and mentioned to pay the costs in a similar manner. Cloud shoppers need SLA’s (Service-Level Agreements) to express the specialized prerequisites to be fulfilled by a specialist organization. A cloud purchaser can undoubtedly choose a cloud supplier with better expenses and extra positive terms. Cloud buyers can be sorted into SaaS clients, PaaS clients, and IaaS clients. A “cloud specialist co-op” is an individual or an association who is liable for developing an administration made accessible for the concerned gatherings. It secures and achieves the cloud framework required for administration delivery. Cloud specialist organizations are arranged into SaaS, PaaS, and IaaS specialist organizations. The obligations of the cloud specialist organizations are service deployment, service orchestration, security, privacy, cloud auditor, cloud broker, and cloud carrier.
1.1 Trust Management in Cloud Environment When a client needs any type of assistance, first they need to send a solicitation to the specialist co-op. The solicitation might be handled by various specialist coops. In a private situation, trust is the most imperative part between the supplier and the client. The confirmation level of the cloud specialist co-op alone does not satisfy the buyers since they are searching for other QoS boundaries. Distributed computing in different fields has made tremendous headway. Despite the fact that it has become as essential to our lives as water and electricity, protection, security, and trust remain major concerns. In order to provide solid administration to cloud customers, trust is essential in a cloud environment. Assuming that the cloud customer will place their trust in cloud suppliers and capable individuals when it comes to checking, surveying, or approving cloud credits, to answer all the above mentioned, trust ought to be taken on in a cloudy climate. Trust the board is familiar with overcoming the following issues: bringing together trust, using one’s unbending nature to help complex trust connections in huge organizations, and various types of strategy dialects that are used for setting approval, governing, and implementing security arrangements. Trust the board is frequently used in web-based administrations, e-commerce, and informal organizations. Trust is extensively characterized into two kinds, to be specific, trustors’ anticipation and experience. In view of the trustor’s hope, it is additionally separated as far as execution and conviction of trust.
QoS-Based Classical Trust Management System for the Evaluation …
325
Confidence in execution is the trust regarding what the trustee executes. Confidence in conviction is a trust concerning what the trustee accepts. A confidence in execution can be meant by trust perform (t, e, p, c) which addresses the trustor t, trust trustee e concerning e’s presentation p in the setting c. Assuming p is made by e in setting c, t confides in p in that unique circumstance. A confidence in conviction can be addressed by trust b (t, e, b, c), which signifies the trustor’s t trust in the trustee e concerning the e’s conviction of b in the setting of c. Assuming e confides in b in setting c, t likewise confides in that context [1]. Various attributes, specifically security, openness, and consistency, have been considered for determining the nature of cloud suppliers [2]. Trust is an assortment of various properties, specifically unwavering quality, honesty, trustworthiness, security, steadfastness, openness, QoS, skill, practicality, and return on investment (ROI). The trust model is based on the accompanying properties, to be specific, unevenness, reflexivity, setting reliance, adaptability, incomplete transitivity, abstraction, vulnerability, space-based, and time-based [3]. There are different ways to deal with the building up of trust between the client and the cloud specialist co-op. They are arranged as administration level understanding (SLA), reviews, estimating and appraisals, and self-evaluation surveys. The absence of a standard approach to assisting purchasers in their selection of dependable specialist cooperatives is noticeable. To overcome these issues, trust and notoriety models have been utilized [4]. There are eight boundary limits of trust, to be specific, straightforwardness, SLA, strategy consistency, security and protection, versatility, execution, validation, access control, and client service [5]. The trust the executive’s framework ought to acquire a system to accumulate different properties, paying little mind to different assessment methodologies used to assess the emotional trust boundaries, for example, proposals from different shoppers or objective trust properties, in particular talented appraisals or continuous estimations of opposition and reaction time [2]. The assessment cycle of framework trust is called trust demonstrating. There are a few trust techniques, in particular, SLA confirmation-based trust, notorietybased trust, Trust as a Service (TaaS), and cloud straightforwardness trust. In a cloud straightforwardness trust, the supplier gives self-evaluation in either a Consensus Assessments Initiative Questionnaire (CAIQ) or a Cloud control framework. The impediment to this model is an unscrupulous supplier who can change the information. The TaaS model presents outsider experts. Cloud trust authority provides a performance end for monitoring cloud administration security from various suppliers. Impediment exists in the arrangement of the trust connection between the clients and trust representatives [1]. Trust the board as for two perspectives, for example, a specialist organization and an administration requester, has also been identified. The viewpoint of the specialist co-op, the specialist organization makes an evaluation of the reliability of the assistance requester. In SRP, the service requester makes an evaluation of the reliability of the supplier. An assessment metric for trust models depends on security, information control boundaries, and the QoS ascribed. The accompanying standards have been taken, specifically information respectability, QoS ascribes, information control and proprietorship, process execution control, recognition of untrusted substances, dynamic certainty update and logging, and
326
P. Kumar et al.
model intricacy [6]. A point-by-point survey has been conducted on the different classifications of trust models proposed by many creators. The trust models can be ordered into three kinds: SLA-based, suggestion-based, and notoriety-based trust models. SLA-Based Trust Models are made based on agreements and arrangements between the cloud specialist organization and the cloud administration client. The most frequently utilized agreements are SLAs and administration strategy reports. They contain a few security archives and QoS boundaries to build up trust between the two gatherings. The SLA check puts together a trust model based on approaches. In this trust model, cloud clients want to approve and rethink the trust esteem in the wake of making an underlying trust [1]. The SLA-based trust model is utilized for tracking down a dependable specialist organization to overcome convoluted and classified business applications. The SLA system and the trust model have been incorporated to give another procedure for choosing a reliable specialist co-op [7]. The SLA-based and trust strategies have been utilized to offer a reliable plan to pick the top cloud specialist organization among a few suppliers to accomplish the useful and nonfunctional prerequisites. The Trust Mining Model (TMM) has been utilized for perceiving reliable cloud services. This model backs both the cloud supplier and the administration purchaser, in which the client can choose to drag out or suspend the administrations with the specialist organization. It utilizes harsh sets and Bayesian induction to compute the general trust esteem. The TMM model contains three modules: trust administrator, SLA director, and cloud execution screen. The SLA director is responsible for arranging the arrangement between the specialist co-op and the cloud customer. He speaks with the trust administrator and updates the trust rate in the agreement before the entire understanding plan is finished. Existing trust models are utilized in appropriate and lattice figuring to take the whole reaction to the assistance given by the buyers. There might be some vindictive clients who might give a negative reaction to help on purpose, and this might prompt an off-base assessment of the assistance among the purchasers. Existing instruments just use reaction estimates and compute the dependability of a help as opposed to checking the reaction given by the client for unwavering quality, unprejudiced and reliable components. A trust board system dependent on standing is utilized to convey the trust as a help. In this system, another convention is utilized to exhibit the dependability of trust reactions and hold the client’s security. Dependability and versatile models are utilized to ascertain the reliability of reactions, which keeps the administrations from underhanded clients and recognizes the genuineness of cloud administrations. At long last, a model has been created to achieve the availability of the trust board administers [8]. A structure for building trust has been created in an assistance-arranged climate. The accompanying standing assessment measurements are specific rather than believability, larger part appraising, past rating history, individual encounters for validity assessment, individual encounters for notoriety evaluation, and fleeting affectability [9]. There are certain impediments found in the trust models. In the SLA-based model, security and protection are not considered, and clients do not evaluate all alone. They need the assistance of an outsider, either an intermediary or a cloud trust authority. In
QoS-Based Classical Trust Management System for the Evaluation …
327
a suggested trust model, the absence of a normalized cycle means that in a standingbased trust model, notoriety is more helpful for choosing the administrations in the early phase but not at a later stage. The complexity is high because a large number of buyers are required to rate administrations. Notoriety and suggested trust models can be combined to defeat those issues and work on proficiency. Purchasers are stressed over their information and look for a high certainty level, despite the fact that a help or supplier has a higher trust level. The absence of an effective and dependable trust assessment framework is as yet the most pressing issue.
2 Related Work The majority of experts have identified various trust models for determining the dependability of a service. And, at the end of the day, there are some key questions to consider when assessing the administrations’ trustworthiness. Wang et al. [10] have examined the elements influencing trust in the board in the web administration climate. The creators have proposed a trust model to examine the abstract frailty of trust factors. A period-related cloud age methodology is utilized to accomplish the dynamism of trust. The trust level of the specialist co-op is registered by utilizing the trust model and a calculation. Manuel et al. [11] projected another trust model for the assessment of cloud assets dependent on capacity, personality, and conduct-based trust esteems. The creators have utilized QoS boundaries, for example, network transfer speed, inactivity, processor speed, and RAM speed, for surveying the assets. Abawajy [12] has set up an appropriate structure that gives trust-based cloud clients and cloud specialist co-op. The model recognizes debased response examinations and gives a particular and unprejudiced calculation of the help reputations. Habib et al. [2] arranged the intricate trust model to help the cloud purchasers pick a reliable expert association by thinking about the limits, specifically security, execution, and consistency. Goyal et al. [13] have introduced a trust model that assesses trustworthiness dependent on the QoS boundaries given by the datacenter. The creators thought about introductory time, handling speed, value, issue rate, and transmission capacity to work out trust worth, and planning was done dependent on the trust esteem by the data center. Trusted assets were allocated to the cloud customer with the highest trust esteem, while dishonest assets were allocated to an untrustworthy cloud shopper. Chard et al. [14] and Chard et al. [15] have called attention to the trust setup between the companions in an informal organization to empower them to share the assets in the interpersonal organization. The creator’s consolidated trust associations with appropriate trusts that offer impressive asset sharing. Garg et al. [16] developed a structure for evaluating the nature of assistance and focused on it based on the nature of the assistance. This structure fosters competition among cloud specialist organizations to meet SLAs and improve the quality of their administrations. Xin et al. [17] have proposed a calculation for concluding unique trust by utilizing trust chains and correspondence among the clients. The informal
328
P. Kumar et al.
organization’s trust esteem is powerfully refreshed by the association between the client and his companions. Marudhadevi et al. [18] have planned a trust mining model to discover reliable cloud administration. This model is beneficial for both the cloud specialist co-op and the client because they get reliable assistance. Bayesian deduction and harsh set ideas are used to calculate the resulting trust as an incentive for administrators. Naseer et al. [19] have fostered a trust model by considering qualities like down time, adaptation to internal failure capacity, up time, dormancy, and client assistance. In light of the cloud buyers’ necessities, they were given a decision of the most dependable specialist co-ops that have the most elevated trust esteem. Sidhu and Singh [20] framed a trust model that decides the reliability of a cloud supplier based on the consistency of QoS boundaries, which are ensured in SLA. Rizvi et al. [21] have executed an incorporated objective trust model that positions the cloud specialist organization dependent on trust esteem. From the outsider’s evaluation and the cloud buyer’s reaction, a definitive trust worth of all the partaking cloud specialist organizations was determined. Manuel [22] has introduced an exceptional trust model setup before approval and current capabilities of a cloud specialist co-op. The creator has estimated the trust esteem based on the boundaries; to be specific, dependability, turnaround usefulness, availability, and information trustworthiness; and furthermore, offered the trust executives fraasmework to carry out the trust model. Gholami and Arani [23] have implemented a trust model that relies upon Quality of Service (QoS) measurements, in particular, dependability, openness, turnaround usefulness, and run speed esteem. Wang et al. [24] have fostered a publicly supporting model wherein Social Cloud is utilized for crossing over the clients and detecting objects and furthermore carry on like a cloud specialist organization to offer register and capacity administrations. Detecting objects’ land positions from a cloud supplier and giving the reaction. A standing-based calculation is joined with publicly supporting to see as the boss and cost assurance by ascertaining the dependability of publicly supporting members. Varalakshmi and Judgi [25] have proposed a structure for distinguishing reliable specialist organizations based on reaction from different sources. The creators have demonstrated that the proposed model is effective in a synergistic cloud climate. Tang et al. [26] have planned a system for dependable assistance choice in a cloud climate. They have proposed unbiased and subject trust measures dependent on QoS checking and cloud customer reaction and consolidated both the actions. They have demonstrated that the presentation of the model is further developed when abstract and objective measures are coordinated. With time data, Bhalaji [27] developed a deep learning strategy to forecast workload and resource allocation. By using a logarithmic operation, the standard deviation is minimized, and then sophisticated filters are used to remove the extreme points and noise interference. This technique forecasts workload and resource sequence accurately, as well as time series. The collected data is then standardized using a Min–Max scalar, and the network’s quality is retained by using a network model.
QoS-Based Classical Trust Management System for the Evaluation …
329
The QoS boundaries have been thought to ascertain the trustworthiness of any asset. Either notoriety or QoS boundaries have been utilized for assessing the trustworthiness of a cloud asset. A QoS-based Classical Trust Management System is proposed for the evaluation of the trustworthiness of a cloud resource. Trust and incentive for the asset is determined by thinking about the QoS boundaries, specifically availability, accomplishment rate, turnaround efficiency, pace of creation, and the general reaction [28]. A chi-square test is completed to track down the onesided or fair-minded client reaction. The proposed work is contrasted with other trust models dependent on the QoS boundaries.
3 Proposed Work Trust assumes an indispensable part in overseeing and guaranteeing the certainty between the cloud client and the supplier. The one who brings notoriety to the administrations used in a cloud climate is trust. Cloud buyers need to store the data and access the organizations cloud services from a trustworthy expert community. Similarly, a cloud provider should provide organizations with a solid customer base [11]. A Service-Level Agreement, likewise, assumes a fundamental part in trusting the executives. It is an agreement between customers and cloud specialist businesses. A SLA should assume the roles and responsibilities of both a cloud client and a cloud provider, charging for the administrations, quality, and execution requirements. It should provide the tools for dealing with any assistance issues that arise within a specified time frame. The cloud buyer needs to store the information safely. Yet at the same time, there are numerous security difficulties, like the absence of confidence in the cloud suppliers and putting away the information in a scrambled organization. To make distributed computing fruitful, cloud buyers should believe the supplier and the specialist co-op offer the administrative support dependent on the SLA and assurance for information security. When an affiliation needs to accumulate the data in the cloud, then it should review the trustworthiness of the expert association. Different methodologies and frameworks have proposed a trust model which is used to enroll the trust worth of the resources. Trust models are used to assess the dependability of SaaS, PaaS, IaaS, and XaaS resources introduced by providers in the cloud environment and grid association [29]. In confiding in the board, the three trust models are explicit, standing-based model, Service-Level Agreement (SLA)-based model, and proposition-based model. A trust model contains three stages, to be specific, namely trust foundation, trust update, and trust summon. Trusts are classified into direct trusts and proposal trusts. Direct trust can be characterized by the client’s own insight, and proposal trust is set up between two gatherings through an outsider’s suggestion. Trust instruments dependent on Quality of Service (QoS) are embraced by famous specialist co-ops like Google, Amazon, and Azure for their business. The trust qualities of cloud specialist co-ops are breaking down, dependent on arrangements, proposals, notorieties, and forecasts. Finally, scientific construction is proposed with
330
P. Kumar et al.
a few standards to assess the trust in the board frameworks. Trust estimation for an asset depends on prior qualifications of a cloud asset and the current capacity of an asset [30]. The QoS factors influencing the trust worth of an asset are past qualifications, which incorporate past status and administration narratives of an asset, specifically availability, turnaround time, achievement rate, and current capacities, which incorporate registering power, in particular normal rate of production, RAM size, and processor speed, and organizing strength, which is characterized by data transmission and inertness.
3.1 The Classical Trust Management System The Classical Trust Management System is shown in Fig. 1. It represents various modules, to be specific, System Admin, Trust Admin, SLA Supervisor, and trust file, where every module is interconnected to offer reliable support to the shoppers. The functionalities of every single module are portrayed underneath.
Classical Trust Management System
QoS Parameters
Trust File
Universal Response
Trust Admin
Cloud Consumers System Admin
Trust Assessment
SLA Supervisor
Cloud Source Discovery Cloud Services
Cloud Services
SaaS PaaS IaaS
SaaS PaaS IaaS
Resource Provisioning VM1
Schedule Admin
VM2
Resource Metering VMn
Physical Resources Cloud Service Provider 1
Fig. 1 Classical trust management system
Resource Provisioning VM1
Schedule Admin
VM2
Physical Resources Cloud Service Provider 2
Resource Metering VMn
QoS-Based Classical Trust Management System for the Evaluation …
3.1.1
331
Cloud Consumer
Cloud Consumer is mentioning the trustworthy services of the cloud specialist co-op. Through the interface, cloud purchasers can communicate with the System Admin and gain access to cloud administration information.
3.1.2
Cloud Service Provider
CSP identifies the interfaces for cloud consumers to get to the different sorts of administrations, specifically SaaS, PaaS, IaaS, and XaaS administrations. In asset reflection motors, CSP utilizes programming parts, specifically hypervisors and virtual machines, to oversee and get to the physical assets by means of programming communications. Physical assets incorporate all equipment assets, specifically memory, a focal handling unit, and hard plate. The asset reflection motors portray the virtual cloud assets on top of physical assets and keep up with the help motor layer in which the cloud administration interfaces are addressed to the consumer, since they cannot get to the physical assets straightforwardly.
3.1.3
Cloud Service Discovery
Cloud service discovery gives a rundown of cloud assets presented by the different cloud specialist co-ops. It is a centralized repository for storing a list of administrations and which suppliers provide the administrations. It takes after the web administration part, specifically, Universal Description, Discovery, and Integration.
3.1.4
System Admin
The System Admin assumes a crucial part in the trust overseeing framework as it is associated with different parts of the framework. System Admin communicates with asset provisioning, which oversees the entire VM creation, arrangement, and sending in a foundation.
3.1.5
SLA Supervisor
The SLA Supervisor is accountable for compromising and haggling with the client in regards to the QoS prerequisites. It goes about as an arbiter between the Trust Admin and the System Admin. They get the trust worth of the cloud asset through the Trust Admin and support the System Admin in choosing the cloud asset based on the trust esteem.
332
3.1.6
P. Kumar et al.
Trust Assessment
Trust Assessment communicates with the Trust File to manage and maintain the trustworthiness of a cloud resource enlisted by trust assessment.
3.1.7
Trust File
A Trust File is an archive for putting away the cloud assets’ trust esteems. Trust assessment ascertains the trust worth of an asset dependent on the QoS parameters and universal response. The work process of the proposed work is given beneath. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
Conspicuous verification of cloud resources from an organization vault is reliant upon QoS parameters. Source sends QoS parameters and resources to the chief. Affirmation of resource accessibility in the help library. The Boss sends QoS parameters and resources to the SLA Supervisor. The Trust Admin regards the SLA Supervisor as trustworthy. Sort the resources according to the trust’s regard. The SLA Supervisor haggles with the cloud consumer through System Admin. An SLA is arranged depending on the client’s inclination and sent to the manager. A timetable consultant plans the asset. The Provisioning administration gives the mentioned asset and conveys the data to the cloud consumer. The System Admin sends the trust boundaries As, Ar, Tp, and Rp to SLA Supervisor. The SLA Supervisor sends trust credits to the Trust Admin. Trust Admin refreshes the trust esteem in the file. Cloud consumers benefit from the handling of information. The client tests and surveys the dependable assistance and gives their reaction to the service. The System Admin refreshes the value of the reaction with the Trust Admin, thus sending it to the trust assessment. Recognizable proof of one-sided or unprejudiced reaction. Trust esteem is refreshed and sent back to the Trust Admin, so it will be refreshed in the Trust File.
3.2 Trust Assessment The cloud assets have been picked dependent on cloud specialist organizations’ trust esteem. The trustworthiness of an asset is determined by the QoS parameters in the proposed classical trust management system. Trust esteem is processed depending
QoS-Based Classical Trust Management System for the Evaluation …
333
on five boundaries, in particular, accessibility, achievement rate, turnaround productivity, rate of production, and universal response. The universal response is taken from consumers and cloud agents. The trust assessment (TV) of an asset is given in Eq. 3.1. TV = l1 ∗ As + l2 ∗ Ar + l3 ∗ Tp + l4 ∗ Rp + l5 ∗ Ur 5
Li = 2
(3.1)
(3.2)
i=1
Here, As, Ar, Tp, Rp, and Ur represent accessibility, achievement rate, turnaround productivity, rate of production, and universal response, respectively. The loads for As, Ar, Tp, Rp, and Ur are l1, l2, l3, l4, and l5. The sum of the loads should be 2, which is shown in Eq. 3.2. The values for l1, l2, l3, l4, and l5 are 0.4, 0.2, 0.2, 0.2, and 1.0. Since high priority is given to response, it takes a higher load. Utilizing a numerical equation, the following QoS parameters have been determined: accessibility, achievement rate, turnaround productivity, and rate of production. The chi-square test is completed to check the one-sided or fair-minded reaction by realizing the reaction received from the cloud consumer. The reliability of the CSP is assessed utilizing universal response and QoS parameters. The methodology is given as follows. 1. 2. 3. 4. 5. 6. 7. 8.
Process the QoS parameters, specifically, achievement rate, accessibility, turnaround productivity, and rate of production. The consumer’s reaction is considered after finishing each service. Track down the certifiable consumers’ reaction by utilizing chi-square factual methodology. Figure out the normal, impartial consumer reaction. Ascertain the outsider’s reaction by taking cues from companions and informal organizations. Figure out the normal cloud agent’s reaction. At long last, compute the universal response. Utilizing universal response and QoS parameters, the trust worth of CSP is assessed.
The accompanying section talks about the universal response and QoS parameters utilized in the trust assessment of CSP.
3.2.1
Universal Response
A universal response is assessed by combining the reactions of the consumer, outsiders, and cloud agents. The companion’s reaction and the reaction from interpersonal organization are utilized for assessing the outsider’s reaction. The universal response is assessed by utilizing the numerical formula given in Eq. 3.3.
334
P. Kumar et al.
Ur =
3
liuri
(3.3)
i=1
Here, ur1, ur2, and ur3 are the cumulative unbiased consumer’s response, the third party’s response, and the cloud broker’s response, respectively. The summation of loads ( li) should be 2. The loads have been assigned based on the majority of responses.
Cumulative Unbiased Consumers Response Cloud Consumers Response on an asset is characterized as the normal unbiased response from all consumers. Having distinguished the impartial component of the Consumers Response, compute the normal of unbiased response for an asset which is referenced in Eq. 3.4 and store it in the Trust File. Here, Ri is unbiased Consumers Response and the total number of consumers is denoted by n. Ur1 =
n
Ri n
(3.4)
i=1
Genuine Consumers Response In this paper, the chi-square test is utilized for recognizing biased or unbiased responses. Consumer’s response is considered as the noticed reaction. Expected reaction appraisals are taken from the Epinion dataset that contains evaluations given by the clients to things. The steps for identifying the biased or unbiased response is given in algorithm 1. Algorithm 1: Identifying Biased or Unbiased Response Input: Consumer Response Output: Biased or Unbiased Response i. Cloud consumer response rating is UR = {U ur0, U ur1, U ur2, ….U urn} ii. Cloud expected response rankings ER = {EU ur0, E ur1, E ur2, ….E urn} iii. Calculate the chi-square statistics by consuming the UR and ER collections using the Eq. 3.5 below Yi2 =
n (U uri − E uri )2 E uri 1=1
iv. If the chi-square value
0.9
Normal
217
2764
24
0.92
0.88
0.9
Common Pneumonia
399
157
1459
0.72
0.88
0.8
Truth
just 746, 1065, 757, and 746 images which poses the problem of generalization as opposed to our approach of using the exhaustive dataset. In approach [7], large network ResNext+ along with LSTM is used which is computationally expensive. The following is how the paper is structured. Section 2 contains information about the dataset and the overall approach. Section 3 delves into the findings in depth. Section 4 contains the conclusion as well as some suggestions for further work.
2 Proposed Approach 2.1 Dataset The dataset [10] contains 194,922 chest CT scans from 3745 people. Out of which, 60,083 belong to the Normal class, 40,291 are Common Pneumonia cases and 94,548
360
A. Razim and M. A. U. Kamil
Fig. 1 First line, second line, and third line show Normal, Common Pneumonia, and COVID-19 chest CT-scan images, respectively
are COVID cases Fig. 1. The dataset splits into three sections: train, validation, and test. To avoid overfitting, a 15% dataset is held out as validation dataset. During training, the validation dataset is used to fine-tune hyper-parameters. Train data represents 80%, and the rest constitutes test data. Training dataset has the maximum data along with random distribution which enables network to generalize better. After training the model on train and validation datasets, the confusion matrix and other criteria are used to evaluate the model on the test dataset, as presented by Table 4. The Confusion Matrix is a technique for assessing the performance of different deep learning models. Table 2 shows the overall distribution of the dataset. Further data augmentation is employed to improve the model’s accuracy. During data augmentation, the following transformations are used: • • • •
Horizontal flip Vertical flip Random rotation with a range of 360°. Shifting—the images are shifted left, right, upward, and downward.
Deep Learning-Based Efficient Detection of COVID-19
361
2.2 Deep Learning Model Architecture The deep learning architecture used in this study is EfficientNet B4, which belongs to the EfficientNet family of architectures and is known for its cutting-edge 84.4% top-1 and 97.1% top-5 accuracy on ImageNet data despite being 8.4 × smaller and 6.1 × faster than other deep learning solutions [11]. When developing CNNs, as in most neural networks, the most difficult part is model scaling, which is determining how to enhance accuracy by increasing the model size. This is a time-consuming technique that requires manual hit-and-miss approach until a sufficiently accurate model that meets the resource restrictions is developed. The procedure is resource and time intensive, and it frequently produces models with suboptimal accuracy and efficiency. EfficientNet family uses method of compound scaling which balances dimensions of width, depth, and resolution of the network with a constant user defined ratio φ as shown in (1). This network’s fundamental building element is MBConv [12], to which squeeze-and-excitation optimization has been added. MBConv is related to MobileNet v2’s inverted residual blocks. These provide a direct link between the beginning and ending of a convolutional block. To improve the depth of the feature maps, the input activation maps are first extended using 1 × 1 convolutions. This is followed by 3 × 3 depth-wise and point-wise convolutions, which reduce the number of channels in the final feature map. The thin layers are connected by shortcut connections, whereas the broader layers are present between the skip connections. This structure aids in reducing the overall number of operations necessary as well as the size of the model. To adapt the EfficientNet B4 model architecture for the problem at hand, it is extended by 6 additional layers (Table 3). In these extended layers, 2 dropout layers are used for preventing the neural network from overfitting [13]. First, noisy-student weights are put into the model, which achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% higher than the state-of-the-art model [14]. After that, all the trainable layers are frozen except the last 6 layers which account for only 237,955 trainable parameters. Training is done using Adam optimizer with batch size and image size equal to 16 and 380 × 380, respectively (Fig. 2) depth = d = α∅ width = w = β∅ resolution = r = γ∅ Such that : α.β2 .γ2 ≈ 2 and α ≥ 1, β ≥ 1, γ ≥ 1
(1)
2.3 Learning Rate One of the most crucial hyper-parameters is the learning rate since it defines how much the loss changes in accordance with the predicted error each time the model weights are updated. Learning rate can be constant or varied with a variety of patterns
A. Razim and M. A. U. Kamil
Dropout
MBConv 3*3
Global Average Pool
Top Layers
MBConv, 5*5
MBConv, 3*3
Conv 3*3
Noisy student weights
EfficientNet-B4
Pred: 1000 classes
362
Transfer-learning Train Valida on
EfficientNet-B4
Top Layers
MBConv 3*3
MBConv, 5*5
MBConv, 3*3
Pred: 3 classes
Augmented Data
Pre-processing Stage
Conv 3*3
Table III
Predictions
Grad-CAM
COVID-19 Common Pneumonia
Visualiza on
Normal
Chest-CTs
Fig. 2 Schematic of proposed architecture
during training by using callbacks like cyclic, constant, triangular2, etc. Reduce learning rate on plateau reduces the learning rate after one of the quantities that is set to monitor does not improve. In this study, reduce learning rate on plateau is used with monitor parameter equal to validation accuracy and patience of 2 (Fig. 3).
2.4 Loss Function The loss function performs a crucial role in that it must correctly quantifies the error between output of the model and the given target value down to a single number
Deep Learning-Based Efficient Detection of COVID-19
363
Fig. 3 Reduce learning rate on plateau during training
in such a manner that decrease in that number indicate a better model. Since it is a multiclass classification problem, categorical cross-entropy is used.
Loss = −
Output Size
yn · log yˆn
(2)
n=0
where in (2) the n-th scalar value in the model output is yˆn , the matching goal value is yn , and the output size denotes number of classes which is 3 in our study.
2.5 Evaluation Metrics For this study, accuracy, recall, precision, and F1-score were used as evaluation metrics (Table 4). Precision refers to the percentage of affirmative identifications that were right. The percentage of true positives accurately detected is referred to as Recall. The harmonic mean of Precision and Recall is the F1-score. Accuracy is the percentage of true predictions made by the model.
364
A. Razim and M. A. U. Kamil
True Positives + True Negatives Total Predictions True Positives Recall = True Positive + False Negatives True Positives Precision = True Positives + False Positives 2 ∗ Precision ∗ Recall F1 - Score = Precision + Recall
Accuracy =
(3)
3 Results and Discussion The model was trained on Tesla T4 GPU provided by Google Colab. Keras/TensorFlow framework was used for the implementation of computational experiments. The average accuracy achieved by the model is 90%. Model’s F1scores for COVID-19, Normal, and Common Pneumonia classes were 0.9, 0.9, and 0.8, as given in Table 4. As shown in Fig. 4, the training accuracy is lower than validation accuracy which results from 2 dropout layers (Table 3) as these layers set
Fig. 4 The model’s validation and training accuracy
Deep Learning-Based Efficient Detection of COVID-19
365
Fig. 5 The model’s training and validation loss
30% of features to zero during training but during validating these dropouts are not used. So, the model during validating is more robust and leads to higher validation accuracy; likewise, in the loss curve in Fig. 5 validation loss is less than training loss. From Table 4, the precision and recall values are fairly closer to 1; however, recall is more important than precision as it indicates how many positive CT scans are being predicted correctly, which shows that the model is able to differentiate between different classes. Further explainability-driven validation is shown in Fig. 6 with Grad-CAM visualization [15]. All of the assessment indicators used to validate model performance in this study are standard metrics; however, they do not elucidate the decision-making process that underlies the model. This Grad-CAM visualization depicts regions of the image that the model is paying most attention to while making predictions. As shown in Fig. 6, the model focuses on the affected regions in Common Pneumonia cases likewise in the case of COVID images. In the Normal cases, model focusses more on the lower regions. The texture in the image periphery piqued model interest. Such a visual heuristic that differs from human visual perception warrants additional investigation in order to acquire a better understanding of how the model detects COVID-19 and which traits they consider most important. The profiling of these traits might help clinical doctors discover novel visual markers for COVID-19 infections that can be used in CT scans as a screening tool, as well as explain the model’s power in COVID-19 testing.
366
A. Razim and M. A. U. Kamil
Fig. 6 Grad-CAM visualizations, first line, second line, and third line show Normal, Common Pneumonia, and COVID images, respectively
4 Conclusion This paper solves the multiclass classification problem by means of deep learning approach using EfficientNet B4. Previous studies distinguished between non-COVID and COVID CT-scan images with the best F1-score and accuracy of 0.88 and 98%. This paper discriminates between 3 classes with an F1-score of 0.9, 0.9, and 0.8 for COVID, Normal, and Common Pneumonia samples and average accuracy of 90% despite being efficient in terms of requiring high-end computational resources. The experiments show that our models perform well in COVID-19 testing. In the future, we will focus on determining the severity of COVID-19 and attempting to extract more useful information from CT images in hopes of preventing the pandemic. We will do additional explanatory analyses on the models to shed light on the detection mechanism of COVID-19, uncover essential characteristics in CT scans, and facilitate clinical doctor screening. Although the system performs well on publicly available datasets, the work is still in the theoretical research stage, and the models have not been tested in clinical areas.
Deep Learning-Based Efficient Detection of COVID-19
367
References 1. Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, Tao Q, Sun Z, Xia L (2020) Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology 296:E32–E40. https://doi.org/10.1148/radiol.2020200642 2. He X, Yang X, Zhang S, Zhao J, Zhang Y, Xing E, Xie P (2020) Sample-efficient deep learning for COVID-19 diagnosis based on CT scans. IEEE Trans Med Imag. https://doi.org/10.1101/ 2020.04.13.20063941 3. Wang S, Kang B, Ma J, Zeng X, Xiao M, Guo J, Cai M, Yang J, Li Y, Meng X, Xu B (2020) A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19). medRxiv. https://doi.org/10.1101/2020.02.14.20023028 4. He X, Wang S, Shi S, Chu X, Tang J, Liu X, Yan C, Zhang J, Ding G (2020) Benchmarking deep learning models and automated model design for COVID-19 detection with chest CT scans. medRxiv. 2020.06.08.20125963. https://doi.org/10.1101/2020.06.08.20125963 5. Polsinelli M, Cinque L, Placidi G (2020) A light CNN for detecting COVID-19 from CT scans of the chest. Pattern Recogn Lett 140:95–100. https://doi.org/10.1016/j.patrec.2020.10.001 6. Mobiny A, Cicalese PA, Zare S, Yuan P, Abavisani M, Wu CC, Ahuja J, de Groot PM, van Nguyen H (2020) Radiologist-level COVID-19 detection using CT scans with detail-oriented capsule networks 7. Mohammed A, Wang C, Zhao M, Ullah M, Naseem R, Wang H, Pedersen M, Cheikh FA (2020) Weakly-supervised network for detection of COVID-19 in chest CT scans. IEEE Access 8:155987–156000. https://doi.org/10.1109/ACCESS.2020.3018498 8. Mishra AK, Das SK, Roy P, Bandyopadhyay S (2020) Identifying COVID19 from chest CT ımages: a deep convolutional neural networks based approach. J Healthcare Eng 2020. https:// doi.org/10.1155/2020/8843664 9. Dhanya R (2020) Deep Net model for detection of covid-19 using radiographs based on ROC analysis. J Innov Image Process 2:135–140. https://doi.org/10.36548/jiip.2020.3.003 10. COVIDx CT | Kaggle, https://www.kaggle.com/hgunraj/covidxct. Last accessed 18 Oct 2021 11. Tan M, Le Qv. (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: 36th ınternational conference on machine learning, ICML 2019. 2019-June, pp 10691– 10700 12. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobileNetV2: ınverted residuals and linear bottlenecks 13. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958 14. Xie Q, Luong M-T, Hovy E, Le Qv (2019) Self-training with Noisy Student improves ImageNet classification. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 10684–10695 15. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2016) Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vision 128:336– 359. https://doi.org/10.1007/s11263-019-01228-7
Handwritten Character Recognition for Tamil Language Using Convolutional Recurrent Neural Network S. Vijayalakshmi, K. R. Kavitha, B. Saravanan, R. Ajaybaskar, and M. Makesh
Abstract The process of conversion of printed Tamil character from a handwritten Tamil character is known as handwritten Tamil character recognition. Handwritten characters are challenging to analyze because of the wide range of writing styles, as well as the varying sizes and orientation angles of the characters. This paper has the ability to recognize offline handwritten characters that presently supports Tamil language that has high accuracy and minimum validation loss by the system. The optical character recognition system developed for the Tamil language carry a poor recognition rate because of its numerous writing designs and enormous number of characters. The goal of this system is to notice a digital format for Tamil written document. The proposed method is to achieve a accuracy of ninety six percentage with the help of CRNN. It is the foremost difficult task even although, numerous researches have been planned for written Tamil text. The trendy analysis during this domain has used the deep learning algorithm rule to improve the accuracy for the input document. This survey grants associate in fair review of the Tamil written character recognition, challenges, and diverse skills. Keywords Handwritten character recognition · Written character recognition · Tamil language · Deep learning techniques · Convolution recurrent neural network · Hybrid neural network · Artificial neural network or deep neural network
S. Vijayalakshmi (B) · K. R. Kavitha · B. Saravanan · R. Ajaybaskar · M. Makesh Department of ECE, Sona College of Technology, Salem, Tamil Nadu, India e-mail: [email protected] B. Saravanan e-mail: [email protected] R. Ajaybaskar e-mail: [email protected] M. Makesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_25
369
370
S. Vijayalakshmi et al.
1 Introduction Handwriting recognition refers to a machine’s capacity to receive and understand handwritten input from a variety of sources, such as paper documents, pictures, and touch screen devices. Handwriting and machine character recognition is a new field of study that has a wide range of applications in banks, offices, and businesses. The main goal of this paper is to create an expert system for HCR utilizing neural network, which can recognize all characters in various formats using a CRNN technique. Figure 1 shows the problem that there are many different types of handwriting, both good and bad. This makes it difficult for programmers to provide sufficient samples of how each character could appear. Furthermore, characters might appear quite identical at times, making it difficult for a computer to recognize them effectively. The handwritten character recognition using neural network is a sector where numerous researchers are developing a system to recognize handwritten character due to the various writing style and shape of a character, the character recognition for overlapped character is the most complex of all recognition.
2 Literature Survey Ayush Purohit and Shardul Singh Chauhan, proposed that in returning days, written character recognition would possibly assist as a key issue to recover the dynamic data in the field of AI by the process of machine learning due to digital demand. Optical character documentation is the motivating analysis field due to communication between the PC and folks. Optical character recognition is a method of renovating
Fig. 1 Represents the Tamil letters with different styles
Handwritten Character Recognition for Tamil Language …
371
unprocessable knowledge into machine process able arrangement, limitation-unable to handle semi-structure data [1]. Cinu George et al. explained that optical character recognition is mostly allocated into two categories typed and written. The written optical character recognition is additionally divided into two categories—online and offline. If the input document is scanned so known by the system is termed offline recognition. Similarly, if the document is known whereas writing by mistreatment analog-to-digital converter is termed online recognition. In this work, we have a tendency to grant a laconic review of accessible techniques for Tamil written script recognition. Tamil written scripts recognition approaches are given with their quality and flaws. Different types of preprocessing and segmentation techniques are employed in written character recognition. Varied kinds of options are extracted, and differing kinds of classifiers are accustomed categories the Tamil input scripts. This study is devoted to the search of attainable techniques to progress associate degree offline written script recognition system for Tamil linguistic and its application and challenges [2]. Mouhcine Rabi et al. described that in their paper, the system being modeled is considered to be a Markov process—call it—with unobservable (“hidden”) states in a statistical Markov model. And their contextual handwritten recognition system for cursive Arabic and Latin script is 92.77% efficient. It is a resilient learner due to the large number of hidden layers. It can deal with a lot of different situations. However, the vast number of hidden layers slows down learning. It cannot be used in manufacturing systems, limitation-complex and time-consuming search matching values [3]. Monica patel and Shital P. Thakkar explained that this paper describes a review of handwritten character recognition for the language English. HCR is the procedure of converting handwritten message to machine readable text. For handwritten information the challenge we face is with one author to another, by distinction fit as a fiddle, size and position of character. For perceiving written by hand information with no error sorts of method has been used as a part of the most recent research of this zone, limitation-not useful for representing distinct sets of data [4]. JanrutiChandarana and MayankKapadia proposed that in their optical character recognition (OCR) is a system which loads an optical character as input, preprocess comparing the input image and model library [5]. Noman Islam et al. explained that their multilayer perceptions model that the rigid architecture makes good prediction with less number of classes and has a high accuracy in predicting, but whereas the model performs good only for three characters. Due to this the model is not feasible for deployment and segment the data, and the feature extracted from the input data, classify the data based on the features extracted, and the acknowledged features are kept in the image model library, and classify the input images by the proposed method, limitation-lacks structural dependence [6]. Dimple Bhasin et al. proposed that in this journal SVM is the algorithm which is used. SVM means support vector machine. Due to efficient preprocessing of data, the features make prediction easy through simple methods. Although the project proposes ideas for finding features to identify author not the character. So the
372
S. Vijayalakshmi et al.
proposed model cannot be used for character recognition, limitation-lacks proper pointer representation and indexing mechanism [7]. Hai Dai Nguyenet al. explained this research used deep neural network algorithm. Because it is based on the simple model. As a result, it is simpler to implement and train. Because there are fewer variables, it trains faster. Although this model is efficient, this model cannot be used for recursive identification of character, limitation-no indexing mechanism [8]. S. Vijayalakshmi et al. proposed that another cutting-edge innovation is the Labview-based coordination of the inserted framework using a back propagation neural network technique. Various sensors are connected to the motor during this process, and the attributes are extracted using a microcontroller. Through recreation in MATLAB, the proposed BPNN-based induction motor control architecture is approved. To approve the replica, a hardware arrangement is also developed. Using the proposed BPNN calculation, over 95% proficiency is achieved under full stress conditions [9]. R. Anand et al. explained that optical character recognition (OCR) is a key research topic in real-time applications that is used to recognize all of the characters in a picture. The scanned passport dataset was used in this study to generate all of the characters and digits using tesseract. There are 60,795 training sets and 7767 testing sets in the dataset. A total of 68,562 samples were used, with 62 labels separating them. There has been no research into predicting all 52 characters and ten numbers before today [10]. Tripathi and Milan proposed that, as a result of increased urbanization and people shifting from rural to urban areas, time has become a very important commodity. A greater requirement for speed and efficiency has evolved as a result of this transformation in people’s lifestyles. In the supermarket industry, item identification and billing are often done manually, which takes a lot of time and effort. Because the fruit goods lack a barcode, processing takes longer [11]. M. J. Samuel., explained that in visual inference, capsule networks with structured data provide the best results. A basic capsule network technique is utilized to classify hierarchical multi-label text in this study. It is compared to support vector machine (SVM), long-short term memory (LSTM), artificial neural network (ANN), convolution neural Network (CNN), and other neural and non-neural network designs to illustrate its superior performance. The Blurb Genre Collection (BGC) and Web of Science (WOS) databases were used in this study. The encoded latent data is combined with the algorithm in hierarchical multi-label text applications to manage structurally distinct categories and uncommon events [12]. Sungheetha et al. proposed that in the study of where in recent decades, aspectlevel sentiment classification refers to the process of determining the text in a document and classifying it according to its sentiment polarity in relation to the objective. The high cost of annotation, on the other hand, may prove to be a considerable hindrance to this goal. However, this is incredibly successful in interpreting document-level labeled data, such as online reviews, using a neural network from the perspective of a consumer. The sentiment encoded text seen in online reviews can be examined using the proposed methodology [13].
Handwritten Character Recognition for Tamil Language …
373
Smys et al. explained that their research investigated and describes that the paper machine learning is a crucial and inevitable zone in today’s research era, where it delivers improved solutions to a variety of fields. Deep learning, in particular, is a cost-effective and effective supervised learning model that can be used to a variety of complex problems. Because deep learning has a variety of illustrative qualities and does not rely on any limiting learning methods, it aids in the development of superior solutions. Deep learning is widely employed in different applications such as image classification, face recognition, visual recognition, language processing, audio recognition, object detection, and various science, business analysis, and so on, due to its considerable performance and advancements [14]. A. Pasumpon Pandian proposed that in the study where the use of a sentiment analysis in deep learning algorithm is one of the most common uses. When compared to earlier approaches, this study provides a more effective and efficient automated feature extraction technique. Traditional approaches, such as the surface approach, will rely on a time-consuming manual feature extraction process, which is a key component of feature-driven developments. These strategies provide a solid foundation for determining the predictability of features and deep learning techniques of the ideal platform for implementing [15].
3 Methodology The planned approach for the handwritten character recognition is powerful, and it is totally different styles of text appearances, as well as size of the font, style of the font, color of the image, and background of the image. Which combines the several strengths of various complicated and complementary techniques and overcoming their shortcomings, the planned technique uses economical character recognition/detection and classifier coaching supported neural networks. Neural networks gain the best accuracy within the lowest time interval attainable and the representation of proposed method with block diagram is shown in Fig. 2.
Fig. 2 Representation of proposed method in block diagram
374
S. Vijayalakshmi et al.
Fig. 3 Sample dataset of Tamil character
3.1 Input From the Block diagram, Input is the first step where the dataset for handwritten character is loaded into the system and which is then preprocessed and then the image is segmented; later the image is trained using neural network by extracting the features in it and by classifying. The system is ready to recognize the character, the test dataset is loaded and then compared with the trained model for prediction. Figure 3 represents a sample of the dataset which consist of 12 vowels in Tamil language used for the model training of Tamil handwritten character recognition.
3.2 Preprocessing The next step after getting the input from the user is preprocessing, and it is the initiative within the process of input image (online or offline). The scanned image is processed in a system where it is checked for noise, skew, slant, etc. In the prospects of the image obtaining skew with either of the orientation (left or right) or with noise like Gaussian. Here, the image is first convert into grayscale so into binary. Then, a tendency to get image that is appropriate for any process. When preprocessing is completed, the image is free of noise due to electricity, heat, and sensor illumination which is then passed and processed to the next stage where the image is segmentated, wherever the image is rotten into individual characters [16]. The image is now in binary format, and the lay to rest line sections is checked. If the image contains lay to rest line sections, the image is divided into groups of paragraphs that span the interline gap, creating a mesmerizing effect. The lines within the paragraphs are scanned for horizontal house intersection in relation to the background. A bar chart of the image is used to determine the width of the horizontal
Handwritten Character Recognition for Tamil Language …
375
lines. The lines are then searched vertically for house intersections. Histograms are commonly used to determine the breadth of a set of words. After that, the words are split into characters and character breadth computation is performed. Extraction of features is the next step after the segmentation part of HCR wherever the standalone image glyptography is taken into account and the feature is extracted for options. First a personality glyptography is outlined to the subsequent adjacent attributes like character height and breadth of the processed character. Classification is finished mistreatment, and the options extracted within the before step that coordinates to every character glyptography [17]. These options are then analyzed to the mistreatment where the set of information and rules are tagged as happiness to totally new and different categories. This classification of according to the rules is generalized such that it works for single font kind. The peak of the every individual character and also the character breadth, where the disagreement occurs, different distance measures are picked as the candidate class for categorization. Alternative characters have their own classification information and regulations. Because it extracts in the form of visual characters and does not require training, this technique might be considered general. Once a replacement glyptography is generated and delivered to the current character classifier block, it extracts the options and compares the preprocessed picture to the given options according to the principles, recognizing, and labeling the character. The steps concerned in preprocessing are • • • • •
Browse input image. Image size. Grayscale conversion. Noise removal. Binary image conversion.
However, in depth analysis, it is demanding to style series all-purpose systems. There is different sources of variation in the result once extraction of text from the image. Darken from the rough-irregular and uneven background or, from the raised pattern low-contrast or advanced pictures, or pictures with varieties in size of the font, font style, color of the font, font orientation, and alignment of the font. This variety of variation makes the matter terribly troublesome to sketch mechanically [18]. Generally, handwritten text detection strategies are often differentiated into three classes. The most primary one which consists of different connected component which is based on strategies, that recognize the region of printed character which have similar colures and which include for sure font size, shape of the font, and spatial alignment of the font where constraints. However, the given strategies are not as effective as once, when the character have uniform colors with the preprocessed background. The region of text have special contrast between the region of text and background. Through these strategies are relatively less accurate in sensing the background colors, where they cannot identify the region of texts from the backgrounds which text is similar to the backgrounds.
376
S. Vijayalakshmi et al.
The third type is based on the edge-based strategies. The region of text is classified below with the idea that the sting of the surface of the image and also the regions of object are closer than those of the region of text. Where this type of classification is not that accurate to discover handwritten texts which is of massive size in font compared the support vector machines (SVM) technique with the MLPbased character validation over four freelance options, i.e., the DCT coefficients feature extraction, the constant gradient variance feature extraction, the grayscale spatial spinoff feature extraction, and the space map feature extraction. From the results is it found that higher feature detection or extraction results are produced by support vector machine instead of by multilayer perceptron. Multi-resolution-based handwritten text discover strategies are usually accepted to extract or detect texts in numerous scale.
3.3 Segmentation • Image division or segmentation is that the method of partitioning or separating with respect of polling size of a digital binary image by adjacent segments which forms into multiple segments. • Image division or segmentation is a compulsory part of handwritten text image analysis method. It differentiates between the objects we would like to examine additional and therefore, the alternative objects or their background. • Connected part analysis: Once region boundaries are detected, and it is typically helpful to extract regions that are not separated by a boundary. Any set of pixels that is not separated by a boundary is decision connected. Every highest region of connected pixels is termed a connected part. The set of connected elements partition a picture into segments.
3.4 CRNN When segmenting words onto memory boards, the system uses a word segmentation algorithmic rule followed by a neural network design to conduct printed character recognition on the words. The scanned image of the shape crammed by the user is the computer file provided to the machine. Input fields were located and cut into individual fields from this scanned image. These cropped images were then placed into the word segmentation algorithmic process, which separated each word in the images and ensured that each word in each input field was separated. The CRNN model is then used to perform HCR on these individual words. The convolutional recurrent neural network (CRNN) model is made up of three parts: convolutional layers, recurrent layers, and finally transcription layers [19]. The convolution layers are used for extracting features in sequences from user data photos. The perpetual layers use this output to create predictions for each frame
Handwritten Character Recognition for Tamil Language …
377
of the feature sequence. Finally, the transcription layer or the government agency is used to convert the perpetual layers’ per frame predictions into a label sequence. G[m, n] = ( f ∗ h)[m, n] =
j
h[ j, k] f [m − j, n − k]
(1)
k
Equation 1 represents kernel convolution where it is a major component of several other computer vision methods in addition to CNNs. It is a technique in which we apply a tiny matrix of numbers to our image (known as a kernel or filter), then transform it using the values from the filter. The input picture is designated by f, and our kernel is denoted by h, and hence, the following formula is used to generate subsequent feature map values. The row and column indexes of the result matrix are indicated by m and n, respectively. Figure 4 shows the CRNN architecture is a hybrid neural network that combines convolutional neural networks (CNN) with recurrent neural networks (RNN). A representation of RNN has been introduced to the neural network in the form of three bidirectional long-short term memory (BLSTM). A 224 × 224 RGB picture with a defined size is fed into the cov1 layer. The image is processed through a series of convolution (conv.) layers with a very small receptive field: 3 * 3 (the smallest size that can capture the concepts of left/right, up/down, and center). One of the setups also includes 1 * 1 convolution filters, which may be regarded of as a linear change of the input channels (followed by nonlinearity). The convolution stride is set to 1 pixel, and the spatial padding of the convolution layer input is set to 1 pixel for 3 * 3 convolution layers, preserving the spatial resolution after convolution. Spatial pooling is done via five max-pooling layers that follow part of the conv. layers (not all the conv. layers are followed by max-pooling). Stride 2 is used to max-pool over a 2 * 2 pixel window. Three fully connected (FC) layers are added after a stack of convolution layers (of varied depth in different architectures): the first two have 4096 channels each, while the third performs 1000-way ILSVRC classification and hence, has 1000 channels (one for each class). The final layer is the softmax layer. The entirely connected levels are configured in the same way in all networks. All hidden layers have the rectification (ReLU) nonlinearity. With the exception of one, none of the networks use local response normalization (LRN), which does not improve performance on the ILSVRC dataset but increases memory usage and computation time.
3.4.1
CNN
The segmented words are then fed into the CRNN model as an input. The convolution network is given first, followed by a perennial network to capture successive possibilities from the photos, and finally the transcription layer (CTC) to transfer the output of the perennial network to a labeled sequence.
378
S. Vijayalakshmi et al.
Fig. 4 Block diagram of CRNN (VGG-16) architecture
Z [ j] = W [ j] · A[ j−1] + b[ j] A[ j] = g [ j] · Z [ j]
(2)
Equation 2 represents the convolution layers of two steps with the forward propagation. Calculate the intermediate value Z by convolutioning the input data from the previous layer with the W tensor, which contains filters, and then adding Bias b. The second step is to apply a nonlinear activation function to our intermediate value (g is the activation function).
Handwritten Character Recognition for Tamil Language …
379
Fig. 5 Architecture of CNN
The convolution network is created by mistreating the quality convolutional layers in conjunction with the CNN model’s max-pooling layers [20]. Pictures are used as association input in a CNN model. It retrieves options from those images with the use of weights and biases. If a conventional deep neural network (DNN) model is used to extract options from a picture, the pixels from the pictures must be flattened, which causes the initial options inside the image to become disoriented. When the pixel dependency in the input photos is exceedingly high, this could result in very little to no accuracy. In contrast, filters or kernels are used in CNN to perform convolution operations on images in order to capture spatial and temporal connections. Maxpooling layers are used in the CRNN in conjunction with convolution layers. Maxpooling layers are used to reduce the size of an image, which helps to reduce machine power and focus on extracting the most important possibilities. It also helps to reduce the amount of noise in the image. Figure 5 represents the architecture of CNN where the initial layers of a CNN model area unit accustomed discover straightforward options like horizontal or vertical lines. The complexity of the options discovered will rise as we proceed deeper into the model, unless the options found make no sense to human eyes. These options, on the other hand, allow the CNN model to grow even more powerful and accurately forecast photos.
3.4.2
RNN
The CNN output is routed into a two-way perpetual neural network (RNN). The perennial layer is used to predict a label for each feature sequence in close proximity. There are three blessings in the perennial layer. It will first collect discourse data from a succession. When predicting a label for any letter, it should be easier to distinguish it by mixing it with surrounding alphabets rather than examining them individually.
380
S. Vijayalakshmi et al.
Second, the errors are frequently back propagated to the convolution layers, making it easier to educate the entire CRNN model using a single loss function. Finally, impulsive length inputs are frequently provided as association input to the RNN layer. Figure 6 represents the architecture of a recurrent neural network, seen on the left with an RNN notation and on the right with an RNN being unrolled (or unfolded) into a full network. When we unroll the sequence, we write out the network for the full sequence. The network would be unrolled into a three-layer neural network, a layer for each word, if the sequence we are interested in is a three-word sentence. a (t) = b + W h (t−l) + U x (t)
(3)
h (t) = tan h a (t)
(4)
o(t) = c+
(5)
yˆ (t) = softmax o(t)
(6)
Equations 3–6 represent RNN that transfers a length-matched input sequence to a length-matched output sequence the sum of the losses over all time steps would thus equal the overall loss for a particular set of x values paired with a set of y values. We assume that the outputs o(t) are used as the parameter to the softmax function to create the vector of probabilities over the output. We also assume that the loss L is equal to the negative log-likelihood of the true goal y(t) based on the current input. Between the input and output stages of the RNN layer, there are many RNN units. These units are used to capture previous context in order to forecast the label.
Fig. 6 Architecture of RNN
Handwritten Character Recognition for Tamil Language …
381
Although ancient RNN units are better at capturing prior context, they suffer from vanishing gradients when transferring context to distant units. Long-short term memory (LSTM) is a type of RNN unit used to deal with the problem of disappearing gradients. There are three gates in an associate LSTM: input, output, and forget. The LSTM also includes a memory cell that is used to transfer the context till it varies. Every cell’s input is passed on to the gates, and the output determines whether the information within the memory cell should be preserved or deleted. However, knowledge flows in an LSTM cell as shown in Fig. 1. The LSTM gates are sigmoid functions with point wise multiplication that produce a range of zero to one. This pricing shows how much data should be transferred from the memory cell.
4 Results and Discussion The goal of “HCR using neural network” is to recognize and identify handwritten characters. A NN is used to implement the “handwritten character recognition system.” The original image is changed to grayscale in this system, and then the image is converted to black and white and segmented after that. The system displays the final output after preprocessing and segmentation. Training of the dataset is crucial. We have used the 80% of the dataset for training of the model and 20% of the datasets for testing of the model. We have used only with a smaller number of class samples of characters. We attempted character recognition with different technique to enhance our training procedure in order to reduce computation time while maintaining a high overall performance. We tested a variety of parameter settings and found the proposed system to be particularly more effective than all the other system proposed by different authors. Figure 7 shows the output of the image where the text in the image is detected and extracted from the test image uploaded to the trained model. Many regional languages around the world have distinct writing styles that may be identified by HCR systems utilizing the appropriate algorithm and methods. We offer a programmed that teaches you how to recognize Tamil characters. It has been discovered that the presence of unusual characters or similar shapes for many characters makes it difficult to recognize handwritten characters. The characters are segregated
Fig. 7 Test image with output
382 Table 1 Different existing methods and proposed method comparison
S. Vijayalakshmi et al. S. No
Different implemented method
Model accuracy (%)
1
SVM
81.08
2
HMM using VGG-14
93.82
3
DNN using mobile net
94.91
4
ANN and kernel filters
95.50
5
CRNN(proposed method) 95.99
into distinct characters after the scanned image is preprocessed to provide a clean image. Normalization and filtration are performed as part of the preprocessing operation, which results in a noise-free and clear output. Managing our evolution algorithm with correct training, evaluation, and other step-by-step processes will result in a successful and efficient system output. The use of some statistical and geometric factors in conjunction with a neural network will improve the recognition of Tamil characters. The researchers will benefit from this effort as they work on additional scripts. Table 1 shows the different method where it is compared of the existing methods for HCR in which our proposed method is much more efficient than all the other existing methods. Figure 8 shows the CRNN-trained model which has an accuracy of over 0.96; i.e., the model accuracy of 96% and has a model loss less than 0.05, i.e., a model loss less than 5%.
5 Conclusion The key approaches utilized in the realm of handwritten Tamil alphabet recognition during the previous decade are examined in this research. Various preprocessing techniques, segmentation methods, feature extraction processes, and classification algorithms are all presented in detail. Despite the development of numerous solutions in recent decades to deal with the complexity of handwritten Tamil alphabets, much more research is needed before a suitable software solution can be made available. The precision of the existing handwritten HCR is poor. To improve overall performance, we need a competent solution to fix this problem. In comparison with existing approaches, the suggested CRNN method provides a high precision for the answer. In this paper, we have a tendency to recognized written Tamil characters, by giving input image so it is converted to a digital text format. This digital text are often keep in an exceedingly document file and conjointly a coaching and check accuracy of ninety six is obtained by this CRNN model that is healthier than cluster, GVC, SVM, component labeling technique, and ANN.
Handwritten Character Recognition for Tamil Language …
383
Fig. 8 CRNN-trained model loss and accuracy of proposed method
References 1. Purohit A, Chauhan SS (2016) A survey on handwritten character recognition. Int J Comput Sci Inf Technol 7(1):1–5 2. George C, Podhumani S, Monish RK, Pavithra T (2016) Survey on handwritten character recognition using artificial neural network. Int J Sci Technol Eng 2(10) 3. Rabi M, Amrouch M, Mahani, Z (2017) A survey of contextual handwritten recognition systems based HMMs for cursive Arabic and Latin script. Int J Comput Appl 160(2) 4. Patel M, Thakkar SP (2015) Handwritten character recognition in English: a survey. Int J Adv Res Comput Commun Eng 4(2) 5. Chandarana J, Kapadia M (2014) Optical character recognition. Int J Technol Adv Eng 4(5) 6. Islam N, Islam Z, Noor N (2016) A survey on optical character recognition system. J Inf Commun Eng 10(2) 7. Bhasin D, Goyal G, Dutta M (2014) Design of an effective preprocessing approach for offline handwritten images. Int J Comput Appl (0975–8887) 98(1) 8. Nguyen HD, Le A, Nakagawa M (2016) Recognition of online handwritten math symbols using deep neural network. IEICE Trans Inf Syst E99(12) 9. Vijayalakshmi S, Kavitha KR, Senthilvadivu M, Amutha M, Evangelin BC (2020) Back propagation neural network algorithm based enlistment of induction motor parameters monitoring using labview. J Adv Res Dynam Control Syst 12:439–452 10. Anand R, Shanthi T, Sabeenian RS, Veni S (2020) Real time noisy dataset implementation of optical character identification using CNN. Int J Intell Enterp 7(1–3):67–80 11. Tripathi M (2021) Analysis of convolutional neural network based image classification techniques. J Innov Image Process (JIIP) 3(02):100–117
384
S. Vijayalakshmi et al.
12. Samuel MJ (2021) Capsule network algorithm for performance optimization of text classification. J Soft Comput Paradigm (JSCP) 3(01):1–9 13. Sungheetha A, Sharma R (2020) Transcapsule model for sentiment classification. J Artif Intell 2(03):163–169 14. Smys S, Chen JIZ, Shakya S (2020) Survey on neural network architectures with deep learning. J Soft Comput Paradigm (JSCP) 2(03):186–194 15. Pasumpon PA (2021) Performance evaluation and comparison using deep learning techniques in sentiment analysis. J Soft Comput Paradigm 3(2):123–134 16. Alotaibi F, Abdullah MT, Abdullah RBH, Rahmat RWOK, Hashem IAT, Sangaiah AK (2018) Optical character recognition for Quranic image similarity matching, vol 6 17. Shyni SM, Antony Robert Raj M, Abirami S (2015) Offline Tamil handwritten character recognition using sub line direction and bounding box techniques. Indian J Sci Technol 8(S7):110–116 18. Doush IA, Al-Trad AM (2016) Improving post-processing optical character recognition documents with Arabic language using spelling error detection and correction. Int J Reason Based Intell Syst 8(3/4) 19. Shi B, Bai X, Yao C (2016) A survey on handwritten character recognition (HCR) techniques for Chinese alphabets. Adv Vis Comput Int J (AVC) 3(1) 20. Pengchao LI, Peng L, Wen J (2016) Rejecting character recognition errors using CNN based confidence estimation. Chin J Electron 25(3)
Liver Tumor Detection Using CNN S. Vijayalakshmi , K. R. Kavitha , M. Tamilarasi, and R. Soundharya
Abstract The abnormal growth of cells in the liver causes liver cancer, which is also known as hepatic cancer, where hepatocellular carcinoma (HCC) is the most common type of liver cancer which makes up 75% of cases. The detection of this tumor is difficult and mostly found at advanced stage which causes life-threatening issues. As a result, detecting the tumor at an early stage is critical and the main goal of this study is to use image processing to identify liver cancer at an earlier stage. Computed tomography (CT) scans are used to diagnose malignant liver tumors. Anisotropic diffusion filters are used to improve the picture, and morphological processes are used to segment it, making it simple and straightforward to deal with. The dilation and erosion processes are combined in this procedure. The original CNN method works well for segmenting noise-free pictures, but it fails to segment images that have been contaminated by noise, outliers, or other imaging artifacts. And picture quality and accuracy are the most important aspects of this project; image quality evaluation and improvement are dependent on the enhancement stage, which employs low-preprocessing approaches based on CNN and feature extraction. Following the segmentation principles, a more improved portion of the item of interest is generated, which is then employed as the basis for feature extraction. Keywords Computed tomography (CT) · Convolutional neural network (CNN) · Anisotropic diffusion filter · Dilation · Erosion
S. Vijayalakshmi (B) · K. R. Kavitha · M. Tamilarasi · R. Soundharya Department of ECE, Sona College of Technology, Salem, Tamil Nadu, India e-mail: [email protected] K. R. Kavitha e-mail: [email protected] M. Tamilarasi e-mail: [email protected] R. Soundharya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_26
385
386
S. Vijayalakshmi et al.
1 Introduction The liver is the largest organ in the body, and it is found under the right ribs and below the base of the lung. It aids in the digestion of food. It filters blood cells, processes, and stores nutrients, and converts part of these nutrients into energy, as well as breaking down harmful chemicals. The left and right lobes are the two primary hepatic lobes. The quadrate and caudate lobes, which are visible from the underside of the liver, are two more lobes. Hepatocellular carcinoma (HCC) is a type of cancer that develops when liver cells grow out of control and spread to other parts of the body. When the cells behave abnormally, primary hepatic cancers occur. Liver cancer is the second most common cancer in males and the sixth most common cancer in women, according to statistics. In 2008, 750,000 persons were diagnosed with liver cancer, with 696,000 of them dying as a result. Males are infected twice as often as females over the world. Viral hepatitis, which is far more dangerous, can lead to liver cancer. According to the World Health Organization (WHO), this virus causes over 1.45 million fatalities per year. Clinical, laboratory, and imaging tests, such as ultrasound scans, magnetic resonance imaging (MRI) scans, and computed tomography (CT) scans, are used to detect primary hepatic cancer. A CT scan uses radiation to produce detailed images from various angles across the body, such as sagittal, coronal, and axial images. It depicts organs, bones, and soft tissues, with the data being processed by a computer to produce images, which are commonly in DICOM format. Contrast material is frequently injected intravenously during the procedure. Malignant lesions can be distinguished from acute infection, chronic inflammation, fibrosis, and cirrhosis using the scans. This method for generating medical images can be used to enhance synthetic data and improve CNN’s performance in medical image classification. On a small dataset of 182 CT images of the liver, our unique approach is presented. The accuracy of the liver tumor classification job was improved by 70% when synthetic data augmentation was used.
2 Related Work The diverse image filtering algorithms and techniques used for image filtering/smoothing. Image smoothing is one of the most critical and widely used method in image processing [1]. To develop an automated hepatocellular carcinoma detection system in computed tomography images with high sensitivity and low specificity [2]. Using clinical imaging, segmentation is critical for function extraction, picture measurements, and picture display. In a few utilities, it can be beneficial to categorize picture pixels into anatomical regions, along with bones, muscles, and blood vessels, at the same time as in different into pathological region, along with most cancers, tissue deformities, and more than one sclerosis lesion [3]. The proposed set of rules is designed for computerized computer-aided analysis of liver
Liver Tumor Detection Using CNN
387
most cancers from low evaluation CT pix. The concept expressed in this text is to categories the malignancy of the liver tumor in advance of liver segmentation and to discover HCC burden at the liver [4]. An automated pathological analysis approach called neural ensemble mainly based detection (NED) is developed and tested in an early-level liver cancer diagnosis system, based on the popularity of the strength of synthetic neural community ensemble (LCDS). NED use a synthetic neural community ensemble to identify most cancers’ cells inside images of needle biopsies taken from the bodies of the subjects to be diagnosed [5]. Their described morphological operations are easy to apply, and it works on the principle of set theory. The objective of using this type of operation is to remove the imperfections in the structures of the images [6]. The proposed set of rules to directly carry out morphological operations on photographs represented via quadtrees and produce the dilated/eroded snap shots, additionally represented through quadtrees [7]. The proposed stepped forward anisotropic diffusion filter uses adaptive threshold selection. The proposed technique became carried out to real MRI images, and the outcomes are fantastic [8]. They used anisotropic diffusion filter to cancel the noise at the frame floor potentials measurements with the aim of enhancing the corresponding answers of the inverse hassle of electro cardiology [9]. Another cutting-edge innovation is the LabVIEW-based coordination of the inserted framework using a back propagation neural network technique. Various sensors are connected to the motor during this process, and the attributes are extracted using a microcontroller. Through recreation in MATLAB, the proposed BPNN-based induction motor control architecture is approved. To approve the replica, a hardware arrangement is also developed. Using the proposed BPNN calculation, over 95% proficiency is achieved under full stress conditions [10]. The optical character recognition (OCR) is a key research topic in real-time applications that is used to recognize all the characters in a picture. The scanned passport dataset was used in this study to generate all the characters and digits using tesseract. There are 60,795 training sets and 7,767 testing sets in the dataset. A total of 68,562 samples were used, with 62 labels separating them. There has been no research into predicting all 52 characters and ten numbers before today [11]. Wireless communication is a field that is continually changing and developing. In the radio-frequency signal transmission link, the RF input module’s activity is crucial. The design of an RF high-frequency transistor amplifier for unlicensed 60 GHz applications is discussed in this study. The analysis transistor is a FET amplifier that operates at 60 GHz and draws 10 mA at 6.0 V. The open source Scilab 6.0.1 console software was used to simulate the amplifier. Sll = 0.930°, S12 = 0.21–60°, S21 = 2.51–80°, and S22 = 0.21–15° are the biases of the MESFET. The transistor is discovered to be unconditionally stable, allowing unilateral approximation to be used [12]. In this method, the basic behavior of bee agents aids in making synchronous and decentralized routing decisions. The MATLAB simulations demonstrate the algorithm’s advantages. When compared to existing state-of-the-art models, the nature-inspired routing method outperforms them. The network’s performance can be improved by using a simple agent model. The breadth first search version is used to increase the overall routing protocol output by discovering and deterministically evaluating various paths in the network [13].
388
S. Vijayalakshmi et al.
The suggested efficient deep learning system is based on continuously detecting intelligent anomalies in video surveillance and is utilized to reduce time complexity. The learning process was preprocessed to reduce the dimensionality of the video recorded pictures. The suggested pre-trained model is used to minimize the dimension of extracted picture features in a sequence of video frames so that the valuable and anomalous events in the frame can be preserved. The framework’s dimension can be reduced by selecting special elements from each frame of the film and using a background subtraction procedure. With the help of an image classification approach, the suggested method uses a combination of CNN and SVM architecture to detect anomalous conditions in video surveillance [14]. This not only doubles the work, but it also takes a long time. As a result, multiple convolutional neural network-based classifiers are presented to recognize the fruits by perceiving them through the camera to develop a speedy billing system to solve this obstacle. The best model out of the bunch can classify photographs with start-of-the-art accuracy, which is better than earlier research [15]. To get a faster learning speed and less calculation time with less human participation, the ELM essence should be integrated. This study article contains the true essence of ELM as well as a quick explanation of the classification algorithm. This research paper explains the numerous ELM variations for different categorization tasks. Finally, the future development of ELM for numerous applications based on function approximation was mentioned in this study article [16]. The experimental results using standard datasets and performance measures illustrate the method’s resilience, with better PSNR and SSIM values for stain separation than previous efforts in the field. The suggested unmixing strategy achieves PSNR values of roughly 30 dB, which is more than 10 dB higher than the benchmark approaches. The SSIM values are also in the 0.42–0.73 range, indicating strong structural components. The proposed method could be used to investigate the interactions of different activators in the wound-healing process as cancer progresses [17].
3 Proposed Work The proposed work is a basic approach of cancer detection using image processing in this research. MATLAB was utilized in this project. Computed tomography image of liver cancer is employed to detect the tumor region. Detection of liver cancer involves three main steps. It includes preprocessing of image, processing of image, and highlighting tumor region. Image improvement with an anisotropic diffusion filter removes noise and defects in the image during preprocessing. During thresholding, there may be some chance of noise creation. This step plays an important role in detection of cancer, because a small deviation which may be caused due to imperfections or noise will leaves a major effect in detection process. Convolutional neural networks are extensively used for image categorization and pattern recognition. The CNN gets its name from the convolution technique for extracting characteristics from images. In the processing stage, the image is segmented after enhancement to find the tumor region. Morphological operations are used to segment images. It contains
Liver Tumor Detection Using CNN
389
the basic processes of dilation and erosion, which are required to complete the entire detection procedure. The last step is to highlight the tumor region in the given image for easy and clear observation. For understanding the whole process obviously, the sub-plotted image of all processed image is shown which includes original image, filtered image, tumor region, bordered tumor image, and highlighted which type of tumors present in region in original given image. Block diagram Figure 1 shows the block diagram of the proposed system consist of preprocessing of image, segmentation of image, and feature extraction of image using morphological operation. Preprocessing Preprocessing of an image is the first and foremost step in image processing. The main aim of preprocessing in image processing is to improve the quality of image, suppressing unwanted distortions in image due to noise and enhance image features for further processing. To detect the liver and tumors in the liver’s surrounding organs, data preprocessing, data augmentation, and CNN models are used. The investigation began by distinguishing each slice of liver from its surrounding organs. All pixels’ Hounsfield unit (HU) values are derived by applying gray scale concentration to all values; higher values indicate clearer and brighter pixels. Hounsfield windowing
Fig. 1 Bock diagram of the proposed system
390
S. Vijayalakshmi et al.
(a)
(b)
Fig. 2 a CT images before preprocessing, b CT image after preprocessing
was done to ranges 100 and 400 after the slides were read in DICOM format. The researchers used histogram equalization to boost the contrast between the liver and its surrounding organs for additional brightness and clarity. Apart from that, Hounsfield windowing made it easier to determine the segmentation of the liver tumor, and histogram equalization revealed the before and after CT image. The CT scans for preprocessing are shown in Fig. 2. Our dataset has misbalancing classes of CT images pixels, but we need the augmentation of the dataset that required more containing tumor pixels. For increasing the quality of data, the researchers used the machine learning tools to avoid the over fitting, under fitting, and fix the non-balancing class for dataset training. Each CT imaging slice dataset contains its unique set of liver and tumor masks. To improve training and data augmentation, all the different masks were unified into one. The researchers, on the other hand, used the reflection of the liver mask and the tumor mask, as well as rotating every slice on the y-axis, to boost the dataset’s training quality by 90%. Segmentation Segmentation divides an image into different sections, each with comparable qualities for each pixel. The regions should have a strong relationship to the portrayed objects or features of interest to be meaningful and useful for picture analysis and interpretation. The first step from low-level image processing, which converts a gray scale or color image into one or more additional images, to high-level image description in terms of features, objects, and scenes is meaningful segmentation. The success of image analysis is determined by the accuracy of segmentation; however, accurate image partitioning is a difficult problem to solve. Segmentation techniques are either contextual or non-contextual. The latter ignore spatial correlations between image characteristics and group pixels together based on a global attribute, such as gray level or color. Contextual approaches take advantage of these connections as well, for example, grouping pixels with comparable gray levels
Liver Tumor Detection Using CNN
391
Fig. 3 Implementation of CNNs on liver tumor images
and close geographical locations. Image segmentation is a method for determining the border’s shape and size. It uses different features collected from the image to isolate the object from its background. After removing the noise and hair from the lesion area, the lesion needs to be separated from the skin, and therefore, the analysis for diagnosis is conducted purely using the necessary area as shown in Fig. 3. There are numerous segmentation approaches that might be used in this investigation. Traditional neural networks are like convolutional neural networks. One or more layers of convolutional fully connected, pooling, or fully connected and rectified linear unit (ReLU) layers make up a CNN. The accuracy of the outputs generally improves as the network goes deeper with more parameters, but it also becomes more computationally complex. CNN models have recently been widely employed in image classification for a variety of applications, as well as to extract features from convolutional layers prior to or after down sampling levels. The designs outlined above, on the other hand, are not ideal for picture segmentation or pixel-level classifications. A CNN model is the VGG-16 network architecture. There are 41 layers in the network. There are 16 learnable weight layers, including 13 convolutional layers and three fully linked layers. The encoder–decoder design is used in most pixel-wise classification network topologies, with the VGG-16 model serving as the encoder. With pooling layers, the encoder gradually reduces the spatial dimension of the images; yet the decoder restores the object’s features and spatial dimensions enabling quick and exact image segmentation. The U-Net network is a convolutional encoder–decoder network that is commonly used for semantic picture segmentation. It is unique in that it uses fully convolutional network architecture to process medical images. The complexity of CNN layers determines the accuracy of the output. In medical science, the researchers have been employing CNNs for picture classification. Prior to down sampling, they are focusing on feature extraction phenomenon using convolution layers. The encoding and decoding steps are time consuming and memory intensive. CNNs can be seen of as pre-programmed feature extractors from a set of data. While if utilize a calculation with pixel vector, it loses a ton of spatial collaboration between pixels, and contiguous pixel data is successfully used by a
392
S. Vijayalakshmi et al.
CNN to viably down sample the images first by convolution, and then by a forecast layer at the end. Even tiny tumors can be detected using this CNN-based diagnostic technology. i.
ii.
iii.
Color-based segmentation Algorithms segmentation based on color discrimination. Include principal component transform/spherical coordinate transform. Discontinuity-based segmentation Active contour/radial search technique/detection of lesion edges by zero crossing of Laplacian of Gaussian (LoG). It covers active contours, radial search and LoG. Region-based segmentation This is a method to split an image into smaller components and merge subimages that are adjacent and similar at some point. This includes merging statistical regions, growing regions at multiple levels, and morphological flooding. It is based on techniques such as split and merge statistical domain merge multi-scale morphology fluting.
Anisotropic diffusion filter The isotropic diffusion filter and the anisotropic diffusion filter are two types of diffusion filters. Linear filters are isotropic, but nonlinear filters are anisotropic. Linear filters have a constant conductivity and are homogeneous in nature. To overcome this smoothing, anisotropic filter, a non-linear approach has been suggested. The Perona-Malik equation is another name for anisotropic filters. It is the powerful image enhancer. The main aim of this filter is to reduce noise without removing significant parts of given image, sharp edges, and significant lines. Image processing with anisotropic diffusion in MATLAB code contains some important parameters. In this project, the input image is computed tomography image of liver as shown in Fig. 4, as illustrated in Fig. 5, this image is preprocessed and filtered in preparation for future processing. Gray scale • Gray scale refers to a set of monochromatic colors that span from black to white. There are no colors in it, simply shades of gray. • Gray scale values are represented as binary values as 0’s and 1’s. • Gray scale images are composed of pixels represented by multiple bits of information, typically range from 2 to 8 bits or more. • Gray scale defines each pixel (picture element) as a byte and quantifies the intensity of light reflected from an area (dot) of flat surface. Gray scale conversion RGB picture is based on RGB color model, which combines red, green, and blue lighting in a variety of ways to replicate a wide range of colors. These additive primary hues inspired the model’s name. This color model’s fundamental aim is to represent
Liver Tumor Detection Using CNN
393
Fig. 4 Input image
Fig. 5 Filtered image
and show images in an electronic system. It is a device dependent color model where different devices can detect a given RGB value through the color elements and properties and their response to the individual R, G, and B levels. Morphological operations A morphological operation is a nonlinear operation that deals with an image’s forms or morphological properties. This operation is based on the image’s pixel ordering rather than their numerical value. The procedure is concerned with the structuring of elements to produce an output image of the same size. Dilation and erosion are two typical processes in morphological operations.
394
A.
B.
S. Vijayalakshmi et al.
Dilation Pixels are added to the edges of the item in the image because of the dilation. The number of pixels added to an image is determined by its dimensions. The assess of the output image is maximum in case of dilation, and in binary image, the pixel is to be made 1 with respect to the neighboring pixels. The item expands and fills the gaps because of the dilation. Erosion The erosion process eliminates pixels from the image’s edges. The number of pixels deleted in an image is determined by the image’s size. The assess of the output image is minimum in case of erosion, and in binary image, the pixel is to be made 0 with respect to the neighboring pixels. The erosion contracts the object and removes small objects in the image shown in Figs. 6 and 7.
Thresholding It is one of the image processing methods which converts the image from gray scale to binary images which is one of the segmentation methods by setting up threshold value. It is most used in binary images but can be applied for colored images also. Pixels with values more than the threshold are transformed to white (binary value 1), while those with values less than the threshold is converted to black (binary value 0). The binary image should contain certain necessary information like position and shape of objects. The steps to be followed for thresholding are set the initial threshold value and specifically the 8-bit value of original image. Separating the image into two sections, pixels less than the threshold–background and pixels greater than the threshold–foreground pixels over the threshold are considered foreground. Calculate Fig. 6 Input image
Liver Tumor Detection Using CNN
395
Fig. 7 Eroded image
the average mean of two photos. Find the average of two mean values to determine the new threshold. Convolutional neural networks algorithm Convolutional neural networks are like traditional neural networks. A convolutional neural network (CNN) includes one or more layers of convolutional, fully connected, pooling, or fully connected, and rectified linear unit (ReLU) layers. Generally, as the network becomes deeper with many more parameters, the accuracy of the results increases, but it also becomes more computationally complex. Recently, CNN models have been used widely in image classification for different applications or to extract features from the convolutional layers before or after the down sampling layers. However, the architectures discussed above are not suitable for image segmentation or pixel-wise classifications. VGG-16 network architecture is a type of CNN model. The network includes 41 layers. There are 16 layers with learnable weights: there are 13 convolutional layers and three fully connected layers. Most pixel-wise classification network architectures are of encoder–decoder architecture, where the encoder part is the VGG-16 model. The encoder gradually decreases the spatial dimension of the images with pooling layers; however, the decoder retrieves the details of the object and spatial dimensions for fast and precise segmentation of images. U-Net is a convolution encoder–decoder network used widely for semantic image segmentation. It is interesting because it applies a fully convolution network architecture for medical images. However, it is very time-and memory-consuming. Here, we have used CNN
396
S. Vijayalakshmi et al.
to compare the stored data in the database and the segmented image of the input image.
4 Results Image processing is utilized to detect liver cancer, and the tumor region is discovered. Medical photographs are used for analysis, and the entire observation is shown in figures. This method of detecting liver cancer via image processing is tested on a variety of samples to ensure a thorough examination. When a single mass image of liver cancer is used as input, the tumor region is highlighted in red after processing. When images of healthy liver are given the indication will be of no specific region indication. Some deviations and inaccuracy take place when very small tumors around the mass of tumor are present. According to the proposed algorithm, it identifies only the mass of tumor in the liver. Here, the benign tumor, malignant tumor, non-tumor is detected from computed tomography (CT) images. Anisotropic diffusion filters are used to improve the image, which is then segmented and analyzed using morphological processes. Based on the input image, our proposed methodology classifies the tumor into three types: 1. 2. 3.
Benign tumor Malignant tumor Non-tumor.
Benign tumor Benign tumors do not spread to other regions of the body and remain in their original location. They do not spread to other sections of the body or to local structures. Benign tumors have definite edges and develop slowly. Benign tumors are typically not an issue. They can, however, become huge and crush surrounding structures, producing discomfort or other medical concerns. A big benign lung tumor, for example, might restrict the trachea (windpipe) and make breathing difficult. This would warrant urgent surgical removal. Benign tumors are unlikely to recur once removed. Fibroids in the uterus and lipomas in the skin are two examples of benign tumors. Certain benign tumors can develop into malignant cancers. These are regularly monitored and may need to be surgically removed. Figure 8 shows the filtered image of benign tumor input images for further processing which is done by using anisotropic diffusion filter where the unwanted noise of image is removed, whereas Fig. 9 shows segmentation process where the image is segmented from the filtered image after removing the noises of the image. Figure 10 shows the tumor level in the liver which was indicated in the top right corner in the image, and here, the tumor is not affected so much as it is a benign stage, and the level was indicated.
Liver Tumor Detection Using CNN
397
Fig. 8 Filtering process of benign tumor
Fig. 9 Segmentation process of benign tumor
Figure 11 is the output after segmenting the input image and shows with the diameter of tumor size is 0.2. This is useful to the doctor’s where there can quickly identify the type of tumor stages. Here, the tumor stage is in first level which benign stage is called benign tumor where the doctors can cure the tumor in early stage.
398
S. Vijayalakshmi et al.
Fig. 10 Liver tumor level for different condition in benign stage
Fig. 11 Output of benign tumor
Malignant tumor Malignant tumors have uncontrolled cell growth that spreads locally and/or to distant locations. Malignant tumors are cancerous (i.e., they invade other sites). They spread to distant sites via the circulatory or lymphatic systems. Metastasis is the medical
Liver Tumor Detection Using CNN
399
term for this type of spread. Metastasis can occur anywhere in the body, but it is most common in the liver, lungs, brain, and bone. Malignant tumors can expand quickly, necessitating treatment to stop them from spreading further. Surgery, possibly with chemotherapy or radiotherapy, is the most likely treatment if discovered early. Systemic treatment, such as chemotherapy or immunotherapy, is likely if the cancer has spread. Figure 12 shows the filtered image of malignant tumor input images for further processing which is done by using anisotropic diffusion filter where the unwanted noise of image is removed, whereas Fig. 13 shows segmentation process where the image is segmented from the filtered image after removing the noises of the image. Figure 14 shows the tumor level in the liver which was indicated in the top right corner in the image, and here, the tumor is affected where it is indicated in gray color. Figure 15 is the output shown in the console window after segmenting the input image and shows with the diameter of tumor size is 0.34. This is useful to the doctor’s where there can quickly identify the type of tumor stages. Here, the tumor stage is in second level which is malignant tumor. Non-tumor Non-tumor is nothing but the normal structure of the liver which is not affected by the tumor like benign or malignant tumor. Figure 16 shows the filtered image of non-tumor input images for further processing which is done by using anisotropic diffusion filter where the unwanted noise of image is removed, whereas Fig. 17 segmentation process where the image is segmented from the filtered image after removing the noises of the image. Fig. 12 Filtering process of malignant tumor
400
S. Vijayalakshmi et al.
Fig. 13 Segmentation process of malignant tumor
Fig. 14 Liver tumor level for different condition in tumor affected stage
Figure 18 shows the tumor level in the liver which was indicated in the top right corner in the image, whereas here, the liver was not affected by the tumor. Figure 19 is the output after segmenting the input image and shows with the diameter of tumor size is 0.11. This is useful to the doctor’s where there can quickly identify the type of tumor stages. Here, the tumor stage is in 0th level which is not affected from tumor.
Liver Tumor Detection Using CNN
401
Fig. 15 Output of malignant tumor
Fig. 16 Filtering process of non-tumor
5 Conclusion In this paper presents a liver cancer detection using image processing using morphological operations such as dilation and erosion for abdominal computed tomography scans. The obtained results ensure that this liver cancer detection can be effectively used to help medical persons in diagnosing hepatocellular carcinoma. It has been
402
S. Vijayalakshmi et al.
Fig. 17 Segmentation process of non-tumor
Fig. 18 Liver tumor level for different condition in not affected case
shown that morphological operations require less computational power and mathematical equations and calculations when compared to other image segmentation algorithms. As for more development segmentation method is improved alongside the liver module extraction ways wherever image processing ways is used that ultimately increase the accuracy of the tested results. This project additionally required to be continued on liver module detection from blob space values that required to be incorporated with more testing. Finally, this project shows the tumor region along with size of diameter and whether the image belongs to non-tumor or benign tumor or malignant tumor. Here, we are using CNN algorithm for fast and accurate output prediction. A limitation of this paper is that the performance was designed for only single mass of tumors. Our research also shows that using an extra classifier for
Liver Tumor Detection Using CNN
403
Fig. 19 Output of non-tumor
tumor diagnosis and only performing lesion segmentation when aberrant tissue is discovered improves the outcomes significantly. Our findings can be used to determine if the observed lesion is cancerous or benign. Other critical information regarding the tumor, such as the specific size of the tumor discovered, could also be collected. Finally, we feel that our proposed method might be used by radiologists as a preliminary analysis prior to writing the final report.
References 1. Gupta G, Chandel R (2013) Image filtering algorithms and techniques. Int J Adv Res Comput Sci Softw Eng 3(10) 2. Ali L, Hussain A, Li J, Howard N, Shah A, Sudhakar U, Shah M, Hussain Z, A novel fully automated liver and HCC tumor segmentation system using morphological operations. Comput Sci Math J Articles 3. Memon NA, Mizer AM, Gilani SAM, Segmentation of lungs from CT scan images for early diagnosis of lung cancer 4. Khan AA, Narejo GB (2019) Analysis of abdominal computed tomography images for automatic liver cancer diagnosis using image processing algorithm. Curr Med Imag Rev 15(10):972–982. https://doi.org/10.2174/1573405615666190716122040 PMID: 32008524 5. Zhou ZH, Jiang Y, Yang YB, Chen SF (2002) Lung cancer cell identification based on artificial neural network ensembles. Artif Intell Med 24(1):25–36. https://doi.org/10.1016/s0933-365 7(01)00094-x PMID: 11779683 6. Srisha R, Khan A, Morphological operations for image processing: understanding and its applications. In: National conference on VLSI, signal processing & communication, Dec 3013 7. Lin R, Wong EK (1996) Morphological operations on images represented by quadtrees. In: 1996 IEEE international conference on acoustics, speech, and signal processing conference proceedings, vol 4, pp 2203–2206. https://doi.org/10.1109/ICASSP.1996.545858
404
S. Vijayalakshmi et al.
8. Tang J, Sun Q, Liu J, Cao Y (2007) An adaptive anisotropic diffusion filter for noise reduction in MR images. https://ieeexplore.ieee.org/abstract/document/4303737 9. Gavgani AM, Dogrusoz YS (2012) Noise reduction using anisotropic diffusion filter in inverse electrocardiology. In: Annual international conference of the ieee engineering in medicine and biology society. IEEE engineering in medicine and biology society. annual international conference, vol 2012, pp 5919–22. https://doi.org/10.1109/EMBC.2012.6347341 10. Vijayalakshmi S, Kavitha KR, Senthilvadivu M, Amutha M, Evangelin BC (2020) Back propagation neural network algorithm based Enlistment of Induction motor parameters monitoring using Labview. J Adv Res Dyn Control Syst 12:439–452 11. Anand R, Shanthi T, Sabeenian RS, Veni S (2020) Real time noisy dataset implementation of optical character identification using CNN. Int J Intell Enterp 7(1–3):67–80 12. Christina G, Shanthini P (2021) Computation of constant gain and NF circles for 60 GHz ultra-low noise amplifiers. J Sustain Wirel Syst 03(03):146–156 13. Jeena IJ, Darney PE (2021) Artificial bee colony optimization algorithm for enhancing routing in wireless networks. J Artif Intell 3(01):62–71 14. Sharma R, Sungheetha A (2021) An efficient dimension reduction based fusion of CNN and SVM model for detection of abnormal incident in video surveillance. J Soft Comput Paradigm (JSCP) 3(02):55–69 15. Tripathi M (2021) Analysis of convolutional neural network based image classification techniques. J Innov Image Process (JIIP) 3(02):100–117 16. Manoharan JS (2021) Study of variants of extreme learning machine (ELM) brands and its performance measure on classification algorithm. J Soft Comput Paradigm (JSCP) 3(02):83–95 17. Sikkandar MY, Jayasankar T, Kavitha KR, Prakash NB, Sudharsan NM, Hemalakshmi GR (2020) Three factor nonnegative matrix factorization based HE stain unmixing in histopathological images. J Ambient Intell Hum Comput
Vehicle Spotting in Nighttime Using Gamma Correction Shaik Hafeez Shaheed, Rajanala Sudheer, Kavuri Rohit, Deepika Tinnavalli, and Shahana Bano
Abstract Vehicle detection has become an important and challenging aspect of a safe transportation system in the nighttime as most accidents occur at night due to the absence of night lighting conditions. Many algorithms detect the vehicles at nighttime based on the headlights of the vehicles, but it does not apply in the daytime or when the headlights were off. These algorithms also find difficulty when vehicles are in no motion or when it is in parking in the night. In this paper, two approaches, image transformation (IMT) approach and the vehicle detection (VD) approach, are used to detect the vehicles in the nighttime. IMT approach is built based on OpenCV and Gamma correction. This approach is used to change the illumination of the images which are not clearly visible or very dark images. Gamma correction increases brightness of an image. Second, the OD module uses the Haar cascade classifier. The patterns in this classifier can identify the vehicle/object based on those patterns. In this paper, our approach will identify the vehicles in night which are parked or headlights were off, based on patterns like in the daytime by increasing the brightness of the images, to avoid the confusion of headlights. Keywords Image transformation · Vehicle detection · OpenCV · Gamma correction · Gamma · Classifiers · Illumination · Brightness
1 Introduction Vehicle detection is the main problem in traffic/driving assistance system. Manual work has always been proven slower and less efficient due to human errors and many other factors that affect living beings. So certain intelligent traffic control systems S. H. Shaheed (B) · R. Sudheer · K. Rohit · S. Bano Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, India e-mail: [email protected] S. Bano e-mail: [email protected] D. Tinnavalli Department of Computer Science, George Mason University, Fairfax, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_27
405
406
S. H. Shaheed et al.
using various techniques have been developed [1]. As there are many different techniques and algorithms to detect the vehicles in the daytime, it helps the driving assistance system to avoid accidents. But there are only few algorithms and techniques to detect vehicles in the nighttime. A less research is done on nighttime as compared to the daytime [2]. Many vehicle detection algorithms work on daytime light conditions with different approaches compared to nighttime. However, the more accidents occur at night than daytime [3]. It is very crucial to find vehicles in nighttime to avoid accidents. There are 55% chances for road accidents in the night because at the nighttime they cannot see the opposite/parked vehicle because of headlight high intensity or environment low intensity of background [2]. The main information about the vehicle existence is from the vehicle headlights, the driver/person can identify vehicles from the headlights but at the time of high beam it is possible to get confusion which leads to accidents. To avoid this type of confusion, we introduced an algorithm which brightens the background light intensity or illuminates the night image so that vehicle and background will be visible to the driver. Generally, all the algorithms work on headlights to identify the vehicles. But it is also important to change the background and the huge spectrum of vehicles and to decrease the headlight intensity. We need a technique to change the illumination of the image. For this, we use the Gamma correction method to change the illumination of the night image. This approach can increase or decrease the values of pixels by not changing the grayscales middle range [4]. Once the darker image is converted into a brighter image, a pre-defined Haar cascade classifier is used to detect the vehicles in nighttime. Haar cascade classifier will be accurate for affected images because of illumination process. For the normal background images, the Haar cascade classifier gives the high performance [5]. Object detection has been improved in terms of speed with the applications of Haar features with the contribution of OpenCV for object detection framework [6]. OpenCV is a free open source, and it has many pre-defined functions [7]. We not only concentrate on classifying the image but also detect/estimate the locations of vehicles present in each image. However, due to the large variations in the lighting conditions and viewpoint distances, it is difficult to detect the vehicles perfectly. This paper presents the implementation and evolution of nighttime vehicle detection.
2 Literature Survey For the final point of our research study, we had seen all the past research models and all their applications. One of the main and unique implementations is CycleGAN [8], which will give an efficient result but takes a large amount of time for the required output. By using the CycleGAN [8] concept, we can convert night image to day image. But it takes more time for converting night image to day image. And sometimes required output is not produced. This is one loophole for conversion of night image to day image.
Vehicle Spotting in Nighttime Using Gamma Correction
407
Generally, for vehicle detection, there are many algorithms that are based on the headlight concept. But those algorithms are useful for detecting running vehicles. Sometimes the vehicles are parked in places, as the vehicle is stopped the headlights are switched OFF. The headlight concept is not useful for detecting the vehicles which are not running. This is a drawback for the detection of vehicles in the nighttime. By considering the abovementioned drawbacks, we used the Gamma correction [9] concept for converting night image to day image and Haar cascade classifier concept [5, 8] for detection of vehicles. In Gamma correction [9], by using the Gamma value, we will increase the brightness level of the night image. The output image is somewhat similar to the day image. It takes less time for the conversion process. And the classifiers which are used for vehicle detection will work in brighter images. So the vehicles which are not running can also be detected. This unique and efficient way of implementation is briefly discussed in the procedure.
3 Procedure 3.1 Image Transformation (IMT) Illumination of the image plays a crucial role in this project. If we cannot change the illumination, then we cannot detect the vehicle. There were many algorithms to change the illumination of the image, but we used the power law transform (Gamma correction) to change the intensity or to bright the image. This Gamma correction is used in the image transformation (IMT) approach. This approach can increase or decrease the values of pixels by not changing the grayscales middle range [4]. Check the intensities of pixels in the image. It should be in between the range [0, 255] and [0, 1.0]. From there, use the below formula O = (I/G) ˆ I, where image is I and the Gamma value is G. O is the output. If 1 > Gamma value, then image will be darker, substantially darker than the input image. We cannot observe the information on the original image. If 1 < Gamma value, then image will appear lighter. When we are increasing the Gamma values, then the image starts up glowing slowly and the things in the images are getting visible clearly. If G = 1, it will have no effect on the input image. By using this, we convert the dark night image into brighter night image, where we can be able to observe the useful information which is not visible in the various illumination conditions [4].
408
S. H. Shaheed et al.
Fig. 1 a Dark night image, b Illuminated image
Figure 1a is a normal night image, where the vehicle is barely visible and the headlights were OFF. After applying the Gamma correction on the image, we can see clearly the information of the vehicle in Fig. 1b.
3.2 Vehicle Detection (VD) Recognizing the vehicles or objects in the image is very crucial. Many approaches like cascade classifiers and decision trees are used to train the distinct features of the image [7]. After increasing the brightness, our next goal is to recognize the vehicle in the image. To detect the objects/vehicles, we need classifiers. There are many classifiers to detect the objects, but we used an XML file called Haar cascade classifier. This file has some pre-defined patterns, those patterns search for the same patterns present in the image. Those will help to recognize the vehicles or objects [6]. For faster processing, we can change the image into gray image. With the help OpenCV cascade classifiers, we can track the vehicles. An alteration can be applied on the output image which is taken from the training sample in input with the help of OpenCV [7]. For the identification of the position of the vehicle in that illuminated image, we apply the trackers. Then, we draw a rectangle around the detected vehicle by using the identified location. For detecting the vehicles, we use different type of patterns or features.
Vehicle Spotting in Nighttime Using Gamma Correction
409
Edge patterns
Line patterns
Center– Surround patterns
We apply these classifiers on the image using OpenCV to detect the vehicle. We can observe in Fig. 2a, b. Figure 2a is the night image which is dark, after applying the IMT approach, we can observe the drastic change in the background of the image, and with the help of VD (vehicle detection) approach, the vehicle is detected in Fig. 2b.
Fig. 2 a Dark image, b Vehicle detected image
410
S. H. Shaheed et al.
3.3 Pseudo Code Step 1: Start. Step 2: Provide the input image. Step 3: Apply Gamma correction on the input image. Step 4: Increase the brightness level (Gamma value) of the image for an efficient output. Step 5: Display the illuminated image (image with more brightness). Step 6: For the vehicle detection, apply Haar cascade classifier on the new image (illuminated image). Step 7: Find the exact position of the vehicle from the illuminated image. Step 8: If the position is not found, then repeat the abovementioned steps (4, 5, 6, and 7). Step 9: Display the detected vehicle output if the position is found. Step 10: Stop.
4 Flowchart To visualize our process of approach, we have given a flowchart (Fig. 3). In Fig. 4a, we show our approach; after applying the Gamma correction and Haar cascade classifier, we check whether the location of the vehicle is detected or not. Here the location is an array detected by our approach which means it indicates the vehicle is present in the image. Otherwise if the location is not present, we will go back to the IMT approach and increase the Gamma value to increase the brightness and follow the steps again. Based on the location, we draw a rectangular box around the vehicle which is shown in Fig. 4b.
5 Results By taking night image as an input image, we change the illumination and detect the vehicle as an output image. From Fig. 5, we concluded that the vehicles were detected in the dark image by increasing the illumination and with the help of Haar cascade classifiers based on patterns but not using the headlights.
Vehicle Spotting in Nighttime Using Gamma Correction
Fig. 3 Overview of the process
Fig. 4 a Classifiers applied on image, b Vehicle detected
411
412
S. H. Shaheed et al.
Input
Output
Fig. 5 Image showing the results from an input image by proposed approach
6 Conclusion Nighttime vehicle detection approach is performed in our paper, which uses image transformation (IMT) approach and the vehicle detection (VD) approach. This main goal of this approach is to detect vehicles at night with darker/brighter lighting environment, but not by finding the headlights. The proposed IMT module uses power law transform to convert the nighttime images into daytime images by increasing the brightness. The VD approach detects the vehicle in the image. The Gamma correction is faster than CycleGAN to increase the illumination of an image. However, it is helpful to increase the detection rate as some of the vehicles are not detected because of long distance and will remain as a part of background of the image. In the future work, when the input image is given, these detection approaches will increase the detection rate and time taken to recognize the vehicles. We need to study in the future to improve the detection rate of vehicles in nighttime.
Vehicle Spotting in Nighttime Using Gamma Correction
413
References 1. Addala S (2020) Vehicle detection and recognition. Lovely Professional University, Published: May2020, https://www.researchgate.net/deref/mailto%3Aaddala.11712155%40lpu.in 2. Fleych H, Mohammed IA (2021) Night time vehicle detection. Dalarna University, June 2021 3. Cai Y, Sun X, Chen L, Jiang H (2016) Night-time vehicle detection algorithm based on visual saliency and deep learning. Automotive Engineering Research Institute, Jiangsu University, Zhenjiang 212013, China, Published: 20 Nov 2016, view at: https://doi.org/10.1155/2016/804 6529 4. Xu G, Su J, Pan H, Zhang Z (2009) An ımage enhancement method based on gamma correction. In: Conference: 2009 Second ınternational symposium on computational ıntelligence and design, ISCID 2009, Changsha, Hunan, China, 12–14 Dec 2009 5. Nikhitha P, Sarvani Pm, Gayathri KL, Parasa D, Bano S, Yedukondalu G (2020) Detection of tomatoes using artificial ıntelligence ımplementing haar cascade technique. Published: 05 Mar 2020. https://doi.org/10.1007/978-981-15-2612-1_15 6. Padilla R, Costa M, Filho CFFC (2012) Evaluation of haar cascade classifiers for face detection. Federal University of Amazonas, Published: IEEE Apr 2012 7. Reinius S (2013) Object recognition using the OpenCV Haar cascade-classifier on the iOS platform. Uppsala Unıversıtet, Jan 2013 8. Shao X, Wai C, Shen Y, Wang Z (2020) Feature enhancement based on cycle gan for night time vehicle detection. School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China, Publication: IEEE Dec 22, 2020 9. Rahman S, Rahman MM, Shoyaib M (2016) An adaptive gamma correction for image enhancement. Department of Software Engineering King Saud University, Riyadh, Saudi Arabia. Published: 18 Oct 2016 10. Kubinger W, Vincze M, Ayromlou M (2015) The Role of gamma correction in colour image processing. Technische Universitat Wien, Wien, Wien, AT. Published in: 9th European signal processing conference (EUSIPCO 1998), 23 Apr 2015 11. Hassanpour H, Asadi Amiri S (2011) Image quality enhancement using pixel-wise gamma correction via SVM classifier. Department of Computer Engineering, University of Shahrood Technology, Shahrood, Iran [email protected] , [email protected], October 20, 2011 12. Alcantarilla PF, Bergasa LM, Jimenez P, Sotelo MA, Parra I, Fernandez D, Mayoral SS (2008) Night time vehicle detection for driving assistance light beam controller. Department of Electronics, University of Alcala, Madrid, Spain, Published in: 2008 IEEE ıntelligent vehicles symposium, 5 Sept 2008
Application of UPFC for Enhancement of Voltage Stability of Wind-Based DG System Namra Joshi
and Pragya Nema
Abstract In this current scenario, the wind power projects are increasing. With this increment, the integration of such power increases in the grid. This paper highlights application of flexible alternating current transmission system (FACTS) devices for voltage stability enhancement of wind-based DG systems in India. Unified power flow controller (UPFC) is used as a FACTS controller for the improvement of a considered wind power plant. With the help of MATLAB Simulink, the simulation model of a wind-based distributed generation system is prepared. The simulation results illustrate that UPFC stabilizes the system voltage after being subjected to severe disturbance. Keywords Wind Energy · Voltage Stability · FACTS
1 Introduction In this era of development of modern urban hood, the energy demand has been increased enormously and it becomes an essential need of time to look toward the sources of energy that are clean and green. Renewable energy sources, geothermal energy, biomass energy, wind energy, solar energy and tidal energy, etc., have a very nice potential to fulfill surplus power demand. In India, the Ministry of new and Renewable Energy (MNRE) will look toward the betterment of renewable energy sources. India is the fourth largest wind power [1]-producing country in the world with good potential for wind energy with a capacity of more than 35,129 MW. The benefits of the wind power plant are as follows: • The emissions of harmful gases like CO2 are very low. • Prominent economically viable resource potential. N. Joshi (B) SVKM’s Institute of Technology, Dhule, Maharashtra, India e-mail: [email protected] P. Nema Oriental University, Indore, Madhya Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_28
415
416
N. Joshi and P. Nema
Fig. 1 Wind turbine system connected to grid
• • • • • •
No impact on the cost of generation due to fluctuations in fuel supply price. Security of supply is increased. Installation and commissioning time is comparatively less. Cost-effective energy production. It can be used as a distributed generation source. Improves sustainability.
As illustrated in Fig. 1, wind turbine generator is connected with the grid through the transformer and busbar 1 and 2, and a capacitor C is also connected for providing compensation. Voltage stability [2] is the competence of a power system to keep steady admissible voltages at all buses in the system under general operating conditions and after being, subjugate to a disturbance. A power system is said to be stable in terms of voltage stability if values of voltages after a disturbance are nearer to the normal operating voltage [3]. Voltage stability is also termed load stability [4] as well. The major factors governing voltage stability are the reactive power limits of a generator, characteristics of the load, the behavior of devices used for providing compensation devices, and the operation of devices used for voltage control. Voltage stability is one of the main issues associated with wind power grid integration [5]. Voltage instability is generally occurring due to the absorbing of reactive power by the WT’s during the normal or abnormal state [6]. During abnormal state in the grid, the behavior of various WT’s [7] use now is not same. In the case of system, contingencies due to consumption of reactive power [8] by generator’s behavior may change and they may behave as motor during system contingencies and will change the voltage stability limit [9].
Application of UPFC for Enhancement of Voltage Stability …
417
Fig. 2 Distributed generation system
2 Wind-Based DG System and FACTS Distributed generation [10] is defined as the operation and installation of small power generating technologies having capability to collaborate with energy management and storage systems. These systems either operating in islanding mode or gridconnected mode. A DG system in which wind energy is used as a source of power is called as wind-based DG system. Voltage stability is a very crucial issue in a windbased DG system; with the help of FACTS devices, we can improve the same [11]. The block diagram is illustrated in Fig. 2. FACTS (flexible alternating current transmission system) [12] are power electronics-based converters used for enhancing the capability of power transfer and to increase controllability of a transmission line [5].
3 Unified Power Flow Controller A unified power flow controller comprises two major controllers, i.e., STATCOM and SSSC [13]. They both are coupled together with the help of the dc link as shown in Fig. 3. This link helps permit the flow of power from both directions. These are curbed to provide real-time control over both the compensation of the line, i.e., active power and reactive power without an external energy source. It is a kind of FACTS device that can work as phase-shifting equipment as well as a shuntcompensating device. It comprises two transformers; one is series and the other is a shunt transformer; and both transformers are coupled through 2 VSC’s with a general capacitor. It uses a DC capacitor which helps control the phase shift of series voltage and permits the exchange of P between the connected transformers. It is having very good characteristics, which make it suitable for controlling both the active and reactive power [14].
418
N. Joshi and P. Nema
Fig. 3 UPFC
4 Simulation and Results In this work, a subsection of the wind power plant located in Dewas District near Indore, Madhya Pradesh, India [15], is simulated [9, 16]. The capacity of this considered wind power plant is 9 MW, the output of this generator is given to a step-up transformer (33/132 kV) through a transmission line of 25 km, and one load is connected [17]. This model is simulated in MATLAB R2018b [18]. Figure 4 shows the simulated model of the 9 MW wind power plant. Figures 5 and 6 indicate the active and reactive power output under normal operating conditions. Figure 7 indicates the voltage at bus 33 under normal operating conditions [19].
Fig. 4 Simulation model of section of considered site
Application of UPFC for Enhancement of Voltage Stability …
419
Fig. 5 Active power at bus 33: normal condition
Fig. 6 Reactive power at bus 33: normal condition
4.1 System Under Fault Condition The designed system is now subjected to a fault at t = 15 s. Due to the fault, the bus voltage is dropped as indicated in Figs. 8, 9, 10, 11.
420
Fig. 7 Voltage at bus 33: normal condition
Fig. 8 Simulation model of section of considered site: fault condition [14]
Fig. 9 Voltage at bus 33: fault condition
N. Joshi and P. Nema
Application of UPFC for Enhancement of Voltage Stability …
421
Fig. 10 Active power at bus 33: fault condition
Fig. 11 Reactive power at bus 33: fault condition
4.2 System with UPFC The voltage of bus 33 is being improved after applying UPFC as illustrated in Fig. 13. The active and reactive power of the wind turbine with the UPFC is indicated in Figs. 12, 14 and 15 (Table 1).
422
N. Joshi and P. Nema
Fig. 12 Simulation model of section of considered site: with UPFC
Fig. 13 Voltage at bus 33: with UPFC
5 Comparative Analysis The generation of electrical power with the aid of renewable energy sources in India is increased tremendously. This gives rise to voltage instability problems when this type of generating station is operated in conjunction with the grid. It is mainly observed when this system is joined with a weak grid. For simulation, the following circumstances are taken into consideration.
Application of UPFC for Enhancement of Voltage Stability …
423
Fig. 14 Active power at bus 33: with UPFC
Fig. 15 Reactive power at bus 33: with UPFC Table 1 Magnitude of bus voltage in various operating modes
S. No
State of subsection of considered site
Value of bus voltage (in p.u.)
1
Normal condition
1.995
2
Under fault condition
1.0011
3
Considered system with 1.501 UPFC
424
N. Joshi and P. Nema
6 Conclusion Wind power is a clean source of energy. Voltage stability is important point in gridintegrated system. Thus, we can conclude that FACTS device like UPFC is proved a very effective source for improving the voltage stability of the wind-based DG system. In this paper, the voltage stability of the wind-based DG system is investigated with and without UPFC. As wind power, generation is increasing in India; so we can conclude that UPFC is becoming a necessary part of a grid-connected wind power plant as it provides reactive power compensation and maintaining the voltage values within the grid code limits.
References 1. Ackerman T (2012) Wind power in power systems, 2nd edn 2. Kundur P (1994) Power: system stability and control. McGraw-Hill, New York 3. Khandelwal A, Nema P (2021) Application of PI controller based active filter for harmonic mitigation of grid-connected PV-system. Bull Electr Eng Inf 10(5):2377–2382 4. Joshi N, Nema P, Application of statcom in voltage stability analysis of wind based distributed generation system. Int J Sci Technol Res 8(11):2126–2129 5. Praveen Kumar R, Venkatesh Kumar C (2014) Wind turbine voltage stability using FACTS device. Int J Emerg Technol Comput Sci Electron 8(1):104–109 6. Panda R, Sahoo PK, Satpathy PK, Paul S (2014) Impact analysis of wind power integration in existing power systems for study of voltage stability conditions. In: 2014 eighteenth national power systems conference (NPSC), Guwahati, pp 1–6 7. Joshı N, Nema P (2018) Analysis of stability of wind based distributed energy generation system. Int J Energy Smart Grid 3(1):1–16 8. Joshi N, Nema P (2019) Voltage stability enhancement of wind based distributed generation system By SVC. In: 2019 international conference on smart systems and inventive technology (ICSSIT), Tirunelveli, India, 2019, pp 1234–1236 9. Patil SD (2012) Improvement of power quality considering voltage stability in grid-connected system by FACTS devices. Int J Electr Electron Eng 1(3):41–19 10. Ackerman T, Anderson G, Sodder L (2001) Distributed generation: a definition. Electric Power Syst Res 57:195–204 11. Joshi N, Kotwani S (2021) Role of renewable energy development in economic growth: Indian perspective. Turkish J Online Qual Inq 12(6):7651–7656 12. Hingorani NG, Gyugyi L (2000) Understanding FACTS: concepts and technology of flexible AC transmission systems. IEEE, New York, 2000, ISBN 0-7803, pp 3455-3458 13. Joshi N, Nema P (2020) Comparative study on fact devices to enhance the performance of distributed energy system based on wind. In: 2020 international conference on inventive computation technologies (ICICT), Coimbatore, India, 2020, pp 1065–1068 14. SLK, VKM (2018) Dynamic voltage stability enhancement of a wind farm connected to grid using facts—a comparison. In: 2018 international conference on circuits and systems in digital enterprise technology (ICCSDET), Kottayam, India, 2018, pp 1–4 15. Sreedevi J, Meera KS, Noor Cheshma P, Ravichandran S, Santhanakumar R, Sumathi T (2016) Grid stability with large wind power integration—a case study. In: 2016 IEEE region 10 conference (TENCON), Singapore, 2016, pp 571–575 16. Singh J, Chand N (2015) Voltage stability of wind power by using the facts devices. Chandigarh Univ J Undergraduate Res Innov 1(1):07–10
Application of UPFC for Enhancement of Voltage Stability …
425
17. Rawat MS, Vadhera S (2016) Analysis of wind power penetration on power system voltage stability. In: 2016 IEEE 6th international conference on power systems (ICPS), New Delhi, 2016, pp 1–6 18. Toma R, Gavrilas M (2019) Wind farm optimal grid integration based on voltage stability assessment. In: 2019 11th international symposium on advanced topics in electrical engineering (ATEE), Bucharest, Romania, 2019, pp 1–6 19. Khandelwal A, Neema P (2019) State of art for power quality issues in PV grid connected system. In: 2019 international conference on nascent technologies in engineering (ICNTE), Navi Mumbai, India, 2019, pp 1–4
Video Keyframe Extraction Based on Human Motion Detection C. Victoria Priscilla and D. Rajeshwari
Abstract Due to the substantial growth of CCTV surveillance data, it is very hard to cumulate the crime scene information from a long durable video collection as frames. Keyframe extraction is used to eradicate the non-essential frames in order to reduce the processing time of an entire video. Still, keyframe extraction lags to gain more accuracy on determining the crime scene with human detection, thus the spatiotemporal feature extraction approaches the human motion detection phase using the HOG descriptor along with the SVM classifier was reviewed from the existing methods. In this study, two methods are implemented by a combination of frame difference method with HOG along SVM on various edge detection methods, predicts the optimization of human motion detected keyframes. These extracted human detected keyframes are sustaining the local features as keyframes for depicting the crime scene as a clear summarized report. Finally, the experimental result shows that spatiotemporal feature extracted keyframe through Canny edge detection achieves 98.73% as recognition accuracy. Keywords CCTV surveillance · Keyframe extraction · Human motion detection · Spatiotemporal feature extraction · Edge detection · Histogram of oriented gradients (HOG) descriptor · Support vector machine (SVM)
1 Introduction In today’s scenario, video surveillance is widely utilized in all areas of our lives with a large amount of archived video data every day. Those CCTV surveillance covers with main problems such as taking the frames which are not having any activity and acquire more space which is not essential (such as human-less, emptyspace, night-time recording) and take more time to investigate a particular crime scene. These crime scene investigations require only the motion detected frames rather than the human-less frames. The surveillance that captures the crime scene of C. Victoria Priscilla · D. Rajeshwari (B) Department of Computer Science, Shrimathi Devkunvar Nanalal Bhatt Vaishnav College for Women, University of Madras, Chennai, TamilNadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_29
427
428
C. Victoria Priscilla and D. Rajeshwari
the human motion and object detection is processed for the investigations purpose to exclude the evidence after the event transpired [1]. These stored video data are far more complex than text data, and the content is especially rich and changeable, which brings great difficulties to subsequent processing [2]. The variability of video content and formats on the footages makes it an urgent problem to find the content of interest quickly and effectively from an outsized amount of video data for the investigation to get the evidence [3]. CCTV surveillance footages are more different from the traditional videos. As some of CCTV cameras are stable at one place and capture the motion significantly. That’s the reason for surveillance videos is exhausted to decompose as scenes and not as “shots” [4]. To interpret these problems, the keyframe extraction in association with the spatiotemporal features compere to sustain the best video retrieval [5]. This spatiotemporal video sequence from the stable or moving smoothly CCTV camera deserves the vital keyframe from the dissimilarity between the existing frame and the source frame [6]. The accumulated frames are converted as an image which has been enhanced by various edge detection algorithms to consequence the effective keyframe from the video surveillance without any content loss [7]. This paperwork is organized as follows: Sect. 2 describes the literature review on existing keyframe extraction method and spatiotemporal feature extraction methods; a comparative study of between these two methods is discussed in Sect. 3; the performance evaluation and the comparative between both the techniques are reported in Sect. 4; the result is analyzed in Sect. 5, and Sect. 6 is ended with the conclusion.
2 Literature Review The CCTV video footage is very coherent in solving the criminal investigation because it also records the crime scene that occurred by the humans. Those video contents hold rich data that must be computed at the low- and high-level concerts [8]. Content-based video retrieval (CBVR) is the crest method to extricate the concealed content behind these videos through keyframe and shot boundary detection [9]. The rudimentary step of the CBVR system process is to split the video scenes into succession frames where it is preprocessed using color, motion, and edges [10] detection method to extract the keyframe. As according to the subject of matter, these keyframes are reviewed, analyzed, and identified [11] by different authors to depict the pictured output. The keyframes are consequence as content retrieval and indexing form a video summarization. To enhance the video indexing, the features are determined by face detection such as cascading multiple features [12], motion detection such as temporal filtering [13], and object detection such as spatio dynamic threshold value [14]. The concluded keyframes are extracted by discarding the redundant and very similar frames to avoid the loss of detailed visual content [15] in association with spatiotemporal features to reach the greatest accuracy level. In this section, a review for depicting the extraction of human motion keyframe detection by various methods of
Video Keyframe Extraction Based on Human Motion Detection
429
keyframe detection and spatiotemporal video feature extracted keyframe detection is discussed in detail.
2.1 Keyframe Extraction Methods The frame difference method identifies the keyframe by the dilution of sequential frames. In addition, the keyframes are also identified by reducing the recapitulated frames by the removal of redundancy either by applying PCA and HCA [16] or by the local image descriptor termed as interest points [17] to predict the exact keyframe but restrict to certain image frames. The stimulation was also culminated through the clustering method [18] and SURF detector to classify the point of interest [19] to distinguish the keyframe constraints with some loss of evidence. The extended fixture moves on with the necessary extraction of multiple keyframes within an attempt to adapt the dynamic video semantic contents [20]. Researchers select multiple keyframes that support many changes between frames. They also calculate the energy and variance characteristics of each sub-band after contoured conversion to make a feature vector to represent the video frame, thereby extracting some discrimination [21]. Another strive to extract the keyframe through the multi-scale Gaussian pyramid with the features are merged as static saliency, and therefore the statistical analysis characteristics of the motion vector field circumvent to calculate the motion saliency. Finally, the static saliency and the motion saliency are merged through the center-surround mechanism defined by the approximate Gaussian function results in less accuracy [22]. Most bottom-up saliency detection algorithms cannot obtain results on the brink of true value on images with complex content and texture [23]. Based on the thought of more feature extraction and sample learning, Table1 also proposes a picture saliency model of various author faced while predicting the required keyframe. The study explains the concept of keyframe extraction with duplicate removal from frames resulting with high and low complexity, but during the extraction process of the frames, the keyframe has to be provided with some meaning. Due to the requisite for crime investigation, these keyframes can be supported mostly through identifying, the suspect involved in the crime as summarized report. Hence the following section, with spatiotemporal explained in detail for the required purpose.
2.2 Human Motion Spatiotemporal Feature Extraction Method The object detection method is classified by the terms of background subtraction, optical flow, and spatiotemporal feature. The researchers define the above method
430
C. Victoria Priscilla and D. Rajeshwari
Table 1 Keyframe extraction merits and demerits Method name
Author
Year
Advantage
Clustering method
Chaohui et al. [18]
2018
Keyframes are very Real-time video effective and efficient application has not been enhanced
Disadvantage
Robust-principle component method
Kale et al. [24]
2014
Expeditious content from the consumer videos
Only a limited content have resulted
3D Augmentation method
Chap et al. [25]
2010
Conversion of multidimensional record
High time complex
Motion-based
Luo et al. [26]
2009
Required frames are obtained but reduce the spatiotemporal
Video expected is very poor
Optimal keyframe selection
Sze et al. [27]
2005
Probability analyses with accurate values are very fast
Increased time complexity
Context-based
Chang et al. [28]
1999
Multilevel keyframe extraction
Loss of information as by less keyframe obtained
for various categories of establishing the object detection. The background subtraction method detects the requisite object with the absence of background that never supports for crime investigation. The optical flow method prevails the existence of the object/human movement for certain distance but with high computational cost. The more supportive configuration for crime investigation is through spatiotemporal feature where it clips the human/object detection with the presence of background. The spatiotemporal is a feature of locating the human by the combination of shape and motion type in a cluttered sequence. This proposed work supports for the concept of suspecting the crime through CCTV video sequence. The human motions and their actions are recognized in CCTV surveillance based on spatiotemporal analysis, designated via the complete 3D spatiotemporal data manual traversed by the moveable human detection in the video sequence. These methods support the entire motion characteristic by spatiotemporal features [29] depicting with high acknowledgment at the region of motions. With the study of convolutional neural networks (CNNs) there is a rapid growth in identifying the human image and object. Although the resulting CNN on action detection results with the substantial action frames but with more expensive, also its detection remains quite far [30] from that of humans, limiting their real-world deployment. The 3D CNNs has shown fortunate result for the action classification through spatiotemporal feature extraction [31]. 3D kernel affiliation allows these CNNs to find out static or motion information directly from the video frames. All these approaches do not affect random background movements, like those caused by atmospheric turbulence that disrupts the detection process. A method for act detection in longdistance videos degraded significantly by the atmospheric path [32]. The objects
Video Keyframe Extraction Based on Human Motion Detection
431
moved in certain videos are quite small with low signal-to-noise ratios (SNRs), and like the turbulence-induced motion of static images. The lens may also be a collection of image frames that are continuous in time and highly correlated in content. The smallest amounts of relevant image frames during this collection are selected because the keyframe of the lens maximizes the utmost information [33]. Some of the spatiotemporal method studies for certain human action recognition reviews exist with more advantage and with mere complications as manifested in Table 2. It shows the literature study on various spatiotemporal feature extraction method reports to predict the appropriate keyframe at most to the accuracy as required by the authors. As per the study, this spatiotemporal features is well supported for executing the human motion and action with the no effect in the background. This feature supports a lot in the crime investigation of suspecting the crime without the loss of evidence.
3 Keyframe Extraction with Human Motion Detection Video surveillance is the technology of electronically capturing, recording, and storing video sequences representing scenes in motion. Before extracting the keyframe, it is an essential component for scene recognition and substantial scale mapping. The review makes a note that the local motion is not accurately depicted as frames. To conquer this need, one of the spatiotemporal feature extraction methods is applied to acquire the local feature motion representation. Thus, this section explains the comparison of obtaining a keyframe through a step-by-step execution of the proposed work by Keyframe extraction by frame difference method through spatiotemporal feature method (with human detection) to that of frame difference Keyframe extraction (without human detection).
3.1 Keyframe Extraction by Frame Difference Method The surveillance of the human motion video sequence is taken to capture the appropriate frames. Humans need to be traced from the obtained scenes by frame differencing method. These humans and their motion frames must be represented very efficiently [41] to determine the evidence with a clear magnification. In this section, the extraction of keyframes is followed by the initial step by recovering the CCTV surveillance as dataset. The footages are converted to frames followed by the combination of the frame difference method and preprocessing techniques of edge detection method, report the absolute threshold difference classifying the pixel value of each frame and extract the required keyframe as depicted in Fig. 1.
Author
Rajesh et al. [34]
Sehgal et al. [35]
Satyamoorthy et al. [36]
Ullah et al. [37]
Wang et al. [38]
Samrat et al. [39]
Oreifej et al. [40]
Method Name
Fusion of CNN and SVM
Hand-crafted motion vectors
Depth information-based method
Deep learning-based methods-CNN
Deep learning-LSTM
Human action recognition-region of interest ( ROI), SVM, and K-nearest neighbor (KNN)
Depth information-based method
Table 2 Spatiotemporal frames merits and demerits
2013
2014
2018
2018
2018
2018
2021
Year
Use of histogram from oriented 4D normal surface to distinguish the action
Propose both the static and dynamic motion at a low computational form
Attempts only video of arbitrary size
Reduce the redundancy and complexity
Multi-directional projected depth motion map to detect motion
Used background subtraction, HOG descriptor
Performs the dimensionality reduction using CNN and SVM along with background subtraction through ROI
Advantage
Accuracy (%)
Observing the normal and 90 abnormal condition of a person in surveillance
Features
More occlusions prevailed by global features
Lags by certain actions only
Time consuming
Only with fewer datasets are exhibited
Very rare to obtain the depth textures
94.8
88.89
81.9
16 actions of a particular 88.9 person of daily activity by the representation of its shape and motion
Action recognition using shape based spatial features
Human actions in videos of the arbitrary size and length using multiple features
Action recognition from a 92.66 long term sequences such as basketball, football, etc.
Action recognition with different projected directions(planes)
The accuracy power 200 features are selected 99.7 is usually low for the detection of human
Only single motion feature is exhibited
Disadvantage
432 C. Victoria Priscilla and D. Rajeshwari
Video Keyframe Extraction Based on Human Motion Detection CCTV video datasets
Frame Conversion
Pre-processed – Edge detection method
433 Keyframe Extraction method – Frame differencing method
Fig. 1 Keyframe network training data chart
The trained dataset of the video surveillance predicting the human motion by frame differencing method is very difficult to illustrate the human appropriately in the obtained keyframes as depicted in Fig. 2.
Fig. 2 Keyframe extracted from the training dataset
434
C. Victoria Priscilla and D. Rajeshwari
CCTV video datasets
Frame Conversion
Keyframe Extraction – Frame Differencing Method
Human detection- HOG & SVM classifier in frames
Pre-processed- edge detection methods
Fig. 3 Spatiotemporal network training data chart
3.2 Spatiotemporal Resulting Keyframe with Human Detection The task of pursuing and perceive what is emerging in video surveillance is the most challenging [42]. In the keyframe extraction method, the preprocessing and threshold frame difference between the existing and the proposed frame alone depicts the keyframe. The map of spatiotemporal motion is accumulated first by the detection at the frame level performance of human action detection and image recognition in real time using a histogram of oriented gradients (HOG) [43]. The feature descriptor of HOG along with the SVM classifier predicts the human recognition and motion recognition as rectangular blocks to identify the required information frame from streaming videos [44]. Those video sequences are preprocessed by the various edge detection methods. Finally, a keyframe is obtained by the absolute threshold frame differences method and the workflow is depicted in Fig. 3. The method for human motion detection received from the spatiotemporal feature as video sequence is streamed through various edge detection conditions sets a challenge to provide a strong performance when compared to frame difference keyframe extraction as depicted in Fig. 4.
4 Performance Evaluation The motion detection of video surveillance derives two important constitutions for depriving the frames. 1. If the video sequence is human-less surveillance, then the spatiotemporal feature extracted resulting video sequence attempts with fewer frames. 2. If the video sequence with more human motion detection surveillance, then it derives with increased frames using spatiotemporal feature extraction method as resulted in Tables 3 and 4. The performance measure of the comparative analysis of keyframe over spatiotemporal with human motion detected surveillance is manipulated from table 4 using the metrics such as compression ratio, precision, and recall.
Video Keyframe Extraction Based on Human Motion Detection
435
Fig. 4 Spatiotemporal keyframe extracted from the trained dataset Table 3 Frame extraction without human motion detection video sequence
Table 4 Frame extraction with human motion detection video sequence
Videos
Video frames
Spatiotemporal video frame
Video 1
610
225
Video 2
505
349
Video 3
495
536
Video 4
505
298
Video 5
510
326
Videos
Video frames
Spatiotemporal video frame
Video 1
583
619
Video 2
520
1006
Video 3
579
1369
Video 4
535
602
Video 5
584
1380
436
C. Victoria Priscilla and D. Rajeshwari
Compression ratio: It is used to analyze the compactness of the keyframe extracted. Nsk ∗ 100 Compression Ratio = 1 − Nf
(1)
Precision: It is used to analyze number of keyframes extracted by the frame-difference method pursue by the preprocessing techniques. Precision =
Nk ∗ 100 Nsk
(2)
Recall: It is used to analyze the relevant keyframe information from the total number of relevant information. Recall =
Naf ∗ 100 Naf + (Nsk − Nk )
(3)
where N k refers the keyframe is extracted without human detection from a video frame, N f is the total number of spatio-frames of human detected, N sk is the number of spatio-keyframe extracted, and N af is an actual frame extracted from the video sequence.
5 Result Analysis The frames that are extracted from the human motion detected video surveillance are analyzed for the preprocessing methods to extract the keyframe by both the comparative methods as discussed. Thus, the frames are entered into the various preprocessing edge detecting techniques to reveal the keyframe such as 1. 2.
3.
Canny edge detection: This is the most used method by considering edge detection through the local maxima of the gradient image in frames. Laplacian edge detection: The Laplacian method pursues zero-crossing edges in the second derivative as the first derivative is maximum to predict the motion detection. Sobel edge detection: The horizontal(X) and vertical(Y ) direction Sobel attains the edges at the points where the gradient of the human motion intensity is maximum. The edge locations are declared where the threshold exceeds the gradient value.
Video Keyframe Extraction Based on Human Motion Detection
4.
437
Prewitt edge detection: The Prewitt approximation returns with the edges at the points where the gradient human motion intensity is maximum. It also accompanies the pixels closer to the center of the selected region.
Keyframes obtained
These edge detection method frames out the keyframe from the absolute threshold frame difference method which evaluates that spatiotemporal feature extracted key features with Canny edge detection attain reasonable results with less time computation, robustness, and high optimization with the true edges are detected as proved in Figs. 5 and 6. Table 4 avails with the clear view of incremental spatio-frames as by human detection. Thus, the number of keyframes retrieved from the spatiotemporal amends in its results as to the frames pursue by the preprocessing techniques in its precision 9 6 3 0 VIDEO 1 -583 frames
VIDEO 2 -520 frames
VIDEO 3 -579 frames
VIDEO 4 -535 frames
VIDEO 5 -584 frames
Video Frames obtained Canny Edge Detection
Laplacian
X-Direction Sobel Edge Detection
Y-Direction Sobel Edge Detection
Prewitt-X
Prewitt-Y
Key frames obatined
Fig. 5 Edge detection of the existing keyframe extracted 50 40 30 20 10 0 VIDEO 1- 619 VIDEO 2- 1006 VIDEO 3-1369 VIDEO 4-602 VIDEO 5-1380 frames frames frames frames frames
Video Frames Obtained Canny Edge Detection
Laplacian
X-Direction Sobel Edge Detection
Y-Direction Sobel Edge Detection
Prewitt-X
Prewitt-Y
Fig. 6 Edge detection of the spatiotemporal feature video extracted keyframe
438
C. Victoria Priscilla and D. Rajeshwari
evaluation. The recall and compression ratio from the precision reports the high accuracy level of Keyframe through edge detection as illustrated in Table 5. Table 5 express the result analysis of keyframe extraction through edge detection method by the calculation of absolute threshold difference through frame difference method classifying the pixel value of each frames obtained. Each outcome obtained by these methods results that the Canny edge as 98.73%, Laplacian edge as 98.32%, X-direction Sobel edge detection as 97.99%, Y-direction Sobel edge detection as 98.25%, Prewitt −X as 98.25%, and Prewitt −Y as 98.12% from the aggregation of five videos. Thus from the obtained accuracy level, the keyframe extracted through Canny edge detection achieves the best accuracy compared to others. These research support for the crime investigation for suspecting the crime occurred by the human highlighted as rectangular box resulting a keyframe as a descriptive summarized report.
6 Conclusion Video surveillance cumulates all the evidence required for investigation only by the visual attention model that integrates the underlying features of the frames. These frames are extracted to prevent from the aspect of a large amount of video sequence data to a rich content keyframe which is one step within the attainment of contentbased video retrieval (CBVR). The existing frame differencing method reports the keyframes but lags in accuracy with the certain motion depicted keyframes. To conquer from it, the spatiotemporal HOG features along with SVM classifier interprets the human motion as a video sequence with rectangular blocks on each frame. Those video sequence frames in combination with frame difference method and by the Canny edge detection method develop relatively a stable environmental keyframe extraction compared to other edge detection methods. Finally, the overall comparison presumes that keyframe extraction of predicting the human by spatiotemporal feature extraction with Canny edge detection gains with a more accuracy rate for crime scene investigation.
P
R
CR
P
R
CR
P
R
Video 5 CR
Results
41.67 98.82 98.06 33.33 97.75 98.21 22.73 97.15 98.39 50.95 98.34 98.2 22.73 97.15 98.39 98.25
46.15 98.82 97.90 41.98 98.12 98.31 22.73 97.15 98.39 69.36 98.73 97.5 22.73 98.15 99.39 98.12
Prewitt-X
Prewitt-Y
Y-direction Sobel edge detection 41.67 98.82 98.06 36.67 98.49 98.51 17.39 96.83 98.32 62.42 98.65 98.5 17.39 95.83 99.32 98.25
X-direction Sobel edge detection 40.00 98.48 97.58 37.50 98.12 98.41 21.74 96.99 98.32 58.06 98.65 98.3 21.74 97.99 97.32 97.99
98.21 28.57 96.67 97.96 50.95 98.42 98.2 29.57 97.67 98.96 98.32
58.33 99.15 98.06 30.00 98.3
CR
Video 4
Laplacian
R
P
CR
P
Video 3
66.67 99.32 98.06 36.94 98.75 98.61 23.81 97.32 98.47 60.90 98.69 98.5 24.81 98.32 99.47 98.73
R
Video 2
Video 1
Canny edge detection
Edge detection method
Table 5 Spatiotemporal keyframe edge detection of motion detected videos
Video Keyframe Extraction Based on Human Motion Detection 439
440
C. Victoria Priscilla and D. Rajeshwari
References 1. Castanon G, Elgharib M, Saligrama V, Jodoin P (2014) Retrieval in long surveillance videos using user-described motion and object attributes. IEEE Trans Circ Syst Video Technol 2. Lu J, Liong VE, Zhou J (2015) Cost sensitive local binary feature learning for facial age estimation. IEEE Trans Image Process 24(12):5356–5368 3. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231 4. Zhu X, Wu X, Elmagarmid A, Feng Z, Wu L (2005) Video data mining: semantic indexing and event detection from the association perspective. IEEE Trans Know Data En 17:665–677 5. Chen X, Jia K, Deng Z (2009) A video retrieval algorithm based on spatiotemporal feature curves and keyframe. In: Fifth international conference on intelligent information hiding and multimedia signal processing. https://doi.org/10.1109/iih-MSP.2009.82 6. Upsana A, Manisha B, Mohini G, Pradnya K (2015) Real-time security system using human motion detection. Int J Comput Sci Mobile Comput (IJCSMC) 4(11):245–250 7. Munagekar MS (2018) Smart Surveillance system for theft detection using image processing. Int Res J Eng Technol (IRJET) 5(08) 8. Patel BV, Meshram BB, Contented based video retrieval system. Int J UbiComp(IJU), 3:13–3 9. Hiriyannaiah S, Singh K, Ashwin H, Siddesh GM, Srinivasa KG (20200 Deep learning and its application from content-based video retrieval. In: Hybrid computational intelligence for pattern analysis and understanding, Pages 49–68 10. Shanmugam TN, Priya R (2009) Effective content-based video retrieval system based on query clip. In: International conference on advanced computer theory and engineering (ICACTE) 11. Kitchenham BA, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. Evid Based Softw Tech Rep 12. Dong Z, Wei J, Chen X, Zheng P (2020) Face detection in security monitoring based on artificial intelligence video retrieval technology. IEEE Access 8:63421–63433 13. Putheti S, SriHarsha MN, Vishnuvaradhan A (2019) Motion detection in video retrieval using content-based video retrieval. Innov Comput Sci Eng, 235–242 14. Visser R, Sebe N, Bakker EM (2014) Object recognition of video retrieval. In: International conference on image and video retrieval 15. Gawande U, Hajari K, Golhar Y, Deep learinng approach to key frame detection in human action videos. Recent Trends Comput Intell. https://doi.org/10.5772/intechopn.91188 16. Gharbi H, Bahroun S, Zagrouba E (2016) A novel keyframe extraction approach for video summarization. In: International conference on computer vision theory and applications 17. Gharbi H, Bahroun S, Zagrouba E (2017) Keyframe extraction using graph modularity clustering for effective video summarization. In: IEEE international conferences on acoustics, speech, and signal processing (ICASSP) 18. Chaohui L, Huang Y (2018) Effective keyframe extraction from personal video by using nearest neighbor clustering: international congress on image and signal processing, BioMed Eng Inf 19. Sim MA, Almaadeed N, Beghdadi A (2018) A keyframe based video summarization using color features. In: Color and visual computing symposium 20. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90 21. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Conf Comput Vis Pattern Recognit 580–587 22. Lai JL, Yi Y (2012) Keyframe extraction based on visual attention model. J Vis Commun Image Represent 23(1):114–125 23. Fei M, Jiang W, Mao W (2018) A novel compact yet rich keyframe creation method for compressed video summarization. Multimed Tools Appl 77(10):11957–11977 24. Kale JK, Patil VH (2016) A study on vision based human motion recognition and analysis. Int J Ambient Comput Intell 7(2) 25. Chap G, Tsai Y, Jeng S (2010) Augmented 3D keyframe extraction for surveillance videos. IEEE Trans Circ Syst Video Technol (TCSVT) 20(11):1395–1408
Video Keyframe Extraction Based on Human Motion Detection
441
26. Luo J, Papin C, Costello K (2009) Towards extracting semantically meaningful keyframe from personal video clips: from humans to computer. IEEE Trans Circ Syst Video Technol (TCSVT) 19(2):289–301 27. Sze KW, Lam KM, Qiu G (2005) A new keyframe representation for video segment retrieval. IEEE Trans Circ Syst Video Technol (TCSVT) 15(9):1148–1155 28. Chang HS, Sull S, Lee SU (2009) Efficient video indexing scheme for content-based video retrieval. IEEE Trans Circ Syst Video Technol (TCSVT) 9(8):1269–1279 29. Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. In: 2004 IEEE computer society conference on computer vision and pattern recognition. IEEE, Piscataway, pp 819–826 30. Case JT, Ghasr MT, Zoughi R (2011) Optimum two-dimensional uniform spatial sampling for microwave SAR-based NDE imaging systems. IEEE Trans Instrum Meas 60(12):3806–3815 31. Wang X, Wang R, Deng Y, Wang P, Li N, Yu W, Wang W (2017) Precise calibration of channel imbalance for very high-resolution SAR with stepped frequency. IEEE Trans Geosci Remote Sens 55(8):4252–4261 32. Lu J, Liong VE, Zhou J (2018) Simultaneous local binary feature learning and encoding for homogeneous and heterogeneous face recognition. IEEE Trans Pattern Anal Mach Intell 40(8):1979–1993 33. Singh G, Saha S, Sapienza M, Torr P, Cuzzolin F (2017) Online Real-time multiple spatiotemporal action localization and prediction. Proc IEEE Int Conf Comput Vis (ICCV), pp 3657–3666 34. Sharma R, Sungheetha A (2021) An efficient dimension reduction based fusion of CNN and SVM model for detection of abnormal incident in video surveillance. J Soft Comput Paradigm (JSCP) 3(02):55–69 35. Sehgal S (2018) human activity recognition using BPNN classifier on HOG features. In: Proceedings of the 2018 international conference on intelligent circuits and systems (ICICS), page 286–289 36. Satyamurthi S, Tian J, Chua MCH (2018) Action recognition using multi-directional projected depth motion maps. J Ambient Intell Humaniz Comput, 1–7 37. Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2018) Action recognition in video sequence using deep bi-directional LSTM with CNN features. IEEE Access 6:1155–1166 38. Wang X, Gao L, Wang P, Sun L, Liu X (2018) Two-stream 3D convNet fusion for action recognition in videos with arbitrary size and length. IEEE Trans Multimed 20:634–644 39. Nath S, Basak S, Audin SI (2014) Spatio-temporal feature extraction scheme for human action recognition. https://doi.org/10.13140/RG.2.2.2634657283 40. Oreifej O, Liu Z (2013) HON4D: histogram of oriented 4D normals for activity recognition from depth sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Pages 716–723 41. Peng X, Schmid C (2016) Multi-region two stream R-CNN for action: in computer vision— ECCV. Amsterdam. Springer, The Netherlands, pp 744–759 42. Shi J, Malik J (2000) Normalized cuts and image segmentation. PAMI 43. Moreno IR, Otzeta JSM, Sierra B, Rodriguez I, Jauergi E (2019) Video activity recognition: state of the art. Sensors (Basel) 19(14) 44. Scovanner P, Ali S, Shah M (2007) A 3 Dimensional SIFT descriptor and its application to action recognition. In: ACM conference on multimedia
Designing a Secure Smart Healthcare System with Blockchain Neelam Chauhan and Rajendra Kumar Dwivedi
Abstract In medical service applications, IoT facilitates correspondence among specialists and patients as the later can be analyzed distantly in crisis situations through body sensor organizations and wearable sensors. Nonetheless, utilizing IoT in medical service frameworks can prompt infringement of the security of patients. Accordingly, security ought to be think about. Blockchain is one of the secure techniques and has been applied to most of the IoT situations, these days. Scarcely, the significant purposes behind utilizing the Blockchain in medical care frameworks are its conspicuous provisions, i.e., decentralization, immutability, security, privacy, and transparency. This design, in view of the blockchain, Ethereum permitting the patient just as the clinical and paramedical staff to have a protected admittance to wellbeing data. The Ethereum hub is carried out on an installed stage, which ought to give a productive, adaptable, and secure framework. Our primary intention is to give an appropriate, secure, and approved admittance to this critical information utilizing the Ethereum blockchain innovation. In this paper, we proposed a blockchain-based healthcare IoT system toward security of electronic health records (EHRs). This design is based on Ethereum that provides security to the critical clinical data of the patients. This Ethereum-based architecture provides an adoptable and secure framework of smart healthcare system. The implementation results show that our proposed technique implements security with very less computational overheads. Keywords Blockchain · Health care · Smart contract · Ethereum · Wearable sensors · IoT · Electronic health records (EHRs)
1 Introduction There has been a developing interest in utilizing the blockchain innovation to advance clinical and e-healthcare administrations. Blockchain with its decentralized and dependable nature has shown colossal possibilities in different e-wellbeing areas N. Chauhan (B) · R. K. Dwivedi Department of Information Technology and Computer Application, MMMUT, Gorakhpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_30
443
444
N. Chauhan and R. K. Dwivedi
like secure sharing of EHRs and information access the board among numerous clinical substances. Hence, the reception of blockchain can give promising answers for work with medical care conveyance and accordingly change the medical care industry. The continuous headways in the field of Internet of things (IoT) have dramatically expanded the extent of network between distant gadgets associated with the web for information and access move. Consequently, IoT changed and upset pretty much every industry that exists in the globe beginning from training area to inventory network the executives. In the healthcare area, IoT has additionally shown an amazing exhibition by facilitating the analytic techniques and productively checking the exercises of the patients. Also, the main consideration that we are centered around for IoT is that it upholds the checking of patients in any event, during the nonactive hours of the patient which in some cases is exceptionally hard to be accomplished in the customary framework. Likewise remote admittance to the information and consistent investigation of it additionally appears some broad prospects toward quicker determination and productive treatment. Some of the blockchain features are as follows: • Decentralized. It is the primary component of blockchain that it does not depend upon brought together hub. • Straightforward. Every one of the information that is put away is gotten and straightforward to every one of the hubs. • Open-source. Anybody can utilize this blockchain innovation to make any application. • Independent. In blockchain, each square can refresh or move its own information safely. • Unchanging. Since records in blockchain will be put away always and cannot be changed. • Anonymous. Blockchain to be mysterious, it effectively keeps up with trust between hub to-hub. EHRs might incorporate a scope of information, including socioeconomics, clinical history, drug and hypersensitivities, vaccination status, research center experimental outcomes, radiology pictures, important bodily functions, individual insights like age and weight, and charging data.
1.1 Motivation In this specific situation, our paper offers a half and half e-healthcare method such permits that execution of a distributed framework. This framework assists with improving security and protection of clinical, paramedical, and individual information with less classification. Focusing on a low-utilization framework, we termed this blockchain Ethereum on the way to understand an e-wellbeing stage.
Designing a Secure Smart Healthcare System with Blockchain
445
1.2 Contribution This paper makes the following contributions as follows: (i) (ii) (iii)
Implementation of a protected, half breed, clinical, and paramedical submission dependent upon this Ethereum blockchain. Enhancement on that design of the blockchain hubs to lessen control utilization. Execution on a network also versatile stage toward contact and increase information toward that blockchain.
1.3 Organization The remainder of our paper is organized as follows. Section 2 gives the background of blockchain in healthcare IoT. Section 3 introduces the related work in blockchain innovation. In the Sect. 4, we describe the proposed architecture. Section 5 describes the security analysis and discussion. Conclusions and future directions are given in Sect. 6.
2 Background In this day and age, a few frameworks need to associate with different frameworks to trade/send the information. This load of information exchanges ought to guarantee straightforwardness, security and decentralization. At first, this innovation was executed in the cryptographic forms of money like piece coins, and it has developed into numerous different businesses. To get their information, each area has begun to combine this arising innovation into their current framework. It has begun executing in different ventures like showcasing, banking, monetary, medical care and numerous different areas. Blockchain is an appropriated and decentralized data set that stores and moves the information as squares. In more straightforward terms, blockchain may allude as chain of records/information which put away as squares and no single power to control these exchanges. Blockchain is decentralized and open-source; it implies individuals who are associated with the organization can undoubtedly access and update the information. When the information is gone into the blockchain, then nobody can erase it. The data that is put away utilizing blockchain is consistently straightforward which implies nobody can change or adjust the information [1–3]. The primary concern emerges in the medical care framework is information uneven it alludes to just explicit people groups like specialists and the main clinic has the option to get to that data. Assuming the patient or his relatives needs to get to the clinical data, and then they should follow the extensive interaction to get to. Sometimes, they do not share the clinical information. The clinical data is overseen by
446
N. Chauhan and R. K. Dwivedi
medical clinics and clinical establishments. These days we are seeing the information penetrates that have occurred in the wellbeing businesses. It has quickly developed from step by step contrasted and different ventures since medical coverage requires the lawful wellbeing reports. Since in the vast majority of the nations, the protection strategy is required for every resident. The information is divided among the various clinics through electronic wellbeing records where every clinic has various phrasings, different strategies and possibly their capacity abilities additionally change. Nonetheless, the clinical data that is divided among them ought to be interoperable. To defeat these issues in electronic healthcare records, blockchain has turned into a promising job in the clinical business. To keep up with the patient’s wellbeing information safely, we embraced blockchain for the EHR. Hence, it is urgent that these health records should stay secret then consume restricted also measured admittance in this framework whichever ensures that rejection of that chronicles. That engineering’s plan means to fulfill this multitude of necessities using the blockchain innovation to hold electronic healthcare records.
2.1 Wearable Device Every quiet ought to be outfitted with at least one wearable gadget catching a bunch of wellbeing boundaries that characterize the patient’s status, (e.g., pulse, energy unit, depth, footsteps, also thermal reading). That gadget coordinates every information with the versatile application through Bluetooth [4].
2.2 Phone Software That software is introduced in patient cell phone; the situation empowers them toward make a folder (holding its public also private key) then toward convey her personal keen agreement toward this Ethereum organization. Such submission peruses this health information; after this, wearables gadget also keeps them in such patient keen agreement.
2.3 Smart Contract Within our engineering, every understanding remains regulated through wearables gadget. That gadget remains responsible for social event information this would be put away within the patient smart contract. Thus, it is used for every understanding; only keen agreement remains sent. The utilization of the smart contract remains clarified within the accompanying segment [5].
Designing a Secure Smart Healthcare System with Blockchain
447
2.4 Internet Software The element such imagines information persistently and permits healthcare experts to screen the patient’s status relying upon their entrance level.
2.5 Healthcare Expert He can be a specialist, a dietician, an analyst, and so forth. He addresses a hub in our blockchain system. He can envision the information through the network submission dependent on the put away information. This information is put away in a data set in the cloud. At every enlistment, the exchange should be affirmed and hence saved in the smart contract.
2.6 Patient He is not a hub in the blockchain system; he just collaborates with it through his cell phone, getting information from the wearable gadgets and transfer them to the blockchain to be put away in the shrewd agreement. Figure 1 describes the necessities of blockchain innovation to hold electronic health records.
Fig. 1 Electronic health records (EHRs)
448
N. Chauhan and R. K. Dwivedi
3 Related Work The medical care industry stands up to the third developmental pattern of IT computerized innovations with suggestions, so significant investigators think it is another time of worldwide registering [6, 6]. To be sure, for all associations, advanced change will be stirred by fundamental innovations like portability [8], 3D, Internet of things (IoT) [9], huge information [10], profound learning [11], AI utilizing division and grouping cycle [12–15], also healthcare approaching since business [16]. Many examination studies and ventures have managed the utilization of the blockchain innovation and IoT in e-wellbeing frameworks. Increasingly more medical services associations remain spread over the blockchain innovation in their frameworks; this innovation remains assuming a pivotal part in the medical care shop these days. It can give computerized information social occasion and confirmation processes, exact and aggregated accounts since various causes which remain unchanging, sealed, then suggestion defended information, by a minor hazard of cyberattacks. Presently, the medical services professional expressions various difficulties concerning security episodes, information trustworthiness, information proprietorship, and so forth [17, 18]. Electronic health record frameworks make medical care benefits more effective. They can lessen the critical responsibility of the clinician and give analytic help that forestalls clinical blunders. They can report demonstrative examinations and clinical medicines, give clinical choice help, and work with correspondence among medical services suppliers. While the greater part of the works proposes ways to deal with secure basic clinical data that must be gotten to by doctors, patients, or, now and again, medical care suppliers otherwise sellers, our submission introduced in the present paper remains a cross breed e-healthcare submission such incorporates information such should stay classified then open toward doctors, whereas different information perhaps split through different entertainers. Electronic wellbeing record frameworks make medical care benefits more productive. They can decrease the huge responsibility of the clinician and give analytic help that forestalls clinical blunders. They can record symptomatic examinations and clinical medicines, give clinical choice help, and work with correspondence among medical care suppliers [19, 20]. Tanwar et al. [21] introduced a framework dependent on healthcare 4.0 submission. The blockchain innovation proposals a framework engineering such ensures safe admittance toward information by an entrance power strategy executed for members to accomplish security also information proprietorship used for patient role in this EHRs framework. Dagher et al. [22] planned a protection saving system used for accessible authority then integrating of electronic healthcare data utilizing the blockchain innovation. That planned structure, carried out on the current framework, depends on Ethereum. The situation utilizes encoding and confirmation all through the blockchain, which exhibits the arrangement of safety also accessing power.
Designing a Secure Smart Healthcare System with Blockchain
449
Smys et al. [23] described response and location instrument for upgrading the security of the shrewd auto. Vehicular legal sciences, gotten vehicular organization, and trust the board are a portion of the essential applications that are examined by utilizing the records positive proof of safety upgrade and gives insurance against an assortment of assaults; there is still requirement for development in a wide scope of assaults, for example, those completed by harmless interior substances. Joe et al. [24] introduced deniably validated encryption and blockchain is joined to foster an encryption calculation that is appropriate for clinical picture sharing, consolidating the benefits of both the calculations emphatically. This model is utilizing a cloud waiter with blockchain, so the information that is shared the remaining parts are detectable, extraordinary and non-altered. The planned engineering eliminates that focal position then doesn’t current a weak link in the framework, on account of the disseminated idea of such blockchain. Framework safety remains touched, on account of that changelessness of such distributed record by way of somewhat hub can’t modify the record. Utilizing a less force utilization agreement system such can approve exchanges also perfect squares in a quick and safe manner displays the blockchain’s latent capacity and importance in a few regions and affirms such it very well may remain the following progressive innovation used for suggesting innovative medical care framework structures [25].
4 Proposed Approach In that approach, we center around a sufferer-driven request aimed at putting away automatic clinical chronicles. For such reason, we accept such patient remains utilizing approximately wearable gadgets able to do constantly estimating a prearranged place of boundaries of this healthcare rank of that sufferer (like calorie, oxygenate immersion, pulse, and hypotension). That information assembled through various wearable devices remain forever transferred toward that distributed record. This information is likewise put away to recover back the patient’s status improvement permitting the medical care faculty to have better perceivability of his development. Hence, it is urgent that these clinical records should stay secret then consume restricted also measured admittance in this framework whichever ensures that rejection of that chronicles. That engineering’s plan means to fulfill this multitude of necessities using the blockchain innovation to hold electronic healthcare records. To execute our design, we pick the Ethereum network since it regards the rules of our necessities. Truth be told, Ethereum upholds the execution of brilliant agreements and offers the likelihood to pick between various agreement conventions, proof of work (PoW), and proof of authority (PoA).
450
N. Chauhan and R. K. Dwivedi
The PoW suggests a solid e-healthcare framework such can get even a community organization. One of the serious issues of PoW is the essential to take critical equipment assets to encounter the analytical necessities. The PoA grants to foster an isolated also allowance blockchain with low power utilization. Below Algorithm 1 & 2 upload and read health records to implement our design. Algorithm 1 The algorithm depiction displayed in Algorithm 1 addresses the most common way of putting away the patient information in the blockchain network. At the point when the client solicitations to synchronize the information with his wearable gadget and transfer them to the blockchain, in the first place, the versatile application will get the wellbeing data from the wearable gadget and imagine it to the patient. From that point onward, it will execute the shrewd agreement work, which transfers the information which would make an exchange and sends it to the blockchain network to be approved by the companions. These last options confirm the exchange as indicated by the agreement convention being utilized. A reaction is then given by the friends, regardless of whether they approved and marked the exchange or they decided to be unauthentic. With this reaction, the savvy contract and hence the application will be advised, and the new information passage will be added to the smart contract. Algorithm 1: Uploading EHR Input: Request to upload data from the smart contract Output: Automated health records added to the smart contract Step 1: Begin Step 2: if Ppb = = Ownerpb then Step 3: Create Health Info object Step 4: Push the new object to the EHR map Step 5: return” EHR Uploaded successfully” Step 6: else Step 7: return Unauthorized Access Step 8: End if Step 9: End
Algorithm 2 The calculation portrayal in Algorithm 2 gives an itemized depiction of how our request handles envisioning and checking the wellbeing data of patients by wellbeing work force. The wellbeing proficient should interface with the web application to screen his patient’s data. Then, at that point, he will be diverted to the patients’ rundown where he will track down a rundown of all patients that approved him to get to their information. The web application will demand the patient data from his smart contract utilizing the wellbeing proficient Ethereum certifications.
Designing a Secure Smart Healthcare System with Blockchain
451
Algorithm 2: Reading HER Input: Request to read data from the smart contract Output: Access to automated health records Step 1: Begin Step 2: if HPpb ∈ Dlist then Step 3: Convert EHR map into a string Step 4: include All EHR Attributes Step 5: return EHR string Step 6: else if HPpb ∈ Nlist then Step 7: Convert EHR map into a string Step 8: Only Include EHR Attributes accessible by a nutritionist Step 9: return EHR string Step 10: else if HPpb ∈ Clist then Step 11: Convert EHR map into a string Step 12: Only Include EHR Attributes accessible by a coach Step 13: return EHR string Step 14: else Step 15: return Unauthorized Access Step 16: end if Step 17: End
In our model, we decided to chip away at a private Ethereum blockchain to ensure the security of the data being put away in smart contracts. The principal motivation behind the proposed stage is to execute blockchain hubs on a stage with restricted assets assuming the part of the blockchain backing and web server. Figure 2 shows EHR storing and monitoring architecture of wearable devices and internet software.
Fig. 2 Block diagram of EHRs architecture
452
N. Chauhan and R. K. Dwivedi
5 Security Analysis and Discussion In this part, we give execution examination on security of the proposed framework by means of two danger situations. Likewise, specialized components of the planned information sharing plan are additionally examined to feature the convenience and practicality of our plan.
5.1 Security Analysis Security is consistently an extraordinary worry in clinical information sharing frameworks where delicate patient data should be safeguarded well against potential dangers to ensure patient protection and organization security. In our plan, all healthcare records are scrambled with the public key of EHRs administrator for transferring to the decentralized distributed storage. To recover cloud information, any requestors need to know the private key of EHRs administrator to unscramble such information bundle. Note that this private key is remarkable and just known by the EHRs administrator. Figure 3 describes the CIA group of three model (confidentiality, integrity, and availability) is one of the most well-known security systems associated with the blockchain structure. CIA group model is a model that assists associations with organizing their security act. • Confidentiality. Confidentiality is a method for keeping data hidden away from unapproved individuals. • Integrity. Integrity is a method for ensuring the unapproved altering of data.
Fig. 3 Healthcare security with blockchain
Designing a Secure Smart Healthcare System with Blockchain
453
• Availability. Availability alludes to on schedule and solid admittance to information. The way from information to data and data to esteem implies that the worth will be ill-conceived assuming the data is not accessible at the perfect opportunity.
5.2 Results and Discussion Contrasted with the current works, the proposed stage has further developed the security part of shared information. The fundamental motivation behind the proposed stage is to execute blockchain hubs on a stage with restricted assets assuming the part of the blockchain backing and web server. The proposed EHRs sharing framework is examined and assessed under different execution measurements to exhibit the attainability of our model for genuine ease of use situations. (A)
Flexibility
Since our plan is sent on a versatile stage, any clients with cell phones can undoubtedly to chip away at our framework while permitting the opportunity of clients with high adaptability. Our framework can function admirably with various versatile stages, including Android and iOS variants, expanding the convenience of our plan in various medical services framework. (B)
Availability
Our framework permits approved portable clients to get to e-medical care records whenever and anyplace with a versatile application. The utilization of portable application permits clients to collaborate with our framework in a continuous and dynamic way, with profoundly accessible clinical information on cloud. (C)
Avoid Single Point of Failure
Our plan utilized the decentralized stockpiling framework that addresses viably the weak link issue. Additionally, access control empowered by the blockchain method is running in a shared way among decentralized substances that can likewise add to conquer this test. (D)
Integrity
Respectability ensures that patient data is divided among approved clients with no change. Clinical records gathered from versatile passages are constantly scrambled to keep away from any changes. In the interim, for EHRs sharing, portable clients cannot adjust the marked exchanges to brilliant agreements and no any substances can alter and change content of recorded exchanges. Significantly, versatile clients cannot have privileges to change or adjust the understanding in the brilliant agreement and access approaches in our situation.
454
N. Chauhan and R. K. Dwivedi
Fig. 4 Processing time on cloud verses user requests
(E)
Data Privacy
By taking advantage of safety ability of blockchain and keen agreements, our entrance control plot ensures information protection and information responsibility for. Vindictive access is impeded by client character ability and approval of brilliant agreement, forestalling potential dangers from getting to our distributed storage. Besides, illicit exchanges will be discredited and eliminated from our blockchain network by the agreement interaction. One more fascinating element of our plan is that all elements in the blockchain share equivalent information the board freedoms and screen all exchanges and messages. Thusly, any change to cloud clinical records can be recognized effectively by portable clients and informed to the cloud director for protecting patient information security. In Fig. 4, we measured the average time consumption on cloud for processing the access requests of users. The implementation results show that our proposed technique implements security with very less computational overheads.
6 Conclusions and Future Work Within this paper, we concentrated upon putting away automated health record wherever this information gathered through those sent gadgets remain basic. Our objective
Designing a Secure Smart Healthcare System with Blockchain
455
was to suggest a conveyed, got, and authorized admittance to this touchy information utilizing the arising blockchain innovation. In that review, we planned an IoT blockchain-inserted engineering aimed at medical services submission to keep and look at EHRs. We investigated distinctive blockchain devices then stages accessible, and Ethereum was the maximum satisfactory to carry out our engineering. Toward approving our methodology, genuine submissions were performed to show the performance and provisions of our engineering. As future work, we desire to execute a modern framework. This framework should uphold a more extensive scope of devices which can be executed on a wearable gadget. It would suggest health staff additional boundaries to evaluate the patients’ role.
References 1. Dwivedi RK, Kumar R, Buyya R (2021) A Novel machine learning-based approach for outlier detection in smart healthcare sensor clouds. Int J Healthcare Inf Syst Inf (IJHISI) 16(4) (Article 26, IGI Global) 2. Srivastava R, Dwivedi RK (2021) A survey on diabetes mellitus prediction using machine learning algorithms. In: 6th international conference on ICT for sustainable development (ICT4SD 2021), Goa, India, Springer, Aug 05–06 3. Srivastava R, Dwivedi RK (2021) Diabetes mellitus prediction using ensemble learning approach with hyper parameterization. In: 6th international conference on ICT for sustainable development (ICT4SD 2021), Goa, India, Springer, Aug 05–06 4. Rosic A (2016) Smart contracts: the blockchain technology that will replace lawyers 5. Reyna A, Martin C, Chen J, Solar E, Daz M (2018) On block chain and its integration with IoT. Challenges Opportunities Future Gener Comput Syst 88:173–190 6. Dziak D, Jachimczyk B, Kulesza W (2017) IoT based information system for healthcare application: design methodology approach. Appl Sci 7(6):596 7. Frikha T, Ben Amor N, Diguet JP, Abid M, A novel Xilinx-based architecture for 3D graphics. Multimed Tools Appl 78(11):1494 8. Pantelopoulos A, Bourbakis NG (2010) A survey on wearable sensor-based systems for health monitoring and prognosis. IEEE Trans Syst Man Cybern Part C (Appl Rev) 40(1):1–12 9. Ravi D, Wong C, Deligianni F (2017) Deep learning for health informatics. IEEE J Biomed Health Inf 21(1):4–21 10. Dhouioui M, Frikha T (2021) Design and implementation of a radar and camera-based obstacle classification system using machine-learning techniques. J Real-Time Image Process 11. Havaei M, Guizard N, Larochelle H, Jodoin P (2016) Deep learning trends for focal brain pathology segmentation in MRI 12. Greenspan H, van Ginneken B, Summers RM (2016) Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans Med Imaging 35(5):1153–1159 13. Nie D, Zhang H, Adeli E, Liu L, Shen D (2016) 3d deep learning for multi-modal imagingguided survival time prediction of brain tumor patients. Med Image Comput Comput Assist Interv MICCAI 9901:212–220 14. Yu J-S. Chen J, Xiang Z, Zou Y-X (2015) A hybrid convolutional neural networks with extreme learning machine for WCE image classification. In: Proceedings of the 2015 IEEE international conference on robotics and biomimetics (ROBIO), pp 1822–1827, Zhuhai, China, Feb 2015 15. Wan J, Tang S, Li D et al (2019) Reconfigurable smart factory for drug packing in healthcare industry 4.0. IEEE Trans Industr Inf 15(1):507–516
456
N. Chauhan and R. K. Dwivedi
16. Yang G, Pang Z, Jamal Deen M et al (2020) Homecare robotic systems for healthcare 4.0: visions and enabling technologies. IEEE J Biomed Health Inf 24(9):2535–2549 17. Dwivedi RK, Kumari N, Kumar R (2019) Integration of wireless sensor networks with cloud towards efficient management in IoT: a review. In: 2nd Springer international conference on data & information sciences (ICDIS 2019), Agra, India, Pages 97–107. Springer, Mar 29–30 18. Graber ML, Byrne C, Johnston D (2017) The impact of electronic health records on diagnosis. Diagnosis 4(4):211–223 19. Schopf TR, Nedrebø B, Hufthammer KO, Daphu IK, Lærum H (2019) How well is the electronic health record supporting the clinical tasks of hospital physicians? A survey of physicians at three norwegian hospitals. BMC Health Serv Res 19(1):934 20. Frikha T, Chaari A, Chaabane F, Cheikhrouhou O, Zaguia A (2021) Healthcare and fitness data management using the IoT-based blockchain platform. J Healthcare Eng 2021(9978863):1–12 21. Tanwar S, Parekh K, Evans R (2020) Blockchain based electronic healthcare record system for healthcare 4.0 applications. J Inf Secur Appl 50(102407) 22. Dagher GG, Mohler J, Milojkovic M, Marella PB (2018) Ancile: privacy-preserving framework for access control and interoperability of electronic health records using blockchain technology. Sustain Cities Soc 39:283–297 23. Smys S, Wang H (2021) Security enhancement in smart vehicle using blockchain-based architectural framework. J Artif Intell 3(02):90–100 24. Joe CV, Raj JS (2021) Deniable authentication encryption for privacy protection using blockchain. J Artif Intell Capsule Netw 3(3):259–271 25. Dwivedi RK, Kumar R, Buyya R (2021) Secure healthcare monitoring sensor cloud with attribute-based elliptical curve cryptography. Int J Cloud Appl Comput (IJCAC) 11(3) (Article 1, IGI Global)
Vehicle Classification and Counting from Surveillance Camera Using Computer Vision A. Ayub khan, R. S. Sabeenian, A. S. Janani, and P. Akash
Abstract Over the last decade, the number of automobiles has skyrocketed. There are more than one billion active automobiles in the universe, including 70–80 million in India. Handling such traffic conditions and providing enough parking spaces is a difficult task. Vehicle counting and classification on congested routes will assist authorities in obtaining traffic flow data as well as understanding and studying traffic patterns, allowing for the most efficient traffic management. As a result, the aim of this work is that a cost-effective vision-based vehicle counting and classification system that is mainly implemented in OpenCV utilising Python programming and some methods of image processing. Keywords Traffic management · Vehicle counting · Vehicle classification · Computer vision · Cost-effective
1 Introduction Because of the expansion of road networks, the quantity, and, most crucially, the size of vehicles, the necessity for efficient traffic control and monitoring has grown in recent decades. Intelligent traffic surveillance systems are an essential aspect of modern traffic management, but traditional traffic management techniques such as wireless sensor networks, inductive loops, and EM microwave detectors are costly, bulky, and hard to deploy without disrupting traffic. Video surveillance systems might be a useful alternative to these methods. Because of advancements in storage capacity, computer power, and video encryption methods, video surveillance systems have grown less expensive and more effective.
A. Ayub khan (B) · R. S. Sabeenian · A. S. Janani · P. Akash Electronics and Communication Engineering, Sona College of Technology, Salem, Tamil Nadu, India e-mail: [email protected] R. S. Sabeenian e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_31
457
458
A. Ayub khan et al.
Humans often analyse the videos saved by these surveillance devices, which is a time-consuming task. The demand for more robust, automated video-based surveillance systems to overcome this restriction has sparked research in the subject of computer vision. A traffic surveillance system’s primary goal is to identify, monitor, and categorise cars, but it may also be used to do more complicated tasks like driver behaviour identification and lane recognition. The traffic surveillance systems may be used for a variety of purposes, including public safety, abnormal behaviour detection, accident detection, car theft detection, parking spaces, and person identification. The application of vision—based on actual situations—is the primary driver of interest in traffic management. Vehicle segmentation under a variety of climatic situations, such as night-time, snowing, or hazy weather, has been a major roadblock in our efforts. As a result, we created a new pre-processing module based on Histogram Equalization that enhance quality of the video and morphological process to add or remove pixels in object borders before going on to the next phase, where video is reliant on the form and size of the structural components. Cars travelling in the same direction, whether in a darker shade zone, or likewise, seem to have the same shade as the backdrop, making vehicles recognition extremely difficult and leading to a vehicle count mistake. As a consequence, if the vehicle ID reaches a specific threshold, we register it using a background subtraction method.
2 Literature Survey To estimate the number of automobiles in actual traffic video streams, J. Quesada and P. Rodrigue proposed utilising a novel incremental PCP-based approach. We put our technique to the test on a variety of datasets, and we get results that are competitive to state-of-the-art methods in terms of accuracy and speed: 99% accuracy while counting automobiles crossing through a remote door, 92% accuracy while predicting the number of vehicles present in the scene, and processing times of close to 26 frames per second [1]. Mohamed A. Abdelwahab presented an effective method for counting vehicles based on DNN and KLT tracker. Vehicles are identified by DNN every Nframes, for example, to reduce time complexity. N-frames for tracking corner points is used to derive trajectories. Then, to provide their related trajectories distinct vehicle labels, an efficient technique is proposed. The suggested results, which were tested on a variety of vehicle recordings, indicate that cars are correctly tracked and tallied regardless of whether the DNN detects them one or more times [2]. The use of visual surveillance sensors for data collecting and automatic processing in intelligent transportation systems (ITS) has become a popular application field for computer vision technology. In most cases, the initial phase of a visual traffic surveillance system is to accurately detect traffic. Identify things in films and categorise them into several categories. The enhanced spatiotemporal analysis is presented in this study. To improve the resilience of background removal in image processing, a sample consistency method (STSC) is developed [3]. The Sparse-Filtered CNN with
Vehicle Classification and Counting from Surveillance …
459
Layer Skipping (SF-CNNLS) technique for vehicle type classification was developed by Awang et al. Three channels of the SF-CNNLS method were used to extract discriminant and rich vehicle features in this literature review. Additionally, based on colour, brightness, and shape, the global and local attributes of the cars were retrieved from the three channels of an image. The performance of the developed SF-CNNLS technique was validated on a benchmark dataset in the Experimental Results and Discussion section. Finally, vehicle types such as truck, minivan, bus, passenger, taxi, automobile, and SUV were classified using the softmax regression classifier. Higher-level layers are included in the created softmax regression classifier; nevertheless, by embedding lower-resolution vehicle photographs, vehicle type information may be lost [4]. For lightweight moving vehicle categorisation, Nasaruddin et al. proposed an attention-based strategy and a deep CNN technique. The generated model’s performance was validated on a real-time dataset using specificity, precision, and -score in this study. However, in situations like baseline, camera jitter classes, and bad weather, the created model’s performance was limited [5]. On the BIT Vehicle Dataset and MIO-TCD, Hedeya created a new densely linked single-split super learner and applied versions for vehicle type classification. To obtain superior vehicle type classification performance, the built model was simple and did not require any logic reasoning or hand-crafted characteristics. The created model highlights the vanishing gradient problem in big datasets, which is a prominent worry in this research review [6]. The recognition and categorisation of automobiles in traffic footage is at the heart of such a system. For this goal, we develop two models: one is based on a MoG + SVM system, while the other is based on Faster RCNN, a newly popular deep learning architecture for object recognition in pictures is proposed [7]. The AVCV model contains active contour, which is used to determine whether or not an item is a vehicle, Gaussian distribution, which is used for background removal, and Bilateral Filter, which is used to remove shadows and smooth the picture. In addition, the Kalman Filter is utilised to minimise picture noise, and the Hough Transform and Histogram of Oriented Gradient (HOG) techniques are employed to increase counting accuracy by allowing the model to discriminate between two overlapping vehicle objects [8]. Road marker detection method uses vehicle detection and counting algorithm. The algorithms are intended to process pictures captured by a fixed camera. On an embedded platform of smart cameras, the created vehicle identification and counting algorithm were installed and evaluated [9]. The suggested technique employs a unique incremental PCP-based algorithm to estimate the number of cars present in real time in top-view traffic video sequences. We put our technique to the test on a variety of tough datasets, and we get results that are comparable to state-of-the-art methods in terms of performance and speed: While counting cars going through a virtual door, the average accuracy is 98%; when estimating the overall number of vehicles present in the scene, the average accuracy is 91% [1]. This research project uses the camera module of the Raspberry Pi 3 coupled with the Raspberry Pi 3 to identify cars, monitor, and predict traffic flow utilising low-cost electrical equipment. It also intends to build a remote access system based on the Raspberry Pi that will detect, track, and count cars only when certain alterations in
460
A. Ayub khan et al.
the monitored region occur [10]. Two flexible fixed Regions of Interest (ROIs) have been employed in the proposed system: one is the Headlight ROI, and the other is the Taillight ROI, both of which can adjust to varied image and video resolutions. The suggested two ROIs can function concurrently in a frame to recognise impending and preceding automobiles at night, which is a major accomplishment of this research. To detect the car headlights and taillights, segmentation techniques and the double thresholding approach were utilised to extract the red and white components from the scene [11]. RGB road photographs are converted to HSV images using this approach. The software then compares the value readings with a predetermined threshold parameter to determine whether the image is day or night. At present moment, two separate approaches for detecting daytime and night-time vehicles are being employed [12]. The suggested SVS approach is appropriate for situations where motion estimate accuracy is critical. There are four stages to the proposed SVS video encoding. The pre-processing stage is the first step, during which unneeded background regions are deleted. Then, for improved element detection, the local motion Estimation vectors and global motion Estimation vectors are calculated. The characteristics are then extracted using a video enhancement, and detection approach based on Spatial Video Scaling is proposed [13]. The suggested approach sorts a tree’s offspring and compares them to an image stored in a database. Image collection and hand binning are the first steps. Images are trained and tested after they have been binned. The database that was constructed takes into consideration the fact that there are over 300 photographs of various classifications. As a final stage of the suggested model, sorting is performed when the trained picture matches an image in the database. For deep research, the proposed framework employs the AlexNet CNN approach, which provides 97% more accuracy than the present framework using SVM classifier [14]. To assess the data in the forest image, the suggested Supervised Multi-Model Image Classification Algorithm (SMICA) is applied [15]. The impacted region is identified independently in this system, and it provides reliable forest fire information because the output picture intensity is better at stabilising the average value of the image. Furthermore, the suggested Finite Picture Clustering Segmentation (FICS) retains image edges while reducing excessive noise and region in the output image, and it was decided to break the image into small sub-areas. The suggested classifier’s performance measures based on accuracy was proposed [16]. Here, the Gaussian mixture method is used for background subtraction. The Gaussian distribution is calculated for marking the foreground pixels in the images. Here, two algorithms such as CNN and SVM are used for training and classification. The combination of CNN and SVM architecture gives good accuracy with higher efficiency and least loss than other classifiers [17]. Most of the researchers used computer vision techniques in the medical applications. When compared to other procedures, computer vision is used for navigation purpose. Various types of fruits and vegetables are detected using computer vision methods. The SVM classifier with a deep learning feature outperforms its transfer learning competitors. CNN architecture are very speedy and more trustworthy and energetic [18].
Vehicle Classification and Counting from Surveillance …
Input Frame
Background Subtraction
Vehicle Identification
Vehicle Tracking
461
Vehicle Classification
Vehicle Counting
Fig. 1 Design of the proposed system
3 Proposed Methodology This would be a method for devising a system for counting automobiles and detecting them from a video frame. Let us have a look at the system’s flow chart to learn more about the various components and techniques in use. A.
Overview of the proposed system
The basic real-time CCTV camera has the frame rate of 20 to 30 frames per second, from that the video frames are extracted and used as input to the system. From the input frame, background subtraction is done to identify the vehicles [17]. To discover foreground items, background subtraction is conducted to video frames with a virtual detector. To create forms like squares (0, 0), the find contour method is proportional to their size. Due to winds, rain, and lighting issues in the area, we would apply morphological operations in our outdoor work a mechanism for obtaining high-accuracy outcomes in the face of harsh weather conditions. Then by using the Kalman filter method, objects may be detected and tracked. Tracked cars are tallied, categorised, and counted depending on the area’s thresholds for different vehicle sizes (Fig. 1). This proposed method required the following system configuration. • • • • •
Operating system: Windows 10 Processor: AMD or Pentium RAM: 4 GB or Above Hard Disc: 500 or Above Catch: 512 MB or Above.
4 Results and Discussions The system may be used to identify, recognise, and track cars in video frames, and then categorise the discovered vehicles into three distinct sizes based on their size. Figure 2 shows the detailed methodology of the proposed system. Three modules make up the proposed system: • Detecting Vehicles • Tracking Vehicles • Classification and Counting of Vehicles.
462
A. Ayub khan et al.
Surveillance Camera
Input Source Video
Background SubtracƟon
ROI SegmentaƟon
Kalman Filter
Contour Extraction Of Different Vehicles
Morphological Operations
Thresholding
Calculation Of Centroid
Vehicle Counting
Vehicle Classification
Display Output
Fig. 2 Proposed system block diagram
5 Vehicle Identification 5.1 Background Subtraction In video surveillance systems, background removal is a common technique for detecting moving objects. It uses the difference between the backdrop and input pictures to segment moving objects. The secret to background removal is to start with a solid foundation. In a video, the paradigm of computer vision offers Recognise objects that belong to different classes with the vision such as vehicles or people. Object detection is used in the paradigm of computer vision to tackle realistic in fields like as to detect a human or vehicle, image search and surveillance footage are used [18]. BackgroundSubtractorMOG2 is the algorithm that was utilised to build the suggested system. BackgroundSubtractorMOG2 utilises an automatic technique and picks a suitable number of Gaussian mixtures for the pixel, unlike other algorithms in which the number of distributions for the development of the background model is defined. In addition, the system is better at handling variations in scene lighting [17]. The technique of removing the target picture from the source image is known as background subtraction. Origin: The source picture is a colourful or grey scaled 8-bit or 32-bit floating point image. The target image might be either a 32-bit or 64-bit floating point image [10]. Alpha is the image’s weight. The speed of updating is determined by alpha; in existing frames, set a lower value for this variable. The method also allows you to choose whether or not the objects’ shadows should be detected. It is worth noting that the implementation’s default settings are for
Vehicle Classification and Counting from Surveillance …
463
shadow detection. Python function that implies for background subtraction is that cv2.createBackgroundSubtractorMOG2() [19].
5.2 Binarisation of Images The image frame acquired from the camera set at a point is used as input to the algorithm in this step. Because the image is a colour image, it must first be transformed from RGB to grey, which is the first and most important step in image processing. The frame acquired following the conversion to grey scale should then be binarised. Binarising the image aids in the effective processing of the image in following stages.
5.3 Detecting the Edges This is a crucial step in the approach since we must first highlight any items present in a picture before we can detect them. As a result, we should now indicate the object’s borders for which edge detection is used. One of the most significant applications in image processing is edge detection. To achieve detection aims, each picture (video frame) comprises three key characteristics. Edges, curves, and points are examples of these characteristics. Among the attributes stated, edge pixels are a good choice. We can detect edge pixels, which are the major features of passing automobiles in a highway video frame, by processing picture pixels. The Canny operator, which has been employed in this study, is one of the most used approaches to detect the edges of a picture.
5.4 Morphological Operation A variety of morphological procedures are done to the picture in this phase in order to prepare it for vehicle detection in the next step. Morphological procedures include erosion and dilation. Dilation and erosion are the most fundamental morphological processes. • Dilation increases the amount of pixels on the corners of things in a picture. A XOR B is dilation. It is the opposite of contraction, with areas expanding beyond their borders. • Erosion reduces the amount of pixels on the corners of things. Erosion is the process of removing pixels from the region’s edges.
464
A. Ayub khan et al.
Python code for morphological Operations:
#Opening (erode->dilate) to remove noise mask = cv2.morphologyEx(imBin, cv2.MORPH_OPEN, kernelOp) mask2 = cv2.morphologyEx(imBin2, cv2.MORPH_OPEN, kernelOp) #Closing (dilate->erode) to join white region mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernelCl) mask2 = cv2.morphologyEx(mask2, cv2.MORPH_CLOSE, kernelCl)
5.5 Extraction of Contours Contours are the form’s borders that are utilised for shape detection and recognition. The canny edge detection done on a binary picture can be used to define the accuracy of the process of identifying the contours. The cv2.findContours () function in OpenCV is used to locate contours. Python code for implementing contours: contours0,hierarchy=cv2.findContours(mask,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NNE) for cnt in contours0: cv2.drawContours(frame, cnt, -1, (0,255,0), 3, 8) area = cv2.contourArea(cnt)
Objects that fall under the specified threshold are identified in Fig. 3. Vehicles are not the only things that move on/across the road, it is been noticed. There are pedestrians, animals, and people pushing carts, among other things. Additionally, when things travel across the screen, their size and form change. The masked picture of the identified items is shown in Fig. 4. Masking is used to draw attention to specific items in the frame of a video. Objects having a smaller
Fig. 3 Video frame image threshold
Vehicle Classification and Counting from Surveillance …
465
Fig. 4 Masked image for video frame
region greater than the threshold value are not considered. Selecting a Region of Interest (ROI) aids in the detection of items, as well as vehicle detection and subsequent application via tracking of vehicles.
6 Tracking Vehicles The path taken by an item to identify the reported path of a goal in near real-time monitoring and safety for traffic management but without intervention of a human computer. Tracking’s major objective is to recognise the target objects in a sequence of frames. In such settings, objects change shape and size over time, necessitating the use of a motion model to reconstruct trajectories and high-accuracy models for a fixed vehicle. Item tracking is a mechanism used in vision computers to locate a recognised object. The Kalman Filter technique was utilised for object detection in this study. Kalman filter is an efficient recursive technique used to track a moving object in a video frame. The Kalman filter utilises information from the previous frame’s identified object and gives the object’s updated position estimation. Passing cars might be counted and classified using the boundary boxes. The Kalman filtering technique can be used to accomplish this. The edge detection feature in roadway films delivers an erroneous location of moving cars, but the vehicle’s present position information has to be enhanced. Because flawless measurements are impossible to ensure owing to object movement, measurements should be filtered to generate the most accurate estimate of the correct track. By limiting noise disturbances, the Kalman filter can ideally assess each vehicle’s present position and also anticipate the placement of cars in future video frames. It is also used to block vehicles moving in the opposite direction from being tracked in footage shot on the road. Although edge detection can identify moving objects, the Kalman filter uses a series of localisation measurements to calculate an ideal location estimate. The linear Kalman filter is more straightforward and is employed in the suggested method. Consider parameter A to be the area of the vehicle’s bounding box, which was discovered during the frame differentiation process, and p(x, y) to be the vehicle’s centre point, where x and y are the vehicle’s horizontal distances as well as vertical edges. Now, by combining the offered ideas, the parameter in Eqs. (1) and (2) resulted in the vectors to follow:
466
A. Ayub khan et al.
xk = [x, y, A, vx, vy, v A]T
(1)
yk = [x, y, A]T
(2)
where vA is the pace at which the vehicle’s bounding box changes. The speed of change in the movement is measured by box, vx, and vy of the vehicle’s central axis after that, you may use the Kalman filter. The position of each vehicle may be determined using a filtering process. Better approximated and tracked. Finally, a unique identification is created. Each passing vehicle is given a number that may be used to count how many people are in the car purposes of classification [15].
6.1 Classification and Counting of Vehicles Vehicles are tallied when they step out of the frame or cross the line at one of the frame’s departure points. We utilise counting lines to count cars travelling in two separate directions: down count as (red line) and up count as (blue line). In the counting stage, a counter is utilised to keep track of the total amount. A counter should be used to keep track of the cars passing in a certain direction. So, if a vehicle comes to a halt, turn in any direction in the detecting zone to avoid being tallied. Counting is done in this approach based on the number of moving vehicles identified in the detection zone as shown in Fig. 5. Following python code is implemented for line setting to count the vehicles moving in upward and downward direction: BLUE LINE
UP
DOWN Surveillance Area
RED LINE
Surveillance Camera Fig. 5 System for vehicle counting and detection
Vehicle Classification and Counting from Surveillance …
467
line_up = int(2*(h/5)) line_down = int(3*(h/5)) up_limit = int(1*(h/5)) down_limit = int(4*(h/5)) print("Red line y:", str(line_down)) print("Blue line y:", str(line_up)) line_down_color = (255,0,0) line_up_color = (0,0,255) Vehicles are classified based on their area. If the perimeter of the bounding box is less than 300, it is considered a motor; if it is less than 500, it is considered a car; and if it is greater than 500, it is considered a bus or truck. Figure 7 depicts the cars that were tallied and categorised using the provided threshold. If the vehicle is counted, the area, height and width are computed for each vehicle and displayed as 1. And as the number of cars identified grows, so does the count. If no vehicle is identified, the value is set to 0 and the detection of other frames continues. Once the vehicle is counted as 1, the area size, height, and width will be presented on the output screen, as illustrated in Fig. 6. Figure 7 shows how the red and blue regions of interest and centroid are utilised to determine the area of each vehicle; once it has passed the red line, frame differentiation is used to count the vehicles. The vehicle would be tallied after crossing the
Fig. 6 Information of vehicles crossed
468
A. Ayub khan et al.
Fig. 7 Count of vehicles on upward and downward direction
Table 1 The experimental results of proposed method counts with a high level of precision a
B
c
d
70 200
e
f
g (%)
h (%)
1
4
5
5
5
100
100
3
8
10
10
10
100
100
460
12
7
18
19
19
95%
100
1503
17
21
38
40
42
95
95.32
1821
34
48
82
85
81
96.48
95.29
2502
67
45
112
115
113
97.39
98.26
a—The total number of input frames b—Count of cars moving upwards c—Vehicle count in descending order d—In reality, the total number of cars e—Number of cars that have been detected f—The number of cars that have been tracked g—Accuracy of detected vehicles h—Accuracy of vehicle tracking
blue line. It would work in both bi-directional and single-direction scenarios. If no vehicles are identified, the binary picture is saved as 0 (Table 1). Setting rectangle area parameter based on video:
if perimeter < 300: UpMTR += 1 elif perimeter < 500: UpLV += 1 elif perimeter > 500: UpHV += 1
Vehicle Classification and Counting from Surveillance …
469
if large < 400000: DownMTR +=1 elif large < 600000: DownLV += 1 elif large > 600000: DownHV += 1
7 Conclusion We presented centralised approaches in this system to accomplish vehicle development and outperformance categorisation and counting process. The employment of a background removal algorithm is a common approach for improving vehicle detection. The suggested technique removes the extraneous information and more accurately distinguishes the vehicles. The accuracy rate for object recognition and tracking is 99% for detection of object and 98.6% for tracking of object, according to experimental findings using Open CV. We can see from the table that the suggested approach is useful in detecting, tracking, classifying, and counting moving vehicles in a more precise and an effective method, regardless of climate circumstances such as dark, snowing, or sandy.
References 1. Quesada J, Rodriguez P (2016) Automatic vehicle counting method based on principal component pursuit background modeling. In: IEEE international conference on image processing (ICIP), pp 3822–3826 2. Abdelwahab MA (2019) Accurate vehicle counting approach based on deep neural networks. In: 2019 International conference on innovative trends in computer engineering (ITCE). IEEE, New York, pp 1–5 3. Wang Y, Ban X, Wang H, Wu D, Wang H, Yang S, Liu S, Lai J (2019) Detection and classification of moving vehicle from video using multiple spatio-temporal features, recent advances in video coding and security. IEEE Access 7:80287–80299 4. Awang S, Azmi NMAN, Rahman MA (2020) Vehicle type classification using an enhanced sparse-filtered convolutional neural network with layer-skipping strategy. IEEE Access 8:14265–14277 5. Nasaruddin N, Muchtar K, Afdhal A (2019) A lightweight moving vehicle classification system through attention-based method and deep learning. IEEE Access 7:157564–157573 6. Hedeya MA, Eid AH, Abdel-Kader RF (2020) A super-learner ensemble of deep networks for vehicle-type classification. IEEE Access 8:98266–98280 7. Arinaldi A, Pradana JA, Gurusinga AA (2018) Detection and classification of vehicles for traffic video analytics. In: INNS conference on big data and deep learning
470
A. Ayub khan et al.
8. Chhadikar N, Bhamare P, Patil K, Kumari S (2019) Image processing based tracking and counting vehicles. In: 2019 3rd International conference on electronics, communication and aerospace technology (ICECA) 9. Li D, Liang B, Zhang W (2014) Real-time moving vehicle detection, tracking, and counting system implemented with OpenCV. In: 2014 4th IEEE international conference on information science and technology 10. Kulkarni AP, Baligar VP (2020) Real time vehicle detection, tracking and counting using Raspberry-Pi. In: Proceedings of the second international conference on innovative mechanisms for industry applications (ICIMIA 2020) IEEE Xplore Part Number: CFP20K58-ART; ISBN: 978-1-7281-4167 11. Muslu G, Bolat B (2019) Nighttime vehicle tail light detection with rule based image processing. In: 2019 Scientific meeting on electrical electronics & biomedical engineering and computer science (EBBT), Istanbul, Turkey, pp 1-4.https://doi.org/10.1109/EBBT.2019.8741541 12. Chowdhury PN, Chandra Ray T, Uddin J (2018) A vehicle detection technique for traffic management using image processing. In: 2018 International conference on computer, communication, chemical, material and electronic engineering (IC4ME2), Rajshahi, 2018, pp 1–4. https://doi.org/10.1109/IC4ME2.2018.8465599 13. Ayub Khan A, Srinivasan P, Sree Southry S, Vinayagapriya S (2020) Adequate improvement for spatial video scaling for video surveillance applications. J Crit Rev 7(6):441–445. ISSN: 2394-5125 14. Sabeenian RS, Paramasivam ME, Anand R (2020) Efficient gold tree child items classification system using deep learning. J Adv Res Dyn Control Syst 12:1845–1859 15. Sree Southry S, Vinayagapriya S, Ayub Khan A, Srinivasan P (2020) A highly accurate and fast identification of forest fire based on supervised multi model image classification algorithm (SMICA). J Crit Rev 7(6) ISSN: 2394-5125 16. kumar RA, Sai Tharun Kumar D, Kalyan K, Rohan Ram Reddy B (2020) Vehicle counting and detection. Int J Innov Technol Explor Eng (IJITEE) 9(8) ISSN: 2278-3075 17. Sharma R, Sungheetha A (2021) An efficient dimension reduction based surveillance. J Soft Comput Paradigm (JSCP) 3(02):55–69 18. Tripathi M (2021) Analysis of convolutional neural network based image classification techniques. J Innov Image Processing (JIIP) 3(02):100–117 19. Justin R, Kumar R (2018) Vehicle detection and counting method based on digital image processing in Python. Int J Electr Electron Comput Sci Eng Special Issue ICSCAAIT-2018
Classification of Indian Monument Architecture Styles Using Bi-Level Hybrid Learning Techniques Srinivasan Kavitha , S. Mohanavalli , B. Bharathi , C. H. Rahul, S. Shailesh, and K. Preethi
Abstract India is known for its rich architectural and cultural inheritance. In this research work, classification of the Indian historical monuments has been attempted in two levels, primarily based on the time period of its construction and secondly based on its architectural style, using machine learning (ML) and deep learning (DL) techniques. A rich corpus of monument images of varying historical periods and architectural styles has been collected from the web, blogs, and tourism websites. Feature extraction methods such as speeded-up robust feature (SURF), scale-invariant feature transform (SIFT), features from accelerated segment test (FAST), oriented FAST and rotated brief (ORB), Hu moments, image moments, and Zernike moments are used to extract the features from the input images. ML techniques such as decision tree algorithm (ID3) were used for time period classification in the first level and support vector machine (SVM) for classification of architectural style in the second level. Convolutional neural network (CNN) is designed and validated for training and testing images for both the levels of classification to observe the performance of DL technique in monument classification. Results of S. Kavitha (B) · B. Bharathi · K. Preethi Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, India e-mail: [email protected] B. Bharathi e-mail: [email protected] K. Preethi e-mail: [email protected] S. Mohanavalli Department of Information Technology, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, India e-mail: [email protected] C. H. Rahul Freshworks, Chennai, India e-mail: [email protected] S. Shailesh Ajira Tech of Symbion Technologies, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_32
471
472
S. Kavitha et al.
this research show that features extracted using Hu moments and Zernike moments resulted in improved accuracy compared to other feature sets. CNN outperforms ML approaches in overall accuracy in both levels of monument classification and also for few specific architectural styles. Keywords Indian monuments · Architectural styles · Decision tree algorithm · Support vector machine algorithm · Convolutional neural network technique
1 Introduction India is a country which has seen the rise and falls of several dynasties through the ages and has a rich architectural and cultural inheritance. The history and heritage of every kingdom are greatly preserved in the monuments and buildings that were built during the reign of the rulers. There are thousands of such monuments in the country built centuries ago and exist till today speaking our heritage. The philosophies and historic details of these prestigious monuments have to be preserved and conserved for our future generations. Today’s generation is not well-informed of the historical significance or the architectural patterns of our monuments. There exists a social need to provide adequate information regarding the significance of monuments and maintain it as a system to preserve the history and educate it to the coming generations. It is very essential task to develop a forum or system to enable identification of monuments and provide its historic details to the interested community. Nowadays, people have cultivated a habit of travelling to tourist places every year and sharing the photos of them in such places in social media. An automated system which would describe the details and history associated with the input image of a place would be much helpful in promoting tourism. This research attempts to study the characteristics and features of monument images and classify the monuments based on time period of construction and its architecture. It also aims to provide a brief description about the architectural style which could be used by tourists and common people to know about the monuments in India.
2 Related Work Monuments speak about the heritage of a place and the history and culture of the era it was built. It has to be preserved for the future generations both physically and digitally with all information. Moreover, such information would be much useful to restore a monument damaged by natural calamities, fire accidents, etc. A monument corpus with images is tagged with information related to it and is essentially required to transfer the knowledge of our heritage to future generations [1, 2]. Feature extraction and selection play a major role in classifying monument images. Extracting relevant features for image identification task is a challenging research
Classification of Indian Monument Architecture …
473
issue even today. Researchers have attempted to address this issue in classifying monument images using various methods. Edward and Tom [3] in their work have used a machine learning approach to identify corners in an image using FAST feature detector and decision tree classifier. Giuseppe et al. [4] in their work have used visual features to identify prominent landmarks like monuments, buildings, etc., and tested with global and local features. Grigorios et al. [5] in a task of classifying monuments in Greece have used a graphbased visual saliency classifier to retain only valid SURF and SIFT keypoints which reduced matching process time in image classification. Bhatt and Patiala [6] in their research work have attempted to classify ten popular Indian monuments. Their proposed system is designed to extract features from monument images using local binary pattern (LBP), histogram, co-occurrence matrix, and Canny edge detection methods and classify it using SVM classifier. Dataset populated with 500 images (50 images for 10 most popular monuments in India) has been used in classification. Content-based image retrieval (CBIR) is a vision-based technique, commonly used for automatic classification of archeologic monuments. The shape-based features extracted using morphological operations and moment invariants and texture features from GLCM were used to study the art form and retrieve the similar images. A corpus of 500 images with 5 categories of images (mosque, church, temples of style Hampi, Kerala, and South) was used as reference collection in the work of Desai et al. [7]. Another attempt to classify architectural styles of the monuments into the categories ‘Indo-Islamic’, ‘Maratha’, ‘British’, ‘Sikh’, and ‘Ancient’ was done by Sahay et al. [8]. The monument dataset was populated from Web sites and various blogs. Oriented FAST and rotated brief (ORB) feature extraction technique was used to get the local features of each image and quantized using clustering techniques. Both image-wise and descriptor-wise classifications have been experimented, and overlap in the architectural styles is observed as an issue, which has to be addressed with better discriminating descriptors. Shape- and texture-based features were used in the work of Ghildiyal et al. [9] to implement a bag-of-words technique that performs better in classifying the given archaeological image. A corpus of 459 monument images constructed with manually cropped images without background has been classified into 9 categories. Kavitha et al. [10] have combined the second-order statistical and textural features of images for bi-level image classification, using a fuzzy classifier designed with decision tree and genetic algorithms. In the traditional methods, certain predetermined features like GIST, LBP, histogram of oriented gradient (HOG), SIFT, etc., were extracted from the set of input images of different classes and used to train a classifier. The performance of such classification of images depends on how well these predetermined features are suitable for class discrimination. To overcome this issue of identifying the optimal set of discriminating features from a given set of input images, deep learning techniques with convolutional neural network (CNN) architecture have been evolved.
474
S. Kavitha et al.
Recently, researchers widely used the pretrained neural networks in many applications. Gada et al. [11] in their research work used the transfer learning on the Inception v3 architecture of deep neural networks to perform monument classification. Saini et al. [12] have compared the performance of conventional machine learning algorithms and deep learning approach in monument classification. The dataset consists of 5000 images (50 different images for a monument) and has been cropped manually and used in CNN, which gave an improved accuracy of 97%. Yoshimura et al. [13] have proposed a deep CNN to understand the visual similarities in architectural design to identify the architect. Recently, an unsupervised deep learning method called DeepBit has been used by Lin et al. [14] to learn the discriminating binary descriptors of an image. Classification of elements of architectural heritage images has been attempted by Llamas et al. [15] in their research work using CNN. Palma [16] has developed a mobile application to identify and obtain details of a monument image from a prestored database using deep learning techniques. From the study of related work, it has been noted that only single-level classification is performed with minimum number of architecture styles in both machine learning and deep learning methods. Also, selecting the discriminating features plays a vital role in monument classification. In addition, CBIR is concerned and the image retrieval is performed without brief description. To address these limitations, this research focuses on bi-level classification model to classify the time period of a monument and its architectural styles with brief historical description about the images.
3 Proposed System The classification of monuments based on time period and architectural style is done with two different approaches, namely machine learning and deep learning. The dataset consists of images of famous monuments in India, taken from various sources in the Internet. These images belong to different time periods, and there are 2–3 architectural styles under each time period as shown in Fig. 1. The proposed bi-level classifier as shown in Fig. 2 gives the framework of the training and testing phases using machine learning approach. Pre-processing is done using Canny edge detection method to extract the dominant edges. The features are extracted from the images using various feature extraction methods such as SIFT, SURF, ORB, Hu moments, Zernike moments, and image moments. This process is repeated for images in both training and testing datasets. The features extracted from training images are used to build a decision tree and classify the time period using ID3 algorithm in the first level. These feature sets are categorized according to the time period for second-level classification. Individual SVM models are built for each time period to classify the architectural styles during that period. In testing phase, the extracted features of the test images are given to the ID3 model to get the time period of the monument. The SVM model corresponding to the time period identified in the
Classification of Indian Monument Architecture …
475
Fig. 1 Architecture style of monuments during various time periods
Fig. 2 Proposed bi-level classifier design using machine learning approach for training set and test set
first level is used to classify the architectural style of the monument. In addition to the time period and architectural style of the monument, its historical description is also provided.
476
S. Kavitha et al.
3.1 Feature Extraction Features are extracted from the images using speeded-up robust feature (SURF), scale-invariant feature transform (SIFT), and oriented FAST and rotated brief (ORB) as keypoint descriptors, and Hu, image, and Zernike moments as shape-based descriptors from the images are discussed in this section. The algorithms of keypoint descriptors are given below. Algorithm 1 Scale Invariant Feature Transform (SIFT) Input: Data set Image Matrix I, derived from input image Output: Keypoint descriptor vector K. function SIFT(I) Read the image matrix I for all Points in I do Compute the gaussian scale space using difference of Gaussian function. Convolve the image iteratively with a Gaussian kernel of 1.6 Down sample the convolved image by 2, in all directions. Repeat step (4) to repeat the convolving process. end for Find the candidate key points C using local extrema (maxima or minima) of DOG function. for all Candidate Points Ci in C do Localise the keypoints K by fitting a 3D quadratic function to the scale-space local sample point (interpolated extrema). Filter unstable keypoints from K (noise and edge points). end for Assign one or more orientations to each keypoint Ki in all gradient directions (multiple keypoints created at the same location but with different orientations).} Construct Keypoint descriptor K as a 128 bit vector using image gradients. return K end function
Algorithm 2 Speeded Up Robust Features (SURF) Input: Data set Image Matrix I, derived from input image Output: Keypoint descriptor vector K. function SURF(I) Read the image matrix I for all Points in I do Detect the interest points using the local maxima of the Hessian determinant operator applied to the scale space. Validate the detected interest point for the chosen threshold value. end for Construct the actual descriptor (64 dimension) from the spatial localization grid using local histogram of the Haar wavelet responses. Compare the descriptors of both images using Euclidean distance and select all potential matching pairs, to construct the keypoint descriptor K. return K end function
Classification of Indian Monument Architecture …
477
Algorithm 3 Oriented Fast and Rotated Brief (ORB) Input: Data set Image Matrix I, derived from input image Output: Keypoint descriptor vector K. function ORB(I) Read the image matrix I FAST(I) BRIEF(I) Find the keypoints from Accelerated Segment Test (FAST) and Binary Robust Independent (BRIEF) and construct K. return K end function function FAST(I) Read the image matrix I Select top key points (multiscale-features) of I using Harris corner. Validate the corner points using Bresenham circle of radius 3. return Kf end function function BRIEF(I) Read the image matrix I Select set ND as (x,y) location pairs and compare the pixel intensity. ND can be 128, 256 or 512 derives a descriptor of 16, 32 or 64. return Kb end function
Moments are shape-based descriptors used to extract information like centroid, intensity, etc. from the images. Jan and Tomas [17] in their research report have described the effective calculation of image moments. Various moments like Hu, image, and Zernike are used to extract features from the training and testing images [18], whereas a combination of moments is used to increase the accuracy of the system.
3.2 Classification A bi-level classification is carried out using ID3 algorithm in the first level for time period and SVM in the second level for architecture style using the algorithms given in Kavitha et al. [19]. In addition, CNN is designed and implemented for both the levels to validate the importance of deep learning in monument classification.
3.3 Convolutional Neural Network Convolutional neural network (CNN) is trained with reduced number of parameters and makes the model invariant to translation, thus making it more suitable for classifying monument images. For first-level classification (time period), the CNN is totally of five-layer network that comprises three convolutional layers followed by the pooling layer with filter sizes of 32, 32, and 64, respectively, with Rectified
478
S. Kavitha et al.
Linear Unit (ReLU) as the activation function and the optimizer used is gradientbased optimization technique (RMSprop). The images are convolved with the filters at each convolutional layer. At the end of each convolutional layer, the feature maps obtained serve as the input for next layers. The maxpooling layer reduces the size of each feature map, thereby each dimension is halved. Thus, the new feature map extracts the prominent features from the previous feature map. At the end of the first convolutional layer, 298 × 298 is reduced to 149 × 149. This process is repeated for all the three convolutional layers along with pooling layers. The obtained 3D feature maps are further converted to 1D feature vector at the flatten layer. The first dense layer or the fully connected layer is apparently a neural network with ReLU as the activation function. The dropout is set to 0.5, which increases the accuracy with a gradual decrease in the loss. The second dense layer involves activation function as the softmax, with five different classes of time period. These parameters are set for training the CNN, with training samples of 6197 and test samples of 1556. The trainable parameters obtained are 5,046,564 that eventually involve all the prominent features to classify the monuments based on the time period. The model is trained for batch size of 16 with 5 epochs. The bi-level classification for the architecture style encompasses five convolution layers that are convolved with the filter sizes of 64, 32, 32, 16, and 16, respectively, with ReLU activation function, where each convolutional layer is followed by maxpooling layer. The softmax layer is defined with 13 different architecture style labels. The hyperparameters used for training are as follows: Dropout regularization is 0.1, and batch size is 8 with 20 number of iterations, where the number of training samples is 460 and 103 test samples.
4 Experimental Results and Performance Analysis The experimental analysis of the proposed system has been done to identify the time period of the monument and its corresponding architectural style. The performance of the proposed bi-level classifier was evaluated by performing various experiments for the first-level, second-level, and bi-level classification of monuments using machine learning and deep learning techniques, for various feature extraction methods. The metrics used for evaluating the classifier were accuracy, sensitivity, and specificity.
4.1 Dataset Description Images of Indian monuments pertaining to various time periods, namely PostMahajanapadas, early common era, Late Middle Ages, Early Modern Period, and European Era, were collected from various sources to build the corpus. A total of 563 images were used, and the description of training and testing images is given
Classification of Indian Monument Architecture …
479
Table 1 Dataset description—time period Time period
No. of training images
No. of testing images
Total images
Post-Mahajanapadas
79
19
72
Early common era
91
21
112
Late middle ages
49
11
60
Early modern period
132
32
164
European era
109
20
129
Table 2 Dataset description—architectural style under each time period Time period
Architecture style
No. of training images
Post-Mahajanapadas
Cave
21
5
Stupa
58
14
72
Dravidian
31
7
38
Early common era
No. of testing images
Total images 26
Kalinga
30
7
37
North India
30
7
37
Late middle ages
Hoysala
20
4
24
Vijayanagara
29
7
36
Early modern period
Indo-Islamic
58
14
72
Maratha
42
10
52
European era
Sikh
32
8
40
British
61
8
69
French
29
7
36
Portuguese
19
5
24
in Table 1 for five time periods and in Table 2 for 13 architectural styles under each time period.
4.2 Result of Feature Extraction Initially, image features are extracted as keypoints of the input monument images using various techniques like scale-invariant feature transform (SIFT), speeded-up robust feature (SURF), and oriented FAST and rotated brief (ORB), which are plotted on an image and shown in Fig. 3. The comparison of the keypoints and descriptor size obtained is given in Table 3. However, these features were not successful for classifying monument images. Apart from keypoint features, additional image information was required to discriminate the monument architectural styles which are
480
S. Kavitha et al.
Fig. 3 Keypoints plotted
Table 3 Comparison of keypoints and descriptor size
Method/No. of keypoints
SIFT
SURF
ORB
Keypoints
500
500
500
Descriptor
128
128
50
structurally very similar. The additional information was extracted as Hu moments, Zernike moments, and image moments.
4.3 Time-Period-Based Classification Using ID3 The features extracted using the methods mentioned in Sect. 3.1 along with the ground truth labels for the five time periods as given in Table 1 were used in building the training model. The result of ID3 training is generated as a model file, which can be used for testing.
4.4 Architecture-Based Classification Using Support Vector Machine Similarly, in the second level of classification, the class label of the architectural style for five different time periods is used in training. A SVM model specific to the architecture style in that period is generated. After identifying the time period using ID3, the corresponding SVM model file is applied to identify the architectural style of the input test image.
4.5 Result of Classification The results of classification are analyzed with respect to feature extraction methods, levels of classification, and type of classification techniques (ML or DL). From the
Classification of Indian Monument Architecture …
481
feature extraction methods, it has been observed that keypoint extraction methods performed poorly in predicting the time period and the architectural style of the test images. These features are not suitable for monument classification as the shapes of different architectural styles are much similar. The accuracy obtained for bi-level classification is 5%, 12%, and 9% for SIFT, SURF, and ORB, respectively. Further to improve the monument identification, the features were extracted using Hu, image, and Zernike moments for training and test set images. The size of each vector is 1 × 56. Using ID3 and support vector machine, the time period and architectural style of the given input images are identified correctly. The sample input image and the corresponding output are given in Fig. 4, which shows that the system has classified the input image to Post-Mahajanapadas period and cave architectural style. Also, a brief description of the identified architectural style is displayed.
4.6 Performance Analysis—Machine Learning Techniques The machine learning techniques ID3 and SVM were used for the time period classification and architectural style classification, respectively. The performance measures obtained for the metrics specificity, sensitivity, and accuracy for the first-level (ID3 algorithm), second-level (SVM algorithm), and the combined two-level classifications were compared with the result of CNN. The results are given in Tables 4, 5, and 6, which describe each level of results with its technique for better clarification. The average sensitivity, specificity, and accuracy of first-level classification using ML technique are 51.06, 88.75, and 82.52, respectively, whereas for second-level classification, it is 4.40, 92.04, and 85.21, respectively. The sensitivity of second level is very low, because many architecture styles are similar in shape. In combined level, it has improved to 37.09, and the specificity and accuracy are 94.75 and 90.59, respectively. The proposed bi-level classification system exhibited the highest accuracy of 99% for class 1 cave architecture and the least accuracy of 72% for class 8 Indo-Islamic (Table 6).
4.7 Performance Analysis—Deep Learning Technique The performance for monument classification using deep neural networks was evaluated using the images of training and test set for first level (5 classes) and second level (13 classes). These images were validated by changing different parameters such as with/without augmentation, number of epochs, structure, optimizer, and so on. The results given in Tables 7 and 8 are obtained without augmentation, as the accuracy improved by 3–5% for this category. The average sensitivity, specificity, and accuracy of first-level classification using DL technique are 72.2, 93.4, and 90.28, respectively, whereas for second-level classification, it is 13.73, 92.61, and 92.61, respectively.
482
S. Kavitha et al.
(a)
(b)
Fig. 4 a Input image and b time period and architectural style of the input image with description Table 4 Specificity, sensitivity, and accuracy of decision tree—first-level classification Time period/performance measure
Post-Mahajanapadas
Early common e
Late middle ages
Early modern period
European era 89.1
Specificity
91.6
90.2
92.3
80.2
Sensitivity
36.8
76.1
27.2
75
40
Accuracy
81.5
87.3
85.4
78,6
79.6
1 100 0 95.1
Architecture style/performance measure
Specificity
Sensitivity
Accuracy
82.5
0
95.5
2
90.2
14.2
95.8
3
92.2
0
98.9
4
11.6
42.8
9.3
5
Table 5 Specificity, sensitivity, and accuracy of SVM—second-level classification
93.2
0
96.9
6
93.2
0
100
7
86.4
0
100
8
90.2
0
100
9
92.2
0
100
10
92.2
0
100
11
93.2
0
100
12
95.1
0
100
13
Classification of Indian Monument Architecture … 483
1 100 80.0 99.0
Architecture style/performance measure
Specificity
Sensitivity
Accuracy
84.4
35.7
92.1
2
94.1
14.2
100
3
Table 6 Specificity, sensitivity, and accuracy of bi-level classification
81.5
57.1
83.3
4
97.0
57.1
100
5
97.0
25.0
100
6
88.3
14.2
93.7
7
72.8
64.2
74.1
8
95.1
50.0
100
9
93.2
12.5
100
10
84.4
37.5
88.4
11
94.1
14.2
100
12
96.1
20
100
13
484 S. Kavitha et al.
Classification of Indian Monument Architecture …
485
Table 7 CNN—first-level classification—without augmentation Time period/Performance measure
Post-Mahajanapadas
Early Common Era
Late Middle Ages
Early Modern Period
European Era
Specificity
91.6
98.7
98.9
78.8
98.7
Sensitivity
78.9
61.9
54.5
90.6
75.0
Accuracy
89.3
91.2
94.1
82.5
94.1
5 Conclusion In this research, classification of Indian monuments has been attempted using machine learning and deep learning algorithms. SIFT, SURF, ORB, Hu moments, image moments, and Zernike moments are used to extract the features from the input images collected from various web pages, tourism Web sites, and blogs. Classification models have been generated using the extracted features, from the training images, for ID3 decision tree classifier (time period identification) in the first level and support vector machine (architecture style identification) in the second level, respectively. The accuracy of local feature extraction methods such as SIFT, SURF, and ORB was observed to be very poor (5%, 12%, and 9%), while the shape-based feature extraction methods using moments exhibited a high accuracy in the combined level (90.59%). From the experimental results, it is evident that shape-based feature extraction methods give better accuracy than local feature-based extraction methods. The accuracy of moments in the first level of classification is 82.52% and in the second level of classification is 85.21%, while the overall system accuracy obtained using ML techniques is 90.59%. Monument classification using DL technique showed further improvement in accuracy of classification by 8% and 7% in the first and second levels, respectively. CNN learns the discriminating features naturally from the input set of images and uses it for classifying an image, thus gives improved results for monument image classification compared to ID3 and SVM used in the bi-level classification of Indian monument images. The future aspects in extending this work are to use recent developments in image augmentation techniques like generative adversarial networks (GANs) to generate many images. This increased number of images may facilitate better learning of the discriminating features in the monument images, thereby improving the overall classification performance and clear understanding of the monument images with overlapping architectural styles.
1 100 100 100
Architecture style/Performance measure
Specificity
Sensitivity
Accuracy
80.5
0
92.1
2
85.4
28.5
89.5
3
Table 8 CNN—second-level classification—without augmentation
90.2
0
96.8
4
90.2
0
96.8
5
74.5
0
77.7
6
84.4
0
90.6
7
82.5
0
95.5
8
85.4
0
94.6
9
88.3
0
95.7
10
89.3
0
96.8
11
82.5
42.8
85.4
12
87.3
0
91.8
13
486 S. Kavitha et al.
Classification of Indian Monument Architecture …
487
Acknowledgements The authors would like to acknowledge the High Performance Computing Lab, Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, for providing necessary facilities for carrying out this work.
References 1. Paul AJ, Ghose S, Aggarwal K, Nethaji N, Pal S, Purkayastha AD (2021) Machine learning advances aiding recognition and classification of Indian monuments and landmarks (July 2021), pp 1–8, https://arxiv.org/abs/2107.14070 2. Amato G, Falchi F, Gennaro C (2015) Fast image classification for monument recognition. J Comput Cultural Heritage 8(4):1–18 3. Rosten E, Drummond T (2006) Machine learning for high speed corner detection. In: ECCV Proceedings of the 9th European conference on computer vision (2006), pp 430–443 4. Amato G, Falchi F, Bolettieri P (2010) Recognizing landmarks using automated classification techniques: an evaluation of various visual features. In: Second international conference on advances in multimedia (2010), pp 78–83 5. Kalliatakis GE, Triantafy llidis GA (2013) Image based monument recognition using graph based visual saliency. ELCVIA Electron Lett Comput Vis Image Anal 12(2):pp 88–97 6. Bhatt MS, Patalia TP (2017) Indian monuments classification using support vector machine. Int J Electr Comput Eng 7(4):1952–1963 7. Desai P, Pujari J, Ayachit NH, Prasad VK (2013) Classification of archaeological monuments for different art forms with an application to CBIR. In: International conference on advances in computing, communications and informatics (2013), pp 1108–1112 8. Sahay T, Mehta A, Jadon S (2016) Architecture classification for Indian monuments. University of Massachusetts.https://doi.org/10.13140/RG.2.2.32105.13920 9. Ghildiyal B, Singh A, Bhadauria HS (2017) Image-based monument classification using bag-of-word architecture. In: IEEE 3rd International conference on advances in computing, communication & automation (2017), pp 1–5 10. Kavitha S, Bharathi B, Mohana PK, Mohana PS, Sushmy VC (2016) Optimal feature set selection for brain tumor classification using genetic algorithm with support vector machine and decision tree. Asian J Res Soc Sci Hum 6(9):660–670 11. Gada S, Mehta V, Kanchan K, Jain C, Raut P (2017) Monument recognition using deep neural networks. In: IEEE International conference on computational intelligence and computing research (2017), pp 1–6 12. Saini A, Gupta T, Kumar R, Gupta AK, Panwar M, Mittal A (2017) Image based Indian monument recognition using convoluted neural networks. In: IEEE International conference on big data, IoT and data science (2017), pp 138–142 13. Yoshimura Y, Cai B, Wang Z, Ratti C (2019) Deep learning architect: classification for architectural design through the eye of artificial intelligence. In: International conference on computers in urban planning and urban management. Springer, Cham, pp 249–265 14. Lin K, Lu J, Chen C-S, Zhou J, Sun M-T (2018) Unsupervised deep learning of compact binary descriptors. IEEE Trans Pattern Anal Mach Intell 41(6):1501–1514 15. Llamas J, Lerones PM, Medina R, Zalama E, Gómez-García-Bermejo J (2017) Classification of architectural heritage images using deep learning techniques. Appl Sci 7(10):992 16. Palma V (2019) Towards deep learning for architecture: a monument recognition mobile app. Int Arch Photogramm Remote Sens Spatial Inf Sci 17. Flusser J, Suk T (1994) On the calculation of image moments. Pattern Recogn Lett pp 1–12 18. Mercimek M, Gulez K, Mumcu TV (2005) Real object recognition using moment invariants. Sadhana 30:765–775
488
S. Kavitha et al.
19. Srinivasan K, Subramaniam M, Bhagavathsingh B (2019) Optimized bilevel classifier for brain tumor type and grade discrimination using evolutionary fuzzy computing. Turk J Electr Eng Comput Sci 27:1704–1718. https://doi.org/10.3906/elk-1804-13 20. Tripathi M (2021) Analysis of convolutional neural network based image classification techniques. J Innov Image Processing (JIIP) 3(2):100–117
Sign Language Recognition Using CNN and CGAN Marrivada Gopala Krishna Sai Charan, S. S. Poorna, K. Anuraj, Choragudi Sai Praneeth, P. G. Sai Sumanth, Chekka Venkata Sai Phaneendra Gupta, and Kota Srikar
Abstract There is a long drawn communication barrier between normal people and deaf-mute community. Sign language is a major tool of communication for hearing impaired people. The goal of this work is to develop a Convolutional Neural Network (CNN) based Indian sign language classifier. CNN models with combination of different hidden layers are analysed and the model giving highest accuracy is selected. Further synthetic data is generated using Conditional Generative Adversarial Network (CGAN), in order to improve classification accuracy. Keywords CNN · Indian sign language recognition · CGAN
1 Introduction Indian census cites that there are roughly 1.3 million people with hearing impairment. They face a lot of trouble to express their feelings in their day-to-day life, as written communication in many cases is impersonal and impractical. This work aims at automating the recognition of signs in Indian Sign Language, using CNN and hence the intervention of human annotator can be minimized. CNNs are widely used for image data, since they are very good feature extractors and learn useful feature maps of the images. Modern Deep Neural Networks (DNN) suffer from excessive linearity, due to which they struggle to capture highly non-linear characteristics of the data. While linear models are easy to train, they limit the abilities of DNNs to learn highly complex non-linear functions. One way to mitigate this problem is adversarial training. In this work ISL dataset1 for training a baseline CNN model. In order to improve the performance of this baseline CNN classifier we use CGAN to generate new synthetic samples from ISL dataset. 1 The
dataset used in this work can be found here.
M. G. K. S. Charan · S. S. Poorna (B) · K. Anuraj · C. S. Praneeth · P. G. S. Sumanth · C. V. S. P. Gupta · K. Srikar Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_33
489
490
M. G. K. S. Charan et al.
2 Literature Review In [1], the authors used 3D convolutions for extracting features hierarchically from video stream. 3D convolutions are different from normal 2D convolutions since they are capable of capturing temporal variations along with spatial. The authors used inputs through 5 channels, which include RGB, depth and body joint channels. Multichannel inputs contain more information compared to gray channel inputs. They used three 3D convolutional layers, two subsampling and two fully connected layers in their CNN architecture. For body joint tracking, they used Microsoft Kinect software. They recorded average accuracies of 88.5% for gray channel inputs and 94.2% for multichannel inputs. However 3D convolutions are very expensive to implement on an embedded device. The architecture works for 5 channel inputs of fixed dimensions. The dataset used by the authors have considerable algorithmic bias which lead to underwhelming results in some cases. The authors in [2] developed a hybrid learning algorithm for recognizing sign language in video sequences. The implementation had 3 phases. In phase-I, dimensionality reduction was done using Principal Component Analysis (PCA) and Stacked Variational Auto-encoders (VAEs). In phase-II, data was sampled using Generative Adversarial Network (GAN) with Long shortterm memory (LSTM) network for video sequences and Discriminator with LSTM and 3D CNN. In [3] the results showed that transfer learning with little fine tuning works well for sign language recognition task. They also trained a custom 3-layered CNN to compare the performances with pre-trained network. The authors used various state-of-theart architectures like ResNet152V2, InceptionResNetV2, ResNeXt101 for transfer learning task. Some layers of these pre-trained models were trained, while others were frozen. An accuracy of 99% for numerals and 97.6% for alphabets using custom 3-layer CNN trained on public ISL dataset was obtained. The same dataset on ResNet152V2 yielded an accuracy of 96.2 % for numerals and 90.8% for alphabets. However, this network works only for static images with fixed dimensions. The authors in [4] proposed different partial solutions for training GANs, viz. feature mapping, one-sided label smoothing, virtual batch normalization, mini-batch discrimination etc. one-sided label smoothing is used to reduce the confidence of the discriminator by smoothing out the labels in cross entropy. In virtual batch normalization, normalization statistics of remaining batches are calculated from the normalization statistics of first mini batch. One major downside is this batch has to be reasonably large, in order to capture the statistics of input samples. This batch will be expensive to compute forward and backward passes. Inception score mentioned in this paper is a bad metric if number of classes and images are comparable. In [5], a taxonomy on Sign Language production is presented. The authors reviewed recently proposed models and discussed about various approaches in Sign Language Production such as Avatar, motion graph, neural machine translation, conditional image or video generation using Conditional GANs, auto-regressive models etc. Neural machine translation for Sign Language translation is used in [6]. RWTH-PhoenixWeather 2014T dataset was used here and Seq2Seq neural network
Sign Language Recognition Using CNN and CGAN
491
was adopted to map spatio-temporal features to another sequence like speech/text. For encoding spatial features, they used CNN and tokenization layer to pass it to RNN. Attention layer along with RNN was used for decoding into word embedding. Xavier initialization and Adam optimizer with learning rate of 1e-5 were used in this analysis. The authors used [7] Bidirectional LSTM network along with 3D-Residual Network for sign language recognition from videos. Fast R-CNN approach was used for detecting the hand locations, opposed to YOLO and SSD algorithm. Residual Connections were good for learning identity functions, where as skipped connections provided an alternate pathway for inputs. DEVISIGN-D dataset was used and 89.8% recognition accuracy was obtained. In [8], Singular Valued Decomposition was used for weight initialization of GANs. They reported better stability of training using Spectral Normalization GAN (SNGAN). They tested their GAN on Cifar10, STL-10 and ILSVRC2012 datasets. In [9], The authors made a comparision made between conventional machine learning techniques which use hand-tailored features and Convolutional Neural Network which extracts features hierarchically. The authors showed that Convolutional Neural Networks perform better. In [10], The authors compared the performance of various popular pre-trained CNN architectures like ResNet-152, ResNet101, VGG16, VGG19, AlexNet for classification problem. In [11] The authors discussed various aspects of Convolutional Neural Networks like Convolutional layers, various types of Pooling, various types of activation functions, various frameworks available for implementation of Convolutional Neural Networks. The authors [12] used 3D Geometric processing of images using Microsoft Kinect for recognition of Indian Sign Language.
2.1 American Sign Language Versus Indian Sign Language The major difference between American Sign Language (ASL) and Indian Sign Language (ISL) is in the number of hands used in making signs. Person can use one hand to make signs in American Sign Language, whereas person has to use both of his hands to make signs in Indian Sign Language, which makes Indian Sign Language relatively difficult to learn. Number of Datasets available for American Sign Language is significantly larger compared to those available for Indian Sign Language. Keeping aside the global trends, Indian Sign Language is emerging as an alternative for the deaf and mute community in India.
3 Indian Sign Language Recognition Using CNN For image recognition tasks CNNs are widely used. They are basically Multi-layer perceptrons with weight sharing, viz. network can use multiple filters to learn and hence capture various aspects of images. For Indian Sign Language recognition task,
492
M. G. K. S. Charan et al.
training is carried out on 9 CNN models and compared, in order to pick the best one based on accuracy. The experimented are performed by increasing depth of the network, by varying the layers, in order to see their impact on performance of the network. All of these models are trained for 10 epochs. The dataset consisted of colour images with dimension 128 × 128. The summary of model1 is shown in Table 1. For model2 we changed the layer-3, which is Max-pooling layer of stride 2 to Average-Pooling layer of stride 2. The Summary of model2 is shown in Table 2. we removed layer-3,which is Average Pooling of stride 2 and replaced it with convolution of stride 2. The summary of the model3 is shown in Table 3. For model4, for reducing the number of parameters, we replaced the fully connected layer with Convolutional layer with input-channels = 32 and output-channels
Table 1 Summary of model1 Layer (type) Conv2d-1 BatchNorm2d-2 MaxPool2d-3 ReLU-4 Flatten-5 Linear-6
Table 2 Summary of model2 Layer (type) Conv2d-1 BatchNorm2d-2 AvgPool2d-3 ReLU-4 Flatten-5 Linear-6
Table 3 Summary of model3 Layer (type) Conv2d-1 BatchNorm2d-2 ReLU-3 Flatten-4 Linear-5
Output shape
Params
[−1, 32, 128, 128] [−1, 32, 128, 128] [−1, 32, 64, 64] [−1, 32, 64, 64] [−1, 131072] [−1, 35]
896 64 0 0 0 4,587,555
Output shape
Params
[−1, 32, 128, 128] [−1, 32, 128, 128] [−1, 32, 64, 64] [−1, 32, 64, 64] [−1, 131072] [−1, 35]
896 64 0 0 0 4,587,555
Output shape
Params
[−1, 32, 64, 64] [−1, 32, 128, 128] [−1, 32, 64, 64] [−1, 131072] [−1, 35]
896 64 0 0 4,587,555
Sign Language Recognition Using CNN and CGAN Table 4 Summary of model4 Layer (type) Conv2d-1 BatchNorm2d-2 ReLU-3 Conv2d-4 AvgPool2d-5 Flatten-6
Table 5 Summary of model5 Layer (type) Conv2d-1 Dropout2d-2 ReLU-3 Conv2d-4 AvgPool2d-5 Flatten-6
Table 6 Summary of model6 Layer (type) Conv2d-1 Dropout2d-2 ReLU-3 Conv2d-4 BatchNorm2d-5 ReLU-6 Conv2d-7 AvgPool2d-8 Flatten-9
493
Output shape
Params
[−1, 32, 64, 64] [−1, 32, 64, 64] [−1, 32, 64, 64] [−1, 35, 64, 64] [−1, 35, 1, 1] [−1, 35]
896 64 0 10,115 0 0
Output shape
Params
[−1, 32, 64, 64] [−1, 32, 64, 64] [1, 32, 64, 64] [−1, 35, 64, 64] [−1, 35, 1, 1] [−1, 35]
896 64 0 10,115 0 0
Output shape
Params
[−1, 32, 64, 64] [−1, 32, 64, 64] [−1, 32, 64, 64] [−1, 64, 64, 64] [−1, 64, 64, 64] [−1, 64, 64, 64] [−1, 35, 64, 64] [−1, 35, 1, 1] [−1, 35]
896 0 0 18,496 128 0 20,195 0 0
= 35 followed by Global Average Pooling layer followed by Flattening layer. The Summary of model4 is shown in Table 4. For model 5, we replaced Batch Normalization layer with Dropout layer of probability 0.4. The summary of model 5 is shown in Table 5. For model6, we removed layer-3 and inserted Convolutional layer of inputchannels = 32 and output-channels = 64, followed by Batch Normalization, followed by ReLU activation layer, followed by Convolutional layer of input-channels = 64 and output-channels = 35, followed by Global average pooling, followed by Flattening layer. The summary of model6 is shown in Table 6. For model7, we replaced layer-5 which is Batch Normalization layer with Dropout layer of probability 0.4. The summary of model7 is shown in Table 7. We made
494 Table 7 Summary of model7 Layer (type) Conv2d-1 Dropout2d-2 ReLU-3 Conv2d-4 Dropout2d-5 ReLU-6 Conv2d-7 AvgPool2d-8 Flatten-9
Table 8 Summary of model8 Layer (type) Conv2d-1 BatchNorm2d-2 ReLU-3 Conv2d-4 BatchNorm2d-5 ReLU-6 MaxPool2d-7 Conv2d-8 BatchNorm2d-9 ReLU-10 Conv2d-11 BatchNorm2d-12 ReLU-13 MaxPool2d-14 Dropout2d-15 Flatten-16 Linear-17 ReLU-18 Linear-19 ReLU-20 Linear-21
M. G. K. S. Charan et al.
Output shape
Params
[−1, 32, 64, 64] [−1, 32, 64, 64] [−1, 32, 64, 64] [−1, 64, 64, 64] [−1, 64, 64, 64] [−1, 64, 64, 64] [−1, 35, 64, 64] [−1, 35, 1, 1] [−1, 35]
896 0 0 18,496 128 0 20,195 0 0
Output shape
Params
[−1, 32, 128, 128] [−1, 32, 128, 128] [−1, 32, 128, 128] [−1, 64, 128, 128] [−1, 64, 128, 128] [−1, 64, 128, 128] [−1, 64, 64, 64] [−1, 128, 64, 64] [−1, 128, 64, 64] [−1, 128, 64, 64] [−1, 128, 64, 64] [−1, 128, 64, 64] [−1, 128, 64, 64] [−1, 128, 32, 32] [−1, 128, 32, 32] [−1, 131072] [−1, 512] [−1, 512] [−1, 256] [−1, 256] [−1, 35]
896 64 0 18,496 128 0 0 73,856 256 0 147,584 256 0 0 0 0 67,109,376 0 131,328 0 8995
Sign Language Recognition Using CNN and CGAN
495
Fig. 1 Final Convolutional Neural Network architecture with residual connections
model8 relatively deeper to make it learn more complex functions. The layers in model8 are shown in Table 8. Deep Neural Networks are harder to optimize due to exploding/vanishing gradients problem. It is empirically shown that residual connections enable easier optimization of Deep Neural Networks. Also, the network can learn more complex functions as the network now has to learn the residue f (x) − x instead of f (x). So, we added residual connections to our model9. The architecture of model9 is shown in Fig. 1 and summary of model9 is shown in Table 9.
4 Indian Sign Language Generation Using Conditional GAN Unconditional GAN models generate images with no information about classes i.e. generator generate G(z), where z is noise, whereas generator of Conditional GAN generate G(z|x), where z is noise and x is class label. There are many ways to embed the class information x vector in generator and discriminator. One such way is discussed in [15] where the authors used inner-product based embedding for class labels. The same architecture is also adopted in this paper. In Generator network, Convolution and upsampling is used instead of Deconvolution, since it results in softer images [16].
5 Results of CGAN on ISL Dataset For training CGAN to generate images, we selected batch size to be 64, n dis = 2 i.e. Generator is trained once for every 2 iterations. We trained the GAN with Adam optimizer with weight decay λ = 0, β1 = 0, β2 = 0.9, learning rate α = 2 × 10−4 for 50k iterations on ISL dataset. Training parameters of CGAN is summarised in Table 10. We used the same loss function which is used in [17].
496
M. G. K. S. Charan et al.
Table 9 Summary of model9 Layer (type) Conv2d-1 BatchNorm2d-2 ReLU-3 MaxPool2d-4 Conv2d-5 BatchNorm2d-6 ReLU-7 Conv2d-8 BatchNorm2d-9 ReLU-10 Conv2d-11 BatchNorm2d-12 ReLU-13 Conv2d-14 BatchNorm2d-15 ReLU-16 MaxPool2d-17 Conv2d-18 BatchNorm2d-19 ReLU-20 MaxPool2d-21 Conv2d-22 BatchNorm2d-23 ReLU-24 Conv2d-25 BatchNorm2d-26 ReLU-27 MaxPool2d-28 Flatten-29 Dropout-30 Linear-31
Output shape
Params
[−1, 64, 64, 64] [−1, 64, 64, 64] [−1, 64, 64, 64] [−1, 64, 32, 32] [−1, 128, 32, 32] [−1, 128, 32, 32] [−1, 128, 32, 32] [−1, 128, 32, 32] [−1, 128, 32, 32] [−1, 128, 32, 32] [−1, 128, 32, 32] [−1, 128, 32, 32] [−1, 128, 32, 32] [−1, 256, 32, 32] [−1, 256, 32, 32] [−1, 256, 32, 32] [−1, 256, 16, 16] [−1, 512, 16, 16] [−1, 512, 16, 16] [−1, 512, 16, 16] [−1, 512, 8, 8] [−1, 512, 8, 8] [−1, 512, 8, 8] [−1, 512, 8, 8] [−1, 512, 8, 8] [−1, 512, 8, 8] [−1, 512, 8, 8] [−1, 512, 2, 2] [−1, 2048] [−1, 2048] [−1, 35]
1792 128 0 0 73,856 256 0 147,584 256 0 147,584 256 0 295,168 512 0 0 1,180,160 1024 0 0 2,359,808 1024 0 2,359,808 1024 0 0 0 0 71,715
Table 10 Training parameters of CGAN Batch size n dis λ β1 64
2
0
0
β2
α
n iter
0.9
2 × 10−4
50,000
Sign Language Recognition Using CNN and CGAN
497
Fig. 2 (Left) Images from ISL dataset, (right) generated images Table 11 FID6k is evaluated 3times on different 6k real and 6k fake images FID6k 1 2 3 31.132
30.993
31.194
Mean: 31.06, Standard deviation: 0.0106
We trained the CGAN with a constant learning rate i.e. learning rate decay = 0. The generated images at the end of 50k iterations is shown in Fig. 2. We used the metric Fréchet Inception Distance (FID) score [18] for 6k real images and 6k fake images. For calculation of FID score, we pass the images through InceptionV3 net and compare the mean and covariance of real and fake images for a random feature map. At the end of the training we got FID score of 31.06 ± 0.0106. Evaluation of FID score is carried out for 3 times on different 6k real images and 6k fake images. FID scores are shown in Table 11.
6 Results of Our CNN Models on ISL Dataset 6.1 Without Data-Augmentation Using CGAN The results of CNN models on ISL dataset is elaborated in this section. Adam Optimizer is used in the model. For optimization we used Adam Optimizer. Adam Optimizer is a combination of RMSProp and momentum. Adam Optimizer is proved to be fairly robust to architectural choices and convergence using is generally faster. We choose β1 = 0.9 and β2 = 0.999 for Adam optimizer. To prevent exploding gradients, we clipped the gradients to 0.1. For regularization and to prevent the network
498
M. G. K. S. Charan et al.
Fig. 3 Accuracies of CNN models trained
from overfitting we chose weight decay parameter λ = 1 × 10−3 . One Cycle Learning rate Scheduling algorithm is used with a maximum learning rate of 0.01. All the models are trained for 10 epochs, Accuracies of all the models w.r.t No. of epochs is shown in Fig. 3. At the end of training for 10 epochs, Validation loss for model9 is around 0.2563 and training loss around 0.1586 as shown in Fig. 4. The variation of learning rate (lr) w.r.t Batch Number is shown in Fig. 5. Maximum accuracies of 9 models is compared in Fig. 6.
6.2 With Data-Augmentation Using CGAN The same training parameters used in the previous section are also used here for training CNN. The batch size of 128 is selected for both the dataset and generated images and the images of random class labels are selected and passed in each iterations. Model 9 which yielded highest accuracy for the original dataset is chosen for an analysis and trained for 10 epochs. The model gave an accuracy of 95.5%. The progress of model9 is shown in Fig. 7.
Sign Language Recognition Using CNN and CGAN
Fig. 4 Training loss and validation loss versus no. of epochs for Model9
Fig. 5 Learning rate versus batch number in One cycle learning rate scheduling
499
500
M. G. K. S. Charan et al.
Fig. 6 Maximum accuracies of CNN models trained
Fig. 7 Accuracy versus epochs for Model9 with data augmentation using CGAN
Sign Language Recognition Using CNN and CGAN
501
7 Conclusion The paper proposes a sign Indian language based classifier using CNN. Different CNN models were trained using ISL dataset and model using residual connections was chosen to be the best one yielding 91.7% accuracy. In order to improve the performance, a CGAN based image augmentation technique was carried out. The synthetic database along with the original one improved the accuracy of CNN model to 95.5%. An increase in accuracy will benefit the deaf and mute community to identify the signs more efficiently without the intervention of human annotator.
References 1. Huang J, Zhou W, Li H, Li W (2015) Sign Language Recognition using 3D convolutional neural networks. In: 2015 IEEE International conference on multimedia and expo (ICME), pp 1–6 2. Elakkiya R, Vijayakumar P, Kumar N (2021) An optimized generative adversarial network based continuous sign language classification. Expert Syst Appl, p 11527 3. Sharmaa P, Anand RS (2021) A comprehensive evaluation of deep models and optimizers for Indian sign language recognition. In: Graphics and visual computing 4. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. Adv Neural Inf Process Syst 29:2234–2242 5. Rastgoo R, Kiani K, Escalera S, Sabokrou M (2021) Sign language production: a review. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3451–3461 6. Camgoz NC, Hadfield S, Koller O, Ney H, Bowden R (2018) Neural sign language translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7784– 7793 7. LiaoY Xiong P, Min W, Min W, Lu J (2019) Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks. IEEE Access 7:38044–38054 8. Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 9. Poorna SS, Ravi Kiran Reddy M, Akhil N, Kamath S, Mohan L, Anuraj K, Pradeep HS (2020) Computer vision aided study for melanoma detection: a deep learning versus conventional supervised learning approach. In: Advanced computing and intelligent engineering. Springer, Singapore, pp 75–83 10. Bharath Chandra BV, Naveen C, Sampath Kumar MM, Sai Bhargav MS, Poorna SS, Anuraj K (2021) A comparative study of drowsiness detection from Eeg signals using pretrained CNN models. In: 2021 12th International conference on computing communication and networking technologies (ICCCNT), pp 1–3. https://doi.org/10.1109/ICCCNT51525.2021.9579555 11. Aloysius N, Geetha M (2017) A review on deep convolutional neural networks. In: International conference on communication and signal processing (ICCSP), pp 588–592. https://doi.org/10. 1109/ICCSP.2017.8286426 12. Geetha M, Manjusha C, Unnikrishnan P, Harikrishnan R (2013) A vision based dynamic gesture recognition of Indian Sign Language on Kinect based depth images. In: 2013 International conference on emerging trends in communication, control, signal processing and computing applications (C2SPCA), pp 1–7. https://doi.org/10.1109/C2SPCA.2013.6749448 13. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
502
M. G. K. S. Charan et al.
14. Lee KS, Town C (2020) Mimicry: towards the reproducibility of gan research. arXiv preprint arXiv:2005.02494 15. Miyato T, Koyama M (2018) cGANs with projection discriminator. arXiv preprint arXiv:1802.05637 16. Odena Augustus, Dumoulin Vincent, Olah Chris (2016) Deconvolution and checkerboard artifacts. Distill 1(10):e3 17. Nguyen A, Clune J, Bengio Y, Dosovitskiy A, Yosinski J (2017) Plug & play generative networks: conditional iterative generation of images in latent space. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4467–4477 18. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, vol 30 19. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
A Systematic Review on Load Balancing Tools and Techniques in Cloud Computing Mohammad Haris and Rafiqul Zaman Khan
Abstract Nowadays, cloud computing attracted wide attention as it can deliver IT services and resources on a demand basis over the Internet. Load balancing is a key challenge in cloud computing. Due to the complex structure of cloud computing, it is difficult and costly to evaluate the behaviour of load balancing techniques on different cloud resources based on QoS parameters in a real cloud environment. Therefore, to overcome this situation, cloud computing tools are used for simulation to test the behaviour of load balancing techniques in the cloud system under different conditions in a repeated manner by changing various parameters. There are many tools available today such as CloudSim, WorkflowSim, CloudSim4DWf, GreenCloud, and CloudAnalyst, and every tool varies in characteristic, architecture, parameters, result assessment. So, it is very important to choose an efficient load balancing tool which fulfils required QoS requirements. This work focusses on important cloud load balancing tools and provides a comparative study on important existing as well as newly proposed load balancing tools. Also, we will discuss load balancing techniques divided on the basis of three approaches. Keywords Load balancing · Cloud computing tools · Simulation · Heuristic · WorkflowSim · CloudSim4DWf · MATLAB · WeaveSim · CloudSME · ElasticSim
1 Introduction Cloud technology is one of the promising advancements in recent days. Cloud computing has become one of the most attractive fields in information and communication technology (ICT) as it provides IT hardware, software, applications, and services in the form of infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS). In addition, the cloud computing deals with many
M. Haris (B) · R. Z. Khan Aligarh Muslim University, Aligarh, UP, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_34
503
504
M. Haris and R. Z. Khan
things such as documents storage and retrieval, multimedia sharing, and offering the required resources on a pay-as-per use model [1, 2]. Load balancing in a cloud computing is a process for distributing the workload dynamically amongst the available nodes insides the cloud’s datacentres to control the performance of evaluation parameters. If one component of the cloud failed, load balancing distributes its load to other nodes without interrupting service. Therefore, a proper load balancing technique is required to increase the performance of cloud computing. To study the behaviour of load balancing techniques without actual implementation in real world, cloud computing tools are used for creating the cloud environment for testing the response of load balancing techniques [3, 4]. Cloud computing tools are the simulator that creates the virtual environment to test the behaviour of cloud systems. It analyzes the behaviour of different components of cloud computing in different scenarios [5]. There are various prominent cloud computing tools are available to analyze the load balancing’ performance in cloud systems. So, it is necessary to study cloud simulation tools very carefully and make a decision which simulator is suitable for evaluating the performance of load balancing techniques because every simulator has its advantages and disadvantages. This work gives a detailed study on popular cloud computing tool for load balancing as well as newly proposed simulators [6].
2 Load Balancing Technique in Cloud Computing Efficient load balancing technique is very important to distribute the load among available nodes to achieve a better quality of service (QoS). Therefore, load balancing technique are divided into three modern approaches to get an optimal solutions [7].
2.1 Heuristic Approach Heuristic means finding a solution by trial and error. The heuristic approach is highly problem dependent and find the best possible solution by applying a group of constraints in a complete way. They are designed to get the solution in a limited time. Its working based on learning and exploration in which a comprehensive and scientific search is applied to find the optimal solution. The heuristic approaches are based on static and dynamic type. The advantage of the heuristic approach is that they are easier to implement and find a solution efficiently in a finite amount of time and cost. The heuristic approach runs fast, and they are suitable for online scheduling that requires system to respond in less time [8–10] (Fig. 1).
A Systematic Review on Load Balancing Tools …
505
Fig. 1 Working of heuristic approach
2.2 Meta-Heuristic Approach Meta means the next or high level. Sometimes the solution provided by the heuristic approaches not worked in local optima. Meta-heuristic modifies the heuristic approach to give the solution more accurate, generated by local optimality in a reasonable time. It provides unified framework for a specific heuristic approach to give a solution for a particular problem. This approach consists of initial, transition, evaluation, and determination stages to offer the optimal solution. Metaheuristic approaches have a better substitute between the consistency of the solution and computation for complex problems. Meta-heuristic problems are not problem dependent. It can be a single solution or population-based approach [11, 12] (Fig. 2). Fig. 2 Working of meta-heuristic approach
506
M. Haris and R. Z. Khan
2.3 Hyper-Heuristic Approach Hyper-heuristic is a problem independent approach that applies on selected lowlevel heuristic problems in an iterative manner to get more accurate solutions. It is a high-level method that applied on low-level heuristics to determine the generated solution can be accepted or not for a specific problem. It uses a set of existing heuristic approach and select any one of them to solve the particular problem. Hyperheuristic can generate a solution from basic and can be used to optimize an existing solution. The fitness function is used to obtain the low-level heuristic solution. The layer of hyper-heuristic is problem independent because the communication between low-level heuristic and high-level heuristic is restricted. This feature makes hyperheuristic most acceptable approach [10, 13, 14] (Fig. 3).
3 Simulation tools for Cloud Load Balancing Environment Simulation is a technique for making a model of a process or a real system and designed to test strategies and figure out the system’s behaviour under certain circumstances using various strategies [15]. Cloud computing is a complex environment as it contains datacentres, cloud users, cloud service providers, resources allocation such as CPU, memory and bandwidth, service broker policy for datacentre selection, and scheduling policies to schedule user requests. The interactions and communication amongst all these entities make it a large and complex environment. Load balancing plays a very crucial role in cloud computing and manages resources efficiently. Cloud load balancing is a mechanism of allocating the workload equally on the available virtual machines so that Fig. 3 Working of hyper-heuristic approach
A Systematic Review on Load Balancing Tools …
507
Fig. 4 Simulation flow of load balancing in cloud computing
every virtual machine does an equal amount of work. This eliminates the situation of overloading and underloading of virtual machines. By this, it maximizes resource utilization and reduces energy consumption and carbon emission. However, sometimes many resources do not participate in equally distributing the workload which ultimately reduces the performance of cloud system. So, several load balancing techniques are used to evaluate the performance of cloud systems [1, 4, 16]. But testing the behaviour load balancing techniques in a real environment is a difficult task as it involves huge cost, setup of cloud, and security problems. To address this issue, cloud simulation tools are employed to test the cloud system behaviour before actual implementation of the load balancing techniques in the cloud environment. The simulation tools allow the researcher to test the cloud behaviour by performing experiments in a repeated manner by changing different parameters and conditions. The parameter of load balancing considers by cloud simulator is resource utilization, reaction time, transfer time, cost, performance, throughput, etc. These cloud simulation tools can be open source or proprietary [15, 17] (Fig. 4).
3.1 The experiment of Load Balancing Techniques in a Real Cloud Environment: [1, 6, 18] • • • • • •
The expenditure spends on required resources are very high. Resources required for long time. Repetition of experiments is not possible. Cloud applications show a variation in workload and characteristics. Quality of services requirements by users changes in dynamic behaviour. Due to limited resources, all applications do not get full support.
508
M. Haris and R. Z. Khan
3.2 The advantages of using a Cloud Simulation Tool for Load Balancing: [19, 20] • • • • •
Create a scalable and reliable real-time environment. Facilitating dynamic, flexible configuration, and development environments. Visual interfaces can be modified easily. Using available components multiple times make simulation tool valuable. Provide a platform to analysis the quality and performance of proposed algorithms and methods very quickly. • Simulation tools should be easy to use. • Output in the graphical form of tables and charts is highly beneficial in understanding the results.
4 Corresponding Works Cloud computing is gaining popularity day by day because of delivering unlimited facilities to the users with taking care of quality of services (QoSs) requirements. So, it is very important to select scalable and reliable cloud simulation tools to test the cloud environment’s behaviour and characteristics before actual implementation in the real cloud environment. As the demand for cloud computing increasing, the work on cloud computing tools is moving very fast. Many researchers proposed a cloud simulator, and some researchers provide a comparative study of existing cloud simulators. Various researchers [21–25] developed and proposed CloudSim, ElasticSim, CloudSim4DWf, CloudAnalyst, WorkflowSim, respectively, to create a virtual environment for testing the behaviour of cloud systems. Bambrik [26] gives a comprehensive survey on cloud simulation tools along with its features, architecture, etc. Ahmad et al. provide details of the CloudSim simulator with its scheduler policy and compared performance based on different parameters [27]. Fakhfakh et al. analyzed and compared popular cloud simulators based on various attributes and address emerging challenges [28]. Jena et al. have taken important open-source cloud simulators for comparative study and extensive study on parameters such as platform, graphic support, and simulation time. [29] Muhammad Hassaan evaluates three cloud simulator such as CloudSim, CloudAnalyst, and GreenCloud on the basis of measuring energy consumption and concludes that GreenCloud better than other two simulators as it calculates energy consumption more accurately [30]. In this work, authors selected three popular cloud simulators for a comparative study, namely CloudSim, CloudAnalyst, and CloudReports and analyzed on different parameters to find capabilities of these cloud simulator [15]. Mansouri et al. presented a comprehensive review of 33 cloud tools. Every tool is discussed in detail and compared on various high-level features [6]. Lots of research has been done but none of them consider newly introduced tools. Therefore, this
A Systematic Review on Load Balancing Tools …
509
work will enlighten about the comparison of existing tools and newly introduced tools.
5 Simulation tools for Load Balancing in Cloud Computing Today, many cloud simulators are available over the Internet such as CloudSim, WorkflowSim, CloudAnalyst, ElasticSim, and GreenCloud. There are many common characteristics of simulators such as architecture, simulation process, and modelling elements, but they have their own features like focussing on different service layers and with different performance metrics. These simulation tools minimize infrastructure complexities, examine security threats, and evaluate the quality of service (QoS) and overall performance of cloud systems [22, 31].
5.1 Some Popular Cloud Computing Tools for Load Balancing Techniques are as follows: 1.
CloudSim: CloudSim is a well-known cloud simulator, capable of representing various types of clouds. This simulator was developed at the University of Melbourne, Australia, by a research group in CLOUDS Laboratory. Initially, CloudSim uses GridSim as an underlying platform, but after some time, it is upgraded to its own kernel [23]. Basically, Cloudsim is a Java library where we write a Java programme to frame the required structure and obtain the desired results for analyzing cloud applications. A single physical node of the simulator can manage infrastructure very easily including datacentres in large quantities. CloudSim is simple to use and easily extensible simulation toolkit and application that provides easy modelling, simulation, and experimentation on existing or developing cloud systems, infrastructures and application environments for single and Internetworked clouds. The cloud computing provisioning policies, application workload, resources, user’s specifications, etc., restricted existing distributed system simulators to work smoothly. Therefore, CloudSim comes into existence to overcome these existing distributed system simulator drawbacks. CloudSim provides efficient load balancing, preventing any single server from getting overloaded with the resources [15]. CloudCoordinator is an abstract class of CloudSim that monitors the internal state of datacentre and provides efficient load balancing to avoid virtual machines from getting overloaded [17]. CloudSim provides the flexibility to migrate the virtualized services between time-shared and space-shared allocation of processing cores. These unique features of CloudSim contribute to the quick evolution of cloud computing [21] (Fig. 5).
510
M. Haris and R. Z. Khan
Fig. 5 Communication flow between CloudSim entities
2.
3.
CloudAnalyst: CloudAnalyst is a cloud simulation tool that extends the functionality of CloudSim by adding a graphical user interface (GUI) to configure and execute simulations experiments very quickly in a repeatable manner. It gives output in the form of table and charts which helps in comparing the results quickly and smoothly. The aim of CloudAnalyst is to evaluate the performance of geographically distributed computing servers and workloads scattered over multiple datacentres in large-scale distributed applications on the cloud [28]. The Internet model is introduced to calculate traffic routing with suitable transmission latency and data transfer delays [24]. The VM load balancer is used by datacentre controller to implement load balancing techniques during runtime of virtual machines [26] (Fig. 6). Various load balancing techniques can be analyzed using CloudAnalyst. Parameters such as request response time and processing time can be measured using this simulation tool. CloudAnalyst needs to provide more mechanisms and algorithms for resource management for large scaled data centre [24]. GreenCloud: GreenCloud is an energy-aware simulation tool, and it is an extension of the NS2 network simulator. This open-source tool focusses on energy conservation by using energy of the existing data centre [32]. GreenCloud traces the packet-level communication patterns and components of data centres such as computing servers, gateways, and network switches in realistic manners. This minimizes energy consumption by improving power management, dynamically
Fig. 6 CloudAnalyst architecture
A Systematic Review on Load Balancing Tools …
4.
511
managing and configuring power aware-ability of system devices, and developing environment-friendly datacentres. Further, it also captures the details of load distribution in the system. GreenCloud uses the TCP/IP protocol reference model, which allows seamless integration of a wide variety of communication protocols including IP, TCP, and UDP with the simulation [16]. The only main limitation of the GreenCloud simulator is that it limits to only small data centres. C++ language is used to developed GreenCloud. Windows all well as Linux OS supports this tool [17] (Fig. 7). WorkflowSim: WorkflowSim is a powerful simulator that customizes CloudSim toolkit capabilities by introducing a workflow-level support of simulation. The model of WorkflowSim focusses on workflow scheduling along with execution [33]. Workflow is expressed in the form of directed acyclic graph (DAG) in which a node represents each computational task, and a coordinated edge represents control dependency between those tasks. It offers an elaborate model for node failures, a model for delays occurring in the workflow’s different levels [25]. Four main components that perform WorkflowSim functionalities, first workflow mapper maps the groups of tasks to the execution site, workflow engine interrogates dependencies based on parent–child relationship. Workflow scheduler performing the scheduling of job to real resources according to the
Fig. 7 Basic architecture of GreenCloud [32]
512
M. Haris and R. Z. Khan
Fig. 8 Structure of WorkflowSim
5.
conditions selected by user, and workflow partitioner divides the workflow into smaller sized manageable workflows [34] (Fig. 8). MATLAB: Matrix laboratory (MATLAB) is a powerful tool developed by MathWorks to create a programming environment for developing algorithms, numerical computation, visualization, and data analysis. MATLAB is commonly used in science and engineering for matrix-based computation designed. MATLAB generates results using dynamic memory allocation with great data visualization and helps in comparing different algorithm’s behaviour. For these properties, MATLAB is best for dealing with large and fluctuating factors, such as cloud resource elasticity. There are several functions and sections in a multi-cloud simulator. The ability to identify various cloud instances, services, and properties gives the dynamicity and heterogeneity of simulation. Some of MATLAB’s other characteristics are comparison of various algorithms behaviour, accessibility of visual representation during calculation and live observation. MATLAB can be used in other areas such as analyzing the performance of load balancing techniques, applications used for testing and measurement, communications, signal and image processing, etc. [35–37] (Fig. 9; Table 1).
5.2 Newly Developed Simulators for Load Balancing in Cloud Computing 1.
ElasticSim: ElasticSim is an enhancement of CloudSim toolkit by integrating workflow simulation support such as workflow monitoring, workflow scheduling, and workflow modelling. ElasticSim evaluates the efficiency of provisioning and scheduling algorithms for workflow resources, which facilitates the auto-scaling of resources during runtime and modelling of stochastic task execution time. ElasticSim has been planned with high usability. The graphical user interface of ElasticSim displays the current state in real-time.
A Systematic Review on Load Balancing Tools …
513
Fig. 9 Architecture of MATLAB production server in the cloud
2.
3.
By extending the basic scheduling algorithms directly, new workflow resource provisioning and scheduling algorithms can be introduced. The experimental study is done on task execution times, length of pricing interval, and deadline tightness on a static algorithm with stochastic task execution times. The practical results indicate that ElasticSim will be a promising tool in the future [6, 20, 22] (Fig. 10). CloudSim4DWf: CloudSim for workflow (CloudSim4DWF) is a CloudSimbased simulator that provides a new resources provisioning policy for dynamic workflow applications. CloudSim4DWF extended CloudSim by adding a graphical user interface (GUI) component that allows users to import the description file of workflow and the adaptation rules and the inputs required during a simulation. It includes an event injection module responsible for initiating and monitoring the simulation events during runtime to ensure efficient resource provisioning for dynamic workflow. This event change’s ability at runtime makes CloudSim4DWF more reliable to adapt the behaviour according to the user’s requirements. The overhead and financial cost parameters are used to evaluate the performance of CloudSim4DWf [23, 38, 39] (Fig. 11). CloudSME: CloudSME is a simulation platform that allows high-performance simulation experiments. It offers the user to implement their simulation solution on heterogeneous clouds depending on their performance and cost. It gives the freedom to choose the cloud and resources required for their simulation experiments instead of locked to one service provider service. CloudSME simulation platform (CSSP) is a combination of CloudSME AppCentre, WSPGRADE/gUSE gateway framework, and CloudBroker platform. CloudSME AppCentre is the top layer which provides software product and service to the user through the interface. The middle layer is cloud platform layer that connects with cloud-based service from CloudBroker platform and science
514
M. Haris and R. Z. Khan
Table 1 Comparison of important existing cloud simulators [1, 6, 15, 26, 29, 30] Simulator
CloudSim
Extension
SimJava/GridSim CloudSim
CloudAnalyst GreenCloud WorkflowSim MATLAB NS-2
Simulator type
Event-based
Event-based
Packet-level Event-based
Numerical computing
Programming language
Java
Java
C++
Java
C/C++
Availability
Open
Open
Open
Open
Closed
GUI support
No
Yes
Limited
No
Yes
Support for task No clustering
N/A
N/A
Yes
Yes
Communication Limited model
Limited
Full
Limited
Not available (N/A)
Application model
Yes
Yes
Yes
Yes
N/A
Energy model
No
Yes
Yes
Yes
N/A N/A
CloudSim
IA32, × 86–64
Cost model
Yes
Yes
No
Yes
SLA support
No
No
Yes
No
Yes
Publication year 2009
2010
2010
2012
1984
Networking
N/A
Limited
Full
Yes
Yes
Support of TCP/IP
No
No
Full
N/A
Yes
Simulation speed
Medium
Medium
Fast
Fast
Fast
IaaS/PaaS/SaaS
IaaS
IaaS
IaaS
IaaS
N/A
Migration policy
Yes
No
No
No
Yes
Workflow
No
No
No
Yes
N/A
Platform portability
Yes
Yes
No
Yes
Yes
Distributed architecture
No
No
No
No
NA
4.
gateway framework WS-PGRADE/gUSE. And the bottom layer, known as cloud resources layer, consists of a range of clouds and HPC resources accessible via the CloudBroker platform. To study the performance, parameter sweeps for simulation has been done on multiple clouds [40] (Fig. 12). ThermoSim: ThermoSim is a lightweight framework for modelling and simulation of thermal aware resource management for cloud computing environments. It introduces recurrent neural network based on deep learning temperature predictor for cloud data centre which utilizes resource consumption metrics
A Systematic Review on Load Balancing Tools …
515
Fig. 10 Architecture of ElasticSim
5.
parameters to accurately predict cloud host’s temperature. ThermoSim extends CloudSim toolkits by adding thermal parameters to analyze the performance of different quality of service (QoS) parameters such as number of virtual machine migrations, temperature of cloud resources during workloads execution, energy consumption, and service level agreement violation rate. The various techniques based on energy-aware and thermal aware resource management are used to check the performance of ThermoSim with an existing framework such as Thas, and it outperforms in terms of cost, time, energy consumption, prediction accuracy, and memory usage. In the future, ThermoSim can enhance in various aspects such as artificial intelligence and fog computing. [41] (Fig. 13). WeaveSim: WeaveSim is an efficient simulation framework that utilizes the aspect oriented programming (AOP) to reduce the modelling complexity and simulate the custom and dynamic behaviour of cloud-based applications by adding context aware aspect layer (CAAL). WeaveSim recognizes a set of jointpoints in the CloudSim architecture using cross-cutting concerns and injects the code of each aspect module which changes the behaviour of CloudSim at execution time by altering its core code. It addresses the limitation of existing simulator such measuring and controlling the dynamic changes requirements in many modules of cloud systems. The experimental results show implementing cross-cutting concerns increases scalability, reusability, and maintainability of cloud-based systems [15, 22] (Fig. 14; Table 2).
516
M. Haris and R. Z. Khan
Fig. 11 GUI of CloudSim4DWf [23]
6 Comparison and Discussions None of the tool offers a complete model due to the complexity of the tool and requirements of different parameters. However, each tool has a distinct benefit in a particular situation. CloudSim is widely used because of its flexibility and different CloudSim network classes. GreenCloud is best for energy modelling and efficiency of a datacentre. However, the existing tool does not offer workflow support. Therefore, WorkflowSim, Cloudsim4DWf, and CloudSME are getting popular due to workflow management support which reduces the complexity of problems. CloudSME provides capability to use resources from different heterogeneous clouds. It will depend on researchers to explore the similarities and differences between the various tools in order to determine the most appropriate tool to use for a particular experiment.
7 Conclusion and Future Scope Cloud computing is a computing paradigm that offers an infinite number of computing resources and services over the Internet to users on a demand basis,
A Systematic Review on Load Balancing Tools …
Fig. 12 Architecture of CloudSME [40]
Fig. 13 Structure of ThermoSim
517
518
M. Haris and R. Z. Khan
Fig. 14 Layered architecture of WeaveSim
anytime, and anywhere in pay-as-per use manner. Load balancing becomes a big challenge in cloud computing due to the rapid growth of on-demand requests of cloud resources. Different load balancing techniques are used for executing the user request and assign the request to virtual machines. However, the big concern is how to evaluate the performance of load balancing techniques based on QoS constraints before implementing these techniques in a real environment. Therefore, cloud computing tools are used to test the behaviour and the performance of load balancing techniques. In this paper, a comparative study has been done on five popular cloud simulators as well as five newly proposed cloud simulators. This work will contribute to development of a simulator that will enhance the existing simulator and overcome its limitation. Also, this study will be helpful for researchers in choosing the best simulation tool according to their requirement. In the future, we will present in-depth review of load balancing algorithms on the basis of different optimization techniques.
A Systematic Review on Load Balancing Tools …
519
Table 2 Comparison of newly proposed cloud simulators [22, 23, 38, 40, 41] Simulator
ElasticSim
CloudSim4DWf CloudSME
Extension
CloudSim
CloudSim
Simulator type
Event-based Event-based
Web-based
Event-based Event-based
Programming language
Java
Java
Java
Java
Availability
Open
NA
Open
NA
NA
GUI support
Yes
Yes
Yes
No
No
Performance
High
High
High
High
High
Communication Limited model
Yes
Yes
Yes
Yes
Application model
Yes
Yes
No
Yes
Yes
ThermoSim WeaveSim
WS-PGRADE/gUSE CloudSim
CloudSim Java
Energy model
Yes
No
No
Yes
Yes
Cost model
Yes
Yes
Yes
Yes
Yes
SLA support
No
Yes
Yes
Yes
Yes
Publication year 2016
2017
2018
2020
2020
Networking
Yes
Yes
Yes
Yes
Yes
Simulation speed
High
High
High
High
High
IaaS/PaaS/SaaS IaaS
SaaS
SaaS, PaaS, IaaS
IaaS
IaaS
Workflow
No
Yes
Yes
No
No
Platform portability
Yes
Yes
Yes
NA
Yes
Acknowledgements This work is funded by MTTF: MathTech Thinking Foundation.
References 1. Thapar V, Gupta OP, A comparative study of cloud simulation tools 2. Haris M, Khan RZ (2018) A systematic review on cloud computing. Int J Comput Sci Eng 6:632–639 3. Rawat PS, Dimri P, Saroha GP (2016) Tasks scheduling in cloud computing environment using Workflowsim. Res J Inf Technol 8:98–104 4. Haris M, Khan RZ (2019, July) A systematic review on load balancing issues in cloud computing. In: International conference on sustainable communication networks and application. Springer, Cham, pp 297–303 5. Lakshminarayanan R, Ramalingam R (2016) Usage of cloud computing simulators and future systems for computational research 6. Mansouri N, Ghafari R, Zade BMH (2020) Cloud computing simulators: a comprehensive review. Simul Model Pract Theory 104:102144 7. Hota A, Mohapatra S, Mohanty S (2019) Survey of different load balancing approach-based algorithms in cloud computing: a comprehensive review. In: Computational intelligence in data mining. Springer, Singapore, pp 99–110
520
M. Haris and R. Z. Khan
8. Garg D, Kumar P (2018) A survey on metaheuristic approaches and its evaluation for load balancing in cloud computing. In: International conference on advanced informatics for computing research. Springer, Singapore, pp 585–599 9. Mohanty S, Patra PK, Mohapatra S, Ray M (2017) MPSO: a novel meta-heuristics for load balancing in cloud computing. Int J Appl Evol Comput (IJAEC) 8(1):1–25 10. Tsai CW, Huang WC, Chiang MH, Chiang MC, Yang CS (2014) A hyper-heuristic scheduling algorithm for cloud. IEEE Trans Cloud Comput 2(2):236–250 11. Zamli KZ (2018) Enhancing generality of meta-heuristic algorithms through adaptive selection and hybridization. In: 2018 International conference on information and communications technology (ICOIACT). IEEE, New York, pp 67–71 12. Singh P, Dutta M, Aggarwal N (2017) A review of task scheduling based on meta-heuristics approach in cloud computing. Knowl Inf Syst 52(1):1–51 13. Sudhakar C, Agroya M, Ramesh T (2018) Enhanced hyper-heuristic scheduling algorithm for cloud. In: 2018 International conference on computing, power and communication technologies (GUCON). IEEE, New York, pp 611–616 14. Panneerselvam A, Subbaraman B (2018) Hyper heuristic MapReduce workflow scheduling in cloud. In: 2018 2nd International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC) I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC). IEEE, New York, pp 691–693 15. Dogra S, Singh AJ, Comparison of cloud simulators for effective modeling of cloud applications 16. Byrne J, Svorobej S, Giannoutakis KM, Tzovaras D, Byrne PJ, Östberg P-O, Gourinovitch A, Lynn T (2017) A review of cloud computing simulation platforms and related environments. In: International conference on cloud computing and services science, vol. 2. SCITEPRESS, pp 679–691 17. Sareen P, Singh TD (2016) Simulation of cloud computing environment using CloudSim. Simulation 4(12) 18. Devi RK, Sujan S (2014) A survey on application of cloudsim toolkit in cloud computing. Int J Innov Res Sci Eng Technol 3(6):13146–13153 19. Kumar M, Sharma SC, Goel A, Singh SP (2019) A comprehensive survey for scheduling techniques in cloud computing. J Netw Comput Appl 143:1–33 20. Nandhini JM, Gnanasekaran T (2019) An assessment survey of cloud simulators for fault identification. In: 2019 3rd International conference on computing and communications technologies (ICCCT). IEEE, New York, pp 311–315 21. Calheiros RN, Ranjan R, De Rose CA, Buyya R (2009) Cloudsim: a novel framework for modeling and simulation of cloud computing infrastructures and services 22. Cai Z, Li Q, Li X (2017) Elasticsim: a toolkit for simulating workflows with cloud resource runtime auto-scaling and stochastic task execution times. J Grid Comput 15(2):257–272 23. Fakhfakh F, Kacem HH, Kacem AH (2017, June) CloudSim4DWf: A CloudSim-extension for simulating dynamic workflows in a cloud environment. In: 2017 IEEE 15th International conference on software engineering research, management and applications (SERA). IEEE, New York, pp 195–202 24. Buyya R (2009) CloudAnalyst: a CloudSim-based tool for modelling and analysis of large scale cloud computing environments. Distrib Comput Proj Csse Dept, Univ Melb pp 433–659 25. Chen W, Deelman E (2012, October) Workflowsim: a toolkit for simulating scientific workflows in distributed environments. In: 2012 IEEE 8th international conference on E-science. IEEE, New York, pp 1–8 26. Bambrik I (2020) A survey on cloud computing simulation and modeling. SN Comput Sci 1(5):1–34 27. Ahmad MO, Khan RZ, Cloud computing modeling and simulation using Cloudsim environment 28. Fakhfakh F, Kacem HH, Kacem AH (2017, May) Simulation tools for cloud computing: a survey and comparative study. In: 2017 IEEE/ACIS 16th International conference on computer and information science (ICIS). IEEE, New York, pp 221–226 29. Jena SR, Shanmugam R, Saini K, Kumar S (2020) Cloud computing tools: inside views and analysis. Proc Comput Sci 173:382–391
A Systematic Review on Load Balancing Tools …
521
30. Hassaan M (2020) A comparative study between cloud energy consumption measuring simulators. Int J Educ Manage Eng 10(2):20 31. Zaidi T (2020) Analysis of energy consumption on iaas cloud using simulation tool. In: International conference on innovative advancement in engineering and technology (IAET-2020) 32. Kliazovich D, Bouvry P, Khan SU (2012) GreenCloud: a packet-level simulator of energy-aware cloud computing data centers. J Supercomput 62(3):1263–1283 33. Chen C, Liu J, Wen Y, Chen J (2014) Research on workflow scheduling algorithms in the cloud. In: International workshop on process-aware systems. Springer, Berlin, Heidelberg, pp 35–48 34. Kousalya G, Balakrishnan P, Raj CP (2017) Automated workflow scheduling in self-adaptive clouds. Springer, Berlin, pp 65–83 35. Yazdi NT, Yong CH (2015) Simulation of multi-agent approach in multi-cloud environment using matlab. In: 2015 Seventh international conference on computational intelligence, modelling and simulation (CIMSim). IEEE, New York, pp 77–79 36. https://fullforms.com/MATLAB 37. https://www.mathworks.com/videos/how-to-run-matlab-production-server-in-the-cloud-withamazon-web-services-1556013376778.html 38. AlSobeh AMR, AlShattnawi S, Jarrah A, Hammad MM (2020) Weavesim: A scalable and reusable cloud simulation framework leveraging aspect-oriented programming. Jordanian J Comput Inform Technol (JJCIT) 6(02) 39. Jammal M, Hawilo H, Kanso A, Shami A (2019) Generic input template for cloud simulators: a case study of CloudSim. Softw: Pract Exp 49(5):720–747 40. Taylor SJ, Kiss T, Anagnostou A, Terstyanszky G, Kacsuk P, Costes J, Fantini N (2018) The CloudSME simulation platform and its applications: a generic multi-cloud platform for developing and executing commercial cloud-based simulations. Futur Gener Comput Syst 88:524–539 41. Gill SS, Tuli S, Toosi AN, Cuadrado F, Garraghan P, Bahsoon R, Lutfiyya H, Sakellariou R, Rana O, Dustdar S, Buyya R (2020) ThermoSim: deep learning based framework for modeling and simulation of thermal-aware resource management for cloud computing environments. J Syst Softw 42. Khan RZ, Ahmad MO (2017) A survey on load balancing algorithms in cloud computing. Int J Autonom Comput 2(4):366–383
TDMA-Based Adaptive Multicasting in Wireless NoC Smriti Srivastava, Adithi Viswanath, Krithika Venkatesh, and Minal Moharir
Abstract The network-on-chip (NoC) is a developing technology that enables the simultaneous execution of multiple embedded cores on an exclusive die. The procedure being presently used for implementation of NoCs with planar metal interconnects is incompetent owing to its increased latency as well as its consequential power consumption because of performing data exchange using multi-hop links. To control these issues, high-bandwidth single-hop long-range wireless links replace the previously used multi-hop wire interconnects in the NoC. Hence, new breaks in elaborate investigations are welcomed into the arrangement of wireless NoCs (WiNoCs). As mentioned above compared to wired, wireless hybrid architecture (WiNoC) is gaining more popularity due to rapid delivery of unicast messages. In recent years, the applications mainly based on modern technology such as deep neural network and other artificial intelligence have shown a tendency to have multicasting and broadcast-type messages. The principal merits of WiNoC are that with the expansion of the number of cores present on the chip, there is an exponential rise in the multicasting message passing rate. In this paper, it is explored about the multicasting in WiNoC for communication through wireless nodes. As the quantity of wireless nodes used increases, the packet latency is reduced by a certain amount compared to that of wired nodes. Up to 17% reduction in average packet latency is achieved. Keywords Network-on-chips (NoCs) · System-on-chips (SoCs) · Multicast · Hubs · Nodes · Latency
S. Srivastava (B) · A. Viswanath · K. Venkatesh · M. Moharir Department of CSE, RVCE, Bengaluru, India e-mail: [email protected] A. Viswanath e-mail: [email protected] K. Venkatesh e-mail: [email protected] M. Moharir e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_35
523
524
S. Srivastava et al.
1 Introduction The network-on-chip (NoCs) have lately been viewed as the backbone of communication [1] networks to facilitate consolidation in multicore systems-on-chips (SoCs). In spite of having a few advantages, the main demerits of the traditional NoCs arise when data is transferred between any pair of distant nodes and this further leads to consumption of high power and high latency. To overcome these problems, many efforts have been made along the line by including ultra-low latency and low power express channels amidst distant nodes, and various approaches like 3D, photonic NoCs, and NoC architectures which possess multi-band RF interconnects have also been researched. Even when the above-mentioned approaches are considered, the latency and the power dissipation only reduce to a certain degree. The introduction of wireless NoCs (WiNoCs) with on-chip antennas reduced the latency and power dissipation to maximum extent compared to all the above-mentioned methods. Data transfer between NoC takes place through packets. Packets are again split into units called flits. Flit is considered to be the elemental unit of flow control between a pair of adjacent routers. The one-to-one message communication is called as unicasting, and one-to-many message communication is called as multicasting. The main reasons for preferring [2] multicasting over unicasting are as follows: (1) (2) (3) (4)
Quality of service is poor in unicasting compared to multicasting. In unicasting, local congestion on the source node is detected. Lastly, unicasting leads to repeated injections due to overhead caused by power consumption. Unicasting also faces a problem of global congestion. Here, repeated unicast packets end up competing for the same network resource which is not likely in case of multicasting.
Therefore, due to the above disadvantages the overall network performance of the unicast WiNoC decreases distinctly even though the proportion of the collective communication increases marginally. There are multiple SOC applications that demonstrate notable multicast traffic patterns like real-time object recognition, computing using neural networks, and so forth. There exist multiple SOC control functions that depend upon competent exchange of data. These functions have the responsibility of passing the different global states, supervising, organizing, and configuring the present on-chip networks and also executing the following: (1) operand exchange networks, (2) aberrant cache architectures, and (3) cache coherence protocols. When considering the choice of architecture to be used, the time division multiple access (TDMA) architecture seems more apt than the frequency division multiple access (FDMA) due to its more effective usage of power and bandwidth as compared to FDMA, and hence, it facilitates multiple access in an efficient manner. At present, multicasting in NoCs has been achieved mostly through a wired framework which has no wireless interlinks. Even with the smartest of multicasting algorithms, this causes a huge multicast latency as the size of the network grows larger or as the amount of traffic in the network grows.
TDMA-Based Adaptive Multicasting in Wireless NoC
525
The works which incorporate wireless frameworks for multicasting use an FDMA wireless system. This means that any two wireless modules communicate in the network using a separate bandwidth allocated just for that pair. This has the issue of being highly power hungry as each wireless module now requires a receiver which works at a different frequency and has to be constantly on so as to receive messages from other wireless modules. The motivation for this work which implements multicasting on a wireless architecture using a TDMA structure is to save power and to implement a strategy which will stand to show that multicasting using such a structure is better when the power consumption and the performance are considered.
2 Literature Review Brian et al. [3] created intra-chip wireless interconnect for clock distribution implemented with integrated antennas, receivers, and transmitters. It uses an on-chip clock transmitter with an integrated antenna. It suffers from wastage of energy and the chance of going out of sync; nonetheless, it aims at having self-correcting clock structures. In this paper, Lee et al. [4] researched how message transactions in a CMP, i.e., chip multicore processor, can help reduce long-range hops. This paper showed an implementation of a primitive version of the wireless transmission using RF waves that had short ranges; hence, there was a need to route packets across subnets. Author in [5] gives a basic idea of using wireless NoCs as the backbone for multicore chips. It gives a basic idea of its features and how promising it could be. It also states its challenges. The wireless links proposed here suffer from noise, since they also conform to an FDMA strategy. A TDMA strategy is proposed to prevent noise and also to prevent increased bandwidth. Maurizio Palesi et al. developed a TDMA strategy to implement WiNoC under the guise of radio access control mechanism [6]. This paper discusses the holding strategy that was used in the token passing mechanism. The entire project was designed and tested on Noxim, a SystemC simulator. Catania et al. [7] introduce Noxim, a cycle-accurate simulator which in its release itself had the capability of wireless NoCs. Noxim was an option to use instead of BookSim, but it is written in SystemC and therefore hard to modify and build the own modules. Sergi Abadal et al. wrote a performance analysis paper describing the effect of scalability of a wireless-based NoC versus the wireline-based model [8]. They have shown that wireless-based NoCs cannot sustain at high injection rates. They proposed a hybrid between mesh and WiNoC; however, their focus was concentrated toward the broadcasting aspect of NoCs and not of how it affects multicasting in case of cache invalidation. Ahmed Ben Achballah et al. conduct a survey on emerging trends in NoCs. Wireless NoCs are a major part of this survey. All different types of wireless modules are
526
S. Srivastava et al.
extensively divided, and their shortcomings and major features are brought to light [9]. The author tries to incorporate some of the ideas that are presented in this paper, especially the ones regarding wireless mesh NoCs. In this paper, Boppana et al. [10] give an overview on issues related to wormhole multicasting routing. It mainly focuses on addressing problems related to deadlocks. The paper explains the Hamiltonian path-based multicast routing and column path multicast routing. This algorithm is related to dimension-order routing and multicast routing conforming to base routing. Next section indicates a few instances where there may be a deadlock when pre-proposed multicasting algorithms are used including a few possible methods to avoid deadlocks. D. Panda et al. proposed an overview on wormhole k-ary n-cube networks with a few basic concepts of multi-destination message passing [11]. Here, BRCP model is provided for both the classes of wormhole routing schemes: with and without cyclic dependencies. The two main results obtained from the paper are when the number of processors in multicast increases drastically then the multicast latency can be reduced to near constant. The multicast latency can be reduced by taking advantage of routing adaptivity. N. E. Jerger et al. compare unicast (one to one), broadcast (one to all), and multicast (one to many) traffics and concluded that the present on-chip networks provide better performance in terms of high throughput and low latency for unicast traffic [12]. This resulted in relying on multiple unicast and led to bursting of packets from a single source; therefore, this directed the need for multicast traffic. Hence, virtual circuit tree multicasting (VCTM) was proposed which increased network performance and decreased power consumption. E. Carara and F. G. Moraes [13] discussed the transmission of simultaneous multicast messages without the requirement of a multicast service by implementing a dualpath multicast algorithm for NoCs. This is done by varying the switching mode. It also explains the use and advantage of replicating consumption channels to avoid deadlock.
3 WiNoC Architecture 3.1 Topology An 8 × 8 2D mesh topology is used for this network-on-chip, where each processor is associated with its own router with the help of credit channels as well as bidirectional flit. NoC is further divided into evenly sized zones that should not overlap. Each zone has to house a wireless hub. The hub and the processor are disconnected, and hence, using wired channels, the hub can be linked to numerous routers in that particular region or zone. The communication between the various hubs happens using the TDMA protocol. The wireless hubs can be flexibly placed in this simulation framework where the hubs can be arranged in any position not beyond the region and
TDMA-Based Adaptive Multicasting in Wireless NoC
527
also have the capability to link to numerous routers. When not explicitly detailed or stated, the hubs are linked to the central router of the region to shorten the average distance from any particular node to the wireless hub of the specified region.
3.2 Multiple Access Scheme for Wireless Hubs The various access schemes that are examined are explained under this section. When FDMA technique is considered, a channel is divided into numerous subchannels all of which are operating at a decreased bandwidth. Usually, each individual wireless hub retains its distinctive inherent transmission frequency due to which the receiver of the numerous wireless hubs would be able to simultaneously receive from all the transmitters. Therefore, the receiver ultimately inclines toward a bulky, complex, and power-hungry design. Here, the receiver size increases incontrovertibly for a wider network, and hence, this design is inextensible for the same reason. After intensive research and investigating the possibility of multiple wireless hubs connected to an exclusive subchannel to try and alleviate the scalability issues, a wireless system with a TDMA-based single channel is finally used and this has a high baud rate. This ensures that every single receiver hub receives the packet in a single clock cycle and also helps the wireless hubs to facilitate the transmission of five flits or the entire packet in just one clock cycle. It is necessary to ensure that tokens passing overhead are avoided, and this is implemented by providing a prefixed timeslot for each and every wireless hub.
3.3 Routing In this model, a hybrid routing technique is used. The packet chooses either a wired or a combination of a wired and wireless path, according to the position of the source and destination in such a way the total hop distance is minimal. Each and every router is aware of the location of all the present wireless hubs as well as the topology used in the network, and it compares and identifies the path that should be taken by each of the arriving packets. The path taken by XY routing is compared to the path through the closest wireless hub according to the number of hops taken. The packet is then dispatched to its neighbor which further transmits it closer to its destination. In their movement, from the start to the end or source to destination, all the packets that are transmitted through wireless hubs have three distinct phases. Phase 1 involves the packet moving from its source to its closest wireless hub, whereas in phase 2, the packet is further transmitted from a wireless hub to another wireless hub that is beside the destination, and finally in phase 3, it arrives at the required destination. The router structure can be changed accordingly to ensure apt decisions and the path comparisons. For example, consider router 14 in the below diagram,
528
S. Srivastava et al.
Fig. 1 Architecture of the wireless NoC
• If router 14 wants to connect to router 56, the packet takes XY route path to access router 22 → Phase 1. • From router 22, it passes through the wireless channel to router 50 → Phase 2. • From 50, it takes the XY route path to access router 56 → Phase 3. But if router 14 wants to connect to router 11, the packet takes the XY route, even though they are in different regions as the hop distance of the wired path is less than the wired-wireless hybrid path (Fig. 1).
3.4 Multicasting The multicast packets are distributed according to the region or zone. The algorithm from Fig. 2 is used to transmit the multicast packets. The process is as follows: The tile produces a multicast packet and transmits the same to the tile’s router along with a 64-bit multicast vector. Considering the fact that an 8 × 8 mesh topology is used, each bit from the multicast vector is co-related to a tile in the topology. To indicate whether the tile is a multicast destination or not, 0s and 1s are used, 1 for if it is and 0 if it is not. Let us say that there exists a multicast packet with 23 destinations, and then, the vector consists of 23 ones and 41 zeroes. The multicast vector and four 16-bit vectors each representing various multicast destinations within a region can also be viewed. The source router creates a packet with its payload as the multicast information and inserts the multicast vector of its region into the packet’s header. This design handles multicast packets and unicast packets differently. Multicast packets are distributed regionwise. Figure 2 (flowchart) shows the algorithm that is followed to forward the multicast packets. Both the wireless hubs and the normal routers perform certain operations to ensure that multicast packets are being properly
TDMA-Based Adaptive Multicasting in Wireless NoC
529
Fig. 2 Multicasting algorithm for wireless hub
distributed across various regions depending on respective multicast destinations. The source router extracts the multicast vectors per region. Intermediate routers present in the trail of packets utilize this multicast vector to facilitate the transmission of the multicast packets in that respective zone. In the presence of destinations in other regions, there will be a new packet that is created containing the multicast information as its payload and the multicast vector of all the other zones added to the packet’s header. This multicast packet is then redirected to the wireless hub of the source router’s region. Using TDMA protocol, the wireless hubs forward the packet to other wireless hubs. These wireless hubs examine their respective multicast vector for any additional multicast destinations in their zone as they acquire the multicast packet. The packet is then either disposed of or it is accepted and approved according to this condition. The accepted packets are then transmitted to its router from the hub. These are further forwarded to the specified destinations in that region using XY tree routing algorithm, and this is performed by the router.
530
S. Srivastava et al.
The process is usually achieved by duplicating the multicast packets toward the east and west ports (X-direction according to routing algorithm) according to the location of the various multicast destinations. The multicast packet moves through intermediary routers. These intermediate routers have a precise condition check to recognize other multicast destinations along the same column, and accordingly, packets are duplicated and forwarded along the north and south ports (Y-direction according to routing algorithm). Wherever and whenever duplication of packets occur, the multicast vector gets updated accordingly and this updation helps the router retains a record of the different set of destinations that are either already serviced or are yet to be serviced, and finally, all the packets are delivered to their respective destinations in that particular region. The source router creates two packets that have identical multicast information as the payload. One packet has the multicast vector of its particular region in its header, while the other has the multicast vectors of all the other regions. Once the packet is acquired by the node, the current node is checked if it is a destination according to the corresponding bit in the multicast vector (as discussed previously), and if it is a 1, a duplicate of this particular packet is ejected by the node and the multicast vector is updated by modifying the bit position from 1 to 0, to establish that this particular destination has been serviced. Duplicate packets are then created and transmitted to the various destinations along the north and south directions after setting the corresponding bits in the multicast vector to 1. In the case of no destinations present in the north or south direction and the multicast vector has a few 1s, the packet is then addressed to the apt X-direction. The receiving hub drops this packet it receives while the vector only contains 0s or it accepts the packet and uses the algorithm of Fig. 2 to distribute it in its region. An intermediate node may create at most three duplicate packets that are north, south, and local eject. If the duplication happens after routing, it requires an additional three buffers for storing these duplicate packets. A copy of the packet is sent after updating its multicast vector when a VC is allocated to a particular direction. When all the bits of multicast vector are 0, the packet is discarded, the VC is then declared empty, and the respective credit is sent back to the previous node.
3.5 Multicasting Algorithm Step 1: Start the multicasting for wired nodes. Step 2: Receive the incoming multicast packets. Step 3: Check whether the destination node is equal to the current router. Step 4: If No, check if any multicast destination has an identical column value like that of the current router. Step 5(i): If Yes, create a duplicate packet(s) for the same column multicast destination. Step 6(i): Then forward the newly created packet and current packet to their immediate neighbor.
TDMA-Based Adaptive Multicasting in Wireless NoC
531
Step 7(i): If No, forward the packet to the immediate neighbor router. Step 8(i): Stop the multicasting. Step 9: If Yes, forward the multicast packet to the core. Step 10(ii): Then check for any remaining destinations. Step 11(ii): If there are any remaining destinations, then go back to step 4. Step 12(ii): If there are no remaining destinations, then go back to step 6(i).
4 Evaluation The implementation of WiNoC with wired and wireless nodes is done on the BookSim simulator. There are various such simulators present used for this purpose. Here, BookSim simulator is used. It is an open, configurable, and cycle-accurate NoC simulator developed in C++, which allows analyzing the performance and power figures of both conventional wired NoC and emerging WiNoC architectures. A wireless patch is introduced to BookSim that can now be used as one of the possible topologies offered by BookSim. It allows the user to configure and use wireless hubs as a method of communication in a 2D mesh. It also allows the user to decide their own hub placement algorithm and modify other aspects of the wireless hubs such as its receive-queue size. The effects of this new wireless structure on multicasting have been tested in order to come up with a cost-efficient multicasting strategy. This was the main purpose of this project for which a wireless patch for the simulator was built. It uses the MDND [14] (Message Duplication in Non-Destination) strategy for multicasting which makes sure that there is minimal multicast traffic in the network. This also allows for configurable parameters such as the multicast injection rate, a switch which can turn on and off the multicast capabilities, and to decide the number of multicast destinations for each multicast packet. In the proposed model, on an 8 × 8 mesh with wireless capabilities which incorporated multicast packets, at an injection rate of 0.001, an average multicast latency drop of 10 cycles from 53 to 42 cycles can be observed. Moreover, on a 16 × 16 mesh, an average multicast latency drop of over 30 cycles from 100 to 72 can be observed when wireless modules are used in tandem with multicasting. In both cases, four wireless hubs were used and this gives a clear picture of how wireless modules help in curbing scalability-related latency spikes. The performance of the proposed technique is evaluated on the basis of: • Average packet latency The multicast ratio and multicast destination are specified to generate the multicast packet. Multicast destination 8 means all the multicast packets which have been generated have 8 packets, and ratio is given in the form of percentage; for example, 10% percent multicast ratio = 10 out of 100 packets generated are multicast packets.
532
S. Srivastava et al.
Fig. 3 Performance at 10% multicast injection; 8 × 8 mesh; packet size 1. m = number of wireless node
4.1 Evaluation Based on Average Packet Latency Average packet latency can be defined as the average number of cycles a packet takes to travel from source to destination. The above graph depicts the average packet latency for wired and wireless nodes. From the above graph, it is evident that in case of wired nodes when m = 0 the injection rate is directly proportional to average packet latency; that is, as there is an increase in the injection rate, the average packet latency will also increase. But in case of wireless nodes when m = 2,4, and 8, as the number of wireless nodes increases with the increase in injection rate, the average packet latency decreases with injection rate (Figs. 3, 4, and 5).
5 Conclusion This paper mainly focuses on multicasting in wireless network on chips, its architecture, and comparison of wireless and wired nodes based on its performance. The comparison is done with the help of implementation and plotting graphs of injection rate against average packet latency. The results showed that for wired nodes the average packet latency increases with the increase in injection rate but when the number of wireless nodes increases with the increase in injection rate the average packet latency is reduced by a certain amount compared to that of wired nodes. It can be concluded that the following work has been done in the paper:
TDMA-Based Adaptive Multicasting in Wireless NoC
533
Fig. 4 Performance at 5% multicast injection; 8 × 8 mesh; packet size 4. m = number of wireless node
Fig. 5 Performance at 5% multicast injection; 16 × 16 mesh; packet size 3. m = number of wireless nodes
534
S. Srivastava et al.
• Addition of a modular wireless patch which has its own set of parameters which can be inputted by the user as a configuration parameter. • Implementation of multicasting on BookSim using the MDND [14] algorithm. • Comparison of performance of the network when wireless hubs are used as compared to a purely wired setup and the obtainment of a positive result.
6 Limitations The proposed work has a few limitations as listed below. The occurrence of the inevitable clogging at the wireless hubs causes heavy congestion at the wireless hub as almost all packets are trying to travel via the wireless hubs. This congestion only occurs during high injection rates which is a very rare scenario in real-time traffic.
7 Enhancements and Future Work Enhancements are a necessary key to continuous improvement and increased efficiency. Congestion control techniques can be introduced in the routers. This will help in reducing the load at the hub at higher injection rates.
References 1. Shetty SS, Moharir M, Sunil K, Ahmad SF (2018) Low latency & high throughput wireless-NoC architecture for Manycore processor. In: International conference on networking, embedded and wireless systems (ICNEWS), 27–28 Dec 2018 2. Padma Priya A, Ashok Kumar M (2007) Multicast-aware high-performance with wireless network-on-chip architectures. In: 6th National conference on frontiers in communication and signal processing systems (NCFCSPS ‘18), ISO 3297: 2007 Certified organization, vol 7, Special Issue 1, March 2018, 13th–14th March 2018 3. Floyd BA, Hung C-M, Kenneth KO (2002) Intra-chip wireless interconnect for clock distribution implemented with integrated antennas, receivers, and transmitters. IEEE J Solid-state Circuits 37(5):1–3 4. Lee S-B, Tam S-W, Pefkianakis I, Lu S, Frank Chang M, Guo C, Reinman G, Peng C, Naik M, Zhang L, Cong J (2009) A scalable micro wireless interconnect structure for CMPs. In: MobiCom ‘09 Proceedings of the 15th annual international conference on mobile computing and networking, 2009, pp 52–59 5. Deb S, Ganguly A, Pande PP, Belzer B, Heo D (2012) Wireless NoC as interconnection backbone for multicore chips: promises and challenges. IEEE J Emerg Sel Top Circuits Syst 2(2):1–6 6. Palesi M, Collotta M, Mineo A, Catania V (2014) An efficient radio access control mechanism for wireless network-on-chip architectures. J Low Power Electron Appl 5(2):38–56 7. Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2016) Cycle-accurate network on chip simulation with Noxim. ACM Trans Model Comput Simul (TOMACS) 27(1):Article No 4
TDMA-Based Adaptive Multicasting in Wireless NoC
535
8. Abadal S, Mestres A, Nemirovsky M, Lee H, Gonzalez A, Alarcon E, Cabellos-Aparicio A (2016) Scalability of broadcast performance in wireless network-on-chip. IEEE Trans Parallel Distrib Syst 27(12):52–54 9. Liu D (2011) Design and implementation of high-quality course scoring system based on struts and spring and hibernate architecture. In: International conference of information technology, computer engineering and management sciences, 2011, pp 1–4 10. Boppana RV, Chalasani S, Raghavendra CS (1998) Resource deadlock and performance of wormhole multicast routing algorithms. IEEE Trans Parallel Distrib Syst 9(6):535549 11. Panda DK, Singal S, Kesavan R (1999) Multi destination message passing in wormhole K-Ary N-cube networks with base routing conformed paths. IEEE Trans Parallel Distrib Syst, pp 76–96 12. Jerger NE et al (2008) Virtual circuit tree multicasting: a case for on-chip hardware multicast support. In: Proceedings 35th International symposium. Computer architecture (ISCA), pp 229–240 13. Carara EA, Moraes FG (2008) Deadlock-free multicast routing algorithm for wormholeswitched mesh networks-on-chip. In: Proceedings IEEE computer society annual symposium on VLSI, pp 341–346 14. Aruna MR, Jishaa PA, Joseb J (2016) A novel energy efficient multicasting approach for mesh NoCs. In: 6th International conference on advances in computing & communications, ICACC 2016, 6–8 Sept 2016, Cochin, India
Barriers to the Widespread Adoption of Processing-in-Memory Architectures B. Mohammed Siyad and R. Mohan
Abstract In modern big data applications, a huge amount of data needs to be shared among the processor and memory. Since the data is too big for on-chip cache hierarchies, the off-chip data movement between the processor and memory leads to long execution latency and huge energy consumption which may affect the qualityof-service goals of the applications. As of late, a lot of memory innovations based on material modification at nano scale have arisen for storage and computing which offer high performance. Processing-in-Memory or PIM have been emerged as one such solution to these architecture bottleneck of data intensive applications which places the computation unit where the data make sense or where the data resides. In this paper, we survey the extensive body of literature in PIM technology across architectural and application dimensions and present various issues and challenges which place heavy burden on the researchers and developers. Furthermore, based on our analysis of existing PIM systems we identified some key topics for future research, which we consider as crucial for its widespread adoption. Keywords Processing in memory · Near memory computing · Data centric computing · In memory computing · Big data
1 Introduction We are living in an era of big data. In modern big data applications like deep learning, IoT etc., a huge amount of data needs to be processed and shared among the processing and storage units which leads to long execution latency and performance degradation. To reduce the performance loss due to the huge data transfer, the Processing-inMemory or PIM concept was introduced. Although the idea of PIM was extracted in B. Mohammed Siyad (B) · R. Mohan Department of Computer Science and Engineering,National Institute of Technology,Tiruchirappalli, India e-mail: [email protected] R. Mohan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_36
537
538
B. Mohammed Siyad and R. Mohan
1990s, not widely accepted due to some practical concerns. With the recent advancements in 3D stacking technologies and other non-volatile memories, the idea is live now.
1.1 Scope of the Survey The world is experiencing the processing evolution of big data. The large-scale applications of IoT, AI and other emerging technologies like AIoT (Artificial Intelligence of Things) [1] have become a major factor of the emergence of big data. Conventional general purpose computing systems are based on Von-Neumann architecture in which the memory is separated from processing unit. In these processor-centric architectures, the central processing unit (CPU) is considered as the master of computations and all other storage/communication devices are treated incapable in computations. So, in today’s big data applications, a huge amount of data is to be travelled through the low bandwidth media between the processing unit and storage/communication units. The massive data movement affects the performance, energy efficiency and reliability which are the three basic attributes of a computing system. This VonNeumann bottleneck [2] increases the energy and performance overhead. With the timely advancements in transistor technology, the processor and memory performance have also been scaled. Figure 1 shows the performance scaling where the performance of processor doubles every two years, while the performance of memory doubles every 10 years [2]. The performance gap due to this memory wall [3] between the processor and memory causes long waiting of the processor to access data from the memory. In modern computing systems, the power consumption of load/store operations is much higher than data processing operations. To overcome the performance overheads due to the architecture bottleneck and power consumption, the transfer of huge amount of data between the processing and storage units should be controlled or reduced. Technologies like in-memory computing and near memory computing have been explored in view of this which can be applied to any level of memory hierarchy and with various technology options. With the advancements in 3D stacking and other memory technologies, the PIM technology is becoming popular in modern computing industry after a long period of dormancy. At the same time, it introduces some challenges and design issues to the researchers and programmers to overcome. In this paper we describe these challenges in detail by highlighting some existing works in this direction. These issues should be systematically and carefully examined and solved to become this technology widely accepted in future.
1.2 Processing-In-Memory (PIM) In modern big data applications, a huge amount of data needs to be shared among the processor and memory. Since it is too big for on-chip cache hierarchies, the off-chip
Barriers to the Widespread Adoption …
539
Performance
Fig. 1 Performance scaling of CPU and memory with respect to time. Adapted from [2]
U
CP
)
rs
ea
y :2
Performance Gap
X
(2
ry Memo
(2X: 10
years)
Time
data movement between the processor and memory leads to long execution latency because of the memory wall and huge energy consumption which affects the qualityof-service goals of the applications. For future computing systems to assure high performance, energy efficiency and sustainability, the unnecessary data movements between the processor and memory should be minimized. Most of the studies reveals that the overall system energy can be significantly reduced by reducing the cost of the data movement associated with it. To reduce the data movement and alleviate the bottleneck, new technologies and architectures have been explored. One of the most accepted approach was to perform all the data transfer related functions close to or inside the memory which had thrown light into the PIM technology. Processing-in-Memory or PIM have been emerged as a solution to these architecture bottleneck of the big data applications which places the computation unit where the data make sense or where the data resides [4]. The idea of PIM is to place to computation in or near the memory so that the movement of data between the storage and computation units is reduced or completely eliminated. The computations are placed in memory which itself organized as a computational memory [5]. Figure 2 illustrates the distinction between the concept of PIM and a conventional computing system. In PIM, memory is the vital point of computation based on their physical attributes and other control logic. The PIM concept in data centric applications can be enabled using two approaches: First, Processing Using Memory in which the computations are performed within the memory with minimal changes using some special circuitry. Second, Processing Near Memory which enables PIM in a general-purpose manner where computations are performed in a processing element, e.g., CPU or GPU, placed near the memory unit.
540
B. Mohammed Siyad and R. Mohan Processor
Memory
Processor
Memory Module
ALU
f(A)
Memory
ALU
Computational memory No data movement
A
f(A)
Data "Memory Wall" Cache
f
Cache
Data
Conventional Computing Systems
Processing in Memory
Fig. 2 Concept of PIM vs conventional systems. In conventional computing systems, for performing a computation, say f(A), the data A is to be brought to the CPU which causes the memory wall [3]. In PIM architectures, the function is performed within the memory itself organized as computational memory (based on [5])
1.3 Processing Using Memory (PUM) This approach exploits the capabilities of DRAM and the emerging non-volatile memories (NVMs) to implement computations within the memory through minimal modifications which ensures that the actual memory function is preserved. Using DRAM: The high internal bandwidth of DRAMs helps them to become a better choice for implementing PIM concept through some minimal modifications in the DRAM chip architecture. The minimal modification in the DRAM ensures that the original memory function is preserved and the area constraints are met. Some techniques such as charge sharing [3] is used to perform bulk bit-wise computations efficiently with in the memory. Even though DRAM PIM outperforms the host CPU execution, there are some challenges also in the implementation due to the small transistor size, increase in operational frequency and periodic refreshment. Using NVMs: Unlike the charge dependent nature of DRAM, in non-volatile memories the data is represented as cell resistance. Since the data is implanted as resistance, it is protected from accidental lose even in power failure and so it is non-volatile in nature. Phase change memory (PCM), Resistive RAM (ReRAM) and spintronic RAM are some promising NVM technologies that support the computational memory concept. They promise efficient data access and latency compared to DRAM. Recent researches have demonstrated that the NVM technologies best suitable for performing Boolean logical operations and multiplications [6].
1.4 Processing Near Memory (PNM) In these systems, the computations are done through processing elements (PE) placed closer to the memory module. Compared to the PUM systems, the internal memory
Barriers to the Widespread Adoption …
PE1
PE2
...
541
PEn
System bus
DRAM Stack
...
DDR memory
Logic Die
PIM Core
DDR memory
PIM Core
Memory controller
3D packaged DRAM (PIM Device) PIM Dvice
Fig. 3 Concept of Processing Near Memory (PNM)
bandwidth is less as the computations are not directly performed in memory. This mechanism is particularly supported by the capabilities of 3D stacking memory technology in which a bunch of DRAM layers and a logic die are stacked into a single chip as represented in Fig. 3. As in the expansion of the circled portion in the figure, the logic die is placed beneath the DRAM layers through TSV (Through Silicon Via) interface [7]. A wide range of functionalities can be integrated in the logic layer to connect the processor and the DRAM cells. The TSV interface promises low latency, less power consumption and high bandwidth. A recent study by Singh et al. [8] demonstrates that FPGA based near-memory computing can possibly lighten the data movement bottleneck for modern big data applications. Thus, Processing-in-Memory accelerates the computation by keeping the computation unit closer to or inside the main memory where data makes sense. PIM architectures are promising solution to the memory bottleneck issues in modern memory architecture by reducing the data transfer between the processor and memory to minimal. A brief comparison between the PUM and PNM with some example architectures and the enabling technologies is given in Table 1. The rest of the paper is organized as follows. Section 2 gives a landscape of various research challenges in the Processing-in-Memory paradigm identified after analysing some architectures in the literature. Section 3 describes some future research topics and opportunities towards the PIM concept. Section 4 gives the concluding remarks of the work.
542
B. Mohammed Siyad and R. Mohan
Table 1 PUM and PNM: examples and enabling technologies Approach Methodology Example Architectures Enabling technologies PUM
Analog operational principles of existing DRAM and the emerging NVMs
Row-Clone [9] Ambit [10] PiDRAM [11]
PNM
Integrating a PIM logic (Accelerators, re-configurable logic etc.) close to or inside the memory
Tesseract [12] GenASM [13] NATSA [14] GRIM Filter [15]
SRAM DRAM Phase Change Memory (PCM) RRAM or memristors MRAM Logic layers in 3D-stacked-memory Silicon interposers Logic in memory controllers
2 Challenges of Processing in Memory Placing the computation units close to memory or inside the memory have given rise to a lot of system-level and programming level challenges to the computer architects and developers. In this section, we describe the research challenges in connection with programming model, data mapping and scheduling, virtual memory support, memory coherency and availability of data structures. Future computer architects and researchers should efficiently and systematically address these barriers and come up with appropriate solutions so that the emerging PIM architectures will be having a wide range of acceptance.
2.1 Availability of Programming Models and Generation of Code One of the basic research issues in PIM is to support the heterogeneous programming environment of the host processor and the PIM device. Efficient integration of PIM architectures with existing compiler-based techniques is still in the infancy stage due to unconventional programming models. Ahn et al. [16] have introduced PIM enabled instructions with a view of realizing the in-memory computation which are inter-operable with existing computational models with no change in their programming interface by extending the ISA of the host processor to the PIM instructions. During the execution of a PIM instruction, the I/O operands are moved between the host processor and memory through some off-chip links. Due to the exchange of information between the CPU and the PIM processing logic, some amount of overhead was raised while handling large tasks. So, techniques should be developed to integrate the PIM instructions with compilerbased methods to reduce the burden of the programmer.
Barriers to the Widespread Adoption …
543
2.2 Run-Time Issues We have identified some run-time issues in PIM approach. Identifying the functions or primitives which is to be offloaded into PIM, scheduling them effectively among the cores, mapping them into the memory modules etc. are the major run-time issues at run-time which are described in this section. Identifying PIM offloading candidates: PIM includes embedding some logic into the memory device and offloading some major computations into this embedded logic. Boroumand et al. [17] have observed that only some simple functions or primitives which have been called PIM offloading candidate, are responsible for occupying the major portion of the data movement. Identifying these PIM offloading candidates manually is a complex task to the programmer since it requires a significant understanding of the hardware trade-offs between processor and PIM cores. They have proposed a systematic criterion to identify the offloading candidates. According to their observation, a function can be termed as a PIM offloading candidate if (1) the function occupies the major part of the overall workload energy (2) the data movement consumes a major portion of the workload energy (3) the misses per kilo instructions (MPKI) of the last level cache is more than 10 (or memory intensive). They have implemented the PIM targets in 3D stacked memory where once the offloading candidates have been identified, the PIM logic can perform the required operation inside the logic layer. Scheduling and mapping of PIM offloading candidates: Deciding whether the candidate functions are to be actually off-loaded into the PIM or not is a challenging issue. The lack of an efficient scheduling and mapping mechanism can suppress the benefits of PIM technology. Ahn et al. [16] have proposed a locality-aware scheduling policy for the PIM enabled instructions (PEI) to determine whether to execute the instructions in the host processor locally or to offload in to the memory. In this scheduling mechanism, they have monitored the locality of the data used by the PEIs and executed the highly localized instructions in the processor itself other than offloading to the memory to exploit large on-chip cache. They have introduced a locality monitor as a tag array with some valid information bits which is updated upon a PIM instruction is sent to main memory. Hsieh et al. [18] have introduced a dynamic decision-making mechanism which determines whether the offloading code is to be actually offloaded based on the runtime system conditions. In this dynamic offloading aggressiveness control mechanism, the GPU monitors the offloading requests and prevents the offloading from exceeding the capacity of the corresponding 3D stack symmetric multiprocessor to block over offloading. The GPU also keep track of the bandwidth utilization of the transmit and receiving channels and prevent the offloading if the traffic in any of the channels is more than the threshold utilization rate. In a work done by Pattnaik et al. [19], a code scheduling mechanism is implemented in PIM-assisted GPU architectures to exploit the performance and energy
544
B. Mohammed Siyad and R. Mohan
efficiency capability of PIM in GPU based systems. In PIM-assisted GPUs, at least one 3D stacked memory chip, housing simple GPU cores (GPU-PIM) in the logic layer is placed close to the primary GPU chip to assist in computations. They have proposed two mechanisms which operate at kernel level for effective scheduling and management. First, a kernel offloading mechanism which accurately identifies at run-time which kernel would avail from PIM using regression analysis and offloads them to the GPU cores in the 3D stacked memory. They have developed a regression model which predicts the affinity of a kernel to either GPU-PIC (Processing-inCore) or GPU-PIM (Processing-in-Memory) using some predictive variables, before it starts execution. Based on this computation, the kernel is offloaded from CPU to either GPU-PIC or GPU-PIM. Second, a concurrent kernel management mechanism which determines multiple kernels to schedule concurrently on the powerful primary GPU core and the GPU cores in memory at run-time. This approach reduces the under-utilization of one of the GPU cores by balancing the execution time on both. They made use of kernel level dependency information, kernel’s affinity and execution time to schedule them concurrently. To balance the execution time, the kernel with less execution time on any of the GPU cores is offloaded into that GPU. Hsieh et al. [18] also proposed a hardware-software collective mechanism to accumulate data and the offloaded code in the same memory stack to reduce the off-chip bandwidth consumption. This approach was based on the observation that a large portion of the offloaded code shows repeatable memory access pattern. The primary GPU evaluates the memory mapping mechanisms and determines the best mapping in a learning phase. Then it predicts the memory pages that the offloaded code access based on the predictability using the access patterns and maps the data closest to the code. The offloaded code runs on the memory stack streaming memories using the predicted mapping during the learning phase. There are still research going on regarding the widening of acceptability of the PIM by narrowing the above-mentioned barriers to the effective run-time scheduling of the PIM engines among the multiple processing cores for the efficient management of memory between CPU cores and PIM engines.
2.3 Memory Coherence In a multi-threaded, multi-core environment, writes to a memory must be coordinated in such a way that the data must be coherent. Coherence is a major challenge in PIM architecture research as the data will be accessed by both PIM logic and processor cores. Since the conventional coherence protocols are having a lot of message transfer among the processor and memory, adoption of them may affect the special PIM advantages like high bandwidth and low latency. A number of approaches such as coarse-grained coherence, coarse-grained lock, use of non-cacheable PIM data etc. were developed for reducing the PIM coherence. Most of them cause coherence overheads and performance degradation.
Barriers to the Widespread Adoption …
545
Ahn et al. [12] used a mechanism in which the accelerator is mapped to the non-cacheable memory portion of the host processor to eliminate cache coherence problems between the host processor cache and the 3D stacked memory. This system performs computations to the target core using blocking and non-blocking remote function calls. This mechanism causes some restrictions in the programming model and also undo some of the major advantaged of PIM. Boroumand et al. [20] have introduced a coherence protocol named LazyPIM which reduces the off-chip traffic due to unnecessary message transfers. In this protocol, instead of direct message transfer to the processor coherence directory, it keeps all the coherence updates in the cache with a speculative assumption that they have the required coherence permission. Once the kernel execution is finished, it sends the collective coherence information in the form of compressed signatures to the processor directory. The processor evaluates the signatures and check for conflicts. If conflict found, the PIM core rolls back the changes and re-executes the kernel from the beginning. The speculative data is committed if no conflicts exists and the coherence directory is updated. This protocol offers efficient concurrent execution by avoiding unnecessary flushing of data and less coherence traffic. The main drawback of this mechanism is the overhead in the roll back and re-execution of PIM kernel during the conflict resolution. To reduce the roll back overhead, Boroumand et al. have made an enhancement in the LazyPIM protocol by introducing a partial commit [21] mechanism in which the kernel is divided into small pieces of execution and the commit is performed after a partial kernel execution. This improvement has reduced conflict possibility and cost of roll back. After two years, Boroumand et al. [22] have developed another protocol for coherence in near-data accelerators (called CoNDA) in which, as in the case of LazyPIM, the NDA starts execution optimistically by assuming that it has acquired the coherence permission without accessing the processor coherence directory. During the execution it records all the memory access information by the NDA and the rest of the system. Once the execution is over, CoNDA checks and analyses the recorded information to identify the necessary coherence operation which avoids the unnecessary data movements. If no coherence operation is found necessary for the NDA kernel portion, it commits all the updates, else invalidates all the uncommitted updates and re-executes the NDA portion. This protocol reduces the unnecessary data movements by getting an insight into the necessary memory accesses. At the same time, it possesses a little hardware overhead. A brief comparison between the three coherence protocols is given in Table 2. More efficient memory coherence mechanisms which can handle all types of workloads in heterogeneous PIM environments are to be developed in future.
546
B. Mohammed Siyad and R. Mohan
Table 2 PIM coherence protocols Protocol Methodology LazyPIM
Improved LazyPIM
CoNDA
Speculative execution (Commits after kernel execution) Speculative execution (Partial commit) Getting insight into the memory access necessity
Benefits
Drawbacks
Less traffic, concurrent Roll back overhead execution Less chance of conflict, reduced roll back cost Necessary coherence data transfer only
Rollback overhead (less compared to 1) A little hardware overhead
2.4 Virtual Memory Management Implementation of virtual memory is a vital challenge in PIM operation. Applications store any reference to memory in the form of virtual addresses. During code offloading, when the offloaded code needs to access a location for a data, its virtual address is to be translated into the actual physical address before the PIM logic access it. To reduce the access overhead, a TLB is used which contains the frequently accessed mappings. The CPU also uses a page table walker to access the page table itself in the case of a TLB miss. These CPU side address translations significantly invalidates the PIM benefits as it relies on long latency memory accesses. Replication of the TLB within the PIM logic is a naive solution to these issues but still has some difficulties to adopt it due to the following reasons: (1) Keeping coherence between the TLBs in the PIM side and CPU side (2) Extra hardware cost of duplication (3) Ensuring compatibility between the in-memory TLB and other architectures (4) TLB misses lead to significant performance degradation in the case of data intensive applications because of the high latency page walk through heavily layered conventional page table. Hsieh et al. [23] have proposed an efficient method for the virtual to physical address translation in memory by avoiding the high-cost access of the memory management unit in the CPU as a part of their in-memory pointer chasing accelerator (IMPICA) implementation in memory. In this approach, they have used a fully dissociated page table from the CPU to eliminate the compatibility problem. They were able to develop a region-based page table for a particular region instead of mapping for the entire address space. The main advantage of this region-based mapping compared to conventional page table is that the hardware and storage cost were able to limit. Also, the unnecessary memory access could be avoided. By executing every function of the particular region, the sharing of data between the CPU and IMPICA is eliminated and thus coherence was also maintained. This method is not efficient for applications that uses large amount of virtual memory. A virtual block interface (VBI) framework has been introduced by Hajinazar et al. [24] in which all the virtual memory management tasks are controlled by
Barriers to the Widespread Adoption …
547
the memory controller hardware itself to reduce the latency overhead and other software complexities. Virtual Blocks are variable sized continuous locations in the VBI address space which are visible to all processes. Since the ID of each VB is unique and the VBI is globally visible, the VBI address indicates a particular unique piece of data. This mechanism helps the VBI to locate the data without the need of address translation, except in the case of all level cache misses. The address translation is performed by the memory translation layer (MTL) associated with the memory controller which avoids unnecessary page walks in the virtual machines. Because of the global visibility of the VBs, the processes are protected from unauthorized access through an OS controlled Client-VB Table (CVT). CVT stores the information about the set of VBs attached to a process and is checked by the processor on each memory access. Hence, unlike the conventional approaches, access to the translation structures for protection purposes are able to avoid and the performance overhead is reduced. Recently, some design frameworks like PiDRAM [11], SIMDRAM [25] etc. have been implemented to solve the aforementioned system integration challenges and analyze the trade-offs of PIM mechanisms.
2.5 Concurrent Data Structures Today’s multi core systems have been using concurrent data structures in high performance applications. So, for the wide-spread acceptance of PIM in these multi-core systems, the use of concurrent data structures has vital importance. The naive implementation of the pointer-chasing data structures (e.g., Linked lists, skip-lists) and contented data structures (e.g., FIFO queue) in a PIM environment is limited by several factors such as the memory access patterns and high contention. Since contented data structure like FIFO queue exhibits high locality and a high degree of contention, they are not a good choice for PIM environments in the case of concurrent accesses. Liu et al. [26] have proposed some novel designs for PIM data structures, using techniques such as combining, partitioning and pipelining, to outperform traditional concurrent data structures. For, PIM-managed linked lists, they have used combining where a combiner thread executes the requests from all other threads. A partitioning approach is used in skip-lists in which the list is partitioned based on disjoint key ranges and stored in different vaults so that the CPU can send the requests to appropriate vaults. For contented data structures like FIFO queue, they have executed requests from the CPU in a pipelined fashion. This approach demands extra storage for buffering the CPU requests. A brief comparison is given in Table 3. More efficient data structures like search trees, hash tables etc. are to be developed in future to support and work with PIM engines. There exists some other challenges also on which research is still going on. There is a lack of efficient real time PIM hardware or prototypes for better analysis and evaluation of research problems. Some security considerations are also introducing while integrating PIM with real time computations. Realising secure computations
548
B. Mohammed Siyad and R. Mohan
Table 3 PIM-managed concurrent data structures PIM-managed data Category Applied methodology structure Linked lists Skip-lists FIFO queue
Pointer-chasing data structures Pointer-chasing data structures Contended data structures
State-of-the-art techniques
Combining of requests Fine-grained locking Partitioning
Lock-free skip-list
Pipelined execution of Flat-combining FIFO requests queue, F&A (Fetch-and -Add)based FIFO queue
in PIM throw lights to more secure computing systems and applications. Since PIM targets high-performance applications, efficient power saving techniques with no compromise in performance and bandwidth is desirable [27]. All these challenges should be systematically addressed and evaluated in a consistent way that does not impose significant burden on the researchers and programmers so that the Processing in Memory concept to become popular in computing systems and applications.
3 Future Research Challenges and Opportunities Based on the analysis and evaluation of the existing PIM mechanisms in the literature, we have identified some more future research topics and opportunities which are listed below: • Most of the tools, techniques and other management policies are architecturespecific or application specific which could affect the reproducibility of the system. It could also lead to technology fragmentation and inconsistency in performance. In this regard, more tools and policies capable of exploring the generalizability of the technique should be developed. • Machine Learning techniques like reinforcement learning can be adopted for identifying and scheduling of the PIM offloading candidates. • PIM can be used in IoT devices which helps them to apply machine learning algorithms effectively on large data sets. • Frequent offloading of the tasks can cause overheating of the memory chip and result in complete shutdown [28]. More efficient and comprehensive resource management mechanisms are needed in this regard. • A hybrid approach of integrating DRAM and NVM can be developed to explore the capabilities of PIM.
Barriers to the Widespread Adoption …
549
4 Conclusion The degradation in performance, energy efficiency and reliability for the data intensive applications in the conventional computing architectures have thrown lights to the concept of Processing-in-Memory (PIM) in which the computations are performed where the data make sense. The advancement in 3D stacking technologies has made this decade-old concept more significant in the current big-data scenario. Performing computations close to or inside the memory gives rise to several challenges for the researchers and developers to survive. In this paper, we have reviewed the landscape of the PIM technology under various dimensions and addressed and examined those challenges. These challenges should be cautiously and systematically examined and solved in future since the wide spread adoption of PIM technology depends extremely on the efficient solutions to these challenges.
References 1. Ye L, Liu Y, Li H, Tan Z (2021) The challenges and emerging technologies for low-power artificial intelligence IoT systems. IEEE Trans Circ Syst 68(12):4821–4834 2. Talati N, Ben-Hur R, Wald N, Haj-Ali A, Reuben J, Kvatinsky S (2020) mMPU-a real processing-in-memory architecture to Combat the von Neumann Bottleneck. In: Suri M (ed) Applications of emerging memory technology. Springer series in advanced microelectronics, vol 63. Springer, Singapore(2020). https://doi.org/10.1007/978-981-13-8379-3_8 3. Khan K, Pasricha S, Gary Kim R (2020) A survey of resource management for processingin-memory and near-memory processing architectures. J Low Power Electron Appl. arXiv:2009.09603 4. Mutlu O, Ghose S, Gómez-Luna J, Ausavarungnirun R (2020) A modern primer on processing in memory. arXiv:2012.03112 5. Mehonic A, Sebastian A, Rajendran B, Simeone O, Vasilaki E, Kenyon A (2020) Memristorsfrom in-memory computing, deep learning acceleration, and spiking neural networks to the future of neuromorphic and bio-inspired computing. Adv Intell Syst 2(11):2000085 6. Angizi S, He Z, Fan D (2018) PIMA-logic: a novel processing-in-memory architecture for highly flexible and energy-efficient logic computation. In: 2018 55th ACM/ESDA/IEEE design automation conference (DAC). IEEE, San Francisco, CA, USA, pp 1–6. https://doi.org/10. 1109/DAC.2018.8465706 7. Loh G (2008) 3D-stacked memory architectures for multi-core processors. In: Proceedings of the 35th annual international symposium on computer architecture (ISCA ’08). IEEE Computer Society, USA, pp 453–464 8. Singh G, Mohammed A, Damla Senol C, Dionysios D, Mutlu O (2021) FPGA-based nearmemory acceleration of modern data-intensive applications. IEEE Micro 41(4):39–48 9. Seshadri V, Kim Y, Fallin C, Lee D (2013) RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization. In: Proceedings of the 46th annual IEEE/ACM international symposium on microarchitecture (MICRO-46). Association for Computing Machinery, New York, USA, pp 185–197 10. Seshadri V, Mullins T, Boroumand A, Lee D (2017) Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology. In: Proceedings of the 50th annual IEEE/ACM international symposium on microarchitecture (MICRO-50 ’17). Association for Computing Machinery, New York, USA, pp 273–287
550
B. Mohammed Siyad and R. Mohan
11. Olgun A, Luna J, Kanellopoulos K, Salami B (2021) PiDRAM: a holistic end-to-end FPGAbased framework for processing-in-DRAM. arXiv:2111.00082 12. Ahn J, Hong S, Yoo S, Mutlu O (2015) A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd annual international symposium on computer architecture (ISCA ’15). Association for Computing Machinery, New York, USA, pp 105–117 13. Cali S, Kalsi S, Subramanian L, Boroumand A (2020) GenASM: a high-performance, lowpower approximate string matching acceleration framework for genome sequence analysis. In: Proceedings of the 53rd annual IEEE/ACM international symposium on microarchitecture (MICRO). Athens, pp 951–966 14. Fernandez I, Quislant R, Giannoula C, Mutlu O (2020) NATSA: a near-data processing accelerator for time series analysis. In: Proceedings of the IEEE 38th international conference on computer design (ICCD). Hartford, pp 120–129 15. Kim J, Senol S, Xin D, Lee D (2018) GRIM-filter: fast seed location filtering in DNA read mapping using processing-in-memory technologies. BMC Gen 19(89) 16. Ahn J, Yoo S, Mutlu O, Choi K (2015) PIM-enabled instructions: a low-overhead, localityaware processing-in-memory architecture. In: Proceedings of the 42nd annual international symposium on computer architecture (ISCA ’15). Association for Computing Machinery, New York, NY, USA, pp 336–348 17. Boroumand A, Ghose S, Kim Y, Mutlu O (2018) Google workloads for consumer devices: mitigating data movement bottlenecks. In: Proceedings of the twenty-third international conference on architectural support for programming languages and operating systems (ASPLOS’18). ACM, New York, USA, pp 316–331 18. Hsieh K, Ebrahim E, Kim G, Chatterjee N (2016) Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems. In: Proceedings of the 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA). IEEE, Seoul, Korea, pp 204–216 19. Pattnaik A, Tang X, Jog A, Mutlu O (2016) Scheduling techniques for GPU architectures with processing-in-memory capabilities. In: Proceedings of the 2016 international conference on parallel architectures and compilation (PACT ’16). Association for Computing Machinery, New York, USA, pp 31–44 20. Boroumand A, Ghose S, Patel M, Zheng H (2017) LazyPIM: an efficient cache coherence mechanism for processing-in-memory. IEEE Comput Arch Lett 16(1):46–50 21. Boroumand A, Ghose S, Patel M, Hassan H (2017) LazyPIM: efficient support for cache coherence in processing-in-memory architectures. arXiv:1412.6980 22. Boroumand A, Ghose S, Patel M, Hassan H (2019) CoNDA: efficient cache coherence support for near-data accelerators. In: Proceedings of the 2019 ACM/IEEE 46th annual international symposium on computer architecture (ISCA). IEEE, Phoenix, USA, pp 629–642 23. Hsieh K, Khan S, Vijaykumar N, Chang K (2016) Accelerating pointer chasing in 3D-stacked memory: challenges, mechanisms, evaluation. In: Proceedings of the 2016 IEEE 34th international conference on computer design (ICCD). IEEE, Scottsdale, USA, pp 25–32 24. Hajinazar N, Patel P, Kanellopoulos K, Ausavarungnirun R, Mutlu O (2020) The virtual block interface: a flexible alternative to the conventional virtual memory framework. In: Proceedings of the ACM/IEEE 47th annual international symposium on computer architecture (ISCA ’20). IEEE Press, New York, USA, pp 1050–1063 25. Hajinazar N, Oliveira G, Gregorio S, Ghose S (2021) SIMDRAM: a framework for bit-serial SIMD processing using DRAM. In: Proceedings of the 26th ACM international conference on architectural support for programming languages and operating systems (ASPLOS 2021). ACM, New York, pp 329–345 26. Liu Z, Calciu I, Herlihy M, Mutlu O (2017) Concurrent data structures for near-memory computing. In: Proceedings of the 29th ACM symposium on parallelism in algorithms and architectures (SPAA ’17). ACM, New York, USA, pp 235–245 27. Larimi S, Salami B, Osman U, Mutlu O (2021) Understanding power consumption and reliability of high-bandwidth memory with voltage underscaling. In: Proceedings of the design, automation, and test in Europe conference (DATE)
Barriers to the Widespread Adoption …
551
28. Nai L, Hadidi R, Xiao H, Kim H (2018) CoolPIM: thermal-aware source throttling for efficient PIM instruction offloading. In: Proceedings of the 2018 IEEE international parallel and distributed processing symposium (IPDPS). IEEE, Canada, pp 680–689
A Deep Multi-scale Feature Fusion Approach for Early Recognition of Jute Diseases and Pests Rashidul Hasan Hridoy, Tanjina Yeasmin, and Md. Mahfuzullah
Abstract Diseases and pests of jute hinder the quality production of fiber which is a malignant threat to the jute industry, causing severe financial losses to cultivators. Early recognition of diseases and pests of jute plant is highly vital for preventing the spread of diseases and pests which will ensure the quality improvement of the jute industry. This paper addresses a robust hybrid model, namely JuteNet, is a multiscale feature fusion approach for early recognition of jute diseases and pests. First, a dataset of 56,108 images of jute leaves and stems is generated. Afterward, the fusion of extracted features from images by deep neural networks such as Xception, InceptionResNetV2, and InceptionV3 was used to develop JuteNet that obtained 99.47% accuracy in recognizing 2803 images of six classes of the testing set. Moreover, Xception, InceptionResNetV2, and InceptionV3 separately acquired 91.83, 96.11, and 98.86% of test accuracy, which validates the recognition efficiency of JuteNet. Keywords Computer vision · Deep neural networks · Transfer learning · Feature fusion · Jute · Leaf disease recognition
1 Introduction Jute (scientific name: Corchorus capsularis) is also called as the golden fiber which is widely cultivated in Bangladesh, India, China, Uzbekistan, and Nepal and provides affordable and eco-friendly fiber that has massive demand all over the world. In respect to production and global consumption, the fiber of jute is the second only to cotton, which is considered as a commercial, industrial, and economically crucial R. H. Hridoy (B) · T. Yeasmin · Md. Mahfuzullah Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh e-mail: [email protected] T. Yeasmin e-mail: [email protected] Md. Mahfuzullah e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_37
553
554
R. H. Hridoy et al.
sector of Bangladesh [1]. Jute is highly affected by several kinds of diseases and pests during cultivation, which decreases the quality of jute fiber significantly. Leaf mosaic virus, leaf curl, and stem rot are the most destructive diseases of jute. On the contrary, semilooper and hairy caterpillar are the commonly found pests of jute leaf [2]. Visual recognition technique of diseases and pests of jute through analyzing symptoms is a troublesome and time-consuming approach, but the accuracy of recognition does not meet the requirement due to lack of experience about diseases and pests. To ensure sustainable development of the jute industry, an automated approach is crucial for efficient and rapid recognition of jute diseases and pests at an initial stage. For gaining better yields and controlling diseases and pests, several technologies were been utilized in agriculture for many years. In recent years, convolutional neural networks (CNNs) brought crucial changes in computer vision and significantly improves the efficiency of image recognition approaches [3]. Several machine learning (ML) algorithms also were utilized in recognition approaches, where preprocessing of images and feature extraction were very time-consuming tasks. Deep neural networks can learn features directly from input images that save effort and time, and also achieve higher accuracy than ML algorithms [4, 5]. Several studies were conducted using CNNs and ML algorithms in the agricultural sector for identification of plant species, diseases, and pests. However, the accurate classification of plant diseases and pests is a more challenging task than generic image recognition for the complex background of images. Fusion of extracted features provides more accurate recognition performance by merging the extracted features acquired by numerous CNNs into a hybrid recognition model [6]. A dataset of 56,108 images of six classes of jute leaves and stems, namely JL dataset, is generated in this study using twelve image augmentation approaches from 4316 collected images from several cultivations fields. The JL dataset includes 44,889 training, 8416 validation, and 2803 testing images. This research work aims to reveal an efficient recognition approach for early recognition of jute diseases and pests using the fusion of multi-scale features extracted by CNNs. In the introduced JuteNet framework, pretrained CNNs such as Xception, InceptionResNetV2, and InceptionV3 were utilized using the transfer learning technique for the extraction of features from jute leaf and stem images. All extracted features by CNNs were fed into a concatenation layer that concatenates all inputs into a single tensor. Lastly, the fusion of extracted features was fine-tuned in this study using three batch normalization (BM) layers, three dropout layers, and three fully connected (FC) layers, where the last FC layer is connected to the Softmax activation function with six neurons for classifying six classes of the JL dataset. Moreover, for comparing the recognition efficiency of the proposed hybrid model with single models, Xception, InceptionResNetV2, and InceptionV3 were also trained separately in this study. Several experimental studies were conducted for evaluating the recognition efficiency of JuteNet and single models using 2803 images of the testing set. According to experimental results, single state-of-the-art CNNs such as Xception, InceptionResNetV2, and InceptionV3 wrongly recognized 229, 109, and 32 images of the testing of the JL dataset, where the proposed JuteNet model misclassified 15 images which demonstrate that the addressed hybrid model can classify diseases and pests of jute
A Deep Multi-scale Feature Fusion Approach …
555
more efficiently than single CNNs. However, JuteNet acquired 99.47% accuracy, and pretrained CNNs include Xception, InceptionResNetV2, and InceptionV3 acquired 91.83%, 96.11%, and 98.86% accuracy, respectively, on the testing set of the JL dataset, and the major contributions of this research work are as follows: • Till now no suitable dataset of jute leaf and stem is available, and an enhanced dataset of 56,108 images is developed for the purpose of this study. • An effective and robust hybrid model is introduced for early recognition of diseases and pests of jute using the fusion of multi-scale features. • The recognition performance of single pretrained CNNs and the proposed hybrid model was evaluated using 2803 images of the testing set of the JL dataset. The rest of this paper is formed as follows: Sect. 2 represents the literature review. Section 3 describes the JL dataset and the proposed framework in detail. Experimental studies are described in Sect. 4. The results obtained and their interpretations are demonstrated in Sect. 5. Finally, the conclusion and future work are given in Sect. 6.
2 Related Work A remarkable number of approaches were conducted in the literature for identifying diseases and pests of several plants using CNNs and ML algorithms. Hridoy et al. addressed a computer vision method for recognizing yellow mosaic disease of the black gram using CNN which obtained 97.11% accuracy, and the performance of the addressed CNN architecture was compared with the recognition performance of three pretrained CNN models such as AlexNet, VGG16, and InceptionV3, acquired 93.78%, 95.49%, and 96.67% accuracy, respectively [3]. Hasan et al. addressed a recognition method by utilizing CNN to classify jute diseases and acquired 96.00% accuracy [4]. The classification performance of the addressed CNN architecture was compared with the performance of support vector machine (SVM), k-nearest neighbors (kNN), and random forest (RF) classifier, and these three classifiers obtained 89.00%, 86.00%, and 80.00% accuracy, respectively. Reza et al. addressed a detection method for classifying stem-based diseases of jute by utilizing the multiclassSVM (M-SVM) classifier, obtained 86.00% accuracy with hue-based segmentation and 60.00% accuracy without hue-based segmentation [5]. For removing noise from images, the bilateral filter approach was utilized and thirteen features were extracted from images using the color co-occurrence technique for performing texture analysis. After analyzing the recognition performance of nine classifiers, Habib et al. addressed a recognition method for classifying jackfruit diseases by utilizing the RF classifier which obtained 89.59% accuracy [7]. For segmenting images, k-means clustering was utilized, and ten features were extracted from images that were fed to classifiers. Sholihati et al. addressed a classification approach using VGG16 for classifying diseases of potato leaf after analyzing the classification performance of VGG16 and VGG19 model, acquired 91.31% and 90.96% accuracy, respectively, and VGG16 consumed less training time than VGG19 [8]. For classifying diseases
556
R. H. Hridoy et al.
and pests of paddy leaf, Senan et al. addressed an automated classification method using CNN that acquired 90.30% accuracy with 50:50 training and testing image ratio and 96.60% accuracy with 70:30 training and testing image ratio [9]. However, multilayer perceptron (MLP), SVM, and artificial neural network (ANN) acquired 81.12, 81.45, and 82.60% accuracy with a 70:30 training and testing image ratio. Bhowmik et al. introduced a detection approach using CNN that contains three layers for classifying diseases of tea leaves and acquired 95.94% accuracy after 22 epochs of training [10]. Both classification performance and training time of the proposed CNN architecture were increased as the number of epochs grew during the training phase. Hridoy et al. addressed a recognition approach using CNN that was built with depthwise separable convolutions to classify diseases of betel leaf at an initial stage and acquired 96.02% accuracy [11]. The performance of three activation functions such as rectified linear unit (ReLU), scaled exponential linear unit (SELU), and Swish was analyzed, and Swish performed superior than others. Hussain et al. introduced a recognition approach for classifying diseases of cucumber leaf using the fusion of deep features with the whale optimization algorithm which was used for selecting the best features and obtained 96.50% accuracy with 45.28 s computational time [12]. VGG16 and InceptionV3 were used for extracting features from images, and the best features were classified using algorithms of supervised learning. Trang et al. addressed an identification approach for mango diseases using CNN that acquired 88.46% accuracy where three pretrained models such as InceptionV3, AlexNet, and MobileNetV2 obtained 78.48, 76.92, and 84.62% accuracy [13]. For enhancing the quality of images, rescaling and center alignment were used, and the golden section search technique was utilized to enhance the contrast. Fenu et al. introduced a multioutput learning approach for diagnosing diseases of plant and severity of stress, and five pretrained CNN models such as VGG16, VGG19, ResNet50, InceptionV3, MobileNetV2, and EfficientNetB0 were utilized [14]. Inception V3 performed better than others in diagnosing biotic stress that obtained 90.68% accuracy, where EfficientNetB0 performed better than others in diagnosing severity which acquired 78.31% accuracy. The training time of CNNs was also analyzed, and EfficientNetB0 consumed less training time than others. Zaki et al. addressed an automated approach for classifying diseases of tomato leaves by utilizing MobileNetV2 and acquired 95.94% accuracy with a batch size of 16 [15]. The performance of five optimization methods and three learning rates was analyzed where the Adagrad optimization method and the learning rate of 0.0001 performed better than others. Besides disease and pest recognition of plants, CNNs and ML algorithms were also used in several research works [16–18]. According to abovementioned research works, CNNs obtained remarkable results in recognizing plant disease and pests, and ML algorithms obtained less accuracy than CNNs. However, hybrid model is rarely utilized for identifying diseases and pests of plant. Hence, a proficient framework based on the fusion of multi-scale features is addressed in this research to recognize initial infection of jute diseases and pests.
A Deep Multi-scale Feature Fusion Approach …
557
3 Materials and Methods 3.1 JL Dataset A significant number of days was devoted to acquire 4316 images of jute leaves and stems, and these images were acquired with complex backgrounds after five days of initial infections. Utilizing position augmentation techniques include cropping, rotation, and flipping, and color augmentation techniques include brightness, contrast, saturation, and hue, and JL dataset of 56,108 images is generated from 4316 collected images. The JL dataset summary is given in Table 1, and Fig. 1 represents samples of six different classes. Table 1 The JL dataset summary Class
Training images
Validation images
Testing images
Total images
Healthy leaf
14,508
2721
906
18,135
Leaf mosaic virus
7094
1329
443
8866
Leaf curl
5096
956
318
6370
Stem rot
4430
832
276
5538
Semilooper
7791
1459
487
9737
Hairy caterpillar
5970
1119
373
7462
Fig. 1 Instances of JL dataset: (1) healthy leaf, (2) leaf mosaic virus, (3) leaf curl, (4) stem rot, (5) semilooper, and (6) hairy caterpillar
558
R. H. Hridoy et al.
Fig. 2 Samples of image enhancement approaches: (1) high brightness, (2) low brightness, (3) high contrast, (4) low contrast, (5) high saturation, (6) low saturation, (7) cropping, (8) rotation 90°, (9) rotation 180°, (10) rotation 270°, (11) vertical flip, and (12) horizontal flip
All images of the JL dataset were split into 3 sets randomly which include training, validation, and testing set. Twelve image enhancement approaches were utilized in this study, which are presented in Fig. 2. Image enhancement approaches reduce the overfitting issue significantly and help CNNs in enhancing the anti-interference ability during the training phase [3].
3.2 JuteNet State-of-the-art CNNs are now widely utilized in the field of computer vision for recognizing images with higher accuracy than classical approaches. Pretrained CNN models, including Xception, InceptionResNetV2, and InceptionV3, were utilized in this study for constructing the framework of JuteNet using the transfer learning approach. The approach of multi-scale feature fusion was used in JuteNet where Xception, InceptionResNetV2, and InceptionV3 were used for extracting features from images of the JL dataset. Xception is a 71-layer deep pretrained CNN model and builds with layers of depthwise separable convolution, and 299 × 299 pixels images are required to train
A Deep Multi-scale Feature Fusion Approach …
559
this architecture [19]. The network of Xception consists of three flows which include entry, middle, and exit, where data enters into the network via entry flow, then enters middle flow that is repeated eight times, and exits via exit flow. InceptionResNetV2 was developed with modules of hybrid Inception-ResNet which contains 164 layers, and the input size of the image of this architecture is 299 × 299 pixels [20]. The reduction module was utilized in all hybrid Inception-ResNet modules for reducing the presentation dimension. InceptionV3 is a 48-layer deep pretrained CNN model, and 299 × 299 pixels images are required to train this architecture, which is built with symmetric and asymmetric building blocks such as convolutions, max pooling, average pooling, dropouts, CL, and FC layers [21]. BM layers are widely applied in this architecture, and for reducing the parameters and connection numbers without decreasing the recognition ability of the model, factorization is utilized in this model. In this study, pretrained CNN models separately extracted features from the images of jute leaf and stem, and used the global average pooling (GAP) 2D layer to flatten the output of functional CNN models into a vector that calculates the mean of input channels. Afterward, a concatenate layer (CL) was used for combining individual vectors of CNNs to generate a single vector. The framework of JuteNet is graphically represented in Fig. 3. After CL, three blocks were used in JuteNet framework for fine-tuning the output of CL to recognize six classes of the JL dataset, and each block contains one dropout, one BM, and one FC layer. For overcoming the overfitting issue during the training phase, three dropout layers were used in the JuteNet framework. BM layers were used for making the JuteNet framework faster and stable. The FC layer of the last block was connected with the Softmax activation function with six neurons.
Fig. 3 The schematic representation of the JuteNet framework
560
R. H. Hridoy et al.
4 Experiments The JuteNet framework was trained via the transfer learning strategy in this research with GPU support of Google Colab. The transfer learning approach assists networks in decreasing training time, also enhancing the baseline performance of the framework. 44,889 images of the training set and 8416 images of the validation set were used in this study to train and fit the introduced JuteNet framework. Randomly chosen 2803 images of the testing set were used for examining the efficiency of JuteNet. During the training stage of JuteNet, Adam was used as an optimization method with a learning rate of 0.0001. Categorical cross-entropy was used as a loss function, batch size was set to 32, and the epoch’s number was set to 50 during the training of the JuteNet framework. During the training phase of the JuteNet network, no major fluctuations were seen in curves of training and validation accuracy and loss. Besides the JuteNet framework, single CNNs include Xception, InceptionResNetV2, and InceptionV3 were trained separately using the JL dataset in this research for comparing the efficiency of the individual CNNs with the fusion of multi-scale approach. To examine the efficiency of the JuteNet framework and single models used in this research, statistical parameters including sensitivity (sen), specificity (spe), accuracy (acc), and precision (pre) were utilized which were computed using true positive (TP), true negative (TN), false positive (FP), and false negative (FN), and statistical parameters are given below in between Eq. 1 and 4, where ju represents the number of classes of the testing set of the JL dataset. Here, TP represents the summation of correctly recognized images in each class of the testing set, TN presents the summation of correctly recognized images except the relevant class, FP presents the summation of falsely predicated images in other classes except the relevant class, and FN presents the summation of falsely recognized images in the relevant class [3]. For a class ju, Pre( ju) =
TP( ju) TP( ju) + FP( ju)
(1)
Rec( ju) =
TP( ju) TP( ju) + FN( ju)
(2)
Spe( ju) =
TN( ju) TN( ju) + FP( ju)
(3)
TP( ju) + TN( ju) TP( ju) + TN( ju) + FP( ju) + FN( ju)
(4)
Acc( ju) =
A Deep Multi-scale Feature Fusion Approach …
561
5 Result and Discussion This study is conducted for developing an efficient recognition approach for initial identification of diseases and pests of jute, where the fusion of multi-scale features was used to develop a hybrid robust model, namely JuteNet, that acquired 99.47% test accuracy. On the contrary, single CNNs include Xception, InceptionResNetV2, and InceptionV3 individually acquired 91.83%, 96.11%, and 98.86% accuracy, respectively, in recognizing 2803 images of the test set which contains six classes. Four normalized confusion matrices which were obtained after multiclass classification of JuteNet and three other CNNs were given in Fig. 4, where the normalized confusion matrix of JuteNet framework represented its significant recognition efficiency.
Fig. 4 Normalized confusion matrix: (1) Xception, (2) InceptionResNetV2, (3) InceptionV3, (4) JuteNet, and (C1) healthy leaf, (C2) leaf mosaic virus, (C3) leaf curl, (C4) stem rot, (C5) semilooper, (C6) hairy caterpillar
562
R. H. Hridoy et al.
To illustrate the classification efficiency of single CNNs and JuteNet more clearly, class-wise recognition performance was evaluated using statistical parameters. Xception acquired the highest sensitivity and precision value in the healthy leaf class, 94.84%, and 97.46%, respectively. In the leaf curl class, it obtained higher specificity and accuracy values than other classes, 99.31% and 97.72%, respectively. The class-wise recognition efficiency of Xception is presented in Table 2. InceptionResNetV2 acquired higher sensitivity value in the healthy leaf class than other classes, 98.12%, and obtained higher specificity in the stem rot class than other classes, 99.48%. Moreover, it obtained higher accuracy in the hairy caterpillar class than other classes, 98.93%, and the precision value of the healthy leaf class is higher than others, 98.12%. Class-wise recognition performance of InceptionResNetV2, and InceptionV3 are given in Tables 3, and 4, respectively. Table 2 Class-wise recognition performance of Xception Class
Sen (%)
Healthy leaf
94.84
Leaf mosaic virus
90.00
Leaf curl
86.49
Stem rot
89.92
Semilooper Hairy caterpillar
Spe (%)
Acc (%)
Pre (%)
98.77
97.47
97.46
98.76
97.32
93.45
99.31
97.72
94.65
98.27
97.50
84.06
93.17
97.28
96.61
86.86
91.19
97.88
97.04
86.06
Table 3 Class-wise recognition performance of InceptionResNetV2 Class
Sen (%)
Spe (%)
Acc (%)
Pre (%)
Healthy leaf
98.12
99.10
98.79
98.12
Leaf mosaic virus
95.91
99.11
98.61
95.26
Leaf curl
95.42
98.96
98.57
91.82
Stem rot
92.61
99.48
98.79
91.82
Semilooper
95.33
99.22
98.54
96.30
Hairy caterpillar
95.73
99.42
98.93
96.25
Table 4 Class-wise recognition performance of InceptionV3 Class
Sen (%)
Spe (%)
Acc (%)
Pre (%)
Healthy leaf
99.78
99.84%
99.82%
99.67%
Leaf mosaic virus
98.65%
99.79%
99.61%
98.87%
Leaf curl
98.42%
99.76%
99.61%
98.11%
Stem rot
97.82%
99.72%
99.54%
97.46%
Semilooper
98.17%
99.78%
99.50%
98.97%
Hairy caterpillar
98.92%
99.75%
99.64%
98.39%
A Deep Multi-scale Feature Fusion Approach …
563
According to class-wise recognition performance, InceptionV3 and JuteNet performed better in the healthy leaf class compared with other classes. In the healthy leaf class, InceptionV3 acquired 99.78% sensitivity, 99.84% specificity, 99.82% accuracy, and 99.67% precision value. Class-wise recognition performance of JuteNet was better than three single CNNs, and it obtained 100.00% sensitivity value in the healthy class. JuteNet acquired 99.95% specificity, 99.96% accuracy, and 99.89% precision value in the healthy leaf class. However, the class-wise recognition performance of InceptinV3 and JuteNet were very close in six classes. The recognition performance of JuteNet for each class validates the significant recognition ability of this hybrid model. Among single CNNs, InceptionV3 performed better than Xception and InceptionResNetV2 in class-wise recognition evaluation. Class-wise recognition performance of JuteNet are given in Table 5. Moreover, Xception misclassified 64 images of the semilooper class, which was the highest misclassification number of this study. On the contrary, InceptionResNetV2 wrongly predicted 26 images of the leaf curl class, which was the highest misclassification number among six classes. Inception V3 wrongly classified 3 images of the healthy leaf class, which was the lowest misclassification number among six classes. The proposed JuteNet wrongly predicted 1 image of the healthy leaf class that was the lowest false prediction number. In terms of the misclassification number of each class, JuteNet performed remarkably better than single CNNs. Class-wise misclassification of four models is given in Table 6. The recognition time of JuteNet and single CNNs was also evaluated in this study using eighteen images of six classes. JuteNet recognized all images accurately within Table 5 Class-wise recognition performance of JuteNet Class
Sen (%)
Spe (%)
Acc (%)
Pre (%)
Healthy leaf
100.00
99.95
99.96
99.89
Leaf mosaic virus
99.10
99.87
99.75
99.32
Leaf curl
99.06
99.92
99.82
99.37
Stem rot
99.27
99.84
99.79
98.55
Semilooper
99.18
99.91
99.79
99.59
Hairy caterpillar
99.46
99.88
99.82
99.20
Table 6 Number of class-wise wrongly classified images of models Class
Xception
InceptionResNetV2
InceptionV3
Healthy leaf
23
17
3
JuteNet 1
Leaf mosaic virus
29
21
5
3
Leaf curl
17
26
6
2
Stem rot
44
13
7
4
Semilooper
64
18
5
2
Hairy caterpillar
52
14
6
3
564
R. H. Hridoy et al.
Fig. 5 The curve of training and validation accuracy of JuteNet
11.03 s. Xception recognized sixteen images correctly within 14.68 s where InceptionResNetV2 and InceptionV3 misclassified one image and took 12.94 and 13.01 s. Several evaluation experiments were introduced using 2803 testing images, where JuteNet significantly performed better than single CNNs, which strongly ensured the recognition efficiency of the fusion of multi-scale features (FMF) approach. The accuracy and loss curve of both training and validation of JuteNet with the training and validation set of the JL dataset are presented in Figs. 5 and 6. The recognition efficiency of the proposed JuteNet framework was evaluated by comparing it with other approaches which had been conducted previously to identify different crop diseases and pests using several computer vision approaches and is given in Table 7.
6 Conclusion Diseases and pests of jute cause enormous financial losses to cultivators and the jute industry, and are also a major threat to the sustainable development of this industry. A robust hybrid recognition model, namely JuteNet, is addressed to recognize jute diseases and pests during the initial stage in this paper. In the JuteNet framework, the fusion of multi-scale features was used, where Xception, InceptionResNetV2,
A Deep Multi-scale Feature Fusion Approach …
565
Fig. 6 The curve of training and validation loss of JuteNet Table 7 Comparison of several crops disease and pest recognition approaches Study
Method
Number of images
Number of classes
Accuracy (%)
Hridoy et al. [3]
CNN
16,980
3
97.11
Hasan et al. [4]
CNN
600
3
96.00
Reza et al. [5]
M-SVM
Not Mentioned
5
86.00
Habib et al. [7]
RF
480
5
89.59
Sholihati et al. [8]
VGG16
5100
5
91.31
Senan et al. [9]
CNN
3355
4
96.60
Bhowmik et al. [10]
CNN
2341
3
95.94
Hridoy et al. [11]
CNN
10,662
4
96.02
Hussain et al. [12]
FMF
2500
5
96.50
Trang et al. [13]
CNN
394
4
88.46
Fenu et al. [14]
InceptionV3
3057
4
90.68
Zaki et al. [15]
MobileNetV2
4671
4
95.94
Proposed
FMF
56,108
6
99.47
566
R. H. Hridoy et al.
and InceptionV3 were utilized for extracting features from images of jute leaf and stem. A dataset of 56,108 images was generated in this study which contains six classes, and several evaluation experiments were conducted using 2803 images of the test set. The proposed JuteNet framework obtained 99.47% accuracy, where Xception, InceptionResNetV2, and InceptionV3 were trained individually with the JL dataset, acquired 91.83%, 96.11%, and 98.86% accuracy, respectively. These four models were trained via the transfer learning strategy for the purpose of this study. On the contrary, the class-wise recognition performance of JuteNet and single CNNs was evaluated using statistical parameters, and JuteNet performed better than single CNNs. Moreover, JuteNet misclassified 1 image of the healthy leaf class where Xception, InceptionResNetV2, and InceptionV3 wrongly classified 23, 17, and 3 images, respectively. Among single CNNs, the class-wise recognition performance of InceptionV3 was better. The recognition time of JuteNet was also less than single CNNs, obtained 100.00% accuracy in recognizing eighteen images within 11.03 s. The outcome of several evaluation studies represented the significant recognition efficiency of the fusion of multi-scale features more perceptibly. However, JuteNet can only recognize common diseases and pests of jute. In future work, we plan to enhance the JL dataset by including images of other crops and extend the number of images and classes to develop a more robust recognition approach for several crop diseases.
References 1. Islam MM (2019) Advanced production technology and processing of jute. In: Hasanuzzaman M (eds) Agronomic crops. Springer, Singapore 2. Selvaraj K, Gotyal BS, Gawande SP, Satpathy S, Sarkar SK (2016) Arthropod biodiversity on jute and allied fibre crops. In: Chakravarthy A, Sridhara S (eds) Economic and ecological significance of arthropods in diversified ecosystems. Springer, Singapore 3. Hridoy RH, Rakshit A (2022) BGCNN: a computer vision approach to recognize of yellow mosaic disease for black gram. In: Smys S, Bestak R, Palanisamy R, Kotuliak I (eds) Computer networks and ınventive communication technologies. Lecture notes on data engineering and communications technologies, vol 75. Springer, Singapore 4. Hasan MZ, Ahamed MS, Rakshit A, Hasan KMZ (2019) Recognition of jute diseases by leaf ımage classification using convolutional neural network. In: 2019 10th International conference on computing, communication and networking technologies (ICCCNT), pp 1–5 5. Reza ZN, Nuzhat F, Mahsa NA, Ali MH (2016) Detecting jute plant disease using image processing and machine learning. In: 3rd International conference on electrical engineering and ınformation communication technology (ICEEICT), pp 1–6 6. Asnaoui KE (2021) Design ensemble deep learning model for pneumonia disease classification. Int J Multimed Info Retr 10:55–68 7. National Habib MT, Mia J, Uddin MS, Ahmed F (2020) An ın-depth exploration of automated jackfruit disease recognition. J King Saud Univ – Comput Inf Sci 8. Sholihati RA, Sulistijono IA, Risnumawan A, Kusumawati E (2020) Potato leaf disease classification using deep learning approach. In: 2020 International electronics symposium (IES), pp 392–397
A Deep Multi-scale Feature Fusion Approach …
567
9. Senan N, Aamir M, Ibrahim R, Taujuddin NSAM, Muda WHNW (2020) An efficient convolutional neural network for paddy leaf disease and pest classification. Int J Adv Comput Sci Appl (IJACSA) 11(7) 10. Bhowmik S, Talukdar AK, Sarma KK (2020) Detection of disease in tea leaves using convolution neural network. In: 2020 Advanced communication technologies and signal processing (ACTS), pp 1–6 11. Hridoy RH, Habib T, Jabiullah I, Rahman R, Ahmed F (2021) Early recognition of betel leaf disease using deep learning with depth-wise separable convolutions. In: 2021 IEEE region 10 symposium (TENSYMP), pp 1–7 12. Hussain N, Khan MA, Tariq U, Kadry S, Yar MAE, Mostafa AM, Alnuaim AA, Ahmad S (2022) Multiclass cucumber leaf diseases recognition using best feature selection. CMCComput Mater Continua 70(2):3281–3294 13. Trang K, TonThat L, Thao NGM, Thi NTT (2019) Mango diseases ıdentification by a deep residual network with contrast enhancement and transfer learning. In: IEEE Conference on sustainable utilization and development in engineering and technologies (CSUDET), pp 138– 142 14. Fenu G, Malloci FM (2021) Using multioutput learning to diagnose plant disease and stress severity. In: Complexity, vol 2021 15. Zaki SZM, Zulkifley MA, Stofa MM, Kamari NAM, Mohamed NA (2020) Classification of tomato leaf diseases using MobileNet V2. IAES Int J Artif Intell (IJ-AI) 2(2):290–296 16. Vijayakumar T (2019) Comparative study of capsule neural network in various applications. J Artif Intell 1(01):19–27 17. Ranganathan G (2021) A study to find facts behind preprocessing on deep learning algorithms. J Innov Image Process (JIIP) 3(01):66–74 18. Jacob IJ (2019) Capsule network based biometric recognition system. J Artif Intell 1(02):83–94 19. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1800–1807 20. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the thirty-first AAAI conference on artificial ıntelligence. AAAI Press, pp 4278–4284 21. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the ınception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826
Assessment of Cardiovascular System Through Cardiovascular Autonomic Reflex Test E. S. Selva Priya, L. Suganthi, R. Anandha Praba, and R. Jeyashree
Abstract One of the cardiovascular autonomic reflex tests (CART) used to diagnose autonomic dysfunction is the head-up tilt (HUT). Healthy participants had their heart rate variability (HRV) and arterial blood pressure (ABP) measured during a head-up tilt test to better understand the physiological events elicited during the four phases of rapid tilt activity: supine, rapid tilt up (RTUP), at rapid tilt (ATRT), and rapid tilt down (RTDOWN). We analysed the ECG, ABP, and angle data during rapid tilt operation of 10 normal healthy subjects aged (27–32) years using Physionet data. Each data set lasts for one hour. The signals were sampled at a frequency of 250 Hz. We calculated HRV and ABP morphological parameters during HUT. Certain parameters, such as the mean of the standard deviation of average to normal intervals (SDNN), root mean square of successive differences of NN (RMSSD), number of interval differences of successive NN intervals greater than 50 ms (NN50), systolic pressure(SP), pulse pressure (PP), baroreflex sensitivity index (BRS), –HF, and –LF, decrease during ATRT. According to this report, during a head-up tilt test, blood pressure drops in the upper body parts but rises in the lower limbs due to gravitational pooling of blood in the lower limbs. As a result, baroreflex control occurs, which constricts blood vessels and raises HR in order to restore BP. Keywords Baroreflex sensitivity (BRS) · Cardiovascular autonomic reflex test (CART) · Head-up tilt test (HUT) · Heart rate variability (HRV) · Arterial blood pressure (ABP) · Rapid tilt down (RTDOWN) · Rapid tilt up (RTUP) · Heart rate (HR)
E. S. Selva Priya (B) · L. Suganthi · R. Anandha Praba BioMedical Engineering, SSN College of Engineering, OMR Road, Kalavakkam, Chennai, Tamilnadu 603110, India e-mail: [email protected] L. Suganthi e-mail: [email protected] R. Jeyashree Electronics and Communication Engineering, Meenakshi College of Engineering, Chennai-78, Tamilnadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_38
569
570
E. S. Selva Priya et al.
1 Introduction The autonomic nervous system (ANS) plays vital role in regulating body function such as metabolism, heart rate, and blood pressure. The denervation of system leads to several disorders [1, 2]. The ANS is the part of peripheral nervous system which branched into sympathetic nervous system (SNS) and parasympathetic nervous system (PSNS) both working in antagonistic manner provides adaptation to cardiovascular system during various activities [3]. Both sympathetic and parasympathetic activity alters the electrical features of the heart. The SNS raises heart rate, myocardial contractility, and peripheral resistance while the PSNS decreases heart rate with minimal contractility effects [1, 3]. In humans, autonomic function is modulated via the baroreflex control system which regulates arterial blood pressure. It is the negative feedback mechanism that incorporates baroreceptors located in the aortic arch and carotid sinus. The baroreceptors placed in the carotid sinus detect a reduction in blood pressure, which causes sympathetic activation and parasympathetic withdrawal. As a result, heart rate will increase. The receptors in the aortic arch experience a rise in pressure, which induces a drop in heart rate in most people [4–7]. Disorders of autonomic function paved the way for enhancing pathophysiological symptoms of cardiovascular diseases [1]. Orthostatic hypotension is a symptom of autonomic dysfunction [8, 9]. Orthostatic hypotension is defined as the reduction in systolic blood pressure more than 20 mmHg or reduction in diastolic blood pressure more than 10 mmHg during postural change from supine to standing [8, 10]. The disorder of sympathetic response to postural change leads to orthostatic hypotension. It is one among the risk factors for developing CAN [9]. The cardiovascular autonomic reflex test is the standard protocol for assessing autonomic function. The CART protocol head-up tilt (HUT) test seems to be a more appropriate test in diagnosing orthostatic hypotension [11]. HUT is often used to evaluate patient’s ability to regulate blood pressure in particular for patients suffering from vasovagal syncope, light-headedness or dizziness, diabetes mellitus, and autism [4, 12, 13]. During this procedure, non-invasive signals such as continuous beat-to-beat measurements of ABP and ECG are measured. Initially, the patient was placed on a tilt table in the supine position. After specific durations, the table is tilted to a certain angle. Gravity causes blood to pool in the lower extremities when tilted, lowering cardiac output [5]. The change of cardiac output leads to a drop in blood pressure in the upper body, while blood pressure in the lower body is increased. Therefore, baroreflex regulation occurs to bring back blood pressure by constricting blood vessels thereby increasing heart rate [14]. Heart rate variability and blood pressure variability analysis are the techniques to assess autonomic function. Heart rate variability analysis (HRV) involves noninvasive assessment of cardiovascular control mechanisms. It is measured by variation in beat to beat interval (R-R interval) [15–17]. The frequency power calculation can be done by estimating the power spectral performance [18]. Arterial blood pressure (ABP) signal contains clinical information about the cardiovascular system
Assessment of Cardiovascular System Through …
571
including systolic pressure, diastolic pressure, and dicrotic notch. This clinical information is widely used to assess properties of the arterial vessel wall. The contraction and relaxation of heart cardiac activity correspond to blood pressure pulse and duration of blood pressure waveform is split into systolic phase and diastolic phase. The systolic phase denotes contraction and diastolic phase denotes relaxation. The fiducial points of arterial blood pressure waveform are detected based on delineator proposed in reference [19]. The second derivative of arterial blood pressure (SDABP) provides exact representation of morphological features of original waveform. The indices of SDABP provide additional characteristic parameters for further analysis [19, 20]. Baroreflex sensitivity index obtained from systolic pressure and R-R interval provides deep insight into the cardiovascular regulation [21, 22]. The robust and sophisticated real-time output with more clarity obtained, and it was anticipated by the clinical users [23]. This paper examines the use of the wavelet transform to pre-process ECG signals, as well as a systematic study of HRV test parameters in the time and frequency domains. This study also examines the variability in morphological BP parameters and baroreflex sensitivity by measuring the baroreflex sensitivity index during different stages of the head-up tilt test.
2 Materials and Methods 2.1 Database Protocol The sympathetic and parasympathetic modulation can be determined by analysing the ECG and ABP during HUT [11, 24, 25]. Normal subjects’ ECG and ABP data were taken from Physionet, which has a wide database of reported physiological signals. Physiological response to a change in posture [5] includes ECG, ABP, and the corresponding angle of tilt of ten healthy subjects (five male and five female) with ages ranging from 27 to 32 years. Each trial will last for one hour. The signals were sampled at a frequency of 250 Hz. The physiological reaction to a shift in posture protocol that was used in the database is described below. 1. 2. 3. 4.
ECG and ABP waveforms were captured continuously during the experiment. After instrumentation, the subjects on the tilt table go through a series of six posture shifts. There were two stand-ups and two rapid HUTs (75 HUT in 2 s). Two slow HUTs (75 HUT over 50 s) were performed for three minutes. As shown in Fig. 1, analysis is performed for supine and rapid tilt operation over a period of 6 min.
The head-up tilt (HUT) test is shown in Fig. 2 which seems to be a more appropriate test in diagnosing orthostatic hypotension. During this procedure, non-invasive
572
E. S. Selva Priya et al.
Fig. 1 ECG, ABP and angle during tilt activity
Fig. 2 Head-up tilt (HUT) set up
signals are measured including continuous beat-to-beat recordings of arterial blood pressure and heart rate. Initially, the patient was placed on a tilt table in the supine position. After specific durations, the table is tilted to a certain angle. During this activity, pressure and heart rate are acquired. Most people’s aortic arch receptors detect an increase in pressure, which produces a reduction in heart rate or vice versa.
2.2 Pre-Processing To remove unwanted noise such as baseline wandering and motion artefact, the preprocessing phase is required. These noises are caused by electrode movement, patient
Assessment of Cardiovascular System Through …
573
movement, and excessive electrode interaction during the recording of biological signals. In the ECG signal, the Fourier transform is used to determine the frequency spectrum of baseline wandering and motion artefact. The equation for computing FFT is shown in Eq. (1). X (k) = (k = 0, 1, . . . , N − 1)
(1)
Wavelet-based denoising is used to eliminate artefacts in ECG signals in the frequency range of 0.5 Hz [26], and a third order low-pass Bessel filter is used in the case of ABP. Filter banks are used in the discrete wavelet transform to minimise computation time.
2.3 HRV Analysis According to the protocol shown in Fig. 3, HRV is measured in ten subjects during four phases of rapid tilt operation. Supine −120 s, RTUP −60 s, ATRT −120 s, and RTDOWN −60 s are the signal durations analysed for each step. Using the Kubios Fig. 3 HRV analysis protocol
574
E. S. Selva Priya et al.
Fig. 4 a Raw ECG signal, b detection of R peak in ECG signal
HRV method, HRV analysis was performed for four phases during rapid tilt operation. Figure 4a shows the raw ECG signal. The ECG signal has a frequency of 0.05–150 Hz and an amplitude of 0.1–5 V. The amplitude of the QRS complex in a typical subject is 1 mv, and the length is 0.6–0.12 ms. For normal people, the R-R interval lasts 0.6–1.2 s. Peak detection is done using thresholding after pre-processing. The following are the steps involved in R peak detection. • Consider a signal sample x(n). • To detect a peak, x(n) must satisfy the following conditions: x(n − 1) > x (n + 1) & also x (n) > TH, where TH = 0.5*max (x). • Save the R peak’s time position. The R peak detection is shown in Fig. 4b. Following the discovery of R peaks, the R-R interval is calculated. The R-R interval is the time interval between the peaks of two QRS complexes. The R-R interval is defined as the difference in time instant between R peaks. The time domain and frequency domain indices of HRV are calculated using this interval in the Kubios HRV tool. Matrix Laboratory (MATLAB) is used to calculate the R-R interval in four phases of rapid tilt operation. For time domain analysis, the following features were extracted from the R-R interval: (1) SDNN (2) RMSSD (3) NN50 (4) pNN50- the proportion obtained by dividing the total number of NN intervals by NN50. (5) Mean R-R interval 6. Mean HR. HR is expressed as a function of frequency rather than time in spectral analysis. Spectral analysis of R-R intervals reveals three distinct bands: Low frequency (LF)—(0.04–0.15 Hz), very low frequency (VLF)—less than 0.04 Hz, and high frequency (HF)—greater than 0.04 Hz (0.15–0.4 Hz). For frequency domain analysis, the following features were extracted from the R-R interval: Sympathovagal
Assessment of Cardiovascular System Through …
575
relationships are described by the LF/HF ratio; LH indicates sympathetic activity, and HF indicates parasympathetic activity.
2.4 Arterial Blood Pressure Analysis Figure 5a shows the raw ABP signal. Many signal processing techniques and delineation algorithms for detecting fiducial points of ABP waveforms such as onset, systolic peak, and dicrotic notch have been published in the literature [13, 22, 24, 27]. An automated delineator for the fiducial points of ABP waveforms was proposed to aid computerised analysis [19]. As shown in Fig. 5b, a peak valley detection algorithm is used to extract features from ABP in the proposed work. For blood pressure analysis, the features derived from the ABP signal are SP, diastolic pressure (DP), PP, mean arterial pressure (MAP), and crest time (CT). According to the term, the SP is the highest arterial pressure caused by the contraction of the left ventricle. The DP is caused by ventricular relaxation and reflects the lowest arterial pressure caused by the contraction of the left ventricle. The normal range for SP and DP is 120 and 80 mmHg. The difference between SP and DP is known as the PP. The standard ranges for SP and DP are 120 and 80 mmHg, respectively. The PP denotes the difference between SP and DP. Pulse pressure should be between 40 and 50 mm of mercury. MAP stands for mean arterial pressure over the course of a cycle. Equation (2) gives the formula for calculating MAP. The standard range for MAP is 70–110 mm of mercury. The crest time (CT) is the distance between the wave’s foot and its height. BP is calculated as the difference between the onset of the systolic
Fig. 5 a Raw ABP signal, b detection of peaks in ABP signal
576
E. S. Selva Priya et al.
Fig. 6 a Second derivative of arterial blood pressure, b schematic representation of computation of pulse transit
and diastolic peaks. MAP = DP + (SP − DP/3)
(2)
Though these characteristics are sufficient for studying the variability among the four phases of tilt operation, the pulse transit time (PTT) and second derivative of arterial blood pressure (SDABP) provide additional clinical knowledge, with the PTT having a higher potential for detecting orthostatic hypotension earlier [28]. Double differentiation of the ABP waveform yields SDABP [19, 20]. The morphology of the waveform varies depending on the tilt operation. Positive peaks ‘a’ and ‘c’ reflect waveform acceleration, while negative peaks ‘b’ and ‘d’ represent waveform deceleration, as shown in Fig. 6a. The amplitude of the peaks is normalised to the value of ‘a’ to obtain the ratios b/a, c/a, and d/a. The PTT is calculated by calculating the time delay between the occurrence of the ‘R’ peak in the ECG signal and the occurrence of the ‘a’ peak in the SDABP waveform, as shown in Fig. 6b.
2.5 Baroreflex Sensitivity Assessment The arterial baroreflex is responsible for maintaining cardiovascular system safety and haemodynamic control. The baroreflex sensitivity index [21, 22], which is defined as a change in R-R interval in milliseconds per unit change in BP, has been shown to
Assessment of Cardiovascular System Through …
577
have greater prognostic value even when assessed with different activities. As shown in Eq. (3), the baroreflex sensitivity (BRS) index is determined by taking the square root of the spectral power of the R-R interval and the spectral power of systolic BP to compute a coefficient in both high- and low-frequency power.
αLF =
RRLFPower αH F = SBPLFPower
RRHFPower SBPHFPower
(3)
2.6 Statistical Analysis Non-parametric Wilcoxon signed rank test was performed using the software package ‘Statistical Package for Social Sciences (SPSS)’ to test whether there is significant increase or decrease of parameters of ECG and ABP during four phases of rapid tilt activity. The test is considered significant if p < 0.05.
3 Results 3.1 Time Domain Analysis Result HRV analysis consists of time and frequency domain. According to the result of time domain analysis shown in Figs. 7a, b and 8b, c, there was significant difference in HR (p < 0.05), R-R interval (p < 0.05), NN50 (p < 0.05), pNN50 (p < 0.05), and
Fig. 7 a Mean of HR b R-R interval variation during four phases of rapid tilt activity
578
E. S. Selva Priya et al.
Fig. 8 Time domain parameters of HRV analysis RMSSD, b NN50, c PNN50, d SDNN during four phases
there is no significant difference in RMSSD (p > 0.05), SDNN (p > 0.05) as shown in Fig. 9a, d during tilting in comparison with supine. Table 1 shows the mean of heart rate and R-R interval variations during four phases for rapid tilt activity.
Fig. 9 a Mean of HF and LF during four phase, b mean of ratio of LF/HF variation
Assessment of Cardiovascular System Through …
579
Table 1 Mean of heart rate and R-R interval variations during four phases for rapid tilt activity Mean
Supine
Heart rate R-R interval
66.17143 848.1971
RTUP
ARRT
75.52714
80.94429
763.52
732.7914
RTDOWN 74.54143 780.1657
3.2 Frequency Domain Analysis Results Analysis of frequency domain parameters is shown in Fig. 9a. There was a significant difference in LF and HF (p < 0.05) between angle RTUP and ATRT. The ratio LF/HF shown in Fig. 10b shows significant difference between ATRT and RTDOWN. Table 2 depicts the spectral features of mean with LF, HF, and LF/HF ratio during four phases.
Fig. 10 Detection of second derivative ABP peaks (a–c)
Table 2 Comparison of mean between LF, HF, and LF/HF ratio during four phases Mean LF HF LF/HF
Supine 923.2857 1339.143 1.736857
RTUP
ARRT
RTDOWN
1713.143
1041.857
2081.143
2218.571
1137.143
1781.857
1.58571
2.130571
2.070286
580
E. S. Selva Priya et al.
3.3 Blood Pressure Analysis Result Arterial BP comprises systolic pressure, diastolic pressure, and dicrotic notch, among other clinical facts about the cardiovascular system. Figure 10 depicts the identification of ABP and SDABP morphological characteristics. The morphological features and ratios of SDABP were analysed, it was found that there is a substantial difference in SP (P < 0.05), DP (P < 0.05), MAP (P < 0.05), PP (P < 0.05), SP (P < 0.05), DP (P < 0.05), MAP (P < 0.05), PP (P < 0.05), ratio c/a during tilt compared to supine, but no significant difference in PTT, ratio d/a. Between ATRT and RTDOWN, the crest time PP (P < 0.05) and the ratio b/a PP (P < 0.05) indicate a significant difference. There is no substantial difference in the alpha coefficient of the BRS index between supine and ATRT, but there is a significant difference in α-LF (p < 0.05), α-HF (p < 0.05) between RTUP and RTDOWN.
4 Discussions During tilting activity, gravitational effect cause pooling of blood to lower extremities thereby blood pressure in lower extremities increases leads to decrease of venous return, cardiac contractility and cardiac output. As cardiac contractility decreases, systolic pressure decreases and diastolic pressure increases. The pressure in upper part of the body drops due to change of volume. The blood pressure changes are monitored by baroreceptors located in the aortic notch and carotid sinus monitors changes in blood pressure which sense drop in blood pressure and sends signals to medulla which then inhibit parasympathetic activity and activate sympathetic activity The walls of blood vessels get constricted; as a result, heart rate increases which tense to increase blood pressure (Williams et al. 2013) HRV and BPV during Supine (120 s), RTUP (60 s), ATRT (120 s), and RTDOWN were found to be important findings in this research (60 s). Figure 11 illustrates the box plot of the SP, DP, PP, MAP, and CT. Figure 12 illustrates the box. The increase in diastolic pressure, mean arterial pressure occurs for angle sustained at 75° shows increase orthostatic stress plot of ratio of b/a, d/a, and c/a. The systolic pressure and pressure decreases during tilt [29]. The effect of posture shift from supine to tilt is shown in Fig. 13 as a reduction in BRS index during tilt action. Finally, the study compares the response to a drop in blood pressure with HR rises during rapid tilt [29]. The spectral estimation results of the low-frequency component and highfrequency component increase during (RTUP) increasing tilt angle from 0° to 75° and decrease when table is maintained at an angle of 75° (ATRT), indicating that sympathetic activity predominates in RTUP compared to supine, but sympathetic activity is less in RT down (decreasing tilt angle from 75° to 0°). The ratio c/a of SDABP indicates increase in arterial stiffness [30, 31].
Assessment of Cardiovascular System Through …
581
Fig. 11 Box plot of morphological features of ABP variation during four phases. a SP, b DP, c PP, d MAP, e CT
Our study is limited with healthy individual that to the data were chosen from Physionet though the results are not statistically significant the data obtained during the procedure should be used to develop patient-specific mathematical model. Physiological modelling, in combination with experimental approaches and signal processing techniques, provides a detailed understanding of the electrophysiology and pathophysiology of the cardiovascular system. The head-up tilt test is non-invasive test helps in diagnosing syncope, orthostatic intolerance, postural orthostatic, tachycardia syndrome, cardiovascular autonomic neuropathy in case of diabetes mellitus. It also helps in investigating regulation of autonomic nervous system function between control and diseased condition, thereby taking therapeutic decision. Beside head-up tilt test, there are several other cardiovascular autonomic reflex test such as valsalva manoeuvre, deep breathing, active standing, and hand grip in diagnosing autonomic dysfunction.
582
E. S. Selva Priya et al.
Fig. 12 Box plot of variation in ratios obtained from SDABP during four phases of rapid tilt activity a b/a ratio, b c/a ratio, c d/a ratio
Fig. 13 Box plot of α coefficient BRS index during four phases of tilt activity a α-HF index b α-LF index
Assessment of Cardiovascular System Through …
583
5 Conclusion HRV analysis is a non-invasive method of evaluating cardiovascular control mechanisms. To research the role of the ANS, it is used to estimate various time and frequency domain parameters. The properties of the arterial vessel wall are assessed using morphological parameters from ABP signal analysis. The mean of steady-state values of HR, DP, MAP, LH, LF/HF ratio increases during ATRT, while the mean of steady-state values of SP, PP, and HF ratio decreases. As a result, sympathetic function is stimulated, resulting in an increase in HR. This research could be expanded upon to create a patient-specific physiological model for HUT.
References 1. La Rovere MT, Bigger Jr JT, Marcus FI, Mortara A, Schwartz PJ, ATRAMI (Autonomic Tone and Reflexes After Myocardial Infarction) Investigators (1998) Baroreflex sensitivity and heartrate variability in prediction of total cardiac mortality after myocardial infarction. The Lancet 351(9101):478–484 2. Aggarwal S, Tonpay PS, Trikha S, Bansal A (2011) Prevalence of autonomic neuropathy in diabetes mellitus. Curr Neurobiol 2:101–105 3. Shen MJ, Zipes DP (2014) Role of the autonomic nervous system in modulating cardiac arrhythmias. Circ Res 114(6):1004–1021 4. Head HR (2013) Int J Biol Med Res 4(2):3057–3061 5. Heldt T, Oefinger MB, Hoshiyama M, Mark RG (2003) Circulatory response to passive and active changes in posture. In: Computers in cardiology, 2003 Sep 21. IEEE, New York, pp 263–266 6. Barzilai M, Jacob G (2015) The effect of ivabradine on the heart rate and sympathovagal balance in postural tachycardia syndrome patients. Rambam Maimonides Med J 6(3) 7. Williams ND, Wind-Willassen O, Wright AA, Mehlsen J, Ottesen JT, Olufsen MS (2014) Patient-specific modelling of head-up tilt. Math Med Biol: J IMA 31(4):365–392 8. Lanier JB, Mote MB, Clay EC (2011) Evaluation and management of orthostatic hypotension. Am Fam Physician 84(5):527–536 9. van Hateren KJ, Kleefstra N, Blanker MH, Ubink-Veltmaat LJ, Groenier KH, Houweling ST, Kamper AM, van der Meer K, Bilo HJ (2012) Orthostatic hypotension, diabetes, and falling in older patients: a cross-sectional study. Br J Gen Pract 62(603):e696-702 10. Zygmunt A, Stanczyk J (2010) Methods of evaluation of autonomic nervous system function. Arch Med Sci: AMS 6(1):11–18. https://doi.org/10.5114/aoms.2010.13500 11. Teodorovich N, Swissa M (2016) Tilt table test today-state of the art. World J Cardiol 8(3):277 12. Likar P (2014) Spectral analysis of heart rate variability for assessment of autonomic nervous system activity during head-up tilt table testing. In: Passamonti S, Gustincich S, Lah Turnšek T, Peterlin B, Pišot R, Storici P (eds) Cross-border Italy-Slovenia biomedical research: are we ready for horizon 2020? Conference proceedings with an analysis of innovation management and knowledge transfer potential for a smart specialization strategy. Trieste, EUT Edizioni Università di Trieste, pp 159–163 13. Elgendi M, Norton I, Brearley M, Abbott D, Schuurmans D (2013) Systolic peak detection in acceleration photoplethysmograms measured from emergency responders in tropical conditions. PLoS ONE 8(10):e76585 14. Hall JE, Guyton and Hall textbook of medical physiology, vol 12. Saunders, Philadelphia, PA
584
E. S. Selva Priya et al.
15. Orini M, Mainardi LT, Gil E, Laguna P, Bailón R (2010) Dynamic assessment of spontaneous baroreflex sensitivity by means of time-frequency analysis using either RR or pulse interval variability. In: 2010 Annual international conference of the IEEE engineering in medicine and biology 2010 Dec 1. IEEE, New York, pp 1630–1633 16. Subbalakshmi NK, Bhat MR, Basha AA, Validity of frequency domain method in assessment of cardiac autonomic function during controlled breathing in healthy subjects 17. Tripathi KK (2004) Respiration and heart rate variability: a review with special reference to its application in aerospace medicine. Ind J Aerosp Med 48(1):64–75 18. Vijayakumar T, Vinothkanna R, Duraipandian M (2021) Fusion based feature extraction analysis of ECG signal interpretation–a systematic approach. J Artif Intell 3(01): –16 19. Li BN, Dong MC, Vai MI (2010) On an automatic delineator for arterial blood pressure waveforms. Biomed Signal Process Control 5(1):76–81 20. Simek J, Wichterle D, Melenovsky V, Malik J, Svacina S, Widimsky J (2005) Second derivative of the finger arterial pressure waveform: an insight into dynamics of the peripheral arterial pressure pulse. Physiol Res 54(5):505 21. Di Rienzo M, Castiglioni P, Parati G (2006) Arterial blood pressure processing. Wiley Encyclopedia Biomed Eng 22. Choi Y, Ko SB, Sun Y (2006) Effect of postural changes on baroreflex sensitivity: a study on the EUROBAVAR data set. In: 2006 Canadian conference on electrical and computer engineering 2006 May 7. IEEE, New York, pp 110–114 23. Adam EEB (2021) Survey on medical imaging of electrical impedance tomography (EIT) by variable current pattern methods. J ISMAC 3(02):82–95 24. Kuntamalla S, Reddy LR (2014) An efficient and automatic systolic peak detection algorithm for photoplethysmographic signals. Int J Comput Appl 97(19) 25. Efremov K, Brisinda D, Venuti A, Iantorno E, Cataldi C, Fioravanti F, Fenici R (2014) Heart rate variability analysis during head-up tilt test predicts nitroglycerine-induced syncope. Open Heart 1(1):e000063 26. Lin HY, Liang SY, Ho YL, Lin YH, Ma HP (2013) Discrete-wavelet-transform-based noise reduction and R wave detection for ECG signals. In: 2013 IEEE 15th International conference on e-health networking, applications and services (Healthcom 2013) 2013 Oct 9. IEEE, New York, pp 355–360 27. Singh O, Sunkaria RK (2017) Detection of onset, systolic peak and dicrotic notch in arterial blood pressure pulses. Meas Control 50(7–8):170–176 28. Chan GS, Middleton PM, Celler BG, Wang L, Lovell NH (2007) Change in pulse transit time and pre-ejection period during head-up tilt-induced progressive central hypovolaemia. J Clin Monit Comput 21(5):283–293 29. Zaidi A, Benitez D, Gaydecki PA, Vohra A, Fitzpatrick AP (2000) Haemodynamic effects of increasing angle of head up tilt. Heart 83(2):181–184 30. Sung J, Choi SH, Choi YH, Kim DK, Park WH (2012) The relationship between arterial stiffness and increase in blood pressure during exercise in normotensive persons. J Hypertens 30(3):587–591 31. Guzik P, Bychowiec B, Gielerak G, Greberski K, Rzetecka K, Wykretowicz A, Wysocki H (2005) Assessment of arterial compliance and elasticity during graded head-up tilt in healthy people. Polski merkuriusz lekarski: organ Polskiego Towarzystwa Lekarskiego 18(103):36–40
Detection of Glaucoma Using HMM Segmentation and Random Forest Classification Chevula Maheswari, Gurukumar Lokku , and K. Nagi Reddy
Abstract Glaucoma is a retinal disease and the world’s leading cause of blindness. Glaucoma is a single retinal condition that, like cataracts, has few symptoms which cause retinal damage and, as a result, a decrease in visual acuity. As a result, interpreting a 2D fundus picture is a complex undertaking. The course of treatment is critical in preventing patients from losing their vision. Immediately, a slew of studies demonstrated the discovery of retinal fundus images using various image processing techniques. In this research, we applied the Hidden Markov Model (HMM) picture segmentation on retinal pictures, then categorized using Random Forest, and achieved accuracy of 98.15%, sensitivity with 85.5%, and specificity of 92.9%. Keywords Glaucoma · Image processing · HMM segmentation · Random forest
1 Introductıon The human eye is the organ responsible for our perception of sight. By analyzing the light that objects reflect or emit, the eye helps us to perceive and understand their forms, colors, and dimensions in the environment. The retina is an essential component of the human visual system. Glaucoma, which is the leading cause of blindness, affects the retina. Glaucoma is a chronic eye disease that causes vision loss. A rise in intraocular pressure inside the eye causes Glaucoma. Intraocular pressure rises due to a problem with the eye’s drainage system. The optic nerve, which delivers light to the brain, where it is received as an image and vision is possible, is affected by increased intraocular pressure [1]. To minimize permanent vision loss, it is vital to identify and treat Glaucoma as soon as feasible. According to World Health Organization data, this is the second most prevalent cause of permanent and permanent C. Maheswari · K. Nagi Reddy Department of ECE, N.B.K.R. Institute of Science and Technology, S.P.S.R., Nellore, A.P., India e-mail: [email protected] G. Lokku (B) J.N.T. University Ananthapuramu, Anantapur, A.P., India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_39
585
586
C. Maheswari et al.
Fig. 1 Labeled retina fundus of the eye
blindness behind the cataract. There is no way for it to be healed entirely if this has already occurred, except for succession from vision failure. In reality, everyone is at risk of developing Glaucoma; the best way to protect your eyes from irreversible vision loss is to visit an ophthalmologist as soon as you notice any vision-related problem [2]. Glaucoma is a primary cause of blindness that lasts a lifetime. Glaucoma is projected to impact 3.5% of adults over 45, meaning that around 64.3 million individuals worldwide are affected. This number is expected to rise to 80 million by 2020 and 111.8 million by 2040 due to population growth and aging. With early detection and treatment, the majority of Glaucoma-related vision loss can be avoided. As a result, detecting Glaucoma at an early stage is crucial [3]. Glaucoma is defined by a change in the anatomy of the optic nerve caused by the expansion of the optic cup, shown in Fig. 1; hence, Glaucoma prediction or classification seeks to describe the cup and disk regions [4, 5]. Glaucoma is the leading cause of irreversible blindness because it damages the eye’s optic nerve. In most situations, patients do not suffer visual loss symptoms until the disease has progressed to an advanced stage [6]. Image segmentation has been a long-standing topic of interest, inspiring many methodologies over the last few decades. The authors [7] conduct a global analysis of a particular image, using the inter-block HMM model to account for inter-block statistical information. The Markov Random Field Model (MRFM) segmentation, an earlier HMM version, may be achieved by maximizing the retro-section probability on the grounds of Huang et al. [8]. Recent research reports said that 38% diabetic, 19% hypertensive, and 49% of the individuals are suffering from various types of Glaucoma. Medical image segmentation is complex because of the picture modality, organ type, and segmentation framework. The employment of various scanning methods, inadequate resolution, image noise, and insufficient contrast complicates the process even more in many cases. Most RF-based segmentation algorithms rely solely on probability maps to segregate the targeted organ. RF’s may give a lot more information than the training approach. We may use RFs to evaluate the training method and quantify the impact of various factors in the classification task [9, 10]. Optionally, the Random Forest software generates two more pieces of data: a measure of
Detection of Glaucoma Using HMM Segmentation …
587
the relevance of the predictor variables and an extent of the internal structure of the data [11]. This research provides a method of studying RF classification training and applying feature selection techniques in HMM-segmented retina images.
2 Related Works Shyam et al. [12] tested both the Glaucomatous and normal eyes. The database is searched for ten photos of each, and the ISNT ratio is determined for each. The technology benefits recognizing Glaucoma with less complexity and easier means by choosing the volume of blood vessels in each quadrant of the eye. Smith et al. [13], using the Random Forest classification technique, propose a technique. A variety of possible scale segmentation options are examined statistically in order to select which segmentation scale(s) best predict the land cover classifications. The selection process for the segmentation of pictures was applied to determine three rankings of critical objects, which produced the most incredible precision in categorizing land cover when combined. After the segmentation scale was changed, Random Forest classification was employed for 11 SPOT picture scenes with a total accuracy of 85.2% to allocate land cover class to North and South Dakota. Niu et al. [14], with the use of the threshold and voting, presented a new HMM approach to partition and recognize challenging activities automatically and efficiently. Experiments using a video clip database from several operations show that our strategy works. Various statistical data related to detection of Glaucoma are given in Table 1. Fondón et al. [22] propose a new approach for detecting the optic cup in retinal fundus images that may be employed in a diagnostic tool with computer Glaucoma. The process is based on the Random Forest classification to get cup-edge pixels and on the color space that is human vision and environmental-specific according to CIECAM 02’s JCh (International Commission on Illumination Color Appearance Model). The classifier does not examine all of the pixels in the picture since vessels bend toward the cup’s edge. In reality, only boats with enormous curvature among Table 1 Literature survey summary report based on various ML techniques Authors
Methodology
Datasets
Accuracy
Shanbehzadeh et al. [15] Lim et al. [16]
SOM NN
SEED-DE,RIM-ONE
97.50%
CNN (3-class)
-N/A-
83%
Al-Bander et al. [17] Tan et al. [18]
CNN
MESSIDOR
96.89%
CNN (7-HL)
DRIVE
92.68%
Raja et al. [19]
Wavelets
RIM-ONE
81%
Orlando et al. [20]
Potts model
STARE
-N/A-
Deng et al. [21]
Adaptive Gaussian
Private dataset
88%
588
C. Maheswari et al.
their neighbors are considered. The cup region described in Fig. 1 typically has a dazzling yellow color, another piece of previous information employed in the recommended process. The curvature, color, and position of the candidate pixel concerning the OD center are included in the classifier’s input feature vector. Finally, a simple step is utilized to construct a smooth curve that joins the pixels selected. The approach was evaluated using 35 images for training and 55 for testing on the GlaucomaRepo database, which is open to the public. We computed five numerical parameters and compared them to three other color schemes. The results demonstrate that the strategy works.
3 Proposed Method The whole process of detecting anomalies is divided into various phases, which are as follows: preprocessing techniques, which include the median filter, the Wiener filter, and Feature Extraction for HMM segmentation followed by RF classifier for classification purpose. Figure 2 depicts the proposed Glaucoma detection block diagram. The preprocessed stage will be carried out by median filter, the Wiener filter, and are then applied for enhancement for better classification and result in enhanced version of preprocessed input image. Approximately, 500 iterations are carried out to find the Glaucoma ROI. Thus by using HMM [23] the iterated preprocessed image is finally segmented as shown in Fig. 5. Further classification is carried out using RF classifier [24].
Input Image
Pre-Processing
Random Forest Classifier
Feature Extraction
HMM Segmentation
Fig. 2 Block diagram of the Glaucoma detection using HMM segmentation and Random Forest classification
Detection of Glaucoma Using HMM Segmentation …
589
Fig. 3 Retinal fundus input images. Courtesy RIGA Data set (Retinal Fundus Images for Glaucoma Analysis)
3.1 Preprocessing Preprocessing is a technique for removing noise and improving image quality. Poorquality image preprocessing can provide an appropriate degree of success in automated anomaly identification. Because retinal abnormalities, considered from RIGA dataset shown in Fig. 3 are better visualized in grayscale than color, color photographs are transformed into grayscale pictures and are portrayed in Fig. 4. The grayscale images are then subjected to preprocessing processes, which result in picture augmentation. A nonlinear digital filtering method called the median filter reduces noise from photos. The median is derived by numerically ranking all pixel values in the window, then replacing the pixel under consideration with the middle (median) value. The Wiener filter removes additive noise while also inverting blurring. The inverse filtering and noise smoothing operations decrease the total mean square error.
3.2 Feature Extraction The feature extraction [25] approach detects anything in a picture with Glaucomalike characteristics. To achieve precise segmentation [26] using the edge approach, create a start contour beyond the object’s bounds. The two types of forces are internal “E Internal size ” and external forces “E External size .” Internal forces maintain the item smooth during the deformation process and identify the object’s curvature. External
590
Fig. 4 Preprocessing and enhancement applied to the input image
Fig. 5 Segmentation with HMM
C. Maheswari et al.
Detection of Glaucoma Using HMM Segmentation …
591
forces are defined as the factors that cause the model to move in the direction of the desired outcome. The active contour model represents the pattern matching process. The contour initialization technique is the first stage for an accurate segmentation dynamic contour model. When an optic nerve fiber is damaged, the bright center zone, known as the cup, expands, causing the optical disk to move. It more efficiently catches the uneven disk area. Equations (1) and (2) define the active contour model and “ds” represents its derivative. E t = ∫ E total .V (s)ds
(1)
E total = E Internal size + E External size
(2)
3.3 HMM Because we wish to rebuild a picture with statistically homogenous areas, a hidden variable y = (y(1), ..., y(s)) = (1, ...., k)s percent “S” that signals a standard categorization of the pictures “ei ” is a reasonable choice. The task at hand is to estimate a set of variables using a Bayesian methodology (e, y) p(e, y|g) = p(e, y|g) p(y|g)
(3)
To express p( f, z|g) using the Bayes formula, first define p(gi |ei ) and p(ei |y) for p(e|y, g) and p(gi |y) and p(y) for p(y|g). To assign p(ei |y), we must first define the pixel classes mentioned in Eq’s. (4) and (5) by “Rk ” and “eik ,” respectively, as Rk = {r : y(r ) = k}, |Rk | = kn
(4)
eik = {ei (r ) : y(r ) = k}
(5)
We begin by assuming that all pixels “eik ” in an image “ f i ” belonging to the same class “k,” where “i” is a vector whose components are all equal to one. We’ll expand this model to the scenario when pixels in various areas are anticipated to be independent. Still, we’ll apply a Gauss–Markov model to account for their local correlation inside any homogenous region (Fig. 5).
592
C. Maheswari et al.
3.4 Random Forest An RF is a collection of decision-making trees that are generally trained by a distinct training set (“bagging”). Samples can be performed via a binary test at every inner node along the way from one root to the bladder. A binary test compares a characteristic with a set of parameters. It is necessary to identify the optimal data collection in several training courses in each node during forest training. A random feature subset is inspected at each node to identify the optimal subset. This reduces the relationship between individual tree outputs while improving the forest’s overall performance. There are multiple cost functions and semi-monitored strategies for dividing the training sample at different nodes. The process is repeated until a spectrum of the node is below a threshold, a maximum tree depth is established, or all samples are of the same type. Until the number of specimens in a node is below a threshold, a full tree depth is achieved, or all of them of the same class are recurring until the specimen number is below the threshold class (Fig. 6). The tests based on the path from the root node to the leaf are conducted to the new sample during testing. When the tree reaches a leaf node, the class allocated to that node during the training stage is voted. The last sample of the examination is taken with the most votes from the class. Moreover, the probability of the class of a test is determined by the percentage of votes cast by all trees for that and finally detection of Glaucoma is achieved and are shown in Fig. 7.
Fig. 6 Normal—Glaucoma-free detection
Detection of Glaucoma Using HMM Segmentation …
593
Fig. 7 Glaucoma detection
Algorithm1. Random Forest classifier a new node n is created; if the halting requirements are satisfied, return as a leaf node; otherwise from j = 1 to j = M do calculate the informativeness metric corr(Aj,Y); end for compute feature weights w1, w2,..., wM; apply the feature weighting technique to choose m features at random; Use these m features as candidates to get the optimal split for the partitioned node; for each split, call create Tree(); stop if return n;
4 Experiment Results 4.1 Sensitivity or Recall (S) The number of pixels separated by the gold standard of the segmentation algorithm is divided by the total number of pixels of the gold standard of segmentation to calculate subdivision. “S” defined in Eq. (6) is a value between 0 and 1, 1 with optimum segmentation and 0 with an unwanted result.
594
C. Maheswari et al.
Sensitivity = TP /(TP + FN)
(6)
Specificity = TN /(TN + FP)
(7)
Accuracy = (TP + TN) /(TP + TN + FP + FN )
(8)
In such case, “TP” is the True Positivity; “TN” stands for True Negative; False Positive is abbreviated as “FP”; Positive Predictive Value (PPV) or “FN” is the False Negativity. The ratio between the number of pixels divided according to the technique which matches the segmenting gold standard and the total number of pixels divided to identify over dividing is calculated. Its potential values are in the field, similar to that of “S.” “F1” is a system of categorization. The average accuracy of a test is weighted; the F1 score of 1 is the highest and 0 the lowest.
4.2 Specificity The real negative rate, also known as the specificities (SPC), is the correctly categorized proportion of negatives and given by Eq. (7). A flawless outcome on a scale of 0 to 1 would be 1.
4.3 Accuracy Accuracy (ACC) is a gauge of how positive and negative effects are recognized correctly is given by Eq. (8). The best value is one again. From the output of proposed methodology portrayed in Fig. 6 and statistics of Table 2, Accuracy of Glaucoma Detection when compared to state of art methods is attained to 98.15% (Fig. 8). Table 2 Accuracy of Glaucoma detection on various approaches
Authors
No. of fundus images in dataset
Accuracy (%)
Carillo et al. [27]
26
88.50
Kan et al. [28]
50
94
Sengar et al. [29]
140
93.57
Poshtyar et al. [30]
300
89.32
Salame et al. [31]
50
92
Proposed method
750
98.15
Detection of Glaucoma Using HMM Segmentation …
595
Accuracy (%) 100.00% 98.00% 96.00% 94.00% 92.00% 90.00% 88.00% 86.00% 84.00% 82.00% Carillo et al. [16]
Kan et al. [17]
Senga et al. [18]
Poshyar et Salame et al. al. [19] [20]
Proposed method
Accuracy (%) Fig. 8 Comparison of accuracy on various Glaucoma detection techniques
Using a Random Forest classifier on HMM-segmented images, an average accuracy with 98.15%, sensitivity with 85.5% and specificity of 92.9% are achieved. The finally obtained experimental results in detection of Glaucoma-free and Glaucoma based fundus images are shown in Fig. 6 and Fig. 7 respectively.
5 Conclusion This study created a new approach for detecting Glaucoma using a Random Forest classifier on HMM-segmented pictures. The median and Wiener filters were used to improve the fundus pictures. The suggested segmentation method was assessed for accuracy, specificity, and sensitivity. The recommended system’s experimental findings for Glaucoma Detection provided positive accuracy, sensitivity, and sensitivity. Glaucoma detection using Random Forest classifier on HMM-segmented photos has an average accuracy of 98.15%, with a sensitivity of 85.5% and a specificity of 92.9%.
References 1. Deepika E, Maheswari S (2018, January) Earlier glaucoma detection using blood vessel segmentation and classification. In: 2018 2nd International conference on ınventive systems and control (ICISC). IEEE, New York, pp 484–490 2. Eswari MS, Karkuzhali S (2020, January) Survey on segmentation and classification methods for diagnosis of Glaucoma. In: 2020 International conference on computer communication and ınformatics (ICCCI). IEEE, New York, pp 1–6
596
C. Maheswari et al.
3. Li L, Xu M, Liu H, Li Y, Wang X, Jiang L, Wang Z, Fan X, Wang N (2019). A large-scale database and a CNN model for attention-based Glaucoma detection. IEEE Trans Med ˙Imaging 39(2):413–424 4. Krishnan R, Sekhar V, Sidharth J, Gautham S, Gopakumar G (2020, July) Glaucoma detection from retinal fundus ımages. In: 2020 International conference on communication and signal processing (ICCSP). IEEE, New York, pp 0628–0631 5. Sungheetha A, Sharma R (2021) Design an Early detection and classification for diabetic retinopathy by deep feature extraction based convolution neural network. J Trends Comput Sci Smart Technol (TCSST) 3(02):81–94 6. Carrillo J, Bautista L, Villamizar J, Rueda J, Sanchez M (2019, April) Glaucoma detection using fundus images of the eye. In: 2019 XXII Symposium on ımage, signal processing and artificial vision (STSIVA). IEEE, New York, pp 1–4 7. Lu J, Carin L (2002, May) HMM-based multiresolution image segmentation. In: 2002 IEEE ınternational conference on acoustics, speech, and signal processing, vol 4. IEEE, New York, pp IV-3357 8. AlZu’bi S, AlQatawneh S, ElBes M, Alsmirat M (2020) Transferable HMM probability matrices in multi-orientation geometric medical volumes segmentation. Concurr Comput: Pract Exp 32(21):e5214 9. Lokku G, Harinatha Reddy G, Prasad G (2020) Discriminative feature learning framework for face recognition using deep convolution neural network. Solid State Technol 63(6):18103– 18115 10. Lokku G, Harinatha Reddy G, Prasad G (2020) OPFaceNet: optimized face recognition network for noise and occlusion affected face images using hyperparameters tuned convolutional neural network. Appl Soft Comput 117(2022):108365 11. Kumar Lokku G, Reddy GH, Prasad MNG (2019) Automatic face recognition for various expressions and facial details. Int J Innov Technol Explor Eng 8(9 Special3):264–268 12. Shyam L, Kumar GS (2016, July) Blood vessel segmentation in fundus images and detection of Glaucoma. In: 2016 International conference on communication systems and networks (ComNet). IEEE, New York, pp 34–38 13. Smith A (2010) Image segmentation scale parameter optimization and land cover classification using the Random Forest algorithm. J Spat Sci 55(1):69–79 14. Niu F, Abdel-Mottaleb M (2005, July) HMM-based segmentation and recognition of human activities from video sequences. In: 2005 IEEE ınternational conference on multimedia and expo. IEEE, NewYork, pp 804–807 15. Shanbehzadeh J, Ghassabi Z, Nouri-Mahdavi K (2018) A unified optic nerve head and optic cup segmentation using unsupervised neural networks for glaucoma screening. In: EMBC. IEEE, New York, 5942–5945 16. Lim G, Cheng Y, Hsu W, Lee ML (2015) Integrated optic disc and cup segmentation with deep learning. In: ICTAI. IEEE, New York, pp 162–169 17. Al-Bander B, Al-Nuaimy W, Williams BM, Zheng Y (2018) Multiscale sequential convolutional neural networks for simultaneous detection of fovea and optic disc. Biomed Sign Process Control 40:91–101 18. Tan JH, Bhandary SV, Sivaprasad S, Hagiwara Y, Bagchi A, Raghavendra U et al (2018) Agerelated macular degeneration detection using deep convolutional neural network. Future Gener Comput Syst 87:127–135 19. Raja C, Gangatharan N (2013) Glaucoma detection in fundal retinal images using trispectrum and complex wavelet-based features. Eur J Sci Res 97:159–171 20. Orlando JI, Prokofyeva E, Blaschko MB (2016) A discriminatively trained fully connected conditional random field model for blood vessel segmentation in fundus images. IEEE Trans Biomed Eng 64:16–27 21. Deng G, Cahill L (1993) An adaptive gaussian filter for noise reduction and edge detection. In: 1993 IEEE conference record nuclear science symposium and medical imaging conference. IEEE, New York, pp 1615–1619
Detection of Glaucoma Using HMM Segmentation …
597
22. Fondón I, Valverde JF, Sarmiento A, Abbas Q, Jiménez S, Alemany P (2015, September) Automatic optic cup segmentation algorithm for retinal fundus images based on random forest classifier. In: IEEE EUROCON 2015-ınternational conference on computer as a tool (EUROCON). IEEE, New York, pp 1–6 23. Bo-ping Z, Rong W (2010) An HMM segmentation method based statistical layered model for an image of vehicle. In: 2010 International conference on networking and digital society, pp 385–389. https://doi.org/10.1109/ICNDS.2010.5479213 24. Divya S, Vignesh R, Revathy R (2019) A distincitve model to classify tumor using random forest classifier. In: 2019 Third international conference on inventive systems and control (ICISC), pp 44–47. https://doi.org/10.1109/ICISC44355.2019.9036473 25. Gurukumar Lokku, G Harinatha Reddy, M N Giri Prasad, Optimized Scale-Invariant Feature Transform with Local Tri-directional Patterns for Facial Expression Recognition with Deep Learning Model, The Computer Journal, 2021;, bxab088, https://doi.org/10.1093/comjnl/bxa b088 26. Gurukumar Lokku, G Harinatha Reddy, M N Giri Prasad (2021) A robust face recognition model using deep transfer metric learning built on AlexNet convolutional neural network. In: 2021 International conference on communication, control and information sciences (ICCISc), pp 1–6. https://doi.org/10.1109/ICCISc52257.2021.9484935 27. Carrillo J, Bautista L, Villamizar J, Rueda J, Sanchez M, Rueda D (2019) Glaucoma detection using fundus images of the eye, 2019 XXII Symposium on Image. In: Signal processing and artificial vision (STSIVA). IEEE, New York, pp 1–4 28. Khan F, Khan SA, Yasin UU, ul Haq I, Qamar U (2013) Detection of glaucoma using retinal fundus images. In: The 6th 2013 Biomedical engineering ınternational conference. IEEE, New York, pp 1–5 29. Sengar N, Dutta MK, Burget R, Ranjoha M (2017) Automated detection of suspected glaucoma in digital fundus images. In: 2017 40th International conference on telecommunications and signal processing (TSP). IEEE, New York, pp 749–752 30. Poshtyar A, Shanbehzadeh J, Ahmadieh H (2013) Automatic measurement of cup to disc ratio for diagnosis of glaucoma on retinal fundus images. In: 2013 6th International conference on biomedical engineering and ınformatics. IEEE, New York, pp 24–27 31. Salam AA, Akram MU, Wazir K, Anwar SM, Majid M (2015) Autonomous glaucoma detection from fundus image using cup to disc ratio and hybrid features. In: ISSPIT. IEEE, New York, pp 370–374 32. Maruthi Kumar D, Guru kumar L, Kannaiah K (2020) A conceal fragment visible image broadcast through montage images with revocable colour alterations. In: Hitendra Sarma T, Sankar V, Shaik R (eds) Emerging trends in electrical, communications, and information technologies. Lecture notes in electrical engineering, vol 569. Springer, Singapore. https://doi.org/10.1007/ 978-981-13-8942-9_57
A UAV-Based Ad-Hoc Network for Real-Time Assessment and Enforcement of Vehicle Standards in Smart Cities Vijay A. Kanade
Abstract The automobile industry plays a key role in a smart city environment. This includes public transportation, personal cars, autonomous vehicles, and rental cars. Governments regularly come up with new vehicle rules and standards such as emission standards, fuel economy standards, gaseous pollutant rules, free acceleration smoke, and others to ensure safety on roads. While drafting new vehicle policies is critical, assessing and enforcing them can pose a greater challenge to the concerned agencies. The research paper proposes a unique UAV-based network that regulates the vehicle standards in smart cities and alerts the authorities in case of any anomaly in real time. Keywords Multispectral light · Polarization · Optical filters · Unmanned aerial vehicle (UAV) · UAV network · Automobile industry · Vehicle standards
1 Introduction As cities get smarter and livable, technology is gaining control over the urban environment. Smart IoT gadgets, smart automotive, and digital interfaces are installed on the traditional infrastructures to streamline the city operations. The ‘smartness’ is enabling city dwellers to make better decisions and lead a better quality of life. The automobile industry has further propelled this development as it has almost transformed the face of cities. The new entrants in the automotive market include autonomous and electric vehicles (EVs). Autonomous cars use a variety of sensors situated on the vehicles to create a map of their surroundings for easier navigation. Electric cars on the other hand use a battery pack to power the motor instead of an internal combustion engine. This results in better fuel economy than traditional vehicles. Irrespective of the technology used for powering the engines, all vehicle types have to abide by the regulations set forth by the government agencies to ensure V. A. Kanade (B) Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_40
599
600
V. A. Kanade
road safety [1]. The vehicle standards have far-reaching implications on the natural habitat and can impact the overall city climate. Vehicle manufacturers thus take into consideration these rules while manufacturing the automobiles. Although manufacturers implement these guidelines on upcoming vehicles, assessing and enforcing them on old cars or models can be tricky. Also, satellitebased remote sensing is being practiced for extracting minerals, developing 3D city models, disaster mitigation, and agricultural development [2]. However, such remote sensing models haven’t been explored in greater depths in the direction of road safety. Thus, as the government’s voice falls on deaf ears, there seems to be a long-standing need to have a system in place that can evaluate these vehicle standards in real time and alert the respective agencies.
2 Indicators of Automobile Condition Taking care of an automobile is important while it’s in operation. However, we tend to ignore, overlook or reject any warning signs such as suspicious noises, smokes, or other visible signs that can be detrimental to any vehicle. Thus, recognizing the signs that highlight the condition of the automobile is necessary to prevent further damage to the vehicle, environment, or climate. Some of the key performance indicators of an improperly maintained vehicle are discussed below [3]: 1.
Rare noises
A vehicle needing attention sometimes makes suspicious noises and sounds different—not just to the driver, but to the outside world. One such noise is of knocking or thumping type. This noise indicates worn-out engine rod bearings or in some cases possibly they have gotten loose than usual. Such bearings in all probability may fail soon, thereby demanding immediate action. Another noise associated with the engine is that of squealing. Here, if the vehicle’s engine is regularly hitting those high notes, then the fan within the engine takes a toll. This implies that the belt has either gotten loose or worn down completely. It is an alarming signal for any vehicle on the road. Grinding noise also relates to the maintenance of the vehicle. Such noise implies that the front brake pads are getting worn-out every single time the vehicle hits the road. If this continues, then eventually, the metal backing plate may directly clamp onto the brake disk. This can reduce the braking effectiveness considerably, thereby making the vehicle unsafe on roads. 2.
Different smoke types
Smoke is another vital indicator of a ‘not so well-maintained’ vehicle. There are various smoke types, and each one identifies a peculiar problem.
A UAV-Based Ad-Hoc Network for Real-Time …
601
Blue smoke Blue smoke signifies the leakage of oil from the engine. Subsequently, it is also possible that oil is being burned along with fuel. White smoke White smoke indicates the accidental event where either water condensation or antifreeze aspect of the vehicle has been intercepted and combined with the fuel supply. Black smoke Black smoke observed when the engine warms up indicates the clogging up of the engine air filter. 3.
Fluid (or oil) beneath the car
Oil or fluid under the car can be an indicator of the engine’s oil or coolant/antifreeze leak. 4.
Change in engine power
If the vehicle seems different from normal, implying there is a slight delay upon pressing of accelerator, then this could potentially highlight a serious engine problem. 5.
Emissions
Government policies specify the emission standards such as CO, HC, and NO emissions for vehicles [1]. If these standards are not met, it could possibly hint at the under performance of the motor vehicle.
3 UAV-Based Ad-Hoc Network The UAV-based ad-hoc network deploys a fleet of UAVs in a city area that communicate wirelessly [4]. They keep track of vehicles running around the city and are in possession of the latest government vehicle standards. These rules or guidelines are updated as and when the regulatory bodies change them. The UAVs are also configured to alert government authorities in real time if any vehicle is spotted violating the vehicle standards or is in a bad shape. This enables the respective bodies to take immediate action on the identified motor vehicle and thereby ensure that the vehicular safety standards are followed.
4 Techniques to Assess Vehicular Condition The UAVs are fitted with multispectral sensors, monochromatic light sources, and optical detectors that help to detect the vehicle condition [5]. The two techniques
602
V. A. Kanade
Fig. 1 Visualization showing the plane of polarization making an angle θ with the y-axis (i.e., transmission axis) of the polarization filter
used to gauge the maintenance status of vehicles include multispectral band sensing and optoacoustics.
4.1 Multispectral Band Sensing In multispectral band sensing, the specialized sensors installed on UAVs capture information that cannot be viewed by a human eye or an RGB camera. It captures the scenes with a higher spectral resolution. Here, each pixel captures vectors of intensity values rather than R, G, B triplets as seen in traditional cameras. Every single pixel refers to the incoming light that is distributed over a small wavelet range. This is achieved by capturing separate bands with respective color filters. Upon capturing the incoming light, it is then passed through small polarizers within the UAVs. Polarizers here refer to optical filters that allow light bands of a specific polarization to pass through and at the same time block the light bands of other polarizations. Thus, polarization further enhances the images captured by segregating separate light bands. Mechanism of polarizing filter A light beam of monochromatic plane-polarized light is incident on a polarizing filter after being captured by multispectral sensors as seen in Fig. 1. The intensity of light transmitted through the polarizing filter is directly related to the intensity of the incident light [6]. This is given by the empirical law, Itransmitted = Iincident cos2 θ
(1)
A UAV-Based Ad-Hoc Network for Real-Time …
603
where I = light intensity, θ = the angle made by the plane of polarization of the incident light beam with the polarizer transmission axis. Light carries electric and magnetic fields as it propagates. Here we focus on the as it bears a relationship with light intensity as seen in electric field vector (i.e., E) the below equation, 2 I ∝ E
(2)
Here, Eq. (1) discloses the polarizer action which allows only selective light band to be transmitted, i.e., transmit only the component of E that is parallel to the transmission axis as represented in Eq. (3). Eincident ≡ E = nˆ x E sinθ + nˆ y E cosθ
(3)
Figure 1 depicts a plane-polarized beam of monochromatic light incident on a polarizing filter. Figure 2 depicts a graphical view of polarized light described by the electric field vector Eq. 3. Note: The arrows in the polarization figure above highlight the electric field vector of light. Here, the magnetic field vector is perpendicular to E. The mechanism of the polarizing filter can be comprehended from Eq. (3). Fundamentally, the polarizing filters are made up of molecules that have large dipole
Fig. 2 A graphical view of polarized light where nˆ x and nˆ y denote unit vectors along the x- and y-axes [6]
604
V. A. Kanade
moments and all of them are aligned in a perpendicular direction to the transmission axis. Such molecules do not absorb light polarized perpendicular to their dipole moments [6]. Also, the electric field vector of the light transmitted (i.e., specific band) through the polarizer is given by Eq. (4): Etransmitted = nˆ y E cosθ
(4)
Thus, polarization of light received via multispectral sensors plays a key role in detecting an unusal light pattern that originates from the vehicles needing attention. Multispectral sensing used here also considers reflectance phenomena to better interpret the received light. The method employed facilitates detection of vehicular smoke of different types, i.e., blue, black and white smoke. It also keeps emission standards in check all along. Further, the sensors keep track of any oil or fluid leak occurring on roads as the car traverses the city. In addition, the UAVs note the acceleration of a vehicle for a brief time period, i.e., days to months. As the monitoring continues and if it is identified that there is a significant change in the acceleration of a particular vehicle during the period, then that motor is red-flagged as the one having issues. In this way, multispectral sensing and analysis exploit the properties of light to ascertain the vehicle condition operating in the city.
4.2 Optoacoustics Optoacoustics is specifically used to optically sense different sounds generated by a faulty vehicle. This implies, the suspicious noises associated with ‘not so wellmaintained’ vehicles are localized and isolated based on optoacoustics principle [7]. Here, we employ a refractometry method which uses the photoelastic principle. The noisy acoustic waves arising from the faulty vehicle interact with the nearby air medium. The waves tend to induce mechanical stress in that air medium which changes the refractive index (RI) of the air in proportion to the mechanical pressure. In this module, we use a monochromatic light source located on a UAV to shoot a light beam onto the vehicle that is to be monitored. The emitted light beam is disturbed due to the changes in the RI of the air medium as a result of unwanted sound waves that propagate outward from the vehicle. The light source here is accompanied with an optical detector that records various parameters such as changes in the beam intensity, deflection angle, or phase of the light beam as it passes through the air media filled with acoustic waves. The recorded information provides essential insights into the sound signals interrogated in the vicinity of the vehicle. In this way, optoacoustics helps to optically sense the sound waves generated by faulty vehicle parts.
A UAV-Based Ad-Hoc Network for Real-Time …
605
Fig. 3 UAV-based ad-hoc network for real-time assessment and enforcement of vehicle standards in smart cities
Figure 3 discloses the proposed UAV-based ad-hoc network used for real-time assessment and enforcement of vehicle standards in smart cities.
5 Preliminary Results To validate the research proposal, we deployed a single drone equipped with a sensor having an array of filters and a polarizing filter to capture and process vehicular images. In the first step, the drone sensor captures the light originating from the surroundings where vehicles are placed as seen in Fig. 4a. The received light then passes through a series of filters as seen in Fig. 4b to produce a multispectral image. After passing through the filters, the multispectral light then passes through an optical filter that serves as a polarizer. This is observed in Fig. 4c. The captured image presents various light bands that seem to generate insights into the vehicular condition. Although the image isn’t conclusive enough in its current form, it reveals a greater deal of information than the traditional cameras.
6 Use Cases The potential use cases of the proposed UAV-based network can be explored across various disciplines. Some of these are elaborated below: 1.
Compliance authorities
The research proposal can be handy for government authorities that are responsible for enforcing vehicle standards within a geography. The agencies can upload vehicle
606
V. A. Kanade
Fig. 4 Drone images
guidelines onto the UAVs in a network and thereby locate the vehicles that do not comply with the vehicular standards in real time. The government bodies can further impose penalties and/or surcharges on such non-complaint vehicles depending on the distance traversed by them. Also, as governments are continually making stricter norms on vehicles to satisfy the emission standards, the UAVs can keep a check on fuel economy and emission standards with minimal human interference. As a consequence, the method can help to control pollution from motor vehicles. 2.
Insurance agencies
The network can be useful for insurance agencies that can offer various insurance packages to motor vehicles depending on the criticality of the vehicle condition. The insurers can also provide immediate compensation assistance to their customers on the fly as soon as they identify that a vehicle with a specific number plate has met with an accident (Fig. 5). 3.
Avoid accidents
The UAV network can also help in avoiding any future accidents that can occur due to improperly maintained vehicles. It can indeed improve the overall road safety.
7 Conclusion The paper proposes a UAV-based network that assesses and enforces vehicle standards in smart cities. The technology provides real-time insights and better visualization of the urban vehicular reality. The network uses multispectral band sensing
A UAV-Based Ad-Hoc Network for Real-Time …
607
Fig. 5 UAV-based network for insurers
and optoacoustics as a foundation for monitoring motor vehicles in the city area. The technique ensures that the vehicles comply with the government guidelines and push the road safety agenda forward. The research reduces accidents in urban localities as a consequence and also offers benefits to insurance companies.
8 Future Work In the future, we intend to further optimize the proposed research by using higherresolution multispectral sensors and polarizers. Additionally, we also intend to add more drones in a UAV-based network to cover larger city area. Acknowledgements I would like to extend my sincere gratitude to Dr. A. S. Kanade for his relentless support during my research work. Conflict of Interest The authors declare that they have no conflict of interest.
References 1. Yang Z et al, Global baseline assessment of compliance and enforcement programs for vehicle emissions and energy efficiency. In: The international council on clean transportation (ICCT) 2. Applications of satellite imagery & remote sensing data, Jan 16, 2017 3. REDEX, For a better drive, 10 ways to recognise the signs of car engine damage, July 19, 2017 4. Ahmad A et al (2020) Noise and restoration of UAV remote sensing images. (IJACSA) Int J Adv Comput Sci Appl 11(12) 5. Hawk, Capturing multispectral data using drones
608
V. A. Kanade
6. Heyde K, Wood JL (2020) Quantum mechanics for nuclear structure, vol 1, A theory of polarized photons. IOP Publishing Ltd 7. Wissmeyer et al (2018) Light: Sci Appl 7:53. Official Journal of the CIOMP 2047-7538.https:// doi.org/10.1038/s41377-018-0036-7
ARTFDS–Advanced Railway Track Fault Detection System Using Machine Learning Lakshmisudha Kondaka, Adwait Rangnekar, Akshay Shetty, and Yash Zawar
Abstract Indian Railways are an integral part of India’s economic ecosystem. The annual ridership of Indian Railways is approximately 9.16 billion with the total length of railway tracks being 1150 billion km, and the total freight/goods transported annually stand at 1.1 billion tonnes. Hence, we need a reliable, accurate and agile method of finding complications in railway tracks as both lives and goods are at stake. An advanced railway track fault detection system (ARTFDS) is intended for monitoring faults in railway tracks. It is an IoT-based application of ultrasonic sensors and utilizes an ML model to identify nature of defects in the track and hence ascertain the severity of the defects. The bot has added an Android application interface and SIM-based GSM message communication to provide exact GPS location enabled via Google Maps on a mobile device. The data stored on the application is backed up on Firebase real-time database which is a cloud-based solution that keeps data secure from crashes and other intrusion-related threats. Keywords Railway tracks · Sensor technology · Image classification · Data classification · GPS · Machine learning
L. Kondaka (B) · A. Rangnekar · A. Shetty · Y. Zawar Department of Information Technology, SIES Graduate School of Technology, Nerul, Navi Mumbai, India e-mail: [email protected] A. Rangnekar e-mail: [email protected] A. Shetty e-mail: [email protected] Y. Zawar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_41
609
610
L. Kondaka et al.
1 Introduction The railways currently employ an error-prone and tedious method of finding faults in tracks. It involves a wide-spread chaotic man-hunt that usually takes place after a fault has been noticed by a motorman. This process disturbs the schedule of traffic and results in extensive losses. Simplification of this process can be carried out by employing a safer, synchronized and dependable system. The inefficiency of operation of the conventional method of crack finding can be overwhelmed using the dedicated bot-network system. Provision of reliable and technologically backed up results in order to improve the accuracy of fault detection. By employing a partially automated process—a lot of time can be saved over operating a chaotic man-hunt-based method that is seen currently. This system is scalable to other equally important application in transport and exploration industry. Indian Railways are in a dire need of an advanced system of fault detection that can solve problems quickly and efficiently. Advancement can be induced by engaging a synchronized distributed system of easy-to-use bots specialized to detect faults in the tracks. Using a range of sensors, we can bring efficiency to this process by increasing detection accuracy significantly with digital and location positioning back up. These bots are compact, low power consuming and easy to use. The proposed system involves developing an automated movable bot that moves upon a railway track along forward and backward direction. The bot then uses a range of ultrasonic sensors to detect faults/defects/cracks in the railway tracks. Upon finding a crack, it communicates using a GSM sim-based module providing a latitude and longitude coordinate using a GPS module. Onto the next process, after finding a fault/defect/crack, the CPU runs a ML model which is an image processing classifier that classifies the defect upon basis of magnitude and severity. The bot also has added assistance of a web-app interface. A defect detection system helps save lives by predicting vulnerable areas, and hence, mitigation and evacuation measure could be deployed properly. This enables us to accurately analyse the magnitude of damage and severity of the defect using an image-classifying ML model and GPS technology which can help us keep record of any damage that has happened or needs attention. Moreover, economic losses can be prevented, i.e. freight, goods and constructions losses can be avoided. The paper is organized as follows: Section 2 presents the literature review. Proposed system is presented in Sect. 3. Experimental results are presented in Sect. 4. Future work is presented in Sect. 5, and finally conclusion is given in Sect. 6.
2 Literature Review In 2014, Molodova et al. proposed a method [1] to detect short-wave faults in railway tracks with the help of visual inspection and eddy current measurements. The process for detecting the squats included data acquisition and pre-processing of the measured
ARTFDS–Advanced Railway Track Fault Detection …
611
data for the detection of squats was explained in this paper. The continuous monitoring, management of the large amount of data and accuracy of automatic detection under various track structures are not addressed in this paper. Some solutions seek the use of LCD display and optical encoder [2] to show the coordinates and working status of other integrated devices and to measure the speed of robot in revolution per minute (RPM). Pranav et al. discussed about a system [3], which uses a passive infrared (PIR) sensor to keep away manual patrolling and finding of living beings across the tracks. This can operate during the night as well as the daytime which can help to operate the system more efficiently. In this paper, the ideas inferred in designing a railway track fault detection system are an autonomous vehicle using microcontroller and an IR obstacle sensors assembly system that detects the faults along its path. Chandan et al. proposed a system [4] by employing an LDR sensor and LED to detect faults with the vehicle stopping there and using a connected GSM and GPS module to send latitude and longitude of the fault file to the concerned authorities. Nisthul et al. [5] proposed a significant development in terms of a spur gears mechanism to be used in bodies that are designed to transmit motion and power between parallel shafts. Another system [6] proposes the use of an IR receiver to give a signal to the GPS receiver which will pinpoint the latitude and the longitude coordinates and sends them to the GSM module which will send a message to authorities. Dinesh et al. designed an enhanced fault detection system for railway track [7], which focuses on continuous wireless monitoring of railway tracks via an assembly of IR and fire sensors. The proposal aims for an application that can be useful during the day as well as at night. Nayankumar et al. proposal aims to use an array of IR sensors along with an ultrasonic distance sensor. This system also proposes the use of a motor driver and GSM module for added assistance [8]. An efficient framework to detect cracks in rail track using neural network classifier was proposed by Rajagopal et al. [9]. In this paper, cracks in railway track are detected using neural network classification approach. A deep learning technique is proposed in [10], in which an adaptive histogram equalization technique to improve the track image, and then extracts additional features from the improved rail track image. These extracted characteristics are then trained and categorized using a neural network classifier, which divides the railroad track image into crack-free and cracked images. Traditional learning-based approaches have difficulties in obtaining representative training samples. A new methodology proposed in [11] uses a new unsupervised multi-scale fusion crack detection (MFCD) algorithm that does not require training data. The proposed MFCD algorithm improves the crack detection performance compared with the single-scale WMIP-based method. A new development of the autonomous mobile robotic system for bridge deck inspection and evaluation was proposed in [12]. The robot is integrated with several nondestructive evaluation (NDE) sensors and a navigation control algorithm to allow it to accurately and autonomously manoeuvre on the bridge deck to collect visual images and conduct NDE measurements.
612
L. Kondaka et al.
Another paper discussed on GPS tracking system called Goo-Tracking [13] that is composed of commodity hardware, open-source software and an easy-to-manage user-interface via a web server with Google Map or via Google Earth software. The system includes a GPS/GPRS module to location acquisition and message transmission, MMC to temporary store location information and an 8-bit AVR microcontroller. However, the current research articles include a number of flaws that were noted in a survey paper [14]. Data quality difficulties, such as extremely imbalanced datasets, limitations of the manual labelling process and the lack of a comprehensive public database for training and comparing different algorithms, are limiting progress on the research community’s side. The problems associated with describing how an algorithm detects errors, which is required to acquire the industry’s trust, and incorporating domain knowledge in ML approaches limit development on the deployment side of ML research. The author [15] proposes a classification-based method for detecting rail defects such as localized surface collapse, rail end batter or rail components such as junctions, turning points, crossings using acceleration data. They have also presented a deep learning-based strategy for the detection of rail joints or defects using convolution neural network to increase the performance of the classification-based models and their application in practice (CNN). The determination of the maintenance and repair schedules and the selection of the most economical maintenance practices have been a great concern in the field of railway engineering [16]. A novel CNN can be used to increase the performance of detectors built that use local binary patterns (LBPs) and histogram of oriented gradients (HOGs) [17]. Sharma et al. have examined several methods to classify the conditions of the video frame. The pre-processed CNN + SVM architecture provides a good accuracy with higher efficiency and less loss than other combination and single classifiers [18]. The CapsNet is an imminent architecture that could be applied with a broader range of applications associated with the object detection, image segmentation and classification [19]. In various real-time applications, several assisted services are provided by the human–robot interaction (HRI) [20]. Indian Railways do not directly utilize any technical methods of fault-finding in the past. The railways eagerly seek a system that provides reliable and technologically backed up results to improve the accuracy of fault detection. We have proposed an advanced railway track fault detection system (ARTFDS) using ML. The proposed system ensures the following results: • The inefficiency of operation of conventional method of crack finding can be overwhelmed using dedicated bot-network system. • The system provides reliable and technologically backed up results to improve accuracy of fault detection. • By employing a partially automated process, a lot of time can be saved by operating a chaotic man-hunt-based method. • This system is scalable to other equally important application in transport and exploration industry.
ARTFDS–Advanced Railway Track Fault Detection …
613
3 Proposed System The main designing and modelling process involves a hardware relay chassis on the bot with the facilities of an ultrasonic sensor array that helps to detect any faults and also the depth of railway track faults. The detected faults must be notified to the authorities using a global positioning system (GPS) with latitude and longitude coordinates on Google Maps, which can be communicated via a phone message based on sim communication over GSM. An app deploys an advanced machine learning model that has image classification features to categorize the faults into different severities and ascertain the magnitude of the damage. The bot is complemented with a simple and smooth Android application interface that helps to keep track of prior damage and helps in better guidance of already existing faults.
3.1 Android Application Interface The Android application developed on Android Studio, which runs on most modern android mobile devices, is assisted with cloud-based Firebase database for back-end; developed for the bot operator to journal entries and keep a record of faults in the tracks. The application can also be accessed by an authority designated to solve the fault in the track. The application allows authorities to work in tandem with bot operators to find, inform and solve faults in the tracks. The cloud-based database ensures large data capacity on top of secure and easily accessible data to all devices without any preference for pre-existing software. The bot operator can enter records and view prior records along with the exact location of the faults provided by the GPS on a mobile device and accessible through Google Maps. The application also provides an intermediate magnitude of the fault based on the data provided by the operator. Once a fault has been fixed, the operator can inform the authorities of the status with the given option available on the application. The bot deploys an advanced machine learning model that has image processing features to categorize the defects into different severities and ascertain the magnitude of the damage. The Android application work flow and bot operation are shown in Fig. 1.
3.2 Image Classification Platform The system uses industry-accepted and tested technologies such as machine learning (ML), image classification (IC) and computer vision in developing a long-standing reliable system. The model requires the usage of a camera from the mobile device that runs the application.
614
L. Kondaka et al.
Fig. 1 Flowchart of Android application work flow and bot operation
The bot and Android application interface is further aided in the process of finding accurate results by employing a machine learning model to classify faults. The model uses three distinct datasets to help in the accurate classification of the fault. The model is deployed through an application interface and is accessible directly through a mobile device and runs on any Android platform. The bot operator can use the model for detailed analysis, and this process helps in increasing the overall accuracy of the system.
3.3 Pre-processing and Cleaning Data Initial development of the model involved searching and developing datasets for preliminary classification. The final datasets were selected through a curated procedure of using JPEG format images with a size under 400 KB. The datasets correspond to three distinct feature-based selection criteria. The first dataset selected is a set of random images in a mix of everyday objects and surfaces. This dataset helps to decrease the chance of errors and vastly lower the probability of the bot mistaking random objects as faults. The second dataset selected is a set of plane surfaces of various colours, textures and gradients. This dataset helps in the training stage of the model and to increase the accuracy of spotting a fault.
ARTFDS–Advanced Railway Track Fault Detection …
615
The third dataset is the primary dataset used to train and test the model to precisely predict a fault and provides an exact prediction of faults. This dataset is the most crucial and brings the most accurate to the prediction. The dataset also helps in comparison of faults of various natures. This dataset is curated with the most attention. The datasets are assembled, processed and cleaned in a way that ensures maximum precision of the model. The overall development of the model is dynamic, and constant rectifications help in increasing precision.
3.4 Testing and Training Model After processing and cleaning the records in the datasets for which the parameters had missing or erroneous values, the final total of datasets contains 310 records belonging to three distinct parameters. The first dataset contains 70 random images, the second dataset contains images of 90 plane surfaces, and the third dataset contains 150 images of faults. In totality, these 310 records help in the complete development of the model. The training phase involves feeding the images to an algorithm with three different classes and asserting the differences between the three datasets. The testing phase of the dataset involves running the trained model through a library and getting initial results to classify the images. This phase also results in lowering the probability of the wrong classification by the model. The final testing and training phases are performed after rigorous trial phases, and a standard model is developed for use.
3.5 Classification and Prediction Classification is the step of assigning class labels to a record. This record to class assignment is done depending upon the similarity that this record has with other records of that class. The system is developed by following two-step model. The first step is the division of data into a training set used for predicting relations and testing set used for assessing the strength and accuracy of the relations predicted. In the next step, the training set is used to build the classifier model and the testing set is used to validate the model built. This step uses the “supervised learning” machine learning image classification model. The classification is assisted with Keras library and uses TensorFlow library for training and influence. Both Keras and TensorFlow are open-source, dynamic widely used. The prediction model is set to provide the highest accuracy possible. The model classifies a user-fed record into two classifications: “Fault” or “Not a Fault”. The accuracy achieved for both the models is application approximately around 90%. The use of supervised learning helps us to maintain steady results in accuracy and minimizes variation.
616
L. Kondaka et al.
3.6 Hardware Synthesis The main designing and modelling process involves designing a hardware relay bot with the facilities of an ultrasonic sensor array that help detect any faults and also the depth of these faults. The bot is developed as an automated mobile assembly that moves upon a railway track in a forward and backward direction. The bot then uses a range of ultrasonic sensors to detect faults/faults/faults in the railway tracks. We make use of an inbuilt Arduino library “sonar ping” which is a multipurpose single beam echosounder. Upon finding a fault, it communicates using a SIM-based messaging over a GSM module providing a latitude and longitude coordinate using a GPS module. This coordinate is accessible via Google Maps on any mobile device. The bot is powered by two 9-V batteries than run a twin-motor rear-wheel setup; hence, the bot conserves energy and has the ability for forward–backward movement. The motor-battery relay is connected via a switch. The switch, motors, and individual modules (GPS and GSM) are all centrally connected to the Arduino UNO R3 microcontroller. The chassis is rust-free metal and solderless pluggable breadboard combination which enables to scale of the bot for further extensive uses.
3.7 Integration of Hardware and Software The hardware relay is interconnected and is developed on the Arduino editor which uses C++ as the base programming language and has specialized methods and functions. The hardware and software are to be used in synchronization, where the bot operator needs the training to use the bot to reach its maximum potential. The bots are to be used in a synchronized manner, and the distribution of the bots is to be such that they cover the maximum length of tracks while also keeping in mind the time, effort and energy required. The application interface adds value to the entire process as it provides the ability to record entries and increase the accuracy of the bot. The bot and the application provide the results with maximum efficiency when operated in conjunction.
4 Experimental Results Upon exhaustive research and development, the proposed system demonstrates a satisfactory result. The system delivers results that can be scaled to wide-spread applications in railways as well as other domains. The technology stack employed is up-to-date and can be dynamically updated with advances in technology. Moreover,
ARTFDS–Advanced Railway Track Fault Detection … Table 1 Bot trials with prototype
Use cases
Fault-finding
617
Parameters of trials Number of trials
Recorded values
100
NA
95
3 cm
90
Recorded depth 100
Accuracy (%)
the development of a bot fulfils the need for an easy-to-use, compact and affordable solution for a long-standing problem. The developed system resolves the need for a permanent fault-finding bot that also minimizes the chances of human error and can be used without any obstacles to its operation. The bot also enables the railways to save time, energy and effort which further increases the efficiency of operation, traffic handling and transportation framework that it runs. The proposed system provides consistent results and has been back tested over a prototype track for improvement of results. The results obtained after conducting trial runs show consistency in finding and reporting the faults. The application is secure and diagnostics point towards very rare instances of data loss or crash. Moreover, the recorded data is always backed up on the Firebase cloud database. We have demonstrated empirical evidence observed from trials of the bot conducted over a prototype railway track in Table 1. The prototype railway track has been prepared from wood with a 3 cm cut-out resembling a fault. We have presented the accuracy of the image classification model in term of the percentage probability of correctly predicting whether a surface has a fault or not in Table 2. The tables present experimental evidence observed from trial runs for both the bot and the application. The images of the bot with live connections and the prototype railway track are shown in Fig. 2. These images clearly show the chassis, hardware relay, sensors, battery-switch setup and rear-wheel twin-motor setup. The images of the Android application interface and the image classification application interface are shown in Figs. 3, 4, 5, 6, 7, 8, 9 and 10. The images show the user-interface of the app and also demonstrate the usage and accuracy of the image classification application. Table 2 Image classification model trials
Use cases
Parameters of trials Number of trialsa
Accuracy (%)
Fault
100
95
Not a Fault
100
95
a The trials are conducted on different surfaces with a mobile device
camera
618
Fig. 2 Microcontroller connected with sensor and other components
Fig. 3 Primary dataset consisting of images of faults Fig. 4 Image showing message sent to mobile device
L. Kondaka et al.
ARTFDS–Advanced Railway Track Fault Detection …
619
Fig. 5 Image showing dashboard of the android app
5 Future Work A fault detection system helps save lives by predicting vulnerable railway tracks, and hence, mitigation, as well as evacuation measures, can be deployed properly. The system also assists in the analysis of the magnitude of damage and severity of the damage. Software advancement aids in keeping records of any damage that has happened or needs attention. Moreover, economic losses can be prevented, i.e. freight, goods and constructions losses can be avoided. Provision of predicting expiry of tracks, wear and tear, etc. is extended innovative applications. Faults usually develop due to wear and tear; with an accurate understanding of data, we can predict a pattern of wear and tear and hence guide proper maintenance of railway tracks.
620 Fig. 6 Image demonstrating record entry in Android app with photograph and location. The severity is also provided based on user-entered data from ultrasonic sensor
Fig. 7 Image showing app notification on mobile device
L. Kondaka et al.
ARTFDS–Advanced Railway Track Fault Detection … Fig. 8 Image showing classification on ML model
Fig. 9 Image showing details of fault image
621
622
L. Kondaka et al.
Fig. 10 Image showing fault location on Google Maps
The future extensions of the bot can be equipped for various scaled-out applications in different domains such as construction, core engineering, architecture and transport.
6 Conclusion Thus, we have fulfilled the primary objective of the proposed system which is to build an error-less system of fault detection. Railways are one of the important transports in India. Safety of these services is of utmost priority as train accidents or derailments can lead to direct and indirect disruption in various other industries. Through a system of guided and manoeuvred bots, the aim of providing ease of railway track damage detection is an apt solution for the problem. On manufacturing with enough volume and scale, the cost of the bot can be controlled in proportion to its functioning life. The solution also showcases industrial purposes of remotecontrol systems and navigation systems, and hence, such a synchronized system is
ARTFDS–Advanced Railway Track Fault Detection …
623
the need of the hour. Moreover, a dedicated model can help extend the applications of the bots to different industries.
References 1. Molodova M, Li Z, Núñez A, Dollevoet R (2014) Automatic detection of squats in railway infrastructure. IEEE Trans Intell Transp Syst 15(5):1980–1990 2. Mahfuz N, Dhali OA, Ahmed S, Nigar M (2017) Autonomous railway fault detector robot for Bangladesh: SCANOBOT. In: 2017 IEEE region 10 humanitarian technology conference (R10-HTC), Dhaka, pp 524–527 3. Lad P, Pawar M (2016) Evolution of railway track fault detection system. In: 2016 2nd IEEE ınternational symposium on robotics and manufacturing automation (ROMA), Ipoh, pp 1–6 4. Jha CK, Singh SK, Sainath TT, Sumanth S (2017) Railway track fault detection vehicle. IJESC 7(4) 5. Nisthul G, George L, Varghese N, Jose S, John N, Nandhumon KR (2017) Automatic railway track fault detection system. New Arch Int J Contemp Arch 3(4). ISSN: 2454-1362 6. George R, Jose D, Gokul TG, Sunil K, Varun AG (2015) Automatic broken track detection using IR transmitter and receiver. Int J Adv Res Electr Electron Instr Eng 4(4). ISSN: 2320-3765 7. Muralidharan V, Dinesh V, Manikandan P (2015) An enhanced fault detection system for railway track. IJETT 21 8. Khatawkar N, Bhat D, Kadli N, Veergoudar D, Doddmani S (2015) An ınspection system for detection of cracks on the railway track using a mobile robot. Int J Eng Res Technol (IJERT) 4(5). https://doi.org/10.17577/IJERTV4IS050513 9. Rajagopal M, Balasubramanian M, Palanivel S (2018) An efficient framework to detect cracks in rail tracks using neural network classifier. Computación y Sistemas 22(3):943–952 10. Thendral R, Ranjeeth A (2021) Computer vision system for railway track crack detection using deep learning neural network. In: 2021 3rd International conference on signal processing and communication (ICPSC) 11. Yong S, Limeng C, Zhiquan Q, Fan M, Zhensong C (2016) Automatic road crack detection using random structured forests. IEEE Trans Intell Transp Syst 17(12):3434–3445 12. La HM, Gucunski N, Dana K, Kee S-H (2017) Development of an autonomous bridge deck inspection robotic system. J Field Rob 34(8):1489–1504 13. Chadil N, Russameesawang A, Keeratiwintakorn P (2008) Real-time tracking management system using GPS, GPRS and Google earth. In: 2008 5th International conference on electrical engineering/electronics, computer, telecommunications and ınformation technology, pp 393– 396 14. Chenariyan Nakhaee M, Hiemstra D, Stoelinga M, van Noort M (2019) The recent applications of machine learning in rail track maintenance: a survey. In: Collart-Dutilleul S, Lecomte T, Romanovsky A (eds) Reliability, safety, and security of railway systems. modelling, analysis, verification, and certification. RSSRail 2019. Lecture Notes in Computer Science, vol 11495. Springer 15. Yang C, Sun Y, Ladubec C, Liu Y (2021) Developing machine learning based models for railway inspection. Appl Sci 11:13 16. Sadeghi J, Askarinejad H (2012) Application of neural networks in evaluation of railway track quality condition. J Mech Sci Technol 26:113–122 17. Karuppusamy P (2021) Building detection using two-layered novel convolutional neural networks. J Soft Comput Paradigm (JSCP) 3(01):29–37 18. Sharma R, Sungheetha A (2021) An efficient dimension reduction based fusion of CNN and SVM model for detection of abnormal incident in video surveillance. J Soft Comput Paradigm (JSCP) 3(02):55–69
624
L. Kondaka et al.
19. Vijayakumar T (2019) Classification of brain cancer type using machine learning. J Artif Intell Capsule Netw 1(2):105–113 20. Sungheetha A, Sharma R (2021) 3D image processing using machine learning based input processing for man-machine interaction. J Innov Image Process (JIIP) 3(01):1–6
Recommendation System for Agriculture Using Machine Learning and Deep Learning K. SuriyaKrishnaan, L. Charan Kumar, and R. Vignesh
Abstract In India, the largest source of subsistence is agriculture and its federated sectors. In rural regions, there are about 82% of small and marginal farmers, and 70% of rural households depend primarily on agriculture only. Also agriculture plays a significant role in the Indian economy, thereby contributing 17% of India’s GDP. Picking the right crop for the land, cultivating it and obtaining a prosperous yield with the right fertilizer is a great challenge. The proposed system recommends the suitable crops for the lands with varied soil nutrients. The appropriate fertilizers that are suitable for specific soil nutrient and crop sown are also recommended. Plant physiology can be damaged due to fungal, viral or bacterial diseases. Plants affected from the above pathogens are detected. Random forest classifier gives an accuracy of 98% for recommendation system, and PyTorch neural network gives an accuracy of 99.2% for disease prediction. Keywords Crop recommendation · Fertilizer recommendation · Disease prediction · Random forest classifier · PyTorch neural network
1 Introduction Rigorous cultivation and improper nutrient replenishment have affected soil fertility. Different crops need different types of soil and climatic conditions along with essential irrigation facilities. In order to yield optimized agri outcome, situational considerations like selection of appropriate crops for the appropriate land with land availability factor and proper utilization of fertilizers are significant. Misutilization of fertilizers causes harmful consequences to humans. Food lifestyle is affected due to intense usage of inorganic fertilizers. Haemoglobin disorders, Alzheimer’s disease, diabetes mellitus and other stomach-related diseases along with many nutrient deficiency K. SuriyaKrishnaan · L. C. Kumar (B) · R. Vignesh Sona College of Technology, Salem, Tamil Nadu, India e-mail: [email protected] R. Vignesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_42
625
626
K. SuriyaKrishnaan et al.
are caused. Also some chronic nerve-related diseases are caused due to synthetic fertilizers. In order to assist farmers for improving the crop yield, an application is proposed which in turn helps in recommending crops, fertilizers with associated disease prediction. Nowadays, there is no system that recommends crop, fertilizer and its disease prediction as a whole. The paper proposes a system which recommends suitable crop and fertilizer for the specific region based on the available climatic factors. The dataset for the above recommendation system consists of NPK content in soil and other datasets of climate and rainfall. The system also proposes a disease prediction system in order to find out the diseases affecting the crops thereby providing suitable fertilizer to cure the crop disease. The dataset for disease prediction contains approximately 87 K rgb photographs of healthy and ill crop leaves that are divided into 38 classifications. The recommendation system for crop and fertilizer uses random forest classifier algorithm. The disease prediction system uses PyTorch neural network. This system completely recommends the need for the farmers based on climatic conditions and soil nutrient contents.
2 Literature Review In this literature review, several studies have been carried out on past designed recommendation systems and crop prediction systems. Different agricultural parameters are considered in order to yield optimized agri outcome. A study on crop recommendation led to case studies on vegetation mapping, and the aim of the proceeding is to maintain sustainability in agriculture thereby allocating optimal resources and to reduce inorganic fertilizers [1]. In paper [2], three steps are described: soil classification, crop yield prediction, fertilizer recommendation. SVM is used to predict the crop yield and fertilizer recommendation. Random forest is used for soil classification. Crops grown for the specific region are selected. This existing system has an accuracy of 97% and includes only crop and fertilizer recommendations; however, the proposed system includes an additional feature, disease prediction with an accuracy of 99%. Paper [3] is referenced for disease prediction where AlexNet and GoogLeNet were used and obtained an accuracy of 99%. The aim of the above proceeding is to classify both crop species and the diseases present even if the model has not seen the disease on the image before training the model using deep convolutional neural networks. Two popular architectures are focussed, namely AlexNet and GoogLeNet, which were designed in the context of the ‘large-scale visual recognition challenge’ for the ImageNet dataset, whereas in the proposed system, a large dataset is used for disease prediction with 38 classes. Paper [4] proposes the ensemble modelling using random tree, K-nearest neighbour, CHAID and Naive Bayes. The main goal is to reduce the wrong choice of crop for the field. In paper [5], recommendation of specific crop and fertilizer for specific soil is recommended using pH, temperature, moisture and light data from the sensors which is further analysed in ThingsSpeak cloud from MySql database. The data in the database is searched using the Fibonacci algorithm.
Recommendation System for Agriculture …
627
With these live data, the specific crop and fertilizer are recommended to farmers. Implementation of present neural network for disease prediction is referenced using a blog [6] which shows step by step process for training. To increase the productivity of maize crops, [7] is referred for analysing the nutrients required for maize and pea crops and thereby [8] recommending fertilizer for maize and pea crops is studied. Paper [9] only focusses on medium highland and highland region’s soil nutrients and crop production, where clustering analysis and a generalized linear model are used. The associated works for the crop recommendation system includes [10], which uses random forest algorithm for predicting suitable crops and recommends suitable fertilizers based on the type of soil in Maharashtra state’s lands. In paper [11], the paddy crop production is analysed using KNN and SVM. Hence, from the analysis, it is observed that KNN provides better accuracy than SVM. Crop selection using supervised machine learning algorithms and geographic information system (GIS) is used in [12], where crop selection and prediction for specific land is predicted with historical data using artificial intelligence. Shanthi et al. [13] referred for solving the classification problem.
3 Proposed Methodology The goal of the proposed system is to provide better insights for farmers for choosing the right crop and fertilizer for their lands to yield a better outcome. The proposed system is integrated with three modules, namely (1)
(2)
(3)
Crop recommendation: Crop recommendation is done using random forest classifier which is a machine learning algorithm. Dataset comprising various soil and weather parameters is collected, thereby perfect crop for farming is recommended. Fertilizer recommendation: Fertilizers are recommended based on the nutrient content in the soil and the crop that is sown. Random forest classifier is used for fertilizer recommendation, where dataset with weather and soil parameters is collected and trained. Disease Prediction: Disease prediction is done using convolution neural networks, where images of diseased crops are trained. There are about 38 different crops that are trained using PyTorch neural networks (Fig. 1).
3.1 Crop and Fertilizer Recommendation System The crop recommendation is done using the random forest algorithm which is a supervised learning algorithm. The bagging concept covers the idea of combining the learning models which in turn increases the overall performance rate. Hence, the random forest works by building a collection of decision trees and integrating them together in order to make stable and accurate predictions. Crops are recommended
628
K. SuriyaKrishnaan et al.
Fig. 1 Block diagram for proposed system
for specific soil based on nitrogen, potassium, phosphorus, humidity, rainfall and pH. The environmental parameters for specific location are obtained from openweathermap API which is an open source. The random forest classifier algorithm with 20 estimators recommends the crop with 98% accuracy. This will help farmers to cultivate the crop. Due to factors like soil erosion, unpredicted climatic conditions, crop cultivation is becoming very complex, so farmers are losing their money. Hence, the proposed system helps them in yielding maximum outcome. The fertilizer recommendation is also done using random forest classifier algorithm. Soil fertility is analysed with laboratory, and the results of the laboratory will provide us insights about the nutrient contents, moisture content and dry density of soil, etc. Since specific fertilizer usage cannot be known from the laboratory results, this output is given as input in the application and suitable organic fertilizer suggestions are given to the users. Farmers are also notified about the deficient soil nutrient. This will help farmers to cultivate the crop with the best fertilizer. The recommendation is done based on NPK values. If the value of N is high, green manure, coffee grinds, peas, beans and soybeans are the best option to farm. If the N value is low, tomatoes, corn, broccoli, cabbage and spinach are examples of plants that thrive off nitrogen and will suck the nitrogen dry. If the value of potassium is high, stop applying potassium-rich commercial fertilizer. Apply only commercial fertilizer that has a ‘0’ in the final number field. Commercial fertilizers use a three number system for measuring levels of nitrogen, phosphorus and potassium. The last number stands for potassium. Another option is to stop using commercial fertilizers altogether and to begin using only organic matter to enrich the soil. The research for nitrogen utilizing and less utilizing plants is referred to in many resources. The impact of fertilizer around the world is referenced using [14]. Data Collection. The following Table 1 shows the sample dataset (input data) of the crop and fertilizer recommendation model which is built using the soil nutrient
Recommendation System for Agriculture …
629
Table 1 Sample dataset table (input data) N
P
K
Temperature
Humidity
pH
Rainfall
Label
0
90
42
43
20.879744
82.002744
6.502985
202.935536
rice
1
85
58
41
21.770462
80.319644
7.038096
226.655537
rice
2
60
55
44
23.004459
82.320763
7.840207
263.964248
rice
3
74
35
40
26.491096
80.158363
6.980401
242.864034
rice
content like nitrogen, phosphorus, potassium and environmental conditions like temperature, humidity, rainfall for the weather conditions that can be fetched from an API called openweathermap based on state and city. Openweathermap is an open source API for fetching weather conditions for a particular location. Below code is for dataset collection and reading. Training the model. After collecting the data, the data is preprocessed and the model is trained with 70% of data with different algorithms, whereas each algorithm gives different accuracies and errors. Finally, it is observed that random forest gives better accuracy based on the induced rules for recommendation. Random Forest Classifier. Random forest classifier uses a bagging technique where each and every estimator gives an output, and finally, a decision is made by the algorithm. Table 2 shows the precision, recall, F1-score are calculated for the algorithm for each parameter. Testing the model. After training, the model is tested with the remaining 30% of data. After testing, validation and accuracy are concluded. Confusion matrix. The classification model’s performance is described using a confusion matrix. The model’s performance is determined by the m*m matrix, where M is the number of target classes. The matrix compares the actual goal values to the machine learning model’s predictions. The rows represent the target parameter’s predicted values. The performance of the parameter in the model (i.e. random forest model) is shown in the confusion matrix (Fig. 2).
3.2 Disease Prediction The disease prediction can be done by capturing the real-time images in the field; further, PyTorch model will predict the diseases associated. The plant disease is classified using ResNet-9 which is a PyTorch neural network. Dataset Collection. The dataset for disease prediction is collected from kaggle, and it consists of 38 different diseases of 14 crops. The dataset is split into two parts: 80% for training and 20% for testing. Later, for prediction purposes, a new directory containing 33 test photographs is established. Before preprocessing, the dataset is
630
K. SuriyaKrishnaan et al.
Table 2 Classification report (mathematical model) Apple
Precision
Recall
F1-score
Support
1.00
1.00
1.00
13
Banana
1.00
1.00
1.00
17
Blackgram
0.94
1.00
0.97
16
Chickpea
1.00
1.00
1.00
21
Coconut
1.00
1.00
1.00
21
Coffee
1.00
1.00
1.00
22
Cotton
1.00
1.00
1.00
20
Grapes
1.00
1.00
1.00
18
Jute
0.90
1.00
0.95
28
Kidneybeans
1.00
1.00
1.00
14
Lentil
1.00
1.00
1.00
23
Maize
1.00
1.00
1.00
21
Mango
1.00
1.00
1.00
26
Mothbeans
1.00
0.95
0.97
19
Mungbean
1.00
1.00
1.00
24
Muskmelon
1.00
1.00
1.00
23
Orange
1.00
1.00
1.00
29
Papaya
1.00
1.00
1.00
19
Pigeonpeas
1.00
1.00
1.00
18
Pomegranate
1.00
1.00
1.00
17
Rice
1.00
0.81
0.90
16
Watermelon
1.00
1.00
1.00
15
0.99
440
Macro avg
0.99
0.99
0.99
440
Weighted avg
0.99
0.99
0.99
440
Accuracy
Fig. 2 Confusion matrix
Recommendation System for Agriculture …
631
supplemented with data from the original dataset. Below code is for the dataset collection and reading (Fig. 3). Preprocessing. The plant leaf images are preprocessed into (3, 256, 256) pixels, and other filters are applied for training the images. Before training, the images are separated into batch size of 32. ResNet Architecture. The basic principle behind residual networks is that a deeper network may be created from a shallow network by copying weights from the shallow network and setting identity mapping for other layers in the deeper network. According to this formula, the deeper model should not have a bigger training error than the shallow model. A simple residual block is shown in Fig. 4. Training. The 80% of the data is trained using PyTorch neural network in which ResNet architecture. Detailed explanation of training steps and validation processes is explained below.
Fig. 3 Training dataset
Fig. 4 ResNet architecture
632
K. SuriyaKrishnaan et al.
training_step. As the name suggests, the training steps happen in the training_step function. It is used as an accuracy metric to detect the wrongness of the model and also it helps in enhancing the model throughout training [11]. validation_step. Just because an accuracy metric cannot be used during model training does not mean it should not be applied! In this scenario, accuracy would be determined by a threshold and counted if the difference between the model’s prediction and the actual label was less than that threshold. validation_epoch_end. Every time this work tracks the validation losses /accuracies and train losses after each epoch, this work ensures that the gradient is not recorded. epoch_end. After each epoch, a learning rate scheduler (which changes the learning rate after each batch of training) is used to publish validation losses /accuracies, train losses and learning rate as well. An accuracy function is also defined, which computes the model’s overall accuracy on an entire batch of outputs and can be used as a measure in fit one cycle. A utility function is defined before training the model, as is an evaluate function, which will do the validation step, and a fit one cycle function, which will complete the full training procedure. Because the weights are randomly assigned, the precision is close to 0.019. (i.e. 1.9% chance of getting the right answer or you can say the model randomly chooses a class). Declare some hyperparameters for the model’s training now. If the findings are not satisfactory, they can be altered (Figs. 5 and 6). Based on the number of epochs, the validation accuracy is calculated. From the dataset, we got an accuracy of 99.2%. ResNets perform significantly well for image classification when some of the parameters are tweaked and techniques like scheduling learning rate, gradient clipping and weight decay are applied. The model is able to predict every image in the test set perfectly without any errors. The disease cure recommendation is done for each and every disease, and instructions to prevent those diseases are also suggested. For example: Apple-scab-disease development is favoured by wet, cool weather. Cures are suggested like, rake under Fig. 5 Accuracy versus epoch
Recommendation System for Agriculture …
633
Fig. 6 Loss versus epoch
trees, destroy the infected leaves to reduce the number of fungal spores available to start the disease cycle over again next spring. Avoid overhead irrigation. Spread a 3- to 6-inch layer of compost under trees, keeping it away from the trunk, to cover soil and prevent splash dispersal of the fungal spores. Article [8] is referred for the remedies of the plant diseases.
4 Results and Discussion The training for the recommendation system was done and compared with all the algorithms. Finally, more accuracy is obtained for random forest classifier. Seventy percentage of the dataset is used for training the model, and 30% of the datasets are used for testing the model. Only a smaller number of class samples of diseased plants are used. Training process is much optimized, to save computation time and maintain an overall good performance at the same time. Figure 7 shows the accuracy comparison of different algorithms for crop and fertilizer recommendation systems. When compared to other algorithms, the random forest with 20 estimators gives higher accuracy because the precision and recall values for other algorithms are not accurate. In random forest, the precision and recall values are accurate. Error for random forest is very very less, nearly equal to zero. So, finally random forest is chosen. For the disease prediction, the accuracy is 99% and classified with 38 different diseases which is better than the existing system.
634
K. SuriyaKrishnaan et al.
Fig. 7 Accuracy comparison
5 Application Development A web application for the proposed system is developed using flask, a small lightweight framework written in Python which works by requests and routing. The frontend part is designed using HTML, CSS, JavaScript and Bootstrap. The frontend and backend are linked using the flask library in Python.
6 Conclusion The overall recommendation system and disease prediction model are a wholesome product for farmers with user-friendly GUI. The soil and environmental data from openweathermap API and data sources help in accurate prediction and suggestion of crops and fertilizers. The disease prediction model is created with PyTorch neural network, and finally, the three models are embedded as a single web application. This plant disease classifier can also be made into an integrated web application which helps people to know the plant disease in their farm by simply providing the image of the diseased plant as input to the web application and disease name is obtained as output. Further, the proposed system can also be embedded with crop monitoring and growth tracking systems to track down the periodical growth of the crop. Many classes of plant disease can also be used for predicting more diseases accurately. Acknowledgements We would like to thank our mentor Dr. Suriya Krishnaan for his continuous support and for motivating us to improve our idea into a product. Study of journal papers and various documentations filled us with immense knowledge.
Recommendation System for Agriculture …
635
References 1. Suh S-H, Jhang J-E, Won K, Shin S-Y, Sung CO (2018) Development of vegetation mapping with deep convolutional neural network. In: Proceedings of the 2018 conference on research in adaptive and convergent systems 2. Bondre DA (2019) Prediction of crop yield and fertilizer recommendation system using machine learning algorithms. Int J Eng Appl Sci Technol 4(5). ISSN No. 2455-2143 3. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease detection. J Front Plant Sci 4. Puthumalar S, Ramanujam E, Harine Rajashree R, Kavya C, Kiruthika T, Nisha J (2016) Crop recommendation system for precision agriculture. In: IEEE eighth international conference on advanced computing 5. Aswathy S, Manimegalai S, Maria Rashmi Fernando M, Frank Vijay J (2018) Smart soil testing. Int J Innov Res Sci Eng Technol 7(4) 6. Towards Data Science, An overview of ResNet and its variants. https://towardsdatascience. com/an-overview-of-resnet-and-its-variants-5281e2f56035 7. Shakya S (2021) Analysis of soil nutrients based on potential productivity tests with balanced minerals for maize-chickpea crop. J Electron 3(01):23–35 8. Cure for different plant diseases is referred. www.finegardening.com/articles/35-pest-diseaseremedies 9. Afrin S, Khan AT, Mahia M, Ahsan R, Mishal MR, Ahmed W, Rahman RM (2018) Analysis of soil properties and climatic data to predict crop yields and cluster different agricultural regions of Bangladesh. J IEEE 10. Chougule A, Jha VK, Mukhopadhyay D (2019) Crop suitability and fertilizers recommendation using data mining techniques. J Springer Nature Singapore Pte Ltd 11. Bhambri P, Dhanoa IS, Sinha VK, Kaur J (2020) Paddy crop production analysis based on SVM and KNN classifier. Int J Recent Technol Eng 8(5) 12. Tamsekar P, Deshmukh N, Bhalchandra P, Kulkarni G, Hambarde K, Husen S (2019) Comparative analysis of supervised machine learning algorithms for GIS-based crop selection prediction model. J Springer Nature Singapore Pte Ltd 13. Shanthi T, Sabeenian RS, Modified Alexnet architecture for classification of diabetic retinopathy images. Comput Electr Eng 76:56–64. https://doi.org/10.1016/j.compeleceng. 2019.03.004 (ECE), Cited – 41 14. Use and impact of fertilizer around the world. https://ourworldindata.org/fertilizers
A Machine Learning Approach for Daily Temperature Prediction Using Big Data Usha Divakarla, K. Chandrasekaran, K. Hemanth Kumar Reddy, R. Vikram Reddy, and Manjula Gururaj
Abstract Due to global warming, weather forecasting becomes complex problem which is affected by a lot of factors like temperature, wind speed, humidity, year, month, day, etc. weather prediction depends on historical data and computational power to analyze. Weather prediction helps us in many ways like in astronomy, agriculture, predicting tsunamis, drought, etc. this helps us to be prepared in advance for any kinds disasters. With rapid development in computational power of high end machines and availability of enormous data weather prediction becomes more and more popular. But handling such huge data becomes an issue for real time prediction. In this paper, we introduced the machine learning-based prediction approach in Hadoop clusters. The extensive use of map-reduce function helps us distribute the big data into different clusters as it is designed to scale up from single servers to thousands of machines, each offering local computation and storage. An ensemble distributed machine learning algorithms are employed to predict the daily temperature. The experimental results of proposed model outperform than the techniques available in literature. Keywords Hadoop · Random forest · Support vector machine · Multilayer perceptron · Ensemble model · Weather forecast
U. Divakarla (B) · K. H. K. Reddy · R. V. Reddy · M. Gururaj NMAM Institute of Technology, Nitte, Karkala, Karnataka, India e-mail: [email protected] K. H. K. Reddy e-mail: [email protected] R. V. Reddy e-mail: [email protected] M. Gururaj e-mail: [email protected] K. Chandrasekaran National Institute of Technology Karnataka, Surathkal, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_43
637
638
U. Divakarla et al.
1 Introduction Weather forecasting is process of analyzing the weather data and predicts the action of the atmosphere which is very useful for sectors like farming, disaster management, etc. Metrology department plays a major role in forecasting the weather and predicting major disasters. A number of techniques exist in literature, but there are several limitations in implementation of these forecasting techniques for example short term prediction is not effective in data mining techniques. Due to its dynamic nature it is quite difficult to predict weather report daily basis and from last two decades it has been observed that weather forecasting approaches has been used extensively because of climate change. Examples like rising sea levels, less rainfall, increase in humidity are observed due to climate change. If we can predict these in advance then we can improve our environment quality. Artificial intelligence plays an important role in predicting in real time. Dynamic integrated forecast system is the one of the earliest forecasting approach proposed using artificial intelligent [1]. Even after that it was quite challenging task for researcher to find an effective prediction approach for real time dynamic environments, the most important reason is to understand the all factors that influences the prediction rather than the prediction technique [2]. In [2], different prediction techniques ware proposed to deal with different types of dataset like predictive and descriptive. In [1], Moosavi et al. proposed a machine learning prediction models like random forests artificial neural networks to study the uncertainty in numerical weather prediction. As per Holmstrom et al. in [3], most of the forecasting models proposed as physical models which are not stable to perturbation and due to this reason, most of proposed predication approaches are not accurate with respect to big dataset. After realizing fact for inaccuracy reasons, Holmstrom et al. proposed a linear regression model and its variation models for predicting maximum and minimum temperature for short period with a better performance. These types of short term prediction models are very useful weather forecasting for airlines. In [4], supervised machine learning algorithms and data mining techniques are used to predict the delay in airlines due to inclement weather conditions. Most of cases, imbalanced training dataset can influence the prediction results, so, in order to balance the training data, sampling techniques are used and different forecasting techniques can be used to build the prediction model for individual flights. Based on the individual prediction algorithm’s accuracy it can be predicted that a scheduled flight will be delayed or on time. The complex systems like airline, agriculture, and defense sector where enormous data need to be considered for predicting accurately. On these scenarios big data plays major role in storing the huge dataset and applying the proposed prediction algorithms. In [5], Hadoop framework was used to handle the unstructured dataset before prediction and forecasting techniques like fuzzy logic and artificial neural network investigated. In [6], Hadoop framework-based prediction techniques are applied in order to predict rainfall on huge chunk of dataset, where minimum, maximum, and average rainfall predicted in an efficient manner. Rainfall prediction is very useful for agriculture and agriculture growth is leads to economic growth
A Machine Learning Approach for Daily Temperature Prediction …
639
on any country. In [7], an “Agi algorithm” proposed that predicts the suitable and profitable crop for a particular soil, where soil information, weather information of previous years and climatic conditions were considered during prediction. Statistical linear regression and support vector machine (SVM) approaches proposed for predicting climate condition for next few days [8]. In [9], a smart farming approach is proposed using genetic algorithm which helps farmers in deciding the pre-post agriculture activities for a particular area. These pre-post activities are predicted based on the soil condition, weather information. In [10], Want et al. proposed an effective perturbation algorithm for addressing the privacy of the gathered data which uses optimal geometric transformation. In [11], Valanarasu et al. presented a ML-based prediction model for predicting the personality of job application. It uses dynamic multi-context information and information of multiple platforms such as Facebook, Twitter, and YouTube for prediction. The forecasting techniques discussed in literature are aligned to specific scenarios, and all these approaches either short term or long term using Hadoop framework for predicting, whereas the objective of this work is to predict the weather daily basis, and this article proposed an ensemble prediction model for forecasting daily basis weather using Hadoop framework and its prediction accuracy outperformance than the techniques in the literature. The rest of the paper is organized as follows. Section 2 presents the proposed forecasting model, implementation details, and data preprocessing presented in Sect. 3, and result discussion in Sect. 4. Finally proposed model performance comparison with others and concluded in Sect. 5.
2 ML-Based Ensemble Forecasting Model (MLEF) The aforesaid proposed model uses the machine learning algorithms for weather prediction and Hadoop framework used for efficiently data placement. The Hadoop framework-based performance of any model also depends on distribution of data and its placement [12, 13], Moreover task and data co-allocation and cluster node profiling approach also improves the Hadoop system performance [14], whereas in case of heterogeneous distributed environments, speculative execution strategies also gives better performance [15]. In this paper, a combined forecast-based weather forecasting model is introduced that ensemble the features of different machine learning techniques like Regression, Random forest regression, SVM, and multilayer perceptron techniques (MLP) has been proposed to predict the future weather. Figure 1 shows the block diagram of machine learning-based ensemble forecasting model, where preprocessed data set supplied to each individual machine learning algorithms and generated output of these techniques are given as input to an artificial neural network for accurate prediction in all different scenarios. The details of proposed framework presented in the following subsections. The concept of the Hadoop is used for handling big data. The aforesaid mentioned weather dataset consisting of attributes like temperature, rainfall, humidity, and wind
640
U. Divakarla et al.
Fig. 1 An ensemble forecast model
speed. The data obtained from the weather forecast is continuous and huge dataset that helps the model to predict the correctly. Our data is also in distributed in nature. The obtained data is stored in the Hadoop, which uses the HDFS to store the distributed data. The obtained data contains lots of noise in it. The first process is to preprocess the data to remove the noise from it like missing values, outliers, and normalized for scaling. The next process is to conversion of the week days as binary coding with help of Hadoop that can scale up easily with its power of parallel computing. Pandas library present in Python is mainly used for data handling. The data used in this format is called data frames which in the form excel spreadsheets that contain rows and columns. The number of phases for the training of the data will depend on the data collected. Some amount of data operations is required for the machine learning applications. We used numpy library to convert the python list into array so that we can easily manipulate the data and divided the data into training and testing set and convert these sets into numpy array. After processing it feed into four techniques parallel to our proposed model.
3 Data Preparation 3.1 Data Collection To measure the efficacy of the proposed model, we used the National Climatic Data Center (NCDC) weather dataset. The United States National Climatic Data
A Machine Learning Approach for Daily Temperature Prediction …
641
Center, previously known as the National Weather Records Center, in Asheville, North Carolina, was the world’s largest active archive of weather data [16].
3.2 Data Cleaning Data cleaning is the process of cleaning the data from a dataset. In this step we will discard the features we do not want and store the features we will be using in txt file. There may also be noise in data which need to be removed. These noises can be in form of outliers, missing values, etc. this should be properly corrected and should be removed as it may degrade the performance of our model. We will be predicting the temperature of the given day. To check accuracy we have actual temperature data in the dataset. Rests of features are discarded. There are no missing values so we need not to remove any row from data.
3.3 Data Standardization Data standardization is process of standardizing the format of data according to our need. Not all features can be used directly, they need to be normalized. Different features may have different scales. This will cause problem in prediction as model will not able to learn effectively. In this process we try to normalize the data by calculating its mean and standard deviation X =
x −μ σ
(1)
The day of the week feature need to be standardized as it is string rest all are floating point numbers so we need to convert it into integer format. This can be done through one hot encoding technique. A one hot encoding is a representation of categorical variables as binary vectors. The data values taken are categorical variables. This categorical value consists of days of week, months of the year. These categorical values are converted into the numerical representation, which does not contain any arbitrary ordering. The day of the week we can say it properly depending on the numbers from 1 to 7 as Sunday to Monday. But for the machine learning we have to train it. So we change the days with the binary numbers as shown in Table 1. Hadoop framework is used for data preparation. Since Hadoop is very efficient in working on big data. It is also scalable which can be useful in prediction of weather as Weather prediction uses a lot of data which may not fit in single computer so we can use distributed computing.
642
U. Divakarla et al.
Table 1 Week feature converted to binary
Day
After one hot coding
Day
After one hot coding
Mon
0100000
Fri
0000010
Tue
0010000
Sat
0000001
Wed
0001000
Sun
1000000
Thu
0000100
3.4 Data Storage and Management Data storage is important process since we need to have access to data anytime so the data should be available at any point of time. If the server containing data goes down then there will be no access to data. To avoid this we have to store the data redundantly so that if any server goes down we can access the data from any other server. Another job is the security of the data. Only the privileged people should be able to access the data. The security is need for the integrity of data so that no one can interfere with the data. After cleaning and standardizing data we will store data in csv file. This can then be fed to our machine learning model to predict the target output.
4 Results and Analysis After training, we tested the model with test data. For testing the results the baseline for our prediction has to set. The measure has to be sensible, and it has to beat our model. If the model does not give the improvement above the baseline, then our model is the failure. The baseline prediction for our model is depends upon the historical max temperature averages. The historical maximum temperature averages serves as the baseline prediction in our scenario. In other words, our baseline is the error we obtain using the average temperature from the prior year for all days. The baseline error for our model is 5.0°. Now if our model can show lower error than 5.06 then we can say it works better than baseline model. Our model’s average estimate is off by 3.34°. In other words our model is 25% better than the baseline. The MSE score is 5.01 and ensemble model result is compared with different models like Regression, Random forest, SVM, and MLP. Table 2 depicts the mean absolute error and RMSE values of these techniques. Table 2 Error metrics Regression
Random forest
SVM
MLP
Ensemble model
Mean absolute error
3.95
3.83
4.23
3.89
3.34
Root mean squared error
5.58
5.06
5.62
5.59
5.01
A Machine Learning Approach for Daily Temperature Prediction …
643
Fig. 2 Actual temperature versus predicted: RF
Fig. 3 Actual temperature versus predicted: SVM
Figures 2, , 3 and 4 depicts the plots of predicted vs actual temperatures of Random forest regression, SVM, MLP. We can see that Random forest regression is working better than the other models. It is giving slight better results than MLP but it does give a lot better results than SVM. Since we have less data the multilayer perceptron may be limited if we have more data may be its performance increase. In Figs. 2, 3 and 4, Y-axis and X-axis depicts actual temperature and predicted temperature, respectively.
5 Conclusion and Future Work Weather prediction is one of the most important problem to solve. Since it has various applications, various fields of work can use the weather knowledge. The proposed ensemble machine learning model uses the power of Hadoop framework to analyze big data and prepare the data for weather prediction. Hadoop has power to scale.
644
U. Divakarla et al.
Fig. 4 Actual temperature versus predicted using MLP
Even if data increases, Hadoop can easily scale up itself to analyze this big data with its map-reduce function. Furthermore machine learning algorithms applied to predict the temperature. We have compared different machine learning techniques with proposed ensemble model and found that MAE and RMSE of ensemble model is better than the individual techniques. Accuracy of this model can be improved by utilizing hybrid approaches, and apply deep neural network (DNN) for advanced and accurate prediction with vast meteorological data are some of the key areas where it has room for improvement.
References 1. Moosavi A, Rao V, Sandu A (2021) Machine learning based algorithms for uncertainty quantification in numerical weather prediction models. J Comput Sci 50:101295 2. Sanhudo L, Rodrigues J, Vasconcelos Filho Ê (2021) Multivariate time series clustering and forecasting for building energy analysis: application to weather data quality control. J Build Eng 35:101996 3. Murugan Bhagavathi S, Thavasimuthu A, Murugesan A, George Rajendran CPL, Raja L, Thavasimuthu R (2021) Weather forecasting and prediction using hybrid C5. 0 machine learning algorithm. Int J Commun Syst 34(10):e4805 4. Choi S, Kim YJ, Briceno S, Mavris D (2016) Prediction of weather-induced airline delays based on machine learning algorithms. In: 2016 IEEE/AIAA 35th digital avionics systems conference (DASC). IEEE, pp 1–6 5. Pandey AK, Agrawal CP, Agrawal M (2017) A hadoop based weather prediction model for classification of weather data. In: 2017 Second international conference on electrical, computer and communication technologies (ICECCT). IEEE, pp 1–5 6. Navadia S, Yadav P, Thomas J, Shaikh S (2017) Weather prediction: a novel approach for measuring and analyzing weather data. In: 2017 International conference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC). IEEE, pp 414–417 7. Kushwaha AK, Bhattachrya S (2015) Crop yield prediction using agro algorithm in Hadoop. Int J Comput Sci Inf Technol Secur (IJCSITS) 5(2):271–274 8. Madan S, Kumar P, Rawat S, Choudhury T (2018) Analysis of weather prediction using machine learning & big data. In: 2018 International conference on advances in computing and communication engineering (ICACCE). IEEE, pp 259–264
A Machine Learning Approach for Daily Temperature Prediction …
645
9. Oury DTM, Singh A (2018) Data analysis of weather data using hadoop technology. In: Smart computing and informatics. Springer, Singapore, pp 723–730 10. Haoxiang W, Smys S (2021) Big data analysis and perturbation using data mining algorithm. J Soft Comput Paradigm (JSCP) 3(01):19–28 11. Valanarasu MR (2021) Comparative analysis for personality prediction by digital footprints in social media. J Inf Technol 3(02):77–91 12. Reddy KHK, Roy DS (2015) Dppacs: a novel data partitioning and placement aware computation scheduling scheme for data-intensive cloud applications. Comput J 59(1):64–82 13. Paik SS, Goswami RS, Roy DS, Reddy KH (2017) Intelligent data placement in heterogeneous Hadoop cluster. In: International conference on next generation computing technologies. Springer, Singapore, pp 568–579 14. Hussain MW, Reddy KHK, Roy DS (2019) Resource aware execution of speculated tasks in Hadoop with SDN. Int J Adv Sci Technol 28(13):72–84 15. Wu H, Li K, Tang Z, Zhang L (2014) A heuristic speculative execution strategy in heterogeneous distributed environments. In: 2014 Sixth international symposium on parallel architectures, algorithms and programming (PAAP). IEEE, pp 268–273 16. https://www.ncdc.noaa.gov/. Data set. Last access on 2 Oct 2021
Keyword Error Detection on Product Title Data Using Approximate Retrieval and Word2Vec Duc-Hong Pham
Abstract In the past two decades, the word error detection problem in documents has attracted the attention of many researchers and is an important problem in the field of data science and natural language processing, with many practical applications. This paper presents how to solve a problem of its class. Given a set of product titles, for each product title, we need to identify related keywords to the product brands and determine if this keyword is faulty or not. We will propose a method containing two main stages, the first fast approximation retrieval method is used to identify raw error keyword candidates, and then the most potential candidates are selected by a ranking algorithm and are verified as keyword errors or not by the word2vec model. We conduct experiments on dataset consists of 10,000 product titles, the obtained results have shown the effectiveness of the proposed method. Keywords Approximation retrieval · Word2vec · Keyword error detection · Product brand keyword · Simword · Approximate string
1 Introduction E-commerce is provided based on the Internet platform, where customers can access an online store to browse, search, and select for products. Then they can place orders for good products or suitable services from their own devices. Today this trading method has really developed and gives customers great choices every time they need to buy goods. To have a wide range of products on each e-commerce system, many categories with competitive prices, systems have allowed sellers to open virtual stores and put product information on them. However, because it is a virtual environment, controlling product information is very difficult for e-commerce Web system managers, risks can be mentioned as counterfeit goods, imitation goods, and poor quality goods can also be sold by sellers.
D.-H. Pham (B) Faculty of Information Technology, Electric Power University, Hanoi, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_44
647
648
D.-H. Pham
In order to better support the work of the floor managers, many problems have been solved such as determine quality score of product textual titles on Lazada e-commerce platform [1], identify and recognize logo [2], detect anomalies from ecommerce websites [3], Detect fraud in transactions [4], query autocompletion [5], generate textual titles of products and identify their quality score [6], product repeat detection [7], a multimodal combination model for detecting features and identify values on product documents [8], sensitive keyword detection on product text [9]. There are also other studies performed on the Web of science datasets [10]. Approximate matching [11, 12], also known as approximate retrieval that is a technique using a fast search algorithm for strings (i.e., term, word, and keyword) approximating dictionaries, has been used in many problems, such as spelling checking and corresponding error correction, fast dictionary retrieval, document linkage, and repeat detection [13], biology informatics[14], extract texts from websites [15]. Word2Vec [16] is a learning model to represent word vectors from a large dataset, which has two models as continuous bag-of-words (CBOW) and skip-gram which have been successfully and effectively applied in many tasks in the field of natural language processing. The skip-gram model is built on the assumption that context words are generated from the current word. While the CBOW model is built on the idea that the current word is predicted from context words. There are two basic types of word errors in textual documents, the first is called ‘non-word’ which does not indicate the meaning of the word, the other is called ‘real word error’ that has meaning but is used correctly in context of textual document. The problem of word error detection has been solved by many studies, such as [17] using an N-gram model, and the probability of appearing on the left and right of the word under consideration to find candidates for word errors. Based on the probability score, the proposed method has suggested word correction. Hu et al. [18] using the previously trained BERT model, they treat word errors as masked keywords and use an edit distance algorithm to find the words that are closest in meaning to the error word. Bravo-Candel et al. [19] proposed a deep learning model called Seq2seq to find word errors in textual sentences and correct them. They used rules to construct different error types in correct textual sentences. Prabhakar [20] proposed using the Wikipedia dataset and manually curated video subtitles to generate a dictionary of a language and probabilistic N-gram dictionaries. From these dictionaries, a misspelled word will be suggested with a list of candidate words for correction. Hao et al. [21] used a recurrent neural network model with nest architecture to correct spelling errors in English and built pseudo data based on a similarity of phonetics to train it.
Keyword Error Detection on Product Title …
649
In this paper, we study the problem of detecting keyword error on product title data. For each product textual title, we first apply the approximate dictionary matching algorithm [11] to identify keyword error candidates, then obtained results are ranked to select the most potential candidates. We then use word2vec model to find top close words to a context of words in textual title, and these top words will be used to check each error candidate of keyword. The rest of this paper is organized as follows: Sect. 2 presents the proposed method with the framework of keyword error detection system, each task and technique will be detailed, Sect. 3 describes our experiments and results. Some conclusions are presented in the last section.
2 The Proposed Method The e-commerce platform allows sellers to compose and post information about a wide variety of items they need to sell on the website. Due to the diversity in the number of brand keywords as well as semantics, this process can cause keyword errors during the editor process. We need to determine if there is a keyword error in each product title. Figure 1 depicts workflow in the proposed method. For a textual title, we perform two steps: (1) Keyword error candidate detection proposes an algorithm
Fig. 1 Basic workflow in the proposed method
650
D.-H. Pham
based on approximate retrieval and rank to identify keyword error candidates and (2) using word2vec model to find top words close to context of words in title, and then these top words are used to check each error candidate of keyword.
2.1 Keyword Error Candidate Detection Let D = {d1 , d2 , . . . , d|D| } be a textual title set, where each textual title is preprocessed with tasks such as normalize spaces, transforming all words into lowercase words, and segments. K = {k1 , k2 , . . . , k|K | } be a product brand keyword list. In order to detect keyword error candidates in each textual title d ∈ D, we propose an algorithm containing two stages: (1) using the approximate string retrieval algorithm [16] (we named it as is SimWord) to identify raw error candidates and (2) select most potential keyword error candidates. The necessary computational steps are shown in Algorithm 1. For the SimWord, algorithm uses the similarity threshold as α, the number of m-grams as m which is used to extract features of input textual title. The SimWord uses inputs as strings with 1 g, 2 g, 3 g, 4 g which are built from the textual title d by the function m-gram. In addition, it has been indexed by the list K . Lines 1–6 are stage 1 of the proposed algorithm, the obtained results are stored in the variable R. Many candidates of a word error can be detected from the SimWord algorithm. So we need to select the most suitable candidates possible. We observe that the word occurrence on textual title affects candidate quality of word errors. For a keyword u, whose list of variables is list_u, we need to determine which of the list_u variants are actually the best variant of u. Our general observation is that the candidates with the highest similarity to the keyword u will be considered first. Taking u1 and u2 as two candidates to be considered, then there can be three possibilities as follows: (1) If u2 contains u1 and cos(u1, u) = 1, then u2 will be removed from list_u; (2) if u1 contains u2 or u2 contains u1 and cos(u1, u) > cos(u2, u), then u2 will also be removed from list_u; (3) If u has only one candidate in list_u, then this candidate will be defaulted as the best candidate. Lines 7–29 are the stage 2 of the proposed algorithm, the obtained results are stored in the variable C. The get_candidate(u, R) function returns a list of candidates for the keyword u in R (line 9). If there is more than 1 candidate, then will be selected the best candidate. Specifically, line 10 initializes the u_remove list that stores the candidates removed from list_u. In lines 15–22, check the suitability of each candidate. Unsuitable candidates will be removed and put into u_remove, while qualified candidates will be put into set C (lines 23–28). If there is only one candidate, then there is no need to check and put it in the set C.
Keyword Error Detection on Product Title …
651
Algorithm 1: Keyword error candidate detection.
2.2 Keyword Error Detection After identifying potential keyword error candidates, we need to determine for each candidate whether it is really the keyword error or not. To do this, we assume that a candidate is an error keyword if it is not in the list of keywords but is generated
652
D.-H. Pham
from a context of the title. The CBOW architecture of word2vec can generate word presentations from word contexts, so when we have word2vec model that has been learned from large dataset and given a specific context, we will apply it to find the list of closest words to this context. In line 3 of the Algorithm 2 proposed below the function, getTopN_words is responsible for finding the list of words closest to the context in the title d. Algorithm 2: Keyword error detection.
Results identified in X will be suggested to help sellers check spelling errors when they submit documents of products online. From there, they can correct these errors.
3 Experiments 3.1 Data and Preprocessing We use the data including 10,000 product titles, which cover 1000 keywords about product brands and are collected from e-commerce platform sendo.vn, in which each textual title contains a keyword error candidate, and its label is 0 or 1, which is labeled by two annotators, 1 means that the candidate is indeed the keyword error, 0 means that the candidate is not a keyword error. This dataset is performed preprocessing as follows: use a list of standard words to remove special characters, stop words, nonsense words that do not affect the main content of the product title, and normalize uppercase words to lowercase. Then the long matching algorithm is used together with a dictionary of 40,659 bigrams and 22,327 trigrams to separate words for each textual title. Table 1 shows statistical data in our experiments.
Keyword Error Detection on Product Title … Table 1 Summary of experimental data
653
Number of product titles
10,000
Average number of words in a textual title
4
Number of product brand keywords
1000
3.2 Learning Word2vec Model Given any context extracted from a title, in order to automatically find the top words closest to the context, we need to learn a quality word2vec model. We use a dataset of 30 M products, which also collected from website sendo.vn and use the continuous bag-of-words (CBOW) architecture of word2Vec model of Gensim tool.1 The window size of context is used as 8, the word frequency threshold as 15, and the size of word vector is 50.
3.3 Experimental Result To integrate the approximation match into the keyword error candidate detection algorithm (i.e., Algorithm 1), we use the SimString2 tool with a threshold of 0.60, and the number of N-gram to extract the feature is 2, and a dictionary is indexed by 1000 keywords. Then for each product title, we run through the Algorithm 1, the achieved results are input of the Algorithm 2. Table 2 shows the 10 keywordcandidate pairs detected by the system, in which error candidates are detected from title and keywords are given. Through Table 2, we see that the keyword error candidates detected by the system which the similarity score is greater than 0.60. Most of the candidate words do not appear in the top 10 close words except for the candidates ‘luci’ and ‘kio’. These shows that although ‘luci’ and ‘kio’ are error keywords, they can still be derived from the context of the root keywords ‘caluci’ and ‘kiko’. The cause of the error for this case is because the word2vec model data is noisy, it includes titles or sentences containing keyword errors. In addition, the original keywords tend to appear in the top list of words that are close to the candidate word’s context. However, there are words that are not related to the keyword that still appear in the top 10 close words. In order to evaluate the proposed method, we re-implement the method using local word bigram and trigram in [17] for keyword error detection task, this method is named as ‘approximate matching + N-gram probabilities’ in which keyword error candidate detection task, we also use the proposed Algorithm 1. While the task of detecting keyword errors, we calculate the probability of words that close to the error keyword candidates according to the N-grams model which include left bigram, right bigram, and trigram. We then use the weighted combination score to select the 1 2
https://pypi.org/project/gensim/ https://pypi.org/project/simstring-pure/
654
D.-H. Pham
Table 2 Ten keyword-candidate pairs Product title
Keyword
Candidate
Top 10 words closest to context
ke.o nhai bô_sung omga epa dha cho ngu,`o,i_l´o,n spring
omega
omga
` công_soij, pink luccy d-âm
lucky
luccy
[‘dha’, ‘blackmores’, ‘calci’, ‘ke.o_deij o’, ‘calcium’, ‘canxi’, ‘s˜u,a_bô.t’, ‘omega’, ‘drops’, ‘omega3’] ` ‘lucky’, [‘du.,_tiê.c’, ‘d-âm_xòe’, ij ‘công_so,’, ‘eden’, ‘maxi’, ‘satin’, ij ‘dress’, ‘tiêu_thu,’, ‘lamer’]
ij
` bé_trai burbery cu.,c sang burberry áo_quân
burbery
[‘chaij nh’, ‘bé_trai’, ‘bé_gái’, ` . ’, ‘công_tuij,’, ‘ba_lô’, ˜ ‘ ‘d-ô_bô burberry’, ‘siêu’, ‘ge05’, ‘dép_quai_hâ.u’]
áo_khoác vest playe
player
playe
` ren hoa angelina d-âm
angela
angelina
´ áo_thun nam cô.c tay arizona chât ´ ` vaij i cotton thâm_hút mô_hôi
ariza
arizona
[‘khoác’, ‘áo_khoác’, ‘bomber’, ‘mangto’, ‘blazer’, ‘aristino’, ‘ player’, ‘m˘ang_tô’, ‘lamer’, ‘cardigan’] ` [‘d-âm_xòe’, ‘bèo’, ‘maxi’, ‘lamer’, ` ‘d-âm’, ‘ angela’, ‘voan’, ‘ren’, ‘xòe’, ‘babydoll’] ` [‘cotton’, ‘aristino’, ‘mô_hôi’, ´ ` ‘cô.c’, ‘thâm_hút’, ‘mêm_mát’, -ui’] ‘thun’, ‘ariza’, ‘dê.t’, ‘d˜
´ áo_len nam luci freeship cao_câp ij cô tim
caluci
luci
[‘aristino’, ‘lamer’, ‘unisex’, ‘mmoutfit’, ‘biluxury’, ‘áo_len’, ‘luci’, ‘áo_so,_mi’, ‘áo_thun’, ‘narsis’]
nu,´o,c_hoa omnia green jadore 65 ml
jade
jadore
[‘nu,´o,c_hoa’, ‘musk’, ‘intense’, ‘perfume’, ‘eau’, ‘ jade’, ‘bvlgari’, ‘homme’, ‘edt’, ‘mist’]
lindashop áo_so,_mi kio cô sen ´ ren phôi
kiko
kio
[‘lamer’, ‘bèo’, ‘mmoutfit’, ‘cô’, ` ‘d-âm_xòe’, ‘kio’, ‘rosara’, ‘ren’, ‘hoa_cúc’, ‘so,_mi’]
chuô.t không_dây logitêch m325
logitech
logitêch
[‘forter’, ‘dareu’, ‘logitech’, ‘fuhlen’, ‘chuô.t_quang’, ‘bosston’, ‘chuô.t’, ‘gaming’, ‘chuô.t_máy_tính’, ‘v181’]
ij
ij
candidate as the actual keyword error. The common metrics used for evaluation are precision, recall, and F1-score, and the results achieved by each method are given in Table 3. From Table 3, we can see that the proposed method using approximate matching + word2vec gives better results than approximate matching + N-gram probabilities on two metrics: precision and F1-score. Especially on the precision metric alone, the results have improved by 5.62%. This shows the important role of word2vec in the proposed method.
Keyword Error Detection on Product Title …
655
Table 3 Experimental results and comparison for detecting keyword errors Method
Precision
Recall
F1-score
Approximate matching + N-gram probabilities
77.99
80.30
79.13
Our proposed method
83.61
79.45
81.47
4 Conclusion In this paper, a method using approximate retrieval technique and word2vec model is proposed for detecting keyword error in each product title. Through experimental results, we show that the proposed method is effective when using word2vec and finds the top words closest to the context of error candidate. In the future, we aim at studying deep learning models (i.e., BERT models) to support the context representation that can be captured semantics and find the top close words with better context. We also aim at studying rules to capture fail cases of the SimWord algorithm and building a system of keyword error detection which can be effectively used in practice. Besides that, these results can be used in other systems, such as automatic detection of keyword variants, autocompletion for input string in search system.
References 1. Singh K, Sunder V (2017) Lazada product title quality challenge: an ensemble of deep and shallow learning to predict the quality of product titles. In: International conference on ınformation and knowledge management, AnalytiCup 2. Mudumbi T, Bian N, Zhang Y, Hazoume F (2019) An approach combined the faster RCNN and mobilenet for logo detection. J Phys Conf Ser 3. Bozbura M, Tunc H, Kusak M, Sakar C (2019) Detection of e-commerce anomalies using LSTM-recurrent neural networks. In: Proceedings of the 8th ınternational conference on data science, technology and applications, pp 217–224 4. Cao S, Yang X, Chen C, Zhou J, Li X, Qi Y (2019) TitAnt: online real-time transaction fraud detection in ant financial. Proc VLDB Endowment 12(12):2082–2093 5. Ramachandran L, Murthy U (2019) Ghosting: contextualized query auto-completion on Amazon search. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pp 1377–1378 6. Mane MR, Kedia S, Mantha A, Guo S, Achan K (2020) Product title generation for conversational systems using BERT. arXiv preprint arXiv:2007.11768 7. Hartveld A, Keulen MV, Mathol D, Noort TV, Plaatsman T, Frasincar F, Schouten K (2018) An LSH-based model-words-driven product duplicate detection method. In: International conference on advanced ınformation systems engineering, pp 409–423 8. Zhu T, Wang Y, Li H, Wu Y, He X, Zhou B (2020) Multimodal joint attribute prediction and value extraction for e-commerce product. In: Proceedings of the 2020 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 2129–2139 9. Pham D (2021) Sensitive keyword detection on textual product data: an approximate dictionary matching and context-score approach. Indian J Comput Sci Eng (IJCSE) 12(3):653–660
656
D.-H. Pham
10. Manoharan JS (2021) Capsule network algorithm for performance optimization of text classification. J Soft Comput Paradigm (JSCP) 3(1):1–9 11. Okazaki N, Tsujii J (2010) Simple and efficient algorithm for approximate dictionary matching. In: Proceedings of the 23rd ınternational conference on computational linguistics, pp 851–859 12. Cislak A, Grabowski S (2017) A practical index for approximate dictionary matching with few mismatches. In: Computıng and ınformatıcs, vol 36, no 5, pp 1088–1106 13. Manku GS, Singh G, Jain A, Sarma AD (2007) Detecting near-duplicates for web crawling. In: WWW ’07: proceedings of the 16th ınternational conference on world wide web, pp 141–150 14. Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R (2001) REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res 29(22):4633–4642 15. Manku GS, Jain A, Das Sarma A (2007) Detecting near-duplicates for web crawling. In: Proceedings of the 16th international conference on world wide web, pp 141–150 16. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of workshop at ICLR 17. Bidyut B, Chaudhuri A (2013) A simple real-word error detection and correction using local word bigram and trigram. In: Proceedings of ROCLING, pp 211–220 18. Hu Y, Jing X, Ko Y, Rayz JT (2021) Misspelling correction with pre-trained contextual language model. arXiv preprint arXiv:2007.11768 19. Bravo-Candel D, López-Hernández J, García-Díaz JA, Molina-Molina F, García-Sánchez F (2021) Automatic correction of real-word errors in Spanish clinical texts. Sensors 20. Prabhakar G (2020) A context-sensitive real-time Spell Checker with language adaptability. In: Proceedings of 14th ınternational conference on semantic computing (ICSC), pp 116–122 21. Hao L, Wang Y, Liu X, Sheng Z, Wei S (2018) Spelling error correction using a nested rnn model and pseudo training data. arXiv preprint arXiv:1811.00238
AI-Based Stress State Classification Using an Ensemble Model-Based SVM Classifier Dongkoo Shon, Kichang Im, and Jong-Myon Kim
Abstract The EEG signal is an electrical flow between brain neurons, and it appears differently depending on the mental and physical state. In this paper, stress is classified by analyzing EEG signals based on artificial intelligence. In this paper, using the DEAP dataset the stress state and the non-stress state were separated and trained in an artificial intelligence algorithm. As input to the AI algorithm, statistical, Power Spectrum Density (PSD), and High Order Crossings (HOC) features is used. These features were classified by learning the ensemble-based SVM classifier for each subject. To compare the classification accuracy, we compared the results using the feature selection algorithm using GA. In the comparison of experimental results, the ensemble-based SVM classifier showed better accuracy, such as 71.76% accuracy for feature selection using PCA and 77.51% accuracy for the experiment using ensemble-based SVM classifier. Keywords Stress detection · Support vector machine · Machine learning · Brain wave
1 Introduction According to the Early Warning of Major Disasters related to the shipbuilding industry [1] released to the Korea Occupational Safety and Health Agency (KOSHA), serious accidents continue to occur every year. Figure 1 shows the number of accidents by year of serious accidents in the shipbuilding industry. Serious accidents are D. Shon (B) · J.-M. Kim Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 44610, Korea e-mail: [email protected] J.-M. Kim e-mail: [email protected] K. Im ICT Convergence Safety Research Center, University of Ulsan, Ulsan, South Korea e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_45
657
658
D. Shon et al.
No. of Accidents
25 20 15 10 5 0 2013 2014 2015 2016 2017 2018 2019 2020
Year Fig. 1 A number of accidents by year of serious accidents in the shipbuilding industry
among industrial accidents with a high degree of fatality. So, it is urgent to reduce the continuous occurrence of serious accidents. According to Mcsween [2], it is known that the majority of occupational accidents due to unsafe behavior of workers, and, 76% of injuries at DuPont in 10 years were caused by unsafe behaviors. Inappropriate physical and mental state of workers causes stress. And stress affects brain waves, heart rate, hormones, and more [3, 4]. EEG is an electrical signal that can indirectly know the activity of the brain. EEG can be obtained by methods such as electro-encephalographic (EEG) and electrocorticographic (ECoG) [5]. ECoG is used to measure EEG by attaching electrodes directly to the cerebral cortex of the brain. It is a method that can obtain the cleanest EEG signal, but has a disadvantage in that it requires a surgical method to implant the electrode. EEG is a method of measuring EEG by attaching electrodes to the scalp outside the skull of the head. For EEG-based operator monitoring, it is suitable to obtain EEG using EEG, which is relatively light in application. If all features are applied in common, optimal classification performance may not be obtained. Therefore, higher performance can be obtained when an individual classification model is used for each feature type. When an SVM classification model is generated for each feature type, an SVM classification model showing good classification performance for each feature type is generated, and through this, the overall classification performance can be expected to be improved. Various studies [6–9] using ensemble techniques such as ensemble model-based SVMs are in progress in the field of machine learning. In this paper, ensemblebased SVM method is proposed to classify stress status. Classification accuracy was derived by classifying stress and non-stressed states using individual learning SVM and compared with classification accuracy using GA-based feature selection for performance verification.
AI-Based Stress State Classification Using …
659
2 Materials and Methods 2.1 Overall The overall process for novel stress classification is shown in Fig. 2. Experiments are performed through data annotation to label 2 state (stress, calm) in the dataset and classification to classify stress and calm states. For classification, the SVM learns each of the three features such as statistical features, PSD features, and HOC features, and then inputs the classification results to the decision function to derive the final classification results.
Fig. 2 Overall process
660
D. Shon et al.
Fig. 3 Valence-arousal space
2.2 Dataset In this paper, we used [10] dataset, which is an open brain wave dataset for the experiment for emotion classification. In this dataset, EEG signals were acquired when watching the music videos. Koelstra et al. [10] is the Arousal, Valence, etc. information is provided, and emotions in this dataset are expressed through the ValenceArousal space as shown in Fig. 3. In this paper, the preprocessed data was used for the experiment in [10].
2.3 Data Annotation As the methods used in the experiment in [11], the stress and the calm state were decided. (arousal < 4) ∩ (4 < valence < 6)
(1)
As shown in Eq. 1, if the arousal was less than 4 and the valence was between 4 and 6, the data were labeled as calm. (arousal > 5) ∩ (valence < 3)
(2)
AI-Based Stress State Classification Using …
661
Fig. 4 The data annotaion criteria of stress and calm state
As shown in Eq. 2, this paper decide stress state. Figure 4 provides a visual representation of the data annotation criteria for stress and calm state. As a result of separating the stress state and the sedation state for each individual subject, 7 out of a total of 32 subjects were separated into only one of the stress state and the calm state, and were excluded from the learning data. Therefore, the following experiments were performed on the remaining 25 subjects.
2.4 Feature Extraction The EEG signal of the DEAP is separated according to the experimental cycle, and the EEG signals from individual experiments are 63 s. Since the number of experiments of an individual experimenter is only 40, the training data for the machine learning algorithm is not sufficient. If you do not have enough training data, overfitting may occur, which may result in the machine learning algorithm performing poorly on test data or real data in data classification. Therefore, 8064 samples of data were divided into 32 equal parts. Therefore, the input data used for algorithm training and classification is about 2 s long (252 samples) used. Through this, the EEG signal, which was originally one dataset, was divided into 32 datasets to facilitate the learning of the machine learning algorithm. The extracted features were also used in this paper as statistical and frequency domain features, higher-order crossings, which are the features commonly used for EEG emotion analysis in previous studies [11–17]. Statistical Feature. In this paper, the following six features were used as statistical features in [8, 9].
662
D. Shon et al.
μx =
N 1 X (n) N n=1
N 1 σx = (X (n) − μx )2 N n=1 δx =
(4)
N −1 1 |X (n + 1) − X (n)| N − 1 n=1
(5)
δx σx
(6)
N −2 1 |X (n + 2) − X (n)| N − 2 n=1
(7)
γx σx
(8)
δ˜x = γx =
(3)
γ˜x =
where X(n) in Eqs. 3–8 is the data acquired by dividing the preprocessed EEG dataset [10] into 32 equal parts, so N = 252. Power Spectrum Density Feature. One of the features widely used feature is the power spectrum density (PSD) [13]. In general, according to the frequency domain in EEG research, it is divided into five bands [13–15]. The range of the frequency domain for each EEG band can be slightly adjusted according to the definition of the researcher. In this paper, 7 bands were used with reference to [12]. In this paper, to extract PSD use Welch’s method [16]. Since 7 features were obtained for each EEG of 32 channels, the number of power spectum densify features obtained from one data is 192. Hıgh Order Crossing Feature. Higher-order crossing (HOC), mentioned in [17] as being related to the emotional state, was used as a feature. Zero-crossing is counting that a signal passes through a zero value. Because the operation take short time, it is an efficient method for spectrum analysis. The HOC feature is defined as a sequence of D as shown in the following equation. HOC = [D1 , D2 , . . . , D M ]
(9)
where M means the order of the HOC, and 5 is selected in this paper. Since 5 features were extracted for each EEG of 32 channels, the number of HOC features extracted from one data is 160.
AI-Based Stress State Classification Using …
663
2.5 Classification In order to improve classification performance, features suitable for each SVM were individually learned. the classification result from the SVM learned with different characteristics is input into the decision function to derive the final classification result. Figure 5 shows the ensemble model-based SVM concept. In the decision function stage, the results from each SVM are collected, and the following proposed equation is used for state classification. In other words, It is a majority vote used to aggregate the classification results from the three SVMs. Equations 10 and 11 show the decision function that determines whether stress or not. Decision =
3 n=1
Fig. 5 Ensemble model-based SVM concept diagram
SVMn
(10)
664
D. Shon et al.
Status =
normal, if Decision ≥ 2 stress, Otherwise
(11)
where n means each of the three SVM models.
3 Experimental Results For evaluate the classification results of proposed method, the classification results applied in [18] were compared. In [18], KNN was used as the classifier. Table 1 and Fig. 6 shows the classification performance by applying the ensemble model-based Table 1 Comparison of [18] method and ensemble model-based SVM classification performance Participant no.
GA-based feature selection + KNN [18] precision (%)
Purposed method precision (%)
1
87.13
85.00
2
61.64
82.50
4
52.60
75.21
5
68.15
69.75
8
72.51
76.75
10
61.83
80.18
11
75.72
79.52
12
72.45
73.41
13
78.84
77.21
14
83.65
84.71
15
91.42
91.25
16
64.84
74.90
18
79.68
78.75
19
61.22
75.00
20
62.49
63.91
21
72.64
67.34
22
62.14
69.56
24
77.08
77.78
25
60.56
65.31
26
89.11
89.38
27
88.19
88.89
28
68.23
69.17
29
68.23
83.54
31
57.81
76.98
AI-Based Stress State Classification Using …
665
Fig. 6 Comparison graph of GA-based feature selection method and ensemble model-based SVM classification performance
SVM and [18] methods for each subject. The proposed method showed classification accuracy of at least 63.91% and maximum of 91.25% depending on the subject. On the other hand, in the case of [18], the minimum accuracy was 52.60% and the maximum was 91.42%. This result means that the ensemble model-based SVM is overall better for stress classification. As a result of the experiment, the ensemble-based SVM showed an average 5.75% higher classification performance than the result of [18]. Table 2 shows the average of the classification performance applying ensemble model-based SVM and GA-based feature selection method. Also, Fig. 7 shows that the ensemble model-based SVM technique has better overall performance. Table 2 Comparison of ensemble model-based SVM average classification performance Classification
Average precision (%)
GA-based feature selection + KNN [18]
71.76%
proposed method
77.51%
666
D. Shon et al.
Fig. 7 Comparison chart of ensemble model-based SVM average classification performance
4 Conclusion In this paper, an ensemble model-based SVM is proposed as an algorithm for classifying stress through EEG. As for the data used in the experiment, DEAP, an EEG dataset that was released for the experiment for emotion classification, and statistical feature, power spectrum density feature, and high order crossings were trained for each of the three SVMs by feature type. A majority vote was used to aggregate the classification results from the three SVMs. In the stress classification experiment using ensemble model-based SVM, the average classification success rate was 77.51%, which showed 5.75% higher performance compared to 71.76%, which is the result of using the [18]. Therefore, the proposed method showed better classification performance overall. Future work will be devoted to conducting experiments to evaluate the accuracy of the proposed method. In addition, it is intended to develop into a study that can be practically used in industrial fields through stress classification experiments through EEG collection using wearable measuring equipment. Finally, I would like to proceed with a study on the classification of fatigue and stress, which are other causes that can cause industrial accidents. Acknowledgements This work was supported by the Ulsan City & Electronics and Telecommunications Research Institute (ETRI) grant funded by the Ulsan City [21AS1600, the development of intelligentization technology for the main industry for manufacturing innovation and Humanmobile-space autonomous collaboration intelligence technology development in industrial sites]. This work also supported by the Technology Infrastructure Program funded by the Ministry of SMEs and Startups(MSS, Korea).
AI-Based Stress State Classification Using …
667
References 1. Korea Occupational Safety and Health Agency (KOSHA) Early warning of major disasters related to the shipbuilding industry. https://www.kosha.or.kr/kosha/data/shipbuilding_A.do 2. McSween TE (2003) The values-based safety process: ımproving your safety culture with behavior-based safety. Wiley 3. Pickering TG (2001) Mental stress as a causal factor in the development of hypertension and cardiovascular disease. Curr Hypertens Rep 3:249–254 4. Kim HG, Cheon EJ, Bai DS, Lee YH, Koo BH (2018) Stress and heart rate variability: a meta-analysis and review of the literature. Psychiatry Investig 15(3):235 5. Graimann B, Townsend G, Huggins JE, Schlögl A, Levine SP, Pfurtscheller G (2005) A comparison between using ECoG and EEG for direct brain communication. In: Proceedings of EMBEC05 6. Kim T-J, Jang H-Y, Park J, Hwang S, Zhang B-T (2014) Ensemble methods with increasing data for online handwriting recognition. J KIISE 41(2):164–170 7. Seo M-J, Kim M (2019) Ensemble method of emotion classifier for speech emotion recognition. J Korea Soc Inf Technol Policy Manage 11(2):1187–1193 8. Vijayakumar T, Vinothkanna R, Duraipandian M (2021) Fusion based feature extraction analysis of ECG signal ınterpretation—a systematic approach. J Artif Intell 3(1):1–16 9. Samuel MJ (2021) Study of variants of extreme learning machine (ELM) brands and its performance measure on classification algorithm. J Soft Comput Paradigm (JSCP) 3(2):83–95 10. Koelstra S, Muhl C, Soleymani M, Lee J, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2012) DEAP: a database for emotion analysis using physiological signals. IEEE Trans Affect Comput 3:18–31 11. Bastos-Filho TF, Ferreira A, Atencio AC, Arjunan S, Kumar D (2012) Evaluation of feature extraction techniques in emotional state recognition. In: 4th international conference on intelligent human computer interaction (IHCI). IEEE, pp 1–6 12. Jenke R, Peer A, Buss M (2014) Feature extraction and selection for emotion recognition from EEG. IEEE Trans Affect Comput 5(3):327–339 13. Lin YP, Wang CH, Jung TP, Wu TL, Jeng SK, Duann JR, Chen JH (2010) EEG-based emotion recognition in music listening. IEEE Trans Biomed Eng 57:1798–1806 14. Rozgi´c V, Vitaladevuni SN, Prasad R (2013) Robust EEG emotion classification using segment level decision fusion. In: Acoustics, speech and signal processing (ICASSP), pp 1286–1290 15. Ackermann P, Kohlschein C, Bitsch JA, Wehrle K, Jeschke S (2016) EEG-based automatic emotion recognition: feature extraction, selection and classification methods. In: e-health networking, applications and services, pp 1–6 16. Welch P (1967) The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans Audio Electroacoustic 15:70–73 17. Petrantonakis PC, Hadjileontiadis LJ (2009) Emotion recognition from EEG using higher order crossings. IEEE Trans Inf Technol Biomed 14(2):186–197 18. Shon D, Im K, Park JH, Lim DS, Jang B, Kim JM (2008) Emotional stress state detection using genetic algorithm-based feature selection on EEG signals. Int J Environ Res Public Health 15(11):2461
Designing a Secure Vehicular Internet of Things (IoT) Using Blockchain Atul Lal Shrivastava and Rajendra Kumar Dwivedi
Abstract Smart vehicles are interconnected and deliver a variety of sophisticated services to their owners, transit authorities, automobile manufacturers, and other service providers. Smart cars could be exposed to a number of security and privacy risks, including GPS tracking and remote vehicle hijacking. Blockchain is a gamechanging technology that can be used for everything from cryptocurrencies to smart contracts. It may be a viable solution for implementing security in vehicular IoTs. We present a blockchain-based methodology to preserve users’ privacy while simultaneously boosting vehicle security in this paper. The proposed model upgrades the services of vehicular IoT. Keywords Blockchain · Vehicular networks · IoT · Cloud computing
1 Introduction Automobiles and roadside equipment form self-organizing wireless networks called vehicular ad hoc networks (VANETs) (RSUs). The use of real-time dynamic communication between vehicles and RSUs provides for efficient and long-lasting data transmission. As a result of the broad deployment of VANETs, intelligent transportation systems (ITSs) are now feasible [1, 2]. A varied collection of VANETbased applications, which can be classified as safety or commercial, increases not only driving safety but also driving enjoyment. Safety-related applications include emergency vehicle warnings, traffic management reports, road accident notification, and speed monitoring [3, 4]. Commercially focused applications that provide convenience and entertainment include weather forecasting, broadcasting information from neighboring gas stations and restaurants, navigation, and Internet connectivity. We present a secure certificateless authentication solution for vehicle ad hoc networks in this paper. Centralized and decentralized networks in the IoT are well described in Fig. 1, and blockchain is well described in Fig. 2. A. L. Shrivastava (B) · R. K. Dwivedi Department of Information Technology and Computer Application, MMMUT, Gorakhpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_46
669
670
A. L. Shrivastava and R. K. Dwivedi
Fig. 1 Centralized and decentralized network in IoT
Fig. 2 Blockchain
The following is a breakdown of the paper’s structure. The second section is a summary of the survey’s findings. The research gap is described in Sect. 3. The model description and preliminaries in vehicle IoT using blockchain are described in Sect. 4. Section 5 outlines a model for bridging the gap, and the tools to be utilized are described in Sect. 6. This paper comes to a close with Sect. 7. This paper is organized as follows. Section 2 describes literature review. Section 3 describes the research gaps. Section 4 describes the background and preliminaries in vehicular IoT using blockchain. Section 5 describes system model. Section 6 describes methodology to fill the gap. Section 8 concludes this paper and describes future directions.
Designing a Secure Vehicular Internet of Things (IoT) Using Blockchain
671
2 Literature Review Security in vehicular Internet of Things (IoT) using blockchain for VANETs has received a lot of attention in recent years. User privacy should be protected, and keys should be updated. For location-based services, Lu et al. [5] presented a dynamic key management approach (LBSs). The LBS is a non-profit organization dedicated in helping. Each session is broken down into a number of time slots, each with its own set of activities. Following that a vehicular data authentication mechanism [6] is given, including probabilistic verification. For malicious behavior detection, an approach is used. In addition, in order to avoid delays in computing, checking the certificate revocation list (CRL), and group signatures. The authentication code is a hashed message authentication code (HMAC). Chuang and Lee et al. [7] developed a decentralized authentication security system (TEAM) for V2V communication. It is essential to notice that the findings are improved by using the transitive trust relationship frame. Smart cars could be exposed to a number of security and privacy risks, including GPS tracking and remote vehicle hijacking. Shen et al. [8, 9] proposed schemes have been created that emphasize privacy preservation and lightweight. VANET verification. public-key cryptography that is based on identity, in particular. For safe certificates, (ID-PKC) [10] has been frequently used. Obtaining proof of efficiency, several authentication methods have recently been introduced. Zhang et al. [11] proposed VANET management is a term used to describe the process of putting together a virtual Zhang et al. [11] provided an initial hypothesis. Smart vehicles are interconnected and deliver a variety of sophisticated services to their owners, transit authorities, automobile manufacturers, and other service providers. Smart cars could be exposed to a number of security and privacy risks, including GPS tracking and remote vehicle hijacking. Jung et al. [12] devised a universal re-encryption strategy based on identity. For V2R communications, the batch signature verification scheme. This technique, however, is susceptible to replication [13]. Formal paraphrase An attack Meanwhile, the VANET authentication framework was created as a result of this. [14] offers a new preservation and repudiation model (ACPN). Self-created PKC-based pseudo IDs were designed with this in mind are put into action. He et al. [15] then devised a successful strategy. The CPPA technique for VANETs is based on identity. It is worth mentioning that bilinear is a term used to describe. As a result of the lack of pairing processes, the results are fairly small. Calculations are cheap. Two more CPPA plans are similar. For VANETs, [1] and [16] were created.
672
A. L. Shrivastava and R. K. Dwivedi
3 Research Gaps On the basis of the literature survey, we found following research gaps: • Proximity: Because vehicles are mobile and perhaps fast-moving nodes in a network, they would need to change parent nodes frequently. Connections to adjacent peer nodes will be more reliable than connections to distant parent nodes like a cellular tower. A DSRC protocol satisfies the physical range criteria, as indicated in the preceding section [17, 18]. • Latency: In vehicles, low latency is crucial given their potential speed. Latency between any two peers will be reduced if nodes interact with one other (or even via each other) using DSRC [3, 19]. • Decentralization: When connections are dispersed across different pathways, network traffic bottlenecks are reduced. • Fault tolerance: A network with more connections between nodes is better able to withstand disturbances. This is one of the main advantages of the peer-to-peer (P2P) approach. Because they are not connected to the grid, vehicle peers will be unaffected by power or wired network disruptions [15, 16]. On basis of the identified research gaps, we are using blockchain to create a secure asymmetric cryptographic system. This is commonly used to secure sensitive data and enables public-key encryption. It is especially beneficial when delivering data across an insecure network like the Internet.
4 Background and Preliminaries This section describes preliminaries of our work as follows.
4.1 Elliptic Curve Cryptography (ECC) Let F p be a finite field of order p and p > 3 be a prime number. 4a3 + 27b2 + 6 = 0 must be satisfied by F p . An elliptic curve E p (a, b) over a finite field F p is described by the equation: y 2 = x 3 + ax + b Key Generation: Select a no. ‘d’ within range ‘n’ Q=d∗P
Designing a Secure Vehicular Internet of Things (IoT) Using Blockchain
673
where d = within range (1 to n − 1). Q = public key, d = private key. Encryption: The agreement on a common integer (the key) lies at the heart of all encryption. The agreed number is then used to encrypt a message by shifting the characters. s1 and s2 two ciphertext will be generated. s1 = k ∗ P
(1)
s2 = M + k ∗ Q
(2)
Decryption: The agreement on a common integer (the key) lies at the heart of all encryption. The agreed number is then used to encrypt a message by shifting the characters and then to return to the receiving end to decrypt it. The ECC is a way for securing the agreement of a key. M = s2 − d ∗ s1
(3)
4.2 Hash Function If a one-way hash function fits the following criteria, it is deemed secure [20]. (1) Given any length message x, it is simple (x) to compute a message digest with a fixed length output h. (2) Calculating x = h1 given y is tricky (y). (3) Given x, it is computationally impossible to compute × 0 = x such that h(×0) = h(x).The Chinese remainder theorem (CRT) assume that k 1 , …, k n are positive integer pairs that are approximately prime. The system of congruence [21] exists for any given set of numbers a1 , …, an . There is only one solution to the modulo g = Qn i = 1 k i , × ai mod k ii [1–n]. The solution is C = X n i = 1 iii mod ki, where I = g ki and I I = 1 mod ki. The speed of the time sequences can vary. The DTW method is frequently used to calculate the distance or similarity between time series automatically.
4.3 Dynamic Time Warping (DTW) The approach of dynamic time warping (DTW) [22] is effective for obtaining the best alignment between two time-dependent sequences (time series). It is important to keep in mind that the length and collected and forwarded to TA for analysis.
674
A. L. Shrivastava and R. K. Dwivedi
TA also stores the sensitive keys assigned to RSUs and vehicles. TA is believed to have sufficient storage and processing capabilities in this situation. Furthermore, because TA is the only valid verifier for the whole VANET, all participating autos must first be confirmed. The fast growth of cloud computing, in particular, makes it easier to connect traditional VANETs to cloud servers. As a result, vital information and sensitive user data can be stored on various cloud servers. Meanwhile, TA’s calculation capabilities could be improved. In recent years, cloud-assisted VANET research has attracted a lot of attention [23, 24].
5 System Model The three main components of VANETs system are described in this section. The design of the projected VANETs system is seen in Fig. 2.
5.1 Trusted Authority (TA) The dependable authority is the VANETs system’s dependable command and control center (TA). TA is in charge of verification vehicle registration, key management, and other major activities. We believe TA is always truthful and trustworthy. As seen in Fig. 1, TA provides a variety of programs to authorized cars, Internet access, including weather forecasts, navigation and, meanwhile, vehicle data, such as traffic congestion statistics.
5.2 Road-Side Unit (RSU) The RSU is a one-of-a-kind facility that acts as the only connection between TA and on-the-road vehicles. The RSU’s job is to communicate with approaching vehicles via short-range communication technologies (DSRC). In real-world settings, RSUs are stationed in remote places far from TA, with some of them in dangerous environments. As a result, if these RSUs are not regularly maintained, they can easily be compromised or deactivated, resulting in data leakage from the affected vehicles. Malicious attackers may be able to obtain sensitive vehicle data by storing corrupted RSUs. RSUs were designed as semi-trusted entities with limited access to vehicle data to account for this. The TA will harvest and process the vehicle data and other crucial information.
Designing a Secure Vehicular Internet of Things (IoT) Using Blockchain
675
5.3 Vehicles Vehicles are meant to collect data as well as receive VANET services. Each vehicle has an on-board device (Board unit) that allows it to communicate with road-side units as well as other vehicles. In our system concept, the vehicle plate number is recognized as a unique identifying provided by TA, each of which is clearly linked to a single driver. The driver’s fingerprint/certificate card is also utilized for further protection, guaranteeing that the driver and the connected vehicle are linked each time the driver starts his or her vehicle. For the sake of clarity, we regard the driver and the vehicle as a single entity in this research. Security in vehicular IoT using blockchain is shown in Fig. 3.
6 Methodology to Fill the Gap The capacity of blockchain is to incorporate consensus procedures and peer-to-peer computing. Blockchain has developed a decentralized and safe platform for sharing information. Indeed, digital encryption technologies are integral to blockchain technology, propelling blockchain cryptography to the forefront. Blockchain can use cryptographic technique such as RSA encryption algorithm.
Fig. 3 System model
676
A. L. Shrivastava and R. K. Dwivedi
6.1 Proposed Algorithm RSA algorithm to produce public and private keys, the RSA algorithm is used the steps of this algorithm are as below: p and q are two huge prime numbers. Multiply these numbers to produce x = p × q, where x is the modulus of encryption and decryption. Select an integer e smaller than x such that a is close to (p−1) × (q−1). It means that, except for 1, e and (p−1) × (q−1) have no common factor. Select “e” in such a way that 1e (x), e is prime to (x), and gcd (e, d (x)) = 1. If x = p × q, then e, x> is the public key. The public key e, x> is used to encrypt a plaintext message y. The following formula is used to generate ciphertext C from plain text: me mod n = C. y must be less than x in this case. A message of a length more than x is handled as a collection of messages, each of which is encrypted separately. We use the following formula to compute the d in order to determine the private key: Demonstrate that (p−1) × (q−1) = 1 or (p−1) × (q−1) = 1. De mod ( x ) = 1 de mod ( x ) = 1 de mod ( x ) = 1 de mod ( x ) d , x > is the private key. The private key d , x > is used to decrypt a ciphertext message c . The following formula is used to calculate plain text y from ciphertext c. mod cd x = y.
6.2 Simulation Tool Ethereum is a completely decentralized blockchain network. Due to the liveliness and responsiveness of its community, as well as the abundance of its documentation, the blockchain promises that goods will become completely autonomous and belong to themselves. They will be able to utilize code: In exchange for money (a type of code), the door will release its access (through code) for the duration of the specified time. Ethereum is a blockchain network that is decentralized. The blockchain promises that things will become entirely autonomous and belong to themselves as a result of the community’s liveliness and reactivity, as well as the volume of data available. They will be able to use code: In exchange for money (a form of code), the door will grant them access (through code) for a set amount of time.
Designing a Secure Vehicular Internet of Things (IoT) Using Blockchain
677
7 Result The proposed methodology outlined in the paper totally eliminates the inefficiency, as well as the security and privacy of data created in traditional automotive IoT. The suggested framework does a fantastic job of establishing a fully secure vehicular IoT. When it comes to vehicle data study, the most important consideration is the data’s trustworthiness or authenticity. When data are generated and stored using the blockchain framework, we can always be sure that the data are genuine because it was joined to the chain by various stakeholders rather than a single controlling body. The change from manual to remote monitoring and environment control is always seen as a more guided and successful approach. A tip-to-tip monitoring was not possible in the traditional method, and it may lead to many discrepancies. As a result, a remote network capable of detecting all connected data and alerting vehicles in the event of an irregularity could be considered one of the best alternatives.
8 Conclusion and Future Directions The possibility of employing blockchain for autonomous vehicle networks was examined in this article, and a blockchain model was proposed. The decentralized approach provides a number of advantages not present in traditional client–server architectures. While not the ideal application for blockchain technology, examining prospective applications is nevertheless beneficial. As more powerful ITS systems are built for real-world application, a disruptive technology like blockchain will almost certainly find its way into multiple components, even if it is not a core function. The proposed architecture suits a broader range of applications for future research direction such as improvement in proximity, latency, decentralization, and fault tolerance.
References 1. Lo N-W, Tsai J-L (2016) An efficient conditional privacy-preserving authentication scheme for vehicular sensor networks without pairings. IEEE Trans Intell Transp Syst 17(5):1319–1328 2. Horng S-J, Tzeng S-F, Huang P-H, Wang X, Li T, Khan MK (2015) An efficient certificateless aggregate signature with conditional privacy preserving for vehicular sensor networks. Inf Sci 317:48–66 3. Shen J, Zhou T, Liu X, Chang Y-C (2018) A novel latin-square-based secret sharing for M2M communications. IEEE Trans Ind Informat 14(8):3659–3668 4. Liu B, Jia D, Wang J, Lu K, Wu L (2017) Cloud-assisted safety message dissemination in VANET-cellular heterogeneous wireless network. IEEE Syst J 11(1):128–139 5. Lu R, Lin X, Liang X, Shen X (2012) A dynamic privacy-preserving key management scheme for location-based services in VANETs. IEEE Trans Intell Transp Syst 13(1):127–139 6. Molina-Gil J, Caballero-Gil P, Caballero-Gil C (2014) Aggregation and probabilistic verification for data authentication in VANETs. Inf Sci 262:172–189
678
A. L. Shrivastava and R. K. Dwivedi
7. Chuang M-C, Lee J-F (2014) TEAM: Trust-extended authentication mechanism for vehicular ad hoc networks. IEEE Syst J 8(3):749–758 8. Zhang L, Wu Q, Domingo-Ferrer J, Qin B, Hu C (2017) Distributed aggregate privacypreserving authentication in VANETs. IEEE Trans Intell Transport Syst 18(3):516–526 9. Shen J, Gui Z, Ji S, Shen J, Tan H, Tang Y (2018) Cloudaided lightweight certificateless authentication protocol with anonymity for wireless body area networks. J Netw Comput Appl 106:117–123 10. Shamir A (1984) Identity-based cryptosystems and signature schemes. In: Advances in cryptology. Springer, Berlin, Germany, pp 47–53 11. Zhang C, Lu R, L Xin, Ho P-H, Shen X (2008) An efficient identitybased batch verification scheme for vehicular sensor networks. In: Proceedings of 27th conference on computer communications (INFOCOM), pp 246–250 12. Jung CD, Sur C, Park Y, Rhee K-H (2009) A robust and efficient anonymous authentication protocol in VANETs. J Commun Netw 11(6):607–614 13. Lee C-C, Lai Y-M (2013) Toward a secure batch verification with group testing for VANET. Wireless Netw 19(6):1441–1449 14. Li J, Lu H, Guizani M (2015) ACPN: a novel authentication framework with conditional privacy-preservation and non-repudiation for VANETs. IEEE Trans Parallel Distrib Syst 26(4):938–948 15. He D, Zeadally S, Xu B, Huang X (2015) An efficient identity-based conditional privacypreserving authentication scheme for vehicular ad hoc networks. IEEE Trans Inf Forensics Secur 10(12):2681–2691 16. Sun J, Zhang C, Zhang Y, Fang Y (2010) An identity-based security system for user privacy in vehicular ad hoc networks. IEEE Trans Parallel Distrib Syst 21(9):1227–1239 17. Gao T, Deng X, Wang Y, Kong X (2018) PAAS: PMIPv6 access authentication scheme based on identity-based signature in VANETs. IEEE Access 6:37480–37492 18. Tan H, Chung I (2018) A secure and efficient group key management protocol with cooperative sensor association in WBANs. Sensors 18(11):3930 19. Jiang Q, Huang X, Zhang N, Zhang K, Ma X, Ma J (2019) Shake to communicate: Secure handshake acceleration-based pairing mechanism for wrist worn devices. IEEE Internet Things J 6(3):5618–5630 20. Tan H, Gui Z, Chung I (2018) A secure and efficient certificateless authentication scheme with unsupervised anomaly detection in VANETs. IEEE Access 6:74260–74276 21. Dwivedi RK, Kumari N, Kumar R (2020) Integration of wireless sensor networks with cloud towards efficient management in IoT: a review. In: Part of the lecture notes in networks and systems book series (LNNS), vol 94, Springer Singapore, pp 97–107 22. Haoxiang W, Smys S (2019) QoS enhanced routing protocols for vehicular network using soft computing technique. J Soft Comput Paradigm (JSCP) 1(02):91–102 23. Zhu X, Jiang S, Wang L, Li H (2014) Efficient privacy-preserving authentication for vehicular ad hoc networks. IEEE Trans Veh Technol 63(2):907–919 24. Khattak HA, Islam SU, Din IU, Guizani M (2019) Integrating fog computing with VANETs: a consumer perspective. IEEE Commun Stand Mag 3(1):19–25 25. Tan H, Choi D, Kim P, Pan S, Chung I (2018) An efficient hash-based RFID grouping authentication protocol providing missing tags detection. J Internet Technol 19(2):481–488 26. Khan AA, Abolhasan M, Ni W (2018) ‘5G next generation VANETs using SDN and fog computing framework. In: Proceedings of 15th IEEE annual consumer communications & networking conference (CCNC), pp 1–6 27. Ullah A, Yaqoob S, Imran M, Ning H (2019) Emergency message dissemination schemes based on congestion avoidance in VANET and vehicular FoG computing. IEEE Access 7:1570–1585 28. Song J, He C, Zhang L, Tang S, Zhang H (2014) Toward an RSU-unavailable lightweight certificateless key agreement scheme for VANETs. China Commun 11(9):93–103 29. Tan H, Choi D, Kim P, Pan S, Chung I (2018) Secure certificateless authentication and road message dissemination protocol in VANETs. Wireless Commun Mobile Comput 2018:1–13
Designing a Secure Vehicular Internet of Things (IoT) Using Blockchain
679
30. Gayathri N, Thumbur G, Reddy PV, Ur Rahman MZ (2018) ‘Efficient pairing-free certificateless authentication scheme with batch verification for vehicular ad-hoc networks. IEEE Access 6:31808–31819 31. Zheng D, Jing C, Guo R, Gao S, Wang L (2019) A traceable blockchain based access authentication system with privacy preservation in VANETs. IEEE Access 7:117716–117726 32. Madhusudhan R, Hegde M, Memon I (2018) A secure and enhanced elliptic curve cryptography-based dynamic authentication scheme using smart card. Int J Commun Syst 31(11):e3701 33. Tan H, Chung I (2019) Secure authentication and group key distribution scheme for WBANs based on smartphone ECG sensor. IEEE Access 7:151459–151474 34. Malip A, Ng S-L, Li Q (2014) A certificateless anonymous authenticated announcement scheme in vehicular ad hoc networks. Secur Commun Netw 7(3):588–601 35. Jiang Q, Ma J, Yang C, Ma X, Shen J, Chaudhry SA (2017) Efficient end-to-end authentication protocol for wearable health monitoring systems. Comput Electr Eng 63:182–195 36. Tan H, Choi D, Kim P, Pan S, Chung I (2018) Comments on ‘dual authentication and key management techniques for secure data transmission in vehicular ad hoc networks.’ IEEE Trans Intell Transp Syst 19(7):2149–2151 37. Ming Y, Shen X (2018) PCPA: a practical certificateless conditional privacy preserving authentication scheme for vehicular ad hoc networks. Sensors 18(5):1573 38. Jiang S, Zhu X, Wang L (2016) An efficient anonymous batch authentication scheme based on HMAC for VANETs. IEEE Trans Intell Transport Syst 17(8):2193–2204 39. Luo G, Yuan Q, Zhou H, Cheng N, Liu Z, Yang F, Shen XS (2018) Cooperative vehicular content distribution in edge computing assisted 5G-VANET. China Commun 15(7):1–17 40. Xie L, Ding Y, Yang H, Wang X (2019) Blockchain-based secure and trustworthy Internet of Things in SDN-enabled 5G-VANETs. IEEE Access 7:56656–56666 41. Zhang X, Wang D (2019) Adaptive traffic signal control mechanism for intelligent transportation based on a consortium blockchain. IEEE Access 7:97281–97295 42. Butt TA, Iqbal R, Salah K, Aloqaily M, Jararweh Y (2019) Privacy management in social Internet of vehicles: review, challenges and blockchain based solutions. IEEE Access 7:79694– 79713 43. Tan H, Song Y, Xuan S, Pan S, Chung I (2019) Secure D2D group authentication employing smartphone sensor behavior analysis. Symmetry 11(8):969 44. Lu Z, Liu W, Wang Q, Qu G, Liu Z (2018) A privacy-preserving trust model based on blockchain for VANETs. IEEE Access 6:45655–45664 45. Al-Riyami SS, Paterson KG (2003) ‘Certificateless public key cryptography. In: Advances in cryptology. Springer, Berlin, Germany, pp 452–473 46. Dhaya R, Kanthavel R (2021) Bus-based VANET using ACO multipath routing algorithm. J Trends Comput Sci Smart Technol (TCSST) 3(01):40–48
Survey on Machine Learning Algorithm for Leaf Disease Detection Using Image Processing Techniques A. Dinesh, M. Maragatharajan, and S. P. Balakannan
Abstract Maintaining croplands are one of the critical problems for the Indian economy. Therefore, malady identification in plants plays a critical role in the agriculture sector, as ailment in crops is a natural occurrence. Agricultural production with high quality is a must for any country’s economic development. Therefore, recognizing the perishing fields can be considered as a solution for avoiding crop loss and productivity decrease. The conventional approach to disease identification and classification necessitates a significant amount of time, a great deal of effort, and ongoing farm surveillance. In recent years, technological advancements and researcher’s emphasizing on this field have made it possible to obtain an optimized solution. Various common methods from the fields of machine learning, image analysis, and classification approaches have been used to recognize and detect diseases in agricultural products. This study outlines different existing strategies for detecting agricultural product diseases. Besides, the paper examines the methodologies that have been used to diagnose diseases, segment the affected areas, and classify diseases. It also provides a description of different techniques for (i) feature extraction, (ii) segmentation, and (iii) classifiers and their advantages and disadvantages. Keywords Machine learning · Feature extraction · Segmentation · Classifier · Disease detection
A. Dinesh (B) Department of Information Technology, Kalasalingam Academy of Research and Education, Krishnankoil 626126, India e-mail: [email protected] M. Maragatharajan School of Computing Science and Engineering, VIT Bhopal University, Bhopal, India e-mail: [email protected] S. P. Balakannan Department of Information Technology, Kalasalingam Academy of Research and Education, Krishnankoil, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_47
681
682
A. Dinesh et al.
Fig. 1 General steps for crop detection
1 Introduction Agriculture is an important field in India for its economic development. There are various types of soil and weather conditions available. The farmer can select the suitable soil to cultivate the crop. Due to population, changing weather, and political uncertainty, agricultural industries began to look for new ways to increase food production. Therefore, researchers find new effective and accurate technologies to increase productivity. Farmers can collect data and information using precision agriculture in information technology to make the right decision for an increased farm production. Precision agriculture is relatively a new technology that offers advanced techniques for increasing farm productivity to achieve economic development in agriculture. Precision agriculture can be used for a variety of purpose that includes plant pest identification, detection of weed, crop yield generation, and plant disease detection. Pesticides are used by farmers to manage pests, avoid diseases, and increase crop yield. Crop diseases cause low output and financial losses for farmers and agricultural industries. So, it is important to identify the disease and its seriousness. In a competitive farming system, disease identification in plants is critical. In addition, a farmer identifies disease symptoms in plants by unaided eye observations, which necessitates constant monitoring. However, in large plantations, this method is more costly, and it can also be less effective. Farmers in some countries, such as India, are required to present the specimen to experts, which take time and money. The general steps followed for the detection of plant diseases by system and machine learning classification strategies for recognizing and classifying plant diseases are the subject of this survey (Fig. 1).
2 Related Work The healthy and diseased sections of cotton leaves were classified using back propagation neural network classifiers in the analysis [1]. The training and research samples
Survey on Machine Learning Algorithm for Leaf Disease Detection …
683
of the images of a rice brown spot were obtained from the north portion of the Ningxia Hui autonomous zone. Using image recognition and a BP neural network classifier, the scheme successfully classified rice brown spots [2]. To select features of the cotton disease on leaves, the fuzzy feature selection approaches fuzzy curves (FC) and fuzzy surfaces (FS) were used. A subset of independent significant features was classified using the fuzzy feature selection method to get the best details for diagnosing and identifying. The analysis followed a two-step protocol. FC segregates necessary and essential features from the feature set of raw data and eliminates imprecise features at the primary pace. The investigation pursued the procedure in two steps. In the primary step, FC automatically isolates significant and essential aspect from the unique feature set and eradicates the imprecise features. The utility depends on the considerable characteristics attained afterward in the next pace using FS. The method was especially useful for reducing the dimensionality of feature space, which makes classification applications simpler to implement [3]. The image acquisition, image segmentation, image preprocessing, feature extraction, and then classification of these steps were mainly highlighted, and the author discussed the various method of using leaf image for plant diseases detection. Segmentation and feature extraction process were used to detect plant disease. According to the paper [4], rapid developments in the technology of sensing and techniques of machine learning offers a detailed and cost-effective resolution for better environmental and crop condition assessment. Precision agriculture (PA) is likely to involve more applications targeted at platform sensor and machine learning techniques such as the convergence of different modalities sensor and knowledge expert, hybrid integrating system, development of machine learning, and signal processing techniques [5]. As the season progressed, the model was improved owing to the availability of more data within the season (e.g., rainfall). Longer time—series yield data of a particular region will give a stronger prediction. Because of the method’s generic nature, it can be extended to other agricultural systems with yield monitor info. Future research should look at integrating more data sources into the models, forecasting at finer spatial resolutions inside fields, and using production forecasts to direct managerial decisions [6]. The article [7] not only offers a detailed review of different methods but also concisely addresses key concepts in image analysis and machine learning as they relate to plant virus detection systems. Representing studies on rice insect pests as well as other crops and vegetables and presenting a survey of such papers was the main criteria. Such requirements include image dataset size, number of classes (diseases), preprocessing, segmentation techniques, classifier forms, classifier accuracy, and so on. The analysis of [8] summarizes recent developments in the use of non-invasive optical sensors to detect, classify, and measure plant diseases at various scales. Thermography, chlorophyll fluorescence, and hyper spectral sensors are the most promising sensor groups. The assessment outlines current insight in the optical sensors non-invasive utilization for the identification, quantification, and detection of plant disease on the dissimilar scales. A large amount of promising sensor types are chlorophyll fluorescence, hyper spectral, and thermography sensors. Imaging systems were favored over non-imaging systems for the identification and control of
684
A. Dinesh et al.
plant disease. The variations between these strategies as well as their main advantages are discussed. The paper [9] began with a brief overview of important big data research activities in crop security, with an emphasis on weed control and management, and then moves on to possible applications. There is also a review at some machine learning approaches for the big data analytics. The potential for herbicide resistance modeling of ryegrass using Markov random fields (MRF), which takes an account of the spatial component along with adjacent sites, was investigated. No other work using MRF to model herbicide resistance has been done before. The use of disproportionate pesticides to treat rice crop diseases raises cost and pollutes the climate. Therefore, pesticides must be used sparingly. This can be accomplished by calculating disease incidence and targeting the diseased area with the required quantity and concentration of pesticide. The degree of disease intensity of leaves in rice crops was computed using fuzzy logic and the K-means segmentation technique in paper [10] and the approach had an accuracy of 86.35%. On a communal dataset for the classification of plant disease, various modern architectures of convolutional neural network (CNN) using three strategies of learning were evaluated. With an accuracy of 99.76%, these modern architectures outperformed the state-of-the-art plant disease classification outcomes. Using saliency maps as a visualization tool for better understanding and interpreting the CNN classification mechanism was suggested.
3 Segmentation Techniques The paper [11] has implemented a mechanism for image segmentation that was employed for the detection and classification of plant leaf disease automatically. It also includes a survey of various disease classification methods used to identify plant leaf disease. The inherited algorithm was employed for performing image segmentation, which is an important disease detection feature in plant leaf disease. Early identification of pests from the images is essential for a successful pest control management. A color-based image segmentation approach is used to detect the pests. The method outperforms the Otsu’s method, according to the comprehensive simulation results on various pest images [12].
4 Feature Extraction Table 1 represents various methodologies and analysis. In feature extraction, object tracking is applied to a detection algorithm developed previously which depends on the color-space combination and analysis of shape for attaining robust crop detection system objective. Although the previous algorithm worked well in general, it did not perform that good in sunny conditions, leaving room for improvement [13].
Survey on Machine Learning Algorithm for Leaf Disease Detection …
685
Table 1 Comparison of current methodologies Segmentation techniques
Definition
Merits
Demerits
Threshold method [14]
It is the most basic form of image segmentation, which separates image pixels according to their intensity level. The picture histogram’s peak is used to measure the threshold value
(i) Any prior knowledge of the picture is not needed (ii) Fast, basic, and low-cost computationally (iii) It is easy to use and acceptable for real-life scenarios
(i) The resulting image cannot guarantee that the segmented regions are contiguous since spatial information can be overlooked (ii) The choice of a threshold is important (iii) Extremely sensitive to noise
Clustering method [15]
Pixels in a picture with identical characteristics are segmented into same cluster using this approach. Based on the features of the image, it is divided into different sections. This approach is usually implemented using the K-means algorithm
(i) It is easy to obtain homogeneous regions (ii) Faster in terms of computation (iii) The smaller the value of K, the better K-means operates
(i) Worst-case scenario conduct is bad (ii) It necessitates clusters of similar size, so the assignment of the adjacent cluster center is right
Edge detection method [16]
This method detects all edges first, then connects them to shape the edge points to segment the appropriate area. It is developed for the detection of discontinuities in edges
(i) It works well with pictures that have a higher contrast among regions
(i) Works bad for a picture with a lot of edges (ii) It is difficult to find the right object edge
Regional method [17]
The construction of the segmentation region in this system is based on the correlation and dissociation of neighbor pixels. It operates on the concept of homogeneity with neighboring pixels within a certain area flocking together and having characteristics that are unrelated to pixels in other regions
(i) It helps to choose between interactive and automated image segmentation techniques (ii) The movement from the inner point to the outer region establishes more distinct entity boundaries (iii) In comparison with other approaches, it provides more reliable performance
(i) More computation time and memory are required, and the process is sequential (ii) User seed selection that is noisy results in faulty segmentation (iii) Splitting segments appear square due to the region’s splitting scheme
686
A. Dinesh et al.
Each color space has one luminance channel and two certain channels, a and b, which are known as chromaticity layers. Lightness is represented by dimension L, while color adversary dimensions a and b are represented by a and b. The merits of this paper are: it can control color and intensity separately, and it can detect minute color variations. The disadvantage is that singularity is a nonlinear transformation problem [18]. In three dimensions, HSV can be represented as a hexagon, with the central vertical axis reflecting intensity. The saturation value is Hue. Shades and brightness are used to define colors. The merits of this paper are: (i) Accuracy is higher, and (ii) Real-time implementations are possible. Demerit of this paper is that the sensitivity is less for lighting [19]. It is a color space that uses the RGB model and is made up of three separate picture planes, one for each primary color red, green, and blue. It is a model that is additive. One of the advantages is that it is mainly suitable for display and the disadvantage is that it has a lot of correlation. Therefore, color image processing is not recommended [20].
5 Classifier Traditional farming practices include manually collecting data, dealing with inclement weather, sprinkling pesticides for diseases, and other practices that put farmers’ lives in jeopardy, especially in drought-prone areas. Concerning the current situation, in conventional farming, there has been an urgent need for predicated data in farming that can assist farmers in identifying and responding to real-time problems. To help them solve their problems, a method that uses a decision tree classifier to predict cotton crop diseases based on temperature, soil moisture, and other variables was proposed [21].
5.1 Naive Bayes Classifier The article [22] concentrates on controlled machine learning methods for maize plant disease detection using images of the plant, such as Naive Bayes (NB), decision tree (DT), K-nearest neighbor (KNN), support vector machine (SVM), and random forest (RF). The above classification strategies are studied and compared to determine the best model for plant disease prediction with the highest accuracy. Characteristics are: (i) It is a probabilistic classification scheme. (ii) Theorem of high independence presumption. (iii) The value of a certain function is unaffected by the value of some other feature [22].
Survey on Machine Learning Algorithm for Leaf Disease Detection …
687
5.2 K-Nearest The paper presents two new techniques for automatic fruit counting in images of fruit tree canopies, one based on texture-based dense segmentation and the other based on shape-based fruit detection and compares their use to the existing techniques, (a) contour segmentation and K-nearest neighbor pixel classification system, then (b) a support vector machine-based approach based on super-pixel over it and its classification [23].
5.3 Support Vector Machines Support vector machines and artificial neural networks are used to allow the vision system to detect weeds based on their pattern. In sugar beet fields, four species of common weeds were studied. Characteristics are: (a) It is focused on decision planes, which are used to establish decision boundaries. (b) It has two phases of service, (1) the offline protocol, (2) the online procedure. (c) For preparation and classification, a multi-class support vector machine, which is a set of binary vector machines, is used. Fourier descriptors and moment invariant features were among the form feature sets. The overall classification accuracy of ANN was 92.92%, with 92.50% of weeds correctly identified [24].
5.4 Decision Tree Machine learning techniques combined with adequate image processing principles have a lot of potential for providing intelligence for designing an automation system that can distinguish fruits based on their form, variety, maturity, and intactness [25].
5.5 ANN Several characteristics are extracted using image processing techniques. The device can classify the species of herbal medicinal plant leaves being tested by the artificial neural network functioning as an autonomous brain network. The machine shows, the herbal plant can help with the cure of which disease. A features dataset of 600 images was used for the preparation, with 50 images per herbal program [26].
688
A. Dinesh et al.
6 Conclusion In this paper, a survey on the identification and classification of various agricultural product diseases using various image processing and machine learning techniques are presented. A rundown of various color and texture-based feature extraction methods, along with their benefits and drawbacks are outlined. Furthermore, various segmentation strategies, as well as their advantages and disadvantages are explored. In addition, the paper offers a brief overview of various segmentation strategies, as well as their benefits and drawbacks. Some of the approaches mentioned in this paper may be used in the future as an extended study.
References 1. Sarangdhar AA, Pawar VR (2017) Machine learning regression technique for cotton leaf disease detection and controlling using IoT. In: International conference on electronics, communication and aerospace technology, ICECA 2017 2. Huang et al (2014) New optimized spectral indices for winter wheat diseases. IEEE J Sel Top Appl Earth Observ Remote Sens, pp 128–135 3. Delwiche SR, Kim MS (2000) Hyperspectral imaging for detection of scab in wheat. Biol Qual Precis Agric 4203:13–20 4. Yang C (2012) A high-resolution airborne four-camera imaging system for agricultural remote sensing. In: Computers and electronics in agriculture, pp 13–24 5. Qin Z, Zhang M, Christensen T, Li W (2003) Remote sensing analysis of rice disease stresses for farm pest management using wide-band airborne data. In: Geoscience and remote sensing symposium, pp 7–13 6. Rothe PR, Kshirsagar RV (2015) Cotton leaf disease identification using pattern recognition techniques. In: International conference on pervasive computing (ICPC). IEEE, pp 1–6 7. Gulhane VA, Kolekar MRH (2014) Diagnosis of diseases on cotton leaves using principal component analysis classifier. In: Annual IEEE India conference (INDICON) 8. Texa plant disease handbook, [online] Available: http://plantdiseasehandbook.tamu.edu/indust ryspecialtylfiber-oil-specialty/cotton/ 9. Plant village Cotton, [online] Available: https://www.plantvillage.org/en/topics/cotton 10. Revathi P, Hemalatha M (2012) Advance computing enrichment evaluation of cotton leaf spot disease detection using image edge detection. IEEE 11. Ashourloo D, Aghighi H, Matkan AA, Mobasheri MR, Rad AM (2016) An investigation into machine learning regression techniques for the leaf rust disease detection using hyperspectral measurement. IEEE J Sel Top Appl Earth Observ Remote Sens, pp 1–7 12. Revathi P, Hemalatha M (2012) Classification of cotton leaf spot disease using image processing edge detection technique. In: International conference on emerging trends in science engineering and technology, IEEE, pp 169–173 13. Li H et al (2019) Combined forecasting model of cloud computing resource load for energyefficient IoT system. IEEE Access 7:149542–149553 14. Saranya A, Kottursamy K, AlZubi AA, Bashir AK (2021) Analyzing fibrous tissue pattern in fibrous dysplasia bone images using deep R-CNN networks for segmentation. Soft Comput, pp 1–15 15. AL-Khaleefa AS, Hassan R, Ahmad MR, Qamar F, Wen Z, Aman AHM, Yu K (2021) Performance evaluation of online machine learning models based on cyclic dynamic and feature-adaptive time series. IEICE Trans Inf Syst E104D(8):1172–1184
Survey on Machine Learning Algorithm for Leaf Disease Detection …
689
16. Sato T, Qi X, Yu K, Wen Z, Katsuyama Y, Sato T (2021) Position estimation of pedestrians in surveillance video using face detection and simple camera calibration. In: 2021 17th International conference on machine vision and applications (MVA), pp 1–5 17. Zhen L, Sun T, Lu G, Yu K, Ding R (2020) Preamble design and detection for 5G enabled satellite random access. IEEE Access 8:49873–49884 18. Prakash UM, Kottursamy K, Cengiz K, Kose U, Hung BT (2021) 4x-expert systems for early prediction of osteoporosis using multi-model algorithms. Measurement 180:109543 19. Saranya A, Kottilingam K (2021) An efficient combined approach for denoising fibrous dysplasia images. In: 2021 International conference on system, computation, automation and networking (ICSCAN). IEEE, pp 1–6 20. Saranya A, Kottilingam K (2021) A survey on bone fracture identification techniques using quantitative and learning based algorithms. In: 2021 International conference on artificial intelligence and smart systems (ICAIS). IEEE, pp 241–248 21. Liu Q et al (2020) Contour-maintaining-based image adaption for an efficient ambulance service in intelligent transportation systems. IEEE Access 8:12644–12654 22. Zhang J et al (2020) 3D reconstruction for super-resolution CT images in the internet of health things using deep learning. IEEE Access 8:121513–121525 23. Li H, Fan J, Yu K, Qi X, Wen Z, Hua Q, Zhang M, Zheng Q (2020) Medical image coloring based on Gabor filtering for internet of medical things. IEEE Access 8:104016–104025 24. Zhang L, Zheng X, Yu K, Li W, Wang T, Dang X, Yang B, Modular-based secret image sharing in Internet of Things: a global progressive-enabled approach. In: Concurrency and computationpractice & experience 25. Zhu H, Gowen A, Feng H, Yu K, Xu JL (2020) Deep spectral-spatial features of near infrared hyperspectral images for pixel-wise classification of food products. Sensors 20(18):1–20 26. Hao S, An B, Wen H, Ma X, Yu K (2020) A heterogeneous image fusion method based on DCT and anisotropic diffusion for UAVs in future 5G IoT scenarios. Wirel Commun Mob Comput 2020(8816818):1–11. https://doi.org/10.1155/2020/8816818
Deep Neural Networks and Black Widow Optimization for VANETS Shazia Sulthana and B. N. Manjunatha Reddy
Abstract In today’s modern world, intelligent communication system which is called as vehicular ad hoc networks (VANETs) play important hypothesis for sharing the live messages concerning with the traffic jamming, highway safety, and positionrelated services to progress the driving comportment of the driver. During this situation, privacy with the security is the major challenge to be identified. To identify this issue, Trusted Authority (TA) will provide dual authentication to maintain the authentication of messages between the TA and the VANET nodes. In this work, TA classifies the vehicles into prime users, secondary users, and illegal users under the roadside units (RSUs). Black Widow Optimization (BWO) technique for optimizing the weights to improve the network performance parameters and deep neural networks for intrusion detection, both the advantages are combined in the method. The results of the proposed scheme (BWO-DNN) are computationally efficient in terms of Key Computation Time(121 ms) and Key Recovery Time (3.82 ms) with the existing approaches. Keywords VANET · Privacy · Trusted Authority · Dual Authentication · Deep neural network · Black Widow Optimization
1 Introduction VANET is a scattered intelligent wireless communication network, constructed using the TA, RSUs, and smart vehicles; because of open environment, message transmission has become a major privacy concern; many researchers are doing extensive research in this regard. Normally, VANET contains prime mechanism specifically the TA, RSUs, and vehicles [1]. TA is responsible for providing high-level security S. Sulthana (B) · B. N. M. Reddy Department of Electronics and Communication Engineering, Global Academy of Technology, Bengaluru, India e-mail: [email protected] B. N. M. Reddy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_48
691
692
S. Sulthana and B. N. M. Reddy
toward RSUs and vehicles. The RSUs are fixed infrastructure which connects TA to vehicles, where vehicles get premium services from TA. Every vehicle is installed with intelligent unit called as on-board unit (OBU), which is being mounted with directional antenna rotates in particular direction or omni-directional antenna rotates in all direction. A variety of research study reveal that accidents on the roads made serious threats for people, and the congestion at the road generates an enormous use of time and petroleum [2]. In order to check on these types of issues, driving comfort and best messages related to the traffic should be in protected direction. Hence, VANET is designed to provide for both safety and non-safety applications such as emergency vehicle warning, traffic violation warnings, and curve speed warnings. In addition, it also offers comfort services such as location of petrol stations, location of hotel, and many more. These services depend on intelligent system which is incorporated in VANET and privacy preserving protocols to enhance the driving safety. Vehicle-to-Vehicle(V2V) and vehicle to Infrastructure(V2I) are two types of communications performed by the VANET in which the smart vehicles can be in touch with each other or with the RSUs located near the highways. In an openspace environment, the communication took place through DSRC protocol to access antenna. Since V2V and V2I are communicated through the wireless channel, these are vulnerable to various types of attacks such as Denial of Service, Confidentiality, Integrity, and Black-Hole attack. To achieve privacy in communication, authentication is required to the vehicle to check the authenticity of the vehicle, and it is major step toward the privacy [3–5]. In this work, a dual authenticity scheme is provided for resisting the malicious attackers to enter in to the network. During the process of authentication, TA sends the information to the authenticated vehicles by the use of deep neural networks (DNNs) learning mechanism, and to obtain the authentication with the better network performance proposed, black widow optimization (BWO) is used. In this method, two dissimilar group keys are generated for prime and secondary user keys. One group key is from TA to the prime users through RSUs, and another group key is from the prime users to secondary users [6, 7]. Through proper racing mechanism, the old member will not have access to present communication, and new member will not have access to previous communication.
2 Previous Works Vijayakumar et al. [1] explained the scheme to resolve the allocation of a key to a cluster of users and to modernize such keys when the user joining into the network or outside of the network. Johnson et al. [2] discuss the elliptical curve digital signature algorithm by using asymmetric pair of keys for private and public-key generation. Wasef et al. [3] explained the technique; each vehicle permitted with a short lifetime certificate, and this certificate can be simplified from any RSU. The certificates are frequently updated creating additional overhead. Shen et al. [4] explained the method in which the communication complexity increases when the traffic density of vehicles increases. The problem is no messages will be verified, which results in consumption
Deep Neural Networks and Black Widow …
693
of malicious messages. Syamsuddin et al. [5] discussed about the functions of hash, where the hash is used as a chain by the protocol to improve security and privacy for authentication of RFID. Perrig et al. [6] discussed about the efficient protocol, by using the symmetrical keys rather than the asymmetric keys since the use of symmetrical keys are reliable than the digital signature. Guo et al. [7] discuss about the grouping of signature; one grouping of public keys is associated with numerous groupings of private keys. In this grouping signature method, an invader can without difficulty discover a message send by the grouping; however, it is impossible to follow the sending messages. Lin et al. [8] emphasize on a scheme, where a medium initially send a hash chain to its neighbors and next it generates a MAC-based hash chain through which the neighbors will be able to validate the vehicle’s communication. Wang et al. [9] proposed the deep learning concepts, architectures of convolutional neural networks, challenge, application, upcoming guidelines. Kanthimathi et al. [10] projected an optimized routing path by modifying the ad hoc routing protocol aimed at the VANET. Moghanian et al. [11] planned an artificial neural network (ANN) is used as a learning technique in intrusion detection. Ramya Devi et al. [12] proposed a method for message trustworthiness and the communication competence by adjusting the partition and signal automatically for sinking the retransmissions. Montavona et al. [13] discussed a general idea of technique for interpretation of complex machine learning model, with a focal point on deep neural networks (DNNs). Sakr et al. [14] outcome shows that a mixture of machine learning technique can drastically differ in terms of its performance for the different estimation metrics. Hayyolalam et al. [15] give details of Black Widow Optimization algorithm (BWO), which is motivated by the inimitable mating behavior of black widow spiders. This scheme includes a special phase called as cannibalism. Shakya et al. [16] explained about the network devices and the routing path for an ad hoc network communicated with wireless communication.
3 System Overview 3.1 System Model The model of the system comprises of three main entities, namely trusted authority, roadside units, and intelligent vehicles as shown in Fig. 1. Trusted Authority (TA): The main function of TA for the register of RSUs, on-board unit mounted on a vehicle to become smart vehicle and the dynamic users, it is also accountable for generation of key, sharing of key among the prime users, and to hold safe top services in the network. In proposed work, each state in the nation has a TA. When a vehicle moving from one state to another state, the vehicle’s testimonial will be confirmed by the TA; the status of the vehicle is also identified by the TA. Where the vehicle is roaming currently from one state to another state. In Fig. 1, it is shown with a single TA for simplicity [1]. Adding up to this, each TA validates
694
S. Sulthana and B. N. M. Reddy
Fig. 1 System model
the identity of vehicle OBU’s or the identity of RSUs to stay away from malicious attacking vehicle entering into the intelligent system. Roadside Unit (RSU): RSUs are deploying in the highways, and they are connected and supervised in a regular manner and by the TA. These units act as communicating pathways between the TA and the smart entities. The RSUs are linked with the TA by a connected communication transport layer protocol and smart vehicle though a wireless link by a dedicated short-range protocol. Smart vehicles: Each smart entity is embedded with an intelligent system in the VANET system. Each vehicle can communicate with other vehicles and RSUs through the intelligent system. The vehicles are connected to the center point through the fixed units mounted with the antennas.
4 Proposed Algorithm: DNN-BWO 4.1 Deep Neural Network (DNN) used as Learning Technique for Intrusion Detection This network aimed on the solution to find the authenticated vehicle by learning process [9]. Deep neural network (DNN) is a collected works of neurons arranged in an order of several layers; each neurons accept the input does the particular neural activation from the earlier layer and carry out an easy calculation for example to find the weighted sum of the input by a previous nonlinear activation [11]. From input to the output, the neurons function as a nonlinear mapping. This mapping is educated from the records by the adaption of the weighted value of each neuron by the technique called error back propagation. Figure 2 shows the concept of DNN;
Deep Neural Networks and Black Widow …
695
Fig. 2 Neural network composed of any interconnected neurons
the inputs are mapped to the output through the weighted value. The interpretation of the concept is the general representation of the neuron in the peak layer [13]. Peaklayer neurons detect the image or text to portray below how to construct a model in the input area that is interpreted and represented the abstract learning concept [9]. Structuring the model can be formulated within the maximum activation framework.
4.2 The Proposed Black Widow Optimization Algorithm Figure 3 highlighted the flowchart of the algorithm to be proposed. Similar to other initial algorithms, the proposed algorithm starts with an initial spider population; hence, every spider represents a probable result. ⎡
X N ,d
⎤ x1,1 x1,2 x1,3 . . . x1,d = ⎣ x2,1 x2,2 x2,3 . . . x2,d ⎦ x N ,1 x N ,2 x N ,3 . . . x N ,d lb ≤ X i ≤ ub
(1)
X N,d is the black widow spiders population; d represents the numeral of decision variables for the problem; N is the number of population; lb is the population lower bound, and ub is the population upper bound. The potential solution populations (X N,d ) are used for minimizing or maximizing the following objective function represented in Eq. (2): Objective function = X N ,d
(2)
696
S. Sulthana and B. N. M. Reddy
Fig. 3 Flowchart of the black widow optimization algorithm
Procreate As the pairs don’t depend on one another, they start off to mate to have children of a new invention; in similar to this, each pair mate in its web, unconnectedly from other spiders. In the nature, around 1000 offspring are reproduced in each mate; however, at the ending, few of the strong spider offspring are survive. At present, in this algorithm, as to reproduce, an array called µ should also be formed as long as random numbers containing with widow array; then, children are produced by using µ with the subsequent equation, in which × 1 and × 2 are parents, y1 and y2 are children[14]. y1 = µ × x1 + (1 − µ) × x2
(3)
y2 = µ × x2 + (1 − µ) × y1
(4)
y1 and y2 are the young spiders from reproduction; i and j are a random number between 1 to N, and µ is the random number between 0 and 1. Cannibalism Three kinds of cannibalism: The primary one is sexual cannibalism; female black widow eats her male during or after the matting process. In this proposed algorithm, identifying the male and female by their optimized fitness value. One more type is sibling cannibalism where the well-built siblings eat their weak siblings. In this
Deep Neural Networks and Black Widow …
697
Fig. 4 Mutation
algorithm, to find the number of survivors, cannibalism rate is set. In a quantity of cases, the third type of cannibalism is frequently experimented in which the child spiders eat their own mother. The optimized fitness value is used to conclude wellbuilt or weak spiders [12]. Mutation In this stage, by randomly selecting the mute pop number of individuals as population. As Fig. 4 illustrates, each of the selected solution is arbitrarily interchanging any two elements in the array. Mute pop is calculated by the mutation rate [15]. Convergence In the beginning algorithms, stop conditions which are three in number are accounted (i) an iteration which is predetermined predefined. (ii) Observation of no updating in the fitness value of the finest widow for numerous iterations. (iii) Attaining to the specific point of accurateness. Pseudocode Input: VANET parameters of delay, delivery ratio, drop Output: Black Hole attack detection /*DNN*/ Collect the parameters Training phase Train the network by VANET parameters Weight updating /*BWO*/ Initial population of DNN weight parameters Each population is a dimensional array Set number of iterations Evaluate fitness function
698
S. Sulthana and B. N. M. Reddy
Procreating and cannibalism Random selection of two solution Random generation of children Omit some children by cannibalism rate Save the solution Mutation Based on mutation rate, compute the mutation children Generate an optimal solution Save the solution End for Updating Position updating process Save the optimal weight parameters of DNN /*DNN-BWO*/ Testing phase Optimal weighting factor is used for training Test the network Compute the black hole attack in VANET.
5 Results and Performance Validation This division examines the performance validation of the proposed DNN-BWO technique. The simulation is carried out through NS-2 simulator, in the Fig. 5 shows one TA communicated with the 8 RSU’s, RSU’s further communicated with the prime users by broadcasting the group key, prime users broadcast the group keys to the secondary users. The simulation area coverage is 2824*2000, two-ray propagation model, packet size is 500, ad hoc on-demand multipath distance vector routing (AOMDV) protocol, simulation time is 100 s are used. Figure 6 shows the mobility among the vehicles is implemented through the SUMO simulator, which shows the real-world scenario which shows the vehicle starting and ending point, time it connects with the network.
Deep Neural Networks and Black Widow …
699
Fig. 5 Communication between trusted authority and RSU
Fig. 6 Scenario showing real-world moving vehicles
5.1 Comparison of KeySize Versus Key Computation Time and Key Recovery Time The simulation results revealed in Fig. 6 taken X-axis as KeySize(bits) and Y as Key Computation Time (msec) are used to compare the group key computation time of TA with the existing schemes. This work compares the result obtain from this proposed DNN-BWO with VANET group key management (VGKM), Elgamal group key management (EGKM), and key-tree Chinese remainder theorem (KCRT) from Fig. 7, it is experimented that when the key size is 512 bits, the group key computation time of TA is found to be 121 ms in this proposed scheme, where as in VGKM (1937 ms), KCRT (6000 ms), and EGKM (12300 ms). The results as shown in Fig. 8 contain X-axis as KeySize (bits), and Y-are used to compare the proposed method prime user’s Key Recovery Time with the existing
Key Computation Time (msec)
700
S. Sulthana and B. N. M. Reddy
14000
12300
12000 10000 8000 6000
6000
4000
2400 1200 2 1000
2000 0
1400
142
110
64 VGKM
75
31 1200
2000
1937
121
179
128 KeySize (in bits) EGKM KCRT
256
512 Proposed
Fig. 7 Key Computation Time at trusted authority side under the black-hole attack
12
Key Recovery Time (in msec)
10.22
10 8.25
8
6.82
6.40
6
5.20
5.82 3.95
3.80
4
3.82 2.10
1.95
2 0
0.20
0.10
64
1.35
0.40
0.20
128
256
512
Key Size (in bits) VGKM
EGKM
KCRT
Proposed
Fig. 8 Key Recovery Time at prime users under the black-hole attack
methods. It compares the results obtained with existing approaches, and it is observed that when the key size is 512 bits, the Key Recovery Time of a user is found to be 3.82 ms in this proposed approach, where as in VGKM (5.82 ms), KCRT (6.82 ms), and EGKM (10.22 ms).
Deep Neural Networks and Black Widow …
701
6 Conclusion This work is projected on a dual authenticating method for improving the security of vehicles through the black-hole detection techniques: BWO-DNN are used for communicating with the VANET environment. The proposed deep neural network with black widow optimization technique works well in mass simulation for network intrusion and optimization algorithm, respectively. The dual authentication scheme demonstrated in this work shows the optimized results which enhance the secure data communication between the Trusted Authority and prime users from prime users to secondary users. The proposed work is simulated using NS-2 shows better Computation and Recovery Time of keys than the existing schemes.
References 1. Vijayakumar P, Azees M, Kannan A, Deborah LJ (2015) Dual authentication and key management techniques for secure data transmission in vehicular ad hoc network. IEEE Trans Intell Transp Syst, pp 1524–9050 2. Johnson D, Menezes A, Vanstone S (2001) The elliptic curve digital signature algorithm (ECDSA). Int J Inf Secur 1(1):36–63 3. Wasef A, Jiang YShen X (2008) ECMV: Efficient certificate management scheme for vehicular networks. In: Proceedings of IEEE GLOBECOM, New Orleans, LA, USA, pp 1–5 4. Shen W, Liu L, Cao X (2013) Cooperative message authentication in vehicular cyber-physical systems. IEEE Trans Emerg Top Comput 1(1):84–97 5. Syamsuddin I, Dillon T, Chang E, Han S (2008) A survey of RFID authentication protocols based on hash method. In: Proceedings of 3rd ICCIT, vol 2, pp 559–564 6. Perrig A, Canetti R, Tygar JD, Song D (2002) The TESLA broadcast authentication protocol. RSA Crypto 5(2):2–13 7. Guo J, Baugh JP, Wang S (2007) A group signature based secure and privacy preserving vehicular communication framework. In: Proceedings of IEEE INFOCOM, Anchorage, AK, USA, pp 103–108 8. Lin X, Sun X, Wang X, Zhang C, Ho PH, Shen X (2008) TSVC: timed efficient and secure vehicular communications with privacy preserving. IEEE Trans Wireless Commun 7(12):4987– 4998 9. Wang Y, Menkovski V, Ho IW-H, Pechenizkiy M (2019) VANET meets deep learning: the effect of packet loss on the object detection performance. 978-1-7281-1217 10. Kanthimathi S, Jhansi Rani P (2021) Optimal routing based load balanced congestion control using MAODV in WANET environment. (IJACSA) Int Jthisnal Adv Comput Sci Appl 12(3) 11. Moghanian S, Saravi FB, Javidi G, Sheybani EO (2020) GOAMLP: network intrusion detection with multilayer perceptron and grasshopper optimization algorithm. In: Digital object identifier. IEEE Access. https://doi.org/10.1109/ACCESS 12. Devi MR, Jeya JS (2021) Black widow optimization algorithm and similarity index based adaptive scheduled partitioning technique for reliable emergency message broadcasting in VANET. Research square 13. Montavon G, Samek W, Müller KR (2018) Methods for interpreting and understanding deep neural networks. Digit Signal Process 73:1–15. SCIENCE DIRECT 14. Sakr S, Elshawi R, Ahmed AM, Qureshi WT, Brawner CA, Keteyian SJ, Blaha MJ, Al-Mallah MH (2017) Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercise testing (FIT) project. Med Inf Decis Making 17:174. https://doi.org/10.1186/s12911-017-0566-6
702
S. Sulthana and B. N. M. Reddy
15. Hayyolalam V, Kazem AAP (2020) Black widow optimization algorithm: a novel metaheuristic approach for solving engineering optimization problems. Eng Appl Artif Intell 87:103249 16. Shakya S, Pulchowk LN (2020)Intelligent and adaptive multi- objective optimization in WANET using bio inspired algorithms. J Soft Comput Paradigm (JSCP) 2(1):13–23
An Efficient Automated Intrusion Detection System Using Hybrid Decision Tree B. S. Amrutha, I. Meghana, R. Tejas, Hrishikesh Vasant Pilare, and D. Annapurna
Abstract The remarkable development of network and communication technologies has increased human activities in cyberspace. This change has incited an open, baffling, and uncontrolled system of the Internet which engages an astonishing stage for the cyberattack. Due to the phenomenal increase in cyberattack incidents, the development of more innovative and effective detection mechanisms has been regarded as an immediate requirement. Consequently, intrusion detection systems (IDSs) have become a necessary component of network security. There exist various approaches to detecting intrusions, but none are entirely reliable, which calls for the need for an improvement on the existing models. Traditional signature-based detection methods are not very effective. Therefore, machine learning (ML) algorithms are used to classify network traffic. To perform the classification of network traffic, five ML algorithms—decision tree, AdaBoost, random forest, Gaussian Naive Bayes, and KNN— were built. To improve the classification model, a hybrid model was built using three decision trees. The hybrid model yielded the best results, exhibiting the highest accuracy and the lowest execution time. Keywords Cyberattacks · Network security · Intrusion detection system · Machine learning · Hybrid decision tree · Malicious · Benign · Network traffic · CIC-IDS2017 dataset
B. S. Amrutha (B) · I. Meghana · R. Tejas · H. V. Pilare · D. Annapurna Department of Computer Science and Engineering, PES University, Bengaluru, India e-mail: [email protected] I. Meghana e-mail: [email protected] D. Annapurna e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_49
703
704
B. S. Amrutha et al.
1 Introduction Computers, tablets, smartphones, and the Internet have now become such an intrinsic part of quotidian life that it is unthinkable to envision how one would go about their day without them. The dawn of the “age of the Internet”, while changing the course of humankind, also spawned a myriad of attacks that come under the umbrella of what is known as cybercrime. Cybercrime involves criminal activity that wields or fires at a computer, a computer network, or a network of devices—thus warranting the paramount need for cybersecurity. Cybersecurity is the implementation of a myriad of technologies, algorithms, and processes to cushion systems, networks, programs, devices, and data from cybercrime. Its goal is to mitigate the risk of cyberattacks and safeguard against the unauthorized utilization of systems, networks, and technologies. The proposed work pivots on one key realm, vital to the kingdom of cybersecurity—“automated intrusion detection system”, or IDS. To accentuate just how indispensable the service of an IDS is to one’s safety on the Internet, take a look at a cyberattack that targeted social media giant, Twitter, in July 2020. The incident involved a data breach, where the accounts of several celebrities were hacked. Former President Barack Obama, Amazon’s CEO Jeff Bezos, and Tesla and SpaceX CEO Elon Musk were all victims of the aforementioned attack [1, 2]. Malicious agents used the hacked profiles to tweet out Bitcoin frauds that procured them a rather hefty sum. IDSs are constructed to preclude such attacks. Incidents like the aforementioned one italicize the need for cyber resiliency, and the construction of fast-changing IDSs to keep up with today’s continually unrolling technologies and furtherance in the domain of cybercrime. The proposed project is aimed at constructing a network intrusion detector—a predictive model that is disposed to tell apart malicious activity or attacks, often referred to as intrusions, and normal network activity. Intrusion detection embraces network and host-based methods. A legion of off-the-peg IDSs is certainly available. But the vast number of these IDSs are built on various ML models, and the majority of them use principal component analysis to realize their goals. The proposed IDS is unique in that it uses a hybrid decision tree classifier (HDTC) model to implement its features. Decision trees are especially desirable as they force the consideration of all possible outcomes, hence leaving very little room for errors. A description of the HDTC model is represented in the subsequent sections.
2 Related Work Hoque et al. [3] used a genetic algorithm to build an IDS that effectively detects different types of network intrusions. The dataset used was the KDD Cup 1999
An Efficient Automated Intrusion Detection System …
705
dataset. The paper also provided a brief overview of network-related attacks, various components of IDS, and existing IDSs. Elmabrit et al. [4] evaluated twelve ML algorithms in terms of their ability and efficiency to detect anomalous behaviors in the network. It also provided an overview of all twelve ML algorithms used including their respective advantages and disadvantages. The assessment was performed on UNSW-NB15, CIC-IDS2017, and the industrial control system (ICS) cyberattack datasets. In the end, a complete analysis of the ML algorithms was provided and discussed. Ashoor et al. [5] highlight the importance and vital role of IDS in the current world. It provides an overview of the history of IDS, its categories including signature and anomaly-based IDSs, and its various classifications including host, network, and hybrid-based IDSs. Kumar et al. [6] focus on developing a signature-based IDS (S-IDS) using snort. The paper provides an overview of S-IDS, including its working, advantages, and disadvantages. The description of snort software, its components, and its rule structure is also provided. The installation procedure of the developed S-IDS is provided in detail. Vijayanand et al. [7] used multiple multi-layer deep learning algorithms to build IDS that classifies smart meter network traffic. The algorithms are organized in stratified order according to the concern of attacks. Each algorithm is trained to detect a particular type of attack. Hossain et al. [8] focused on developing a cyberattack detection model (CADM) using ML techniques. The feature extraction was done using LASSO and the ensemble classification method opted. The ensemble method was built by integrating random forest and gradient boosting. Russell et al. [9] surveyed various rule-based approaches, anomaly-based approaches, a distribution detection-based approach, a cluster-based approach, and a hybrid approach used to construct efficient IDSs to detect various types of attacks, including DoS attacks, catered to the IoT sector. Eltom [10] briefly explained the necessity, function, and classification of the IDS. The various investigation techniques common to all the IDSs and the metrics that can be used to assess the efficiency of the IDSs were described in detail. The IDS was simulated using VMWare’s virtualization platform. Lee et al. [11] used a host of deep learning techniques to detect malicious attacks. The KDD and NSL-KDD datasets were used. The data preprocessing included a conversion of categorical attributes into their numeric counterparts. The accuracies of the autoencoder and long-term short memory (LTSM) models were compared. The results showed that the autoencoder performed significantly better, exhibiting an accuracy of 98.9. The LTSM model, however, produced a much lower accuracy of 79.2. Stiawan et al. [12] stressed the feature selection task of the data preprocessing stage. The significant features of enormous network traffic are analyzed to improve the overall efficiency of traffic anomaly detection. The significant features were chosen by using methods such as ranking, information gain, etc.
706
B. S. Amrutha et al.
Shah et al. [13] implemented an anomaly-based multiclass network intrusion detection system (NIDS) using a neural network approach. The dataset used was advanced security network metrics and tunneling obfuscations. Other ML algorithms were also implemented and tested against the above dataset. Guowei et al. [14] proposed a random forest algorithm to detect network intrusions into power systems. Several decision tree classifiers were built, the classifiers were pruned using the CART algorithm, and majority voting was used for the combinatorial process of all the classifiers. Finally, the feasibility of the proposed method was tested. Yedukondalu [15] implemented IDS using two algorithms on the NSL-KDD dataset: support vector machine (SVM) and artificial neural network (ANN). Algorithms based on chi-squared and correlation are used for feature selection. The results show that the ANN algorithm outperforms the SVM algorithm. Widulinski et al. [16] developed an algorithm to locate infections in an operating system. It is based on a negative selection algorithm, inspired by mechanisms of the human immune system. It consists of two phases–creation of receptors and detection of anomalies. Finally, the experimental results of the proposed system are shown. Chen et al. [17] proposed a method to detect financial fraud using deep learning algorithms. Real-time credit card fraud detection dataset is used. Detailed comparisons were made between the proposed model and existing models of ML, autoencoder, and other deep learning models. Sateesh [18] proposed an improved soft computing technique that combines fuzzy rule-based preprocessing, decision tree-based feature reduction, K-means-EM clustering, and SVM-based classification to detect intrusions in the social network. The model was tested on KDD-NSL and DARPA datasets. Mugunthan [19] proposed the use of the hidden Markov model to safeguard various cloud environments, which are particularly susceptible to DDOS attacks. Metrics like sensitivity, accuracy, specificity, precision, and the F-test were used to validate the model built. The KDD-Cup99 dataset was used to test the proposed model. Sharma et al. [20] proposed the use of capsule neural networks (CNNs) and an SVM model to detect atypical activity in video surveillance footage. Approaches such as the background subtraction method and spatial feature extraction approach were used to construct a hybrid model, combining both CNN and SVM architecture, to successfully detect unusual activity. A detailed review of the previous and on-going work in the field inspired the proposed novel method–a hybrid decision tree classifier, to build an efficient and improved intrusion detection system.
3 Dataset The CIC-IDS2017 dataset compiled by the Canadian Institute of Cybersecurity [21] was selected to test and evaluate the proposed model. It comprises the latest benign and the present-day common attacks, which resembles the true real-world data
An Efficient Automated Intrusion Detection System … Table 1 Summary of attacks in the dataset
707
Si. No.
Attack category
No. of records
1
Benign
2,273,097
2
DoS Hulk
231,073
3
Port scan
158,930
4
DDoS
128,027
5
DDoS GoldenEye
10,293
6
FTP-Patator
7938
7
SSH-Patator
5897
8
DDoS Slowloris
5796
9
DDoS SlowHTTPTest
5499
10
Bot
11
Web attack
brute force
12
Web attack
XSS
13
Infiltration
14
Web attack
15
Heartbleed
1966 1507 652 36 SQL injection
21 11
(PCAPs). It incorporates 2,830,743 entries and 79 features, out of which 557,646 are flagged as malicious and the rest are benign. Table 1 summarizes the various attack categories included in the dataset [21]:
4 Experimental Setup A.
Hardware and Software requirements
The proposed IDS is simulated in Python version 3.8 and by using a system having an “i3” processor, “12 GB” RAM, processor base frequency of 2.30 GHZ, and Ubuntu OS. B.
Data preprocessing
The dataset is first cleaned, where NaN and missing values are removed. Next, attack category labels are encoded to their numerical counterparts using the label encoding technique. The dataset is then normalized, and the K (user-defined number) best features are extracted. It is then split into training and testing data in the ratio of 70:30. The models are trained using the training dataset and tested using the testing dataset. The models are then ranked according to the accuracy of the results obtained.
708
B. S. Amrutha et al.
5 Feature Extraction Feature extraction is imperative to the functioning of an algorithm. Feature extraction serves to provide for added efficiency and shorter execution times. Out of a total of 79 features, the best 38 were carefully extracted using the SelectKBest algorithm. Using all 79 features would negatively impact the speed of the algorithm, and using feature extraction to select a subset of the features would make the algorithm work faster and more efficiently. Table 2 represents a list of 38 features selected to construct the classification model.
6 Proposed Model and Alternative Approaches This section provides a thorough description of the HDTC model and also briefly summarizes the alternate ML algorithms that were implemented. A.
Decision Tree Classifier (DTC)
This algorithm works by dividing nodes into sub-nodes. It adopts a greedy-search strategy, where the best course of action is chosen at each step. The decision tree classifier divides the nodes on the provided parameters and then chooses the split that yields the most uniform sub-nodes [22]. The decision tree classifier model exhibited training and testing accuracies of 99.90 and 99.70, respectively. B.
AdaBoost
AdaBoost, or adaptive boosting, is an existing ensemble technique, used commonly in the construction of IDSs. This is a machine learning approach and performs classification by assigning weights to the classifiers. The more accurate results a model yields, the more weight is assigned to it. AdaBoost essentially builds a strong classifier from numerous weak classifiers [23]. The training and testing accuracies exhibited are 99.95 and 99.30, respectively. But the execution times were much longer than are desirable for an efficient IDS. C.
Random Forest
Random forest is an ensemble learning technique for classification. It uses decision trees to perform classification. It works by building decision trees and consolidating them together to create an ensemble of decision trees [24]. It works with only the selected subset of best features (selected using feature extraction) in contrast to using all features of the dataset. The training and testing accuracies exhibited are 99.35 and 99.33, respectively.
An Efficient Automated Intrusion Detection System … Table 2 List of features selected for classification
709
S. No.
Feature
Datatype
1
Bwd packet length mean
float64
2
Avg bwd segment size
float64
3
Bwd packet length std
float64
4
Bwd packet length max
int64
5
Packet length std
float64
6
Fwd IAT std
float64
7
Idle min
int64
8
Max packet length
int64
9
Idle mean
float64
10
Packet length mean
float64
11
Average packet size
float64
12
Idle max
int64
13
Flow IAT max
int64
14
Fwd IAT max
int64
15
Packet length variance
float64
16
Flow IAT std
float64
17
PSH flag count
int64
18
Fwd IAT total
int64
19
Flow duration
int64
20
Flow IAT mean
float64
21
FIN flag count
int64
22
min_seg_size_forward
int64
23
ACK flag count
int64
24
Min packet length
int64
25
Fwd IAT mean
float64
26
Active min
int64
27
Bwd packet length min
int64
28
Active mean
float64
29
Bwd IAT std
float64
30
Flow IAT min
int64
31
Bwd IAT max
int64
32
Init_Win_bytes_forward
int64
33
Active max
int64
34
Bwd IAT mean
float64
35
Total length of fwd packets
int64
36
Subflow fwd bytes
int64
37
Active std
float64
38
Destination port
int64
710
D.
B. S. Amrutha et al.
Gaussian Naive Bayes
This is an algorithm that most commonly works on continuous data types with a Gaussian normal distribution. It assumes that the features of the dataset are independent of each other. It uses statistical values like mean and standard deviation to perform the required classification [25]. This algorithm was the least effective and yielded the lowest accuracies out of all the other algorithms implemented. The training and testing accuracies exhibited are 71.88 and 71.94, respectively. E.
K-Nearest Neighbor (KNN)
This is a supervised machine learning algorithm. The algorithm hinges on the concept of feature similarity, where each data point is most similar to the data point closest to it in distance [26]. The training and testing accuracies are 99.70 and 99.73, respectively. F.
Hybrid Decision Tree Classifier (HDTC)
The proposed HDTC algorithm emulates human learning by using a menagerie of decision trees and hence performs both qualitative and quantitative learning to effectively implement the features of the proposed IDS. The algorithm builds three decision trees to efficiently identify and classify cyberattacks—the first decision tree identifies a packet as benign or malicious, the second labels the type of attack if a malicious packet was encountered, and the third classifies DoS (if DTC-2 classified the attacks as a DoS attack) attacks by their types. Together, the three decision trees work in tandem to accurately ward off and classify all threats to the network. The working of the HDTC model is clearly illustrated in Fig. 1. As can be evidenced from the results obtained, the HDTC algorithm proved to be the most effective, exhibiting the highest training and testing accuracies out of all the algorithms implemented. The HDTC algorithm is also implemented by using training and testing ratios of 60:40, 80:20, and 90:10. The amenable 70:30 ratio yielded efficient results. The detailed results for various split ratios are depicted in Table 4. The training and testing accuracies exhibited are 99.97 and 99.80, respectively.
7 Results and Analysis The HDTC algorithm yielded an accuracy of 99.97. The algorithm worked better than all existing algorithms implemented because the decision trees consider all possibilities and analyze 38 extracted features to yield better results, hence leaving very little room for error. The multiple decision trees serve to better handle the different levels of classification required to obtain higher accuracies. Unlike the other algorithms, HDTC uses three decision trees—one at each level—to realize its goals. DTC-1 classifies packets as benign or malicious packets. DTC-2 identifies the type of attack in case of malicious packets. DTC-3 identifies the type of DoS
An Efficient Automated Intrusion Detection System …
711
Fig. 1 A data flow diagram illustrating the working of the HDTC model
attack in the case of DoS attack packets. The handling of each task at a different level provides for higher training and testing accuracies. The training and testing accuracies for each ML model are given in Table 3. The following graphs were plotted to illustrate the experimental results obtained: (1)
Out of all the algorithms implemented, AdaBoost exhibited the longest execution time, a fact which greatly subverted the overall efficacy of the algorithm. This is clearly illustrated in Fig. 2. The AdaBoost model builds multiple decision trees, assigns variable weights to each decision tree, and finally classifies the traffic, which takes a lot of time. HDTC, on the other hand, uses only three decision trees, thus taking less time.
(2)
Although Gaussian Naive and HDTC models exhibited a similar execution time, the accuracy of HDTC was significantly higher—a fact that went into
Table 3 Training and testing accuracy of algorithms Algorithms
Training accuracy
Testing accuracy
HDTC
99.97
99.80
AdaBoost
99.95
99.40
Random forest
99.35
99.33
Decision tree
99.90
99.70
KNN
99.70
99.73
Gaussian Naive
71.88
71.94
712
B. S. Amrutha et al.
Table 4 Training and testing accuracy, the execution time for HDTC model Split ratio (training: testing)
Accuracy (%)
Execution time (in secs)
Training
Testing
Training
Testing
60–40
99.97
99.80
416
5.34
70–30
99.80
99.80
466
3.33
80–20
99.81
99.81
600
2.18
90–10
99.81
99.82
489
0.84
Fig. 2 Model execution time for each algorithm
consideration when choosing HDTC over all other algorithms implemented. This is clearly visible from Fig. 3. Once the HDTC model was decided upon, it was implemented with different split ratios of training and testing dataset including 60–40, 70–30, 80–20, and 90–10. The training and testing accuracies, execution time (in secs) for each split ratio, are depicted in Table 4. (3)
The HDTC model with an amenable 60–40 split ratio yielded the best training accuracy and training execution time in comparison with other algorithms. The execution time was less because fewer records had to be classified, and
An Efficient Automated Intrusion Detection System …
713
Fig. 3 Bar plot of training and testing accuracies of various algorithms. (GNB-Gaussian Naive Bayes, HDTC-hybrid decision tree classifier, DT-decision tree, KNNs-K-nearest neighbors, RFrandom forest, AB-AdaBoost)
accuracy was higher because it had to fit fewer records, which in turn reduced the potential for error. On the other hand, the 90–10 split ratio exhibited the best testing accuracy and execution time, as it had to classify fewer records. Hence, there was less room for errors, and the algorithm also boasts of a shorter execution time. The results above are represented clearly in Figs. 4 and 5.
Fig. 4 Plot of HDTC model accuracies for different split ratios
714
B. S. Amrutha et al.
Fig. 5 Plot of HDTC model execution time for different split ratios
8 Conclusion The ubiquity of computers and the Internet raised the ineludible need for cybersecurity measures to combat the mounting numbers of cyberattacks. Intrusion detection systems would soon become a prerequisite for all Internet and computer applications. The project implemented five existing algorithms and compared the accuracies and execution times of each against the proposed model. Graphs were plotted, and relevant conclusions were drawn. The implementation showed that the proposed model—HDTC—performed better than all other models implemented, exhibiting a higher accuracy and lower model execution time. The IDS proposes an efficient solution to today’s problem of cyberattacks. It is a punctiliously constructed automated intrusion detection system built using hybrid decision tree. It is a model that leverages the very best of two worlds—machine learning and cybersecurity!
9 Limitations and Future Scope The proposed model is restricted to attacks contained in the CIC-IDS2017 dataset. The IDS might generate a few false positives as well. The project may also be elaborated on to construct an intrusion prevention system and can further be integrated into various antivirus software systems. The proposed model may be tweaked to offer cybersecurity to specific sectors, like IoT, the financial sector, the cloud computing sector, and many more!
An Efficient Automated Intrusion Detection System …
715
References 1. https://www.isaca.org/resources/news-and-trends/industry-news/2020/top-cyberattacks-of2020-and-how-to-build-cyberresiliency 2. https://www.dfs.ny.gov/Twitter_Report 3. Hoque MS et al (2012) An implementation of an intrusion detection system using genetic algorithm. arXiv preprint arXiv:1204.1336 4. Elmrabit N et al (2020) Evaluation of machine learning algorithms for anomaly detection. In: 2020 International conference on cyber security and protection of digital services (cyber security). IEEE 5. Ashoor AS, Gore S (2011) Importance of intrusion detection system (IDS). Int J Sci Eng Res 2(1):1–4 6. Kumar V, Sangwan OP (2012) Signature-based intrusion detection system using SNORT. Int J Comput Appl Inf Technol 1(3):35–41 7. Vijayanand R, Devaraj D, Kannapiran B (2019) A novel deep learning-based intrusion detection system for smart meter communication network. In: 2019 IEEE international conference on intelligent techniques in control, optimization and signal processing (INCOS). IEEE 8. Hossain F, Akter M, Uddin MN (2021) Cyber attack detection model (CADM) based on machine learning approach. In: 2021 2nd International conference on robotics, electrical and signal processing techniques (ICREST). IEEE 9. Russell S, Norvig P (2002) Artificial intelligence: a modern approach 10. Eltom AA, Intrusion detection systems. Int J Mod Commun Technol Res 2(9):265768 11. Lee B et al (2018) Comparative study of deep learning models for network intrusion detection. SMU Data Sci Rev 1(1):8 12. Stiawan D et al (2020) CICIDS-2017 dataset feature analysis with information gain for anomaly detection. IEEE Access 8:132911–132921 13. Shah A et al (2020) Building multiclass classification baselines for anomaly-based network intrusion detection systems. In: 2020 IEEE 7th International conference on data science and advanced analytics (DSAA). IEEE 14. Guowei ZHU et al (2021) Research on network intrusion detection method of power system based on random forest algorithm. In: 2021 13th International conference on measuring technology and mechatronics automation (ICMTMA). IEEE 15. Yedukondalu G et al (2021) Intrusion detection system framework using machine learning. In: 2021 Third international conference on inventive research in computing applications (ICIRCA). IEEE 16. Widulinski P, Wawryn K (2020) A human immunity inspired intrusion detection system to search for infections in an operating system. In: 2020 27th International conference on mixed design of integrated circuits and system (MIXDES). IEEE 17. Chen JI, Lai KL (2021) Deep convolution neural network model for credit-card fraud detection and alert. J Artif Intell 3(2):101–112 18. Sathesh A (2019) Enhanced soft computing approaches for intrusion detection schemes in social media networks. J Soft Comput Paradigm (JSCP) 1(02):69–79 19. Mugunthan SR (2019) Soft computing based autonomous low rate DDOS attack detection and security for cloud computing. J Soft Comput Paradigm (JSCP) 1(02):80–90 20. Sharma R, Sungheetha A (2021) An efficient dimension reduction based fusion of CNN and SVM model for detection of abnormal incident in video surveillance. J Soft Comput Paradigm (JSCP) 3(02):55–69 21. https://www.unb.ca/cic/datasets/ids-2017.html 22. Swain PH, Hauska H (1977) The decision tree classifier: design and potential. IEEE Trans Geosci Electron 15(3):142–147 23. Schapire RE (2013) Explaining adaboost. In: Empirical inference. Springer, Berlin, Heidelberg, pp 37–52 24. Biau G, Scornet E (2016) A random forest guided tour. TEST 25(2):197–227
716
B. S. Amrutha et al.
25. Bouckaert RR (2004) Naive Bayes classifiers that perform well with continuous variables. In: Australasian joint conference on artificial intelligence. Springer, Berlin, Heidelberg 26. Guo G et al (2003) KNN model-based approach in classification. In: OTM confederated international conferences “On the move to meaningful internet systems”. Springer, Berlin, Heidelberg
A Medical Image Enhancement to Denoise Poisson Noises Using Neural Network and Autoencoders V. Sudha, K. Kalyanasundaram, R. C. S. Abishek, and R. Raja
Abstract Poisson noises are common in underexposed radiographs due to a lack of photons reaching the detector. Scatter radiations damage radiographs, and the severity of picture quality degradations is determined by the amount of scatters reaching the detectors. To forecast scatters and reduce Poisson noises, a convolutional neural network (CNN) method and autoencoders are employed in this study. Autoencoders (AEs) are neural networks with the goal of copying their inputs to their outputs. They function by compressing the input into a latent-space representation and reconstructing the output from this representation. Radiation exposures of 60% underexposed the radiograph. Poisson noises are successfully minimised, and image contrast and details are increased, thereby enhancing the image. After applying the CNN algorithm, the contrast and details in the radiograph were considerably improved and are now adequate for establishing a diagnosis, resulting in a 60% reduction in radiation exposure. The quality of radiographs can be enhanced by minimising scatters and Poisson disturbances, as demonstrated in this study. Keywords Convolutional neural network · Radiographs · Autoencoders · Cone-beam computed tomography · Max pooling layer
1 Introduction The moment light strikes the retina, visual perceptions begin. The retina is made up of a film of photoreceptors that convert light into electrical signals, which are then delivered to the primary visual cortex via the optic nerve. In computer science, signal transmissions of visual perceptions can be simulated using deep neural network algorithms that mimic neural processes. Multiplayer activation functions are neural
V. Sudha (B) · K. Kalyanasundaram · R. C. S. Abishek Sona College of Technology, Salem 636201, India e-mail: [email protected] R. Raja Muthayammal Engineering College, Rasipuram 637408, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_50
717
718
V. Sudha et al.
networks in which each neuron in one layer is connected to all neurons in the subsequent layer. Deep neural networks, such as speech recognition, facial recognition and language translation, can be trained on current data and then utilised to anticipate outcomes. Deep neural networks include convolutional neural networks (CNNs), which are a subset of deep neural networks. CNNs mimic animal visual processes in which a single visual-cortex neuron responds to inputs only in a small portion of the visual field. The receptive field of a neuron is a confined region that partially overlaps with the receptive fields of surrounding neurons, resulting in the full visual field being entirely covered by the receptive fields of all neurons. CNNs are used in medical imaging to classify lung diseases and diagnose mammograms. However, the majority of medical photographs still requires confirmation from a human specialist before being accepted. Human experts must examine the photographs and give a diagnosis based on the data displayed in them. The goal of medical imaging is to produce the best images with the least amount of exposure. To make solid diagnoses from photographs, human professionals rely on sufficient image quality. The quality of an X-ray image is determined by a number of factors. Image noise from scatters and/or underexposures is a significant stumbling block to decreasing radiation exposures. The development of intelligent image processing algorithms to enhance noisy images due to scatters or improper exposures is a feasible step towards reducing radiation exposure. As a result, images are practically assessed as poor quality for making confident diagnoses and are rejected, necessitating retaking of the image, which increases patient radiation doses. In cone-beam computed tomography (CBCT) projections, certain CNN applications in medical imaging have demonstrated promising results in modeling scatter (Fig. 1). Image segmentation is a crucial and challenging aspect of image processing. In the realm of image interpretation, it has become a hotspot. This is also a bottleneck that prevents 3D reconstruction and other technologies from being used.
(a) Fig. 1 CT scan images. a noisy image, b clear CT scan
(b)
A Medical Image Enhancement to Denoise Poisson Noises Using …
719
Image segmentation separates an image into several sections with comparable characteristics.
2 Literature Survey De Bruijne [1]: He used deep learning that can make a substantial contribution is medical imaging. The purpose of his review article is to survey the literature on deep learning in medical imaging and to discuss its possibilities for future medical imaging research. The first section provides an outline of how classical machine learning evolved into deep learning. Second, a survey of deep learning’s applicability in medical imaging research is presented. Third, well-known deep learning software tools are examined. Finally, findings are presented, along with limitations and future approaches for deep learning in medical imaging. Wernick et al. [2] proposed an article that covered a variety of machine learning applications that may be unfamiliar to you, as well as how these concepts apply to medical imaging through examples. Despite the fact that the name “machine learning” is new, the concepts of machine learning have been used in medical imaging for decades, most notably in the fields of computer-aided diagnosis (CAD) and functional brain mapping. Their attempt was not to survey the extensive literature in this field in this short article. Instead, their objectives will be to (1) familiarise the reader with certain cutting-edge machine learning techniques and (2) demonstrate how these techniques might be applied to medical imaging in a variety of ways. Aerts et al. [3] had described a radiomic analysis of 440 features from computed tomography data of 1019 patients with lung or head-and-neck cancer that assess tumour image intensity, shape and texture. They discovered that a huge number of radiomic markers, many of which had never been identified as relevant before, have prognostic potential in separate data sets of lung and head-and-neck cancer patients. A predictive radiomic signature, representing intratumour heterogeneity, is linked to underlying gene-expression patterns, according to radiogenomics analyses. These findings imply that radiomics can detect a common prognostic characteristic in lung and head-and-neck cancers. Palani [4] has proposed, collection of all essential data from at least two pictures, and the principle incorporation is partitioned into fewer picture pixels, as a rule into a single one, to build the picture’s norm and reduce iterative so as to improve all of the clinical picture’s fundamental uses, which are used to investigate all clinical issues. Clinical picture combination is the name given to this cycle. These sub-pictures’ parcelling by watershed computation is usual to obtain individual-level areas, and these districts are used to manage the combination cycle. The watershed computation is used to achieve the picture division in order to maintain the basic enumerations. Sudha et al. [5] had used VGG-19 that had sensitivity of 82% and an accuracy of 96% after being trained with features extracted from 20,000 photos and tested with features retrieved from 5000 images. The proposed system was capable of automatically labelling and classifying DR grades.
720
V. Sudha et al.
Sudha et al. [6] used segmentation algorithms based on a deep neural network (DNN) to detect retinal abnormalities such as exudates, haemorrhages and microaneurysms from digital fundus images and then reliably classify the conditions as mild, moderate, severe, no PDR and PDR in DR. Initially, colour images are subjected to saliency detection in order to separate the most salient foreground items from the background. Archana et al. [7] had used neural network concepts, to train and test the data using artificial neural network has been discussed which gives an brief idea about usage of neural net concepts in field of network security. The properties such as feed forward back propagation network, gradient descent momentum training purpose, sigmoid transfer function, supervised learning model used to train the model for predicting fraudulent attacks. Ahmed et al. [8] had compared two unsupervised algorithms to denoise magnetic resonance images (MRIs) in the complex picture space using the raw information that k-space contains in this research. The first approach uses Stein’s unbiased risk estimator, whereas the second uses a blindspot network, which restricts the network’s receptive field. Both approaches are put to the test on two data sets: one with genuine knee MRI and the other with synthetic brain MRI. The above literature works presented the use of neural networks for medical image enhancement. The performance of networks was compared using quantitative and qualitative measures to measure the efficiency of denoising. In our paper, autoencoders were used for image enhancement by denoising the Poisson noise for better diagnosis of diseases.
3 Existing System Design To denoise the image, each layer of the neural network processes the input image into different phases and matrices. After that the model is trained on a series of lowresolution photos and their corresponding high-resolution images in order to extract the characteristics of those images in a specially developed encoder. Following the training, the loaded weights of the decoded images are saved as feature vectors. The results of the trained models are observed using a low-quality input test image. The data set contains 1000 photos of various human internal organs, each of which includes high-resolution images of the items within the image. Adding two layers of two-dimensional convolutional function with output shape of and an activation layer of rectified linear units (ReLUs) function, as illustrated below. y = max(0, x)
(1)
These two layers are stored separately in layer 1 and extract essential features from the input image before moving down to smaller feature space. The max pooling layer is connected to the previous two-dimensional convolution layer and compresses the
A Medical Image Enhancement to Denoise Poisson Noises Using …
721
dimensionality into smaller features with an output shape of (128, 128, 64). The (128, 128, 64) dropout layer is used to regularise the complex neural network layers that are running in parallel. Two layers of two-dimensional convolutional layers are applied one after the other, with the identical output shape of (128, 128, 128) and the activation function of ReLU generated using the algorithm above. For further processing, the layers up to this point are saved as outer layer 2. The previous two-dimensional convolution layer is connected to the max pooling layer, which has an output shape of (64, 64, 128). After the previous max pooling layer, a two-dimensional convolution layer with the output shape of is applied (64, 64, 256). The upsampling layer is used to extend the small picture matrix into a bigger one, having an output shape of (128, 128, 256). Two layers of two-dimensional convolutional layers with the identical output shape of (128, 128, 128) and the activation function of ReLU are applied one after the other. The addition is done on the previously convolutional layer with the outer layer 2, which was saved for later processing with the shape of (128, 128, 128).
4 System Implementation The project’s implementation stage is when the theoretical design is translated into a workable system. This is the most crucial and last phase of the system life cycle. It is the process of turning the new system into a functioning system (Fig. 2). The block diagram showed how output enhanced image can be obtained. Input CT image (noisy) is fed to the encoding block which is passed to convolution layers for extraction of features. Thereby, the dataset is trained and tested to match with
Fig. 2 Block diagram to obtain enhanced image
722
V. Sudha et al.
the extracted features. Autoencoders are used to get noise-free enhanced images as output.
5 Simulation Results When noised CT image is given to trained models, it completely denoises with a good efficiency level. The output is enhanced image as depicted in Fig. 3.
(a) InputCT image
(b) Noisy Image
(c) Denoised image
(d) Output enhanced image
Fig. 3 Simulated output images. a Input CT image, b noisy image, c denoised image, d output enhanced image
A Medical Image Enhancement to Denoise Poisson Noises Using …
723
Fig. 4 Model accuracy
Fig. 5 Model loss
The performance of the algorithm is measured and plotted with accuracy vs epoch curves. It is found that as number of epochs increases, the accuracy also gets increased as observed in Fig. 4.
724
V. Sudha et al.
Fig. 6 ROC curve
Table 1 Performance metrics
S. no.
Performance measures
Metrics obtained
1
Sensitivity
96.40
2
Specificity
99.73
3
Positively predicted value
99.25
4
Negatively predicted value
98.82
Model loss was measured in Fig. 5 where the plot is plotted between loss and epochs It showed that the trained model exhibits less loss compared with validation model. Region of convergence curve is plotted between false and true-positive rate. From the curve, we find the area under the curve value is 0.875 (Fig. 6). Table 1 shows the comparison of various performance parameters such as sensitivity, specificity, positively predicted and negatively predicted values.
6 Conclusion Medical image processing will greatly benefit from deep learning approaches, since deep learning has outperformed traditional machine learning approaches in nonmedical routine imaging research. This paper proposed the enhancement of CT image that is corrupted by high level of Gaussian noise. The performance parameters are measured, and accuracy curves were plotted. It was found autoencoders performed
A Medical Image Enhancement to Denoise Poisson Noises Using …
725
at its best for denoising the CT images. It was found that specificity, sensitivity were obtained above 95%.
References 1. De Bruijne M (2016) Machine learning approaches in medical image analysis: from detection to diagnosis. Med Image Anal 33:94–97 2. Wernick MN, Yang Y, Brankov JG, Yourganov G, Strother SC (2010) Machine learning in medical imaging. IEEE Signal Process Mag 27:25–38 3. Aerts HJ, Velazquez ER, Leijenaar RT, Parmar C, Grossmann P, Carvalho S et al (2014) Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun 5(4006) 4. Palani U, Vasanthi D, Rabiya Begam S (2020) Enhancement of medical image fusion using image processing. J Innov Image Process (JIIP) 2(4):165–174 5. Sudha V, Ganeshbabu TR (2020) A convolutional neural network classifier VGG-19 architecture for lesion detection and grading in diabetic retinopathy based on deep learning. Comput Mater Continua 66(1):828–842 6. Sudha V, Ganesh Babu TR, Vikram N, Raja R (2021) Comparison of detection and classification of hard exudates using artificial neural system vs. SVM radial basis function in diabetic retinopathy. Mol Cell Biomech 18(3):139–145 7. Archana P, Divyabharathi P, Camry Joshya Y, Sudha V (2021) Artificial neural network model for predicting fraudulent attacks. J Phys Conf Ser 1979(1):012016 8. Ahmed AS, El-Behaidy WH, Youssif AA (2021) Medical image denoising system based on stacked convolutional autoencoder for enhancing 2-dimensional gel electrophoresis noise reduction. Biomed Sign Process Control 69:102–842 9. Li Y et al (2021) Research on image processing algorithm based on HOG feature. J Phys Conf Ser 1757(1). IOP Publishing 10. Faridi MS et al (2021) A comparative analysis using different machine learning: an efficient approach for measuring accuracy of face recognition. Int J Mach Learn Comput 11–2 11. Wu L, Liu S (2021) Comparative analysis and application of LBP image recognition algorithms. Int J Commun Syst 34(2) 12. Doi K (2007) Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput Med Imag Graph 31:198–211 13. Park SH, Han K (2018) Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 286:800–809 14. Wang S, Summers RM (2012) Machine learning and radiology. Med Image Anal 16:933–951 15. Wernick MN, Yang Y, Brankov JG, Yourganov G, Strother SC (2010) Machine learning in medical imaging. IEEE Sign Process 27:25–38
Hand Gesture Recognition Using 3D CNN and Computer Interfacing Hammad Mansoor, Nidhi Kalra, Piyush Goyal, Muskan Bansal, and Namit Wadhwa
Abstract Since the 1970s, the field of gesture recognition and its applications has been at the centre of considerable research in human–computer interaction. Researchers have been able to construct strong models that can recognize gestures in real time thanks to recent advances in deep learning and computer vision, but they face hurdles when it comes to classifying gestures in variable lighting conditions. In this paper, we train a 3D convolutional neural network to recognize dynamic hand gestures in real time. Our focus is on ensuring that gesture recognition systems can perform well under varying light conditions. We use a huge training set consisting of numerous clips of people performing specific gestures in varying lighting conditions to train the model. We were able to attain an accuracy of 76.40% on the training set and 66.56% on the validation set with minimal pre-processing applied to the data set. The trained model was able to successfully recognize hand gestures recorded from a Webcam in real time. We were then able to use the model’s predictions to control video playback on the VLC media player such as increasing/decreasing volume and pausing video. These experimental results show the effectiveness and efficiency of the proposed framework to recognize gestures in both bright and dim lighting conditions. Keywords Deep learning · Convolutional neural networks · PyTorch · 3D CNN
1 Introductıon In computer science and language technology (study of how devices can analyse, modify and produce human texts and speech), gesture recognition is aimed at understanding and interpreting human gesture using mathematical algorithms. These gestures usually originate from the hands and face but can actually be any and all forms of bodily motion. Since an individual’s face and hands can produce a multitude of simple and complex gestures, the research on gesture recognition is mostly H. Mansoor · N. Kalra (B) · P. Goyal · M. Bansal · N. Wadhwa Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_51
727
728
H. Mansoor et al.
focussed on creating systems that can identify them, either visually or through the use of electromagnetic waves. Gesture recognition is a way for computers to understand humans and can help bridge the communication gap between humans and machines. It enables an individual to interact with a computer system by simply pointing at it making the use of conventional input such as keyboards and mouse redundant. Traditional ways of interacting with computer systems include using physical peripherals such as keyboards and mouse or screens with touch-based interfaces. These require the user to be present close to the device and physically interact with it. This may not be possible at all times and can create barriers in communication between man and machine. For example, it is very taxing for a disable person to move around and physically interact with devices around them and gesture recognition can help them use them just by making simple combinations of hand movements. Traditional interfacing equipment such as mice and keyboards also contribute to plastic waste which is hazardous to the environment and the existence of gesture-based interfacing can eliminate their need, reducing the production of plastic waste by a huge margin. Hand gesture recognition and its various advantages over traditional, physical human–computer interaction equipment are well known and developments in said field have been going on since the 1970s. Initially most gesture recognition systems made use of a combination of sensors attached to a glove which could be worn on the hand [1]. The sensors present on the glove were of many types, flex sensors, which would output a change in a resistance with respect to bending of a hand, Accelerometers which would measure the acceleration and direction of hand movements, contact sensors, etc. Such models were robust but required an individual to wear heavy gloves which caused inconvenience and restricted the number and complexity of gestures that could be performed. Around the same time research into vision-based gesture recognition using inputs from cameras started to take flight. But due to computational restrictions, such models were hard to train and deploy onto consumer products, slowing the progress of such systems. As computing resources started to get cheaper, smaller and easily accessible, computer vision research in the recognition of hand gestures began to grow. Earlier models made use of skin colour segmentation to localize the position of hands in an image and separate it from the background, making it easier to detect them. These segmentation methods were great but required a lot of data pre-processing which increased computational overheads [2]. After performing an extensive review of research published in the domain, we identified the following problems that exist in present gesture recognition systems: 1.
2.
3.
Some of these systems require the installation of multiple hardware-based sensors which increases the cost of the product and the inconvenience caused to the user. Most systems require extensive data pre-processing in the form of segmentation, localization, capturing depth and intensity data, etc., which can increase computational complexity and overheads. Systems that employ convolutional neural networks for training their models, do so for static image data that does not consider dynamic gestures. These also require significant pre-processing on the input data, which is an overhead.
Hand Gesture Recognition Using 3D CNN and Computer Interfacing
729
To address the above issues, we aim to create a gesture recognition system that is trained using a data set of video clips that each corresponds to a specific gesture that we hope to integrate into our system. The proposed system would include 3D convolutional neural network which would be used to process the video clips and extract features from them that will ultimately help classify the gesture. It would be complemented by a recurrent neural network that will help us integrate the aspect of time and process a set of frames as a single unit. The proposed system would accurately classify hand gestures that are performed by the user and captured using a Webcam.
2 Related Work Hand gesture recognition has been an active field of research and various real-world systems, and products have been developed which make use of it. Some examples include: 1.
2.
3.
Microsoft’s Kinect for Xbox 360 captures body and hand motions in real time, freeing gamers from keyboards and joysticks. Kinect also supports multiple players within a small room setting. Today, Kinect is part of Microsoft’s cloudbased service, Azure. American firm Leap Motion makes a sensor that detects hand and finger motions as input. Besides using it to control your PC, it also allows for hand tracking in virtual reality. BMW 7 Series of cars has gesture recognition that allows drivers to turn up or turn down the volume, accept or reject a phone call, and change the angle of the multi-camera view. There is even a customizable two-finger gesture that you can programme to whatever you want.
Our approach makes use of image processing and machine learning techniques to recognize gestures. These techniques can also be quite useful when working with text-based data where a model is able to categorize handwritten characters [3] and font styles [4]. Furthermore, research can also be focussed on the optimization of existing systems. For example, Mohanraj et al. [5] proposed the use of capsule networks in conjunction with existing neural networks to identify latent structures in large volumes of textual data. Similarly, Sungheetha et al. [6] demonstrated that capsule networks were able to outperform traditional methods of image classification as they used a greater wealth of information which resulted in better decision making for any given problem. Amrita et al. [7] used image processing and recognition techniques to categorize sign language and then used the results to generate corresponding text and speech. Tesfamikael et al. [8] used image processing to segment and identify the eye pupil position. The position of the pupil was used to control a motor using a
730
H. Mansoor et al.
PID controller which drove a wheelchair. Milan et al. [9] explored various CNNbased image classification techniques to automate the item identification and billing process at supermarkets. Biswas et al. [10] proposed the use of a Microsoft Kinect® camera to capture depth and intensity data which could be used to separate the hand from the background and enable the system to classify dynamic hand gestures. This required the use of expensive camera modules that were not easily available in all consumer products. Alexnet in 2012, [11] was the first visual image recognition model based on convolutional neural networks that achieved a significantly lower error percentage compared to its competitors due to the efficient usage of GPUs. This implementation showed that CNNs could be trained to classify high resolution images. This opened the possibility of using convolutional neural networks to classify static gestures in real time. These networks could automatically extract features from an image without extensive prior pre-processing and optimize themselves based on the error/loss suffered during predictions. This made training and implementing such models significantly easier. AlexNet was designed to classify high resolution images but was not suitable in problems which required action recognition. Panwar et al. [12] proposed the use of shape-based features such as orientation, location of fingers relative to each other, and raised/folded fingers for hand gesture recognition. This approach did not consider factors such as colour or texture of skin as these were highly dependent on the light conditions. They were able to build a system that could recognize a set of 45 gestures in real time and tested it on about 450 images and were able to accurately identify about 94% of them. The proposed system was successful in recognizing static gestures but was not trained to make sense of dynamic hand gestures that included rotation and translation of the hands. Devineau et al. [13] proposed the use of convolutional neural networks to classify gestures based on skeletal data. The drawback of their system was that it only worked on complete sequences of hand gestures and would fail if you were to perform a gesture halfway through. This was because their model did not make use of short time window sequences and instead considered an entire sequence as one input. Ji et al. [14] and Gupta et al. [15] showed that 3D convolutional neural nets outperform comparable methods of gesture recognition. Zhu et al. [16] proposed multimodal gesture recognition using 3D convolution and convolutional LSTM. They showed that learning spatial and temporal features simultaneously generates a more robust classifier for video data.
3 Experımental Set-up The gesture recognition system is a software-based module that uses a neural network to recognize gestures performed in real time and binds them to keyboard shortcuts which can then be used to control any given application. Our aim is to map our predicted output from the model to hotkey bindings for VLC media player, allowing
Hand Gesture Recognition Using 3D CNN and Computer Interfacing
731
us to use gestures to control functions like Pause, Play, Skip Forward 10 s, etc., while playing a video or audio file [17]. In order to train the system to recognize gestures, we are making use of a 3D convolutional neural network architecture. The architecture is implemented using the “torch.nn” class within PyTorch and consists of four convolution layers followed by two fully connected layers. Table 1 describes the architecture of the proposed model. Each convolution layer mentioned above performs the following transformations on the data: 1.
2. 3.
4.
Conv3D—Performs a convolution operation on a 3D tensor using a kernel size of (3*3*3) and a stride of 1. At the edges, the image is padded with 1 pixel in order to ensure no loss of information. The output of said layer is a set of feature maps that only includes features that are significant or distinct, helping the network to learn from them. BatchNorm3D—Normalizes the output feature map, to enable stable execution of data. ELU—Exponential Linear Unit is an activation function that is able to converge cost to zero much faster when compared to other activation functions such as Leaky ReLU and Sigmoid. MaxPool3D—Performes a pooling operation on the feature map where it compresses the input by using the output of a maximum function on a kernel as a substitute to the kernel. We use a pool size of (2*2*2) to ensure that the compressed data does not loose information rapidly.
The output from the convolution layers is then flattened, and the input is forwarded to the fully connected layers. The fully connected layers then outputs nine values which correspond to the nine gesture classes we considered. The best of these is then considered as the predicted label. This is compared to the actual label to confirm whether the prediction was accurate or not. Table 1 Architecture of the 3D convolution neural network
Layer
Input dimensions Output dimensions
Convolution layer 1
3
Convolution layer 2
64
128
Convolution layer 3
128
256
Convolution layer 4
256
64
256
Fully connected layer 1 12,800
512
Fully connected layer 2 512
9
732
H. Mansoor et al.
Fig. 1 Set of video frames from the data set showing a woman rotating her hand anti-clockwise [17]
4 Data Acquisition We chose the 20BN-Jester data set (https://20bn.com/datasets/jester) to train our model. It is a set of 1,48,092 video clips saved in directories numbered from 1 to 148,092 present in a parent directory named 20bn-jester-v1. Each directory corresponds to a clip and consists of JPEG images of variable width and height 100px. These clips each correspond to one of 27 pre-defined gestures being performed by a multitude of individuals in front of a laptop camera or Webcam. Figure 1 represents an example of a woman performing a hand gesture. For our use, we extracted nine gestures from the data set and used them to train our neural network. These gestures were chosen as they are common and relatively easy to remember. They are as follows: a. Doing other things b. No gesture c. Stop Sign d. Swiping Down e. Swiping Left f. Swiping Right g. Swiping Up h. Turning Hand Clockwise i. Turning Hand Counter-Clockwise. We used 37,500 clips to train our model, 4688 clips to validate it and then tested the trained model on 4687 clips.
5 Proposed System Figure 2 describes the framework of the proposed system in detail. We use a subset of the Jester data set to train our model to recognize nine gestures. Once the model has been trained and achieves a good accuracy, we use the trained weights to predict gestures in real time from the input of a Webcam. Once the system has been initialized, it starts a video stream and begins to record video. These frames are put in a queue of 20 frames and each of them are transformed into a PIL image. This PIL image is transformed by using the following transforms from the torch vision transforms class: • CentreCrop (84)—Crops the image at the centre and outputs an image that is 84 pixels high and has variable width. This is because most gestures performed are in this region of the recorded image. This also helps reduce the number of pixels that need to be analysed, reducing computation overheads.
Hand Gesture Recognition Using 3D CNN and Computer Interfacing
733
Fig. 2 Proposed framework showing the process flow from gesture acquisition to recognition and execution of associated function
• ToTensor ()—Converts the image into a 2D tensor. • Normalize ()—Reduces pixel values to within a specific normalized range. Each value is calculated as image = (image - mean)/std. This helps reduce the skew or variation in data which in turn helps the model learn faster. If this set of 20 frames correspond to a prediction value greater than a threshold value (0.7), 10 frames are added to the queue to help stabilize the prediction and make it more concrete and accurate. The predicted output is in the form of an integer between 0–9, which is then converted to a string value such as “Stop Sign” or “Swiping Right” based on the gesture dictionary. This output is mapped to an individual or set of key presses in the mapping .ini file. The .ini extension is used for configuration files for certain platforms such as Windows and MacOS . We configure the system to perform a specific key press corresponding to a specific gesture as given in the file. [MAPPING] Stop Sign = press spacebar (pauses the video playback) Swiping Up = hotkey, ctrl, up (increases the volume) Swiping Down = hotkey, ctrl, down (lowers the volume) Turning Hand Clockwise = press “P” Turning Hand Counterclockwise = press “N” Swiping Left = hotkey, alt, left (rewinds playback 10 s) Swiping Right = hotkey, alt, right (forwards playback 10 s)
734
H. Mansoor et al.
So, for example, when a person swipes right, a hotkey (alt + right) is triggered which actuates a response in the system. These actuations are used to manipulate a video stream on the VLC media player using a multitude of hotkeys [18] described in the VideoLAN documentation wiki.
6 Results and Inferences Our proposed model architecture performs well on the data set. While training the model, we see a gradual decline in the loss metric as shown in Fig. 3. This indicates that if the model is trained using the entirety of the Jester data set, we will see promising results. It was able to achieve an accuracy of 76.40% on the training set with a loss metric value of 0.69 and 66.56% on the validation set with a loss metric value of 0.97. The precision of the system is 85%. Table 2 shows a comparison between two architectures trained on the Jester data set. The model was able to integrate with the application well. The Webcam, once initialized, was able to capture the gesture being performed and predict its gesture class accurately for all nine mapped gestures. The gestures performed the correct
Fig. 3 Loss metric reducing overtime as the model is trained. X axis: time (hours) and Y axis: loss (units)
Table 2 Comparative analysis for models trained on Jester data set
Model
Accuracy (%)
3D CNN (proposed architecture)
76.40
3D ResNet (architecture based on [19])
68.13
Hand Gesture Recognition Using 3D CNN and Computer Interfacing
735
functions on the VLC media player, and this allowed us to take control of the application. We could pause, play, skip forward 10 s and even increase and decrease the volume without making any physical contact with the host system.
6.1 Environmental, Economic and Societal Benefits 1.
2.
3.
The gesture classifier eliminates the need for physical peripherals which can significantly reduce plastic waste generated due to the disposal of electronic items making it an environment-friendly alternative. The cost of purchasing computing machines would also be reduced once such a system becomes widely available and easy to use since the added cost of purchasing peripherals would be eliminated. All of their functionality would be ported to software-based logic which is cheaper to manufacture, scale and distribute. This would help make systems more accessible and easier to use where a few simple gestures would translate to complex routines on the computer.
7 Conclusıon This work was aimed at implementing a hand gesture recognition model and using it to classify gestures and trigger a corresponding function on the host system. After extensive research and testing, we were able to implement a model that was able to achieve an accuracy of 76.40% on the training set that was extracted from the 20BN-Jester data set. This model is capable of recognizing nine gesture classes, out of which seven are mapped to a functionality that allows us to control a media player (VLC). We believe that such a system would allow for interactions with computing devices without the need for physical contact through touch screens or peripherals such as keyboards and mice. If adopted, such a system would eliminate the portion of plastic waste that is generated due to used peripherals being disposed of and help protect the environment. These systems would make interactions with machines more convenient for the general public and even allow disabled individuals to interact with them through simple hand gestures that are easy to learn. Our hand gesture recognition model is scalable, and this allows for it to be used on many platforms such as desktop computers, laptops and TV’s opening up exciting opportunities for its extensive use in the consumer industry.
References 1. Shubham J, Pratik S, Avinash B, Parag H (2016) Review on hand gesture recognition using
736
H. Mansoor et al.
sensor glove. Int J Adv Res Comp Commun Eng (IJARCCE) 5(11):563–565 2. Pradhan A, Ghose MK, Pradhan M, Qazi S, Moors T, E. EL-Arab IM, El-Din HS, Mohamed HA, Syed U, Memon A (2012) A hand gesture recognition using feature extraction. Int J Curr Eng Technol 2(4):323–327 3. Hamdan YB (2021) Construction of statistical SVM based recognition model for handwritten character recognition. J Inf Technol 3(02):92–107 4. Vijayakumar T, Vinothkanna R (2020) Capsule network on font style classification. J Artif Intell 2(2):64–76 5. Manoharan JS (2021) Capsule network algorithm for performance optimization of text classification. J Soft Comp Paradigm (JSCP) 3(1):1–9 6. Sungheetha A, Rajesh S (2020) A novel CapsNet based ımage reconstruction and regression analysis. J Innov Image Process (JIIP) 2(3):156–164 7. Thakur A, Budhathoki P, Upreti S, Shrestha S, Shakya S (2020) Real time sign language recognition and speech generation. J Innov Image Process 2(2):65–76 8. Tesfamikael HH, Fray A, Mengsteab I, Semere A, Amanuel Z (2021) Simulation of eye tracking control based electric wheelchair construction by image segmentation algorithm. J Innov Image Process (JIIP) 3(01):21–35 9. Tripathi M (2021) Analysis of convolutional neural network based image classification techniques. J Innov Image Process (JIIP) 3(02):100–117 10. Biswas KK, Basu SK (2011) Gesture recognition using microsoft kinect®. In: The Proceedings of the 5th international conference on automation, robotics and applications, IEEE, pp 100–103 11. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105 12. Panwar M (2012) Hand gesture recognition based on shape parameters. In: 2012 ınternational conference on computing, communication and applications, IEEE, pp 1–6 13. Devineau G, Moutarde F, Xi W, Yang J (2018) Deep learning for hand gesture recognition on skeletal data. In: Proceedings of 13th IEEE ınternational conference on automatic face & gesture recognition (FG 2018), IEEE, pp 106–113 14. Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231 15. Gupta SD, Kundu S, Pandey R, Ghosh R, Bag R, Mallik A (2012) Hand gesture recognition and classification by discriminant and principal component analysis using machine learning techniques. Int J Adv Res Art Intel 1(9):46–51 16. Zhu G, Zhang L, Shen P, Song J (2017) Multimodal gesture recognition using 3-D convolution and convolutional LSTM. Ieee Access 5:4517–4524 17. Materzynska J, Berger G, Bax I, Memisevic R (2019) The jester dataset: a large-scale video dataset of human gestures. In: Proceedings of the IEEE/CVF ınternational conference on computer vision workshops 18. “QtHotKeys,” Internet: https://wiki.videolan.org/index.php?title=QtHotkeysaction=history. Accessed 10 Feb 2019; 16 Dec 2020 19. Hara K, Kataoka H, Satoh Y (2017) Learning spatio-temporal features with 3d residual networks for action recognition. In: Proceedings of the IEEE ınternational conference on computer vision workshops, pp 3154–3160
Analysis of Prediction Accuracies for Memory Based and Model-Based Collaborative Filtering Models C. K. Raghavendra and K. C. Srikantaiah
Abstract Due to extensive growth of e-commerce industries, recommender systems are widely being used by many sites to provide better services to their customers by assisting them to finding their items of interest. However, there is still scope of improvement in finding the best technique to implement recommender systems. Many techniques have been developed to implement the recommender systems. Both in academic research and commercial applications, collaborative filtering algorithms are very popular and widely used techniques. Neighborhood-based or memory based methods are classic approaches to collaborative filtering methods, while model-based methods, notably matrix factorization techniques, are more modern approaches. In this work we perform the comparative analysis of various approaches of memory and model-based approaches with different similarity measures using MovieLens dataset with three variations 100 k, 1 M and ML small. The results indicate that the item-based approach with adjusted cosine similarity performs better in memory based and in model-based SVD performs better than other algorithms. Keywords Recommender system · Content-based · Collaborative filtering · Similarity measure · Memory based · Model-based
1 Introduction In our daily lives, we frequently rely on recommendations from others in a variety of disciplines, things, or products. Rather of relying on the opinions of a small group of people, recommender systems allow us to rely on the opinions of very large groups of people. People’s preferences are used by these systems to assess them and determine which personalized recommendations they should receive. C. K. Raghavendra (B) · K. C. Srikantaiah Department of Computer Science and Engineering, SJB Institute of Technology, Bangalore, Visvesvaraya Technological University, Belagavi, Karnataka, India e-mail: [email protected] C. K. Raghavendra Department of Computer Science and Engineering, B N M Institute of Technology, Bangalore, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_52
737
738
C. K. Raghavendra and K. C. Srikantaiah
Items and users are the key entities in recommender systems where predictions and recommendations will be done based on these two entities information. This information can be related to independent entities, i.e., either users or items but not both or it can be related to both entities. The former is known as content information and the latter is collaborative information. Collaborative filtering (CF) is a commonly used algorithm that relies its forecasts and recommendations on other users’ ratings or past behavior. This strategy is based on the premise that other users’ data can be selected and integrated in such a way that the active user gets a better prediction. They make the intuitive assumption that if consumers agree on the quality or relevance of some items, they will almost certainly agree on the quality or relevance of others. If a group of people enjoys the same items as Ram, Ram is likely to enjoy the items that they enjoy that he has not yet seen. In this work, we focused on user-based, item-based and model-based CF algorithms which compute predicted ratings for a target. For computing similarity there are several measures that are being used. In this work we experimented with commonly used similarity measures [1], i.e., cosine, Pearson, Jaccard coefficients and adjusted cosine additionally experimentations were carried out to test the performances of the model-based approaches. We provide a clear comparison between them against three variations of same dataset and using the same metrics for evaluation. The remainder of the paper is structured as follows: Section 2 highlights literature review Sect. 3 gives methodology and its components. Section 4 discusses the implementation results and finally a conclusion and some future enhancements are given in Sect. 5.
2 Literature Review Several researchers experimented with different types of recommendation systems across different domains like movies, products, music, jokes and so on. In this section, we highlight research work carried out on various similarity measures and different types of collaborative filtering algorithms. Raghavendra et al. [1] given detailed survey on various recommendation systems, types and research issues across various domains. Melville and Sindhwani [2] Categorized recommendation systems into three major types, i.e., content-based collaborative filtering and hybrid methods. Authors in [3] further classified CF techniques into model-based and memory based. Model-based makes predictions based on model whereas memory based utilizes ratings data of users and items. Machine learning-derived similarity measures play a key role in recommender systems. Each similarity metric is linked to vector space approaches; however, the similarity can be defined in a variety of ways. They can be divided into two categories: distance and degree measurement. For calculating user similarity, various similarity computation algorithms exist. Because the formulas for each similarity are different, they produce different results [4, 5] explains some similarity computation algorithms. The authors of [6] have created a website that employs a variety of recommendation strategies on MovieLens dataset. Every technique has a method for estimating
Analysis of Prediction Accuracies for Memory Based and Model …
739
a new item’s user rating based on data from previous users. They computed the rate of prediction error using Mean Absolute Error Rate (MAE) and Root Mean Squared Error Rate (RMSE). We discovered that Amazon [7] did not apply the previously mentioned standard collaborative filtering process when we looked at their recommendation system. To explain, the Amazon RS does not use user-based or cluster models for a variety of reasons. Amazon opted not to employ these approaches because to the complexity O (mn), with m users and n items similar among those users. Wu et al. [8] created a movie recommendation system that takes into consideration past user ratings of movies in order to provide recommendations to the user. They developed this method by merging the CF algorithm with the Apache framework, and comparing the efficiency of user and item-based RS systems. Authors in [9] developed a model for determining similarity based on preferences of users for an item attribute. The method takes into account user preferences for item attributes, the number of items that are rated together, and the total number of items rated together. This approach develops stronger links between objects and users in order to more effectively identify the preferences of user and make them more relevant to given application. In comparison with other existing methods the proposed method is superior and gives better accuracy in providing quality recommendations. Lamis Al Hassanieh et al. [10] the major goal was to develop a user-based CF algorithm that could predict ratings for a specific user based on the ratings of other users who were similar to them. Further experiments on the same dataset were done to evaluate the performance of the most commonly used similarity metrics, resulting in a clear comparison of these using the same dataset and assessment methods.
3 Methodology The methodology of proposed model is shown in Fig. 1. It mainly consists of three components: 1. 2. 3.
MovieLens dataset Collaborative Filtering Models Evaluation Metrics.
3.1 MovieLens Dataset [14] In this research, we used three data sets related to movies: Movielens 100 K, Movielens 1 M and Movielens-small-latest. • MovieLens 100 K: The MovieLens 100 k data set contains 100,000 ratings on a scale of 1–5 provided to 1682 films by 943 individuals. It also includes basic demographic info for users such as gender, age, zip and occupation as well as movie
740
C. K. Raghavendra and K. C. Srikantaiah
Fig. 1 Proposed methodology
information such as title, date of release, URL, and genre. Between September 19th, 1997, and April 22nd, 1998, data was collected through the MovieLens website. At least 20 ratings are assigned to each user. • Movielens 1 M: Movielens 1 M contains 1,000,209 ratings on a rating scale of 1–5 given by 3952 users who joined MovieLens in 2000–6040 movies. Like Movielens 100 K data set, Movielens 1 M is also collected from the same website and also contain user’s demographic information and movie information. All selected users had rated at least 20 movies. • MovieLens-latest-small: contains 1,00,234 ratings on a rating scale of 1–5 given by 718 users to 8915 movies. The data was created between March 26, 1996 and August 05, 2015. All selected users had rated at least 20 movies.
3.2 Overview of Collaborative Approaches There are several algorithms in recommender systems literature to recommend items based on content information or collaborative information or both. Here we discuss algorithms we used based on collaborative information as our work focuses on collaborative filtering algorithms. Memory based Approaches: Memory based algorithms are very simple and most popular collaborative filtering algorithms. As the name suggests, these algorithms use preferences of similar neighbors to make predictions and recommendations [11, 12]. This can be done as follows: In neighborhood-based algorithms first we need to compute the similarities between the entities to find similar neighbors for the given
Analysis of Prediction Accuracies for Memory Based and Model …
741
entity. Once the neighbors are found, we use the preferences of these neighbors to predict the ratings or scores of unknown items of the given user. Then, recommend the items with highest predicted ratings or scores by ranking the unknown items of the given user in descending order. Therefore, neighborhood-based algorithms are composed of three steps: • Similarity Computation • Rating Prediction • Top-K Recommendation. The key entities in recommender systems are users and items. Based on these two key entities, two neighborhood-based collaborative algorithms are proposed: user-based and item-based CF. User-based CF: The intuition behind user-based is that users like those items which are seen by the users having similar taste. Therefore, feedback given by similar users is used to predict the preferences of the unknown items of the given user. Based on these predicted preferences, items are recommended to the given user. Three steps followed in user-based collaborative filtering algorithm are: • User Similarity: In this step, we need to find the similarities between given user with all other users. In order to find the similarity of two users, we need to consider the items that are consumed by both users. We have several similarity measures to perform this task and these similarity measures are different for different kinds of feedback. Most popular among them are Pearson and cosine similarity measures. • Rating Prediction: Once the similarities are computed, k most similar users are extracted and the preference of every unknown item of the given user is found by aggregating the choices of similar users. The preference can be a rating in case of numerical data sets or a score in the case of positive-only feedback data sets. • Top-K Recommendation: Finally, rank all unknown items in descending order based on their predicted rating or score and recommend the topmost N items in the ranked list. Item-based CF: The general idea underlying the item-based CF is that to use given user data. The three steps in item-based CF are: • Item Similarity: In this step, we need to find the similarities between all items available in the system. In order to find the similarity between two items, we need to consider the users who consumed both these items. Several similarity measures were proposed to perform this task and these similarity measures are different for different kinds of feedback. Most popular similarity measures used in item-based collaborative filtering are Pearson and cosine and adjusted cosine. • Rating Prediction: Once the similarities are computed, k most similar items for the given unknown item from the set of consumed items of the given user are extracted and the preference of this item is found by aggregating the preferences of similar items. Here also, the preference can be a rating in case of numerical data sets or a score in case of positive-only feedback data sets.
742
C. K. Raghavendra and K. C. Srikantaiah
• Top-K Recommendation: Finally, rank all unknown items in descending order based on their predicted rating or score and recommend the topmost N items in the ranked list. Model-based Algorithms: Neighborhood-based algorithms perform well when the data set is dense. Their accuracy tends to decrease with the decrease in the amount of information available in the data set because the similarity computations are not very accurate under sparsity conditions. Furthermore, neighborhood-based algorithms are not scalable with the increase in number of users and items in the data set. Most of the real-world data sets used in recommender systems are very sparse and have large number of users and items. Therefore, neighborhood-based algorithms are not scalable, fast and do not produce accurate results for these data sets. To deal with these problems, model-based algorithms were proposed. Modelbased algorithms first build a model based on the information available in the data set. This involves capturing hidden information in the data set and using the built model to make predictions and recommendations. Thus, when the dataset is sparse with large number of items and users, model-based algorithms are scalable, fast and accurate compared to memory based algorithms. The accuracy of these algorithms, on the other hand, is determined by how well the model fits the real data. In this section, we discuss the most popular model-based algorithm, namely, Matrix Factorization (MF): It is a low-dimensional factor model [13] that uses a small number of latent factors to represent both people and items. The premise is that the user’s preferences for an item are influenced by a small number of factors. For example, if two users give same rating for same movie, say 4 on a scale of 1–5, it is possible that both users appreciate the director of the film or that it is an action film, which is their favorite genre. So, if we can identify these hidden characteristics, we can anticipate how a certain user will rate a particular item. Formally, we define the matrix factorization as follows: Given a user-item rating matrix Y ∈ R m X n where m users and n items. Assume that d is the number of latent factors and r is the number of possible ratings. Now our task is to find two matrices, user latent feature matrix W ∈ R m X d and item latent feature matrix V ∈ R n X d such that their product is approximately equal to Y: W X V T = Yˆ ≈ Y
(1)
Each W row contains weights that represent the user’s preference for each latent factor. Similarly, each row of V shows the amount to which each latent factor characterizes the film. To acquire the prediction of a rating that a user would give to an item, we can dot product the two vectors belonging to user I and item j: yˆi j =
wi v Tj
=
d a=1
wia va j
(2)
Analysis of Prediction Accuracies for Memory Based and Model …
743
3.3 Evaluation Metrics Several evaluation measures were used in the literature to measure the performance of the RS from different perspectives. Prediction accuracy metrics are used to determine how close a prediction is to the actual preference. Researchers utilize a variety of prediction accuracy criteria to assess the predictive accuracy of their algorithms. We selected MAE and RMSE [15–17] evaluation metrics for our experiments. MAE: This statistic assesses an algorithm’s accuracy by comparing the value of each user’s item pair with predicted ratings to actual ratings for in the testset. MAE can be calculated by adding these pairings together and then by the total number of rating prediction pairs. n
i=1 | pi
MAE =
− ri |
n
(3)
RMSE: This statistic accuracy metric that differs from MAE in a few ways. After calculating the rating prediction difference, multiply it by two to get the power of two. RMSE may be calculated by adding them up, dividing them by the total number of predicted rating pairs, and taking the square root of the result. RMSE =
n
i=1 ( pi
n
− r i )2
(4)
where pi is predicted rating of user i, ri is actual rating and n is number of ratings pairs.
4 Experiments and Results 4.1 Item-Based Versus User-Based In Table 1, we compare the performance of different similarity of user-based algorithm. We can observe that Pearson clearly outperforms Cosine and Jaccard coefficients for all data sets. From the results, it is clear that the performance of user-based algorithm with Pearson and Cosine is not same for all data sets and Pearson is preferable to Cosine for user-based algorithm. In Table 2, we compared the performance of different similarity of item-based algorithm. We can observe that adjusted cosine clearly outperforms other coefficients for all data sets. For all data sets, though, Adjusted Cosine outperforms Pearson and Cosine similarity measures, the performance difference between Pearson and Adjusted Cosine is very small and is not statistically significant. For MovieLens 100 K dataset both Pearson and Adjusted Cosine outperform Cosine and this difference is significant. For
744
C. K. Raghavendra and K. C. Srikantaiah
Table 1 User-based CF with different similarity measure Dataset
Performance measures
Similarity measures Pearson
Cosine
Jaccard
RMSE
0.961
0.99
0.987
MAE
0.77
0.794
0.80
ML-latest
RMSE
0.93
0.948
0.96
MAE
0.73
0.74
0.76
ML 1 M
RMSE
0.887
0.894
0.899
MAE
0.692
0.699
0.70
ML-100 k
Table 2 Item-based CF with different similarity measure Dataset ML-100 k ML-latest ML-1 M
Performance measures
Similarity measures Pearson
Cosine
Adjusted cosine
RMSE
0.934
0.95
0.89
MAE
0.734
0.75
0.696
RMSE
0.897
0.90
0.864
MAE
0.704
0.691
0.664
RMSE
0.855
0.852
0.81
MAE
0.667
0.663
0.623
MovieLens 1 M and MovieLens-latest-small data sets, the performance difference between all three similarity measures is very small and is not statistically significant. From Tables 1 and 2, we can see that Pearson is the best similarity measure for userbased and adjusted Cosine are the best similarity measures for item-based algorithm. In Table 3, we compare the performance of user-based algorithm with Pearson and item-based algorithm with Adjusted Cosine. Item-based algorithm gives significantly better performance than user-based algorithm and therefore is more preferable. Table 3 User-based versus item-based
Dataset
Performance measures
User versus item User-based
Item-based
ML-100 k
RMSE
0.961
0.89
MAE
0.77
0.696
ML-latest
RMSE
0.93
0.864
MAE
0.73
0.66
RMSE
0.887
0.81
MAE
0.692
0.623
ML-1 M
Analysis of Prediction Accuracies for Memory Based and Model …
745
4.2 Model-Based Model-based methods are the better approaches for sparsity problems and these methods mainly based on matrix factorization. In this research we experimented with three commonly used and simple model-based approaches which are KNN basic, singular value decomposition (SVD) and Non negative matrix factorization (NMF). KNN basic with number of neighbors equal to 10 gives better results. We compared results of three algorithms in Table 4 where SVD performs better compared to other two models and its accuracy can still be improved by fine tuning using GridSearchCV (Fig. 2). Table 4 Comparison of model-based Dataset
Performance measures
Models KNN
NMF
SVD
ML-100 k
RMSE
0.94
0.922
0.873
MAE
0.73
0.70
0.67
ML-latest
RMSE
0.93
0.917
0.87
MAE
0.728
0.707
0.668
RMSE
0.90
0.929
0.869
MAE
0.683
0.709
0.664
ML-1 M
Fig. 2 RMSE of three algorithms
746
C. K. Raghavendra and K. C. Srikantaiah
5 Conclusion CF algorithms store user’s historical data and purchase details of an item such as news, books, products, movies and music. Various algorithms are available for suggesting the best items, which are based on different approaches. According to CF systems establishing similarity between users and items is critical point in generating recommendations. In this work, CF algorithms are experimented, and detailed analysis has been carried out. We compared memory based and model-based approaches for implementing movie recommendation systems using movieLens 100 k, 1 M and ML-latest-small datasets. In memory based, user-based and item-based algorithms are experimented with commonly used similarity measures. Item-based approach with adjusted cosine similarity outperforms user-based approach. In model-based approaches SVD, KNN and NMF methods were implemented; SVD outperforms other approaches in all cases. As a future enhancement, this comparison will be performed on a larger dataset MovieLens 20 M dataset to get more detailed analysis. The similarities used in the current work can further be experimented with various other similarity measures available in RS literature and considering other type of metrics, which allows us to have a greater idea about these measures and arranging them from best to worst. Collaborative filtering and latent factor methods capture the user related features well and are more powerful for large datasets. A possible enhancement is combining the models of CF and Matrix factorization methods to create a hybrid model which possibly betters the accuracy of the predicted ratings. The hybrid model makes use of key features from each and every model which is part of hybridization and creates more appropriate recommendations for users.
References 1. CK R, KC S, KR V (2018) Personalized recommendation systems (PRES): a comprehensive study and research issues. Int J Mod Educ Comp Sci 10:11–21 2. Melville P, Sindhwani V (2017) Recommender systems. Encyclop Mach Learn Data Min 1056–1066 3. He J, Chu WW (2010) A social network-based recommender system (SNRS). Data Min Soc Netw Data 47–74 4. Jalili M (2017) A survey of collaborative filtering recommender algorithms and their evaluation metrics. Int J Syst Model Simul 2:14 5. CK R, Srikantaiah KC (2021) Similarity based collaborative filtering model for movie recommendation systems. In: 2021 5th international conference on intelligent computing and control systems (ICICCS) 6. Ala A (2013) Recommender system using collaborative filtering algorithm. Tech Libr 155 7. Linden G, Smith B, York J (2003) Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput 7:76–80 8. Wu CSM, Garg D, Bhandary U (2018) Movie recommendation system using collaborative filtering. In: 2018 IEEE 9th international conference on software engineering and service science (ICSESS)
Analysis of Prediction Accuracies for Memory Based and Model …
747
9. He X, Jin X (2019) Collaborative filtering recommendation algorithm considering users’ preferences for item attributes. In: 2019 international conference on big data and computational intelligence (ICBDCI) 10. Hassanieh LA, Jaoudeh CA, Abdo JB, Demerjian J (2018) Similarity measures for collaborative filtering recommender systems. In: 2018 IEEE Middle East and North Africa communications conference (MENACOMM) 11. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens. In: Proceedings of the 1994 ACM conference on computer supported cooperative work—CSCW ‘94 12. Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the tenth international conference on World Wide Web—WWW ‘01 13. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42:30–37 14. Harper FM, Konstan JA (2016) The Movielens datasets. ACM Trans Inter Intell Syst 5:1–19 15. Hamdan YB (2020) Faultless decision making for false information I online: a systematic approach. J Soft Comput Paradigm (JSCP) 2(04):226–235 16. Dhaya R (2021) Analysis of adaptive image retrieval by transition Kalman filter approach based on intensity parameter. J Innov Image Process (JIIP) 3(01):7–20 17. Pandian AP (2019) Artificial intelligence application in smart warehousing environ ment for automated logistics. J Artif Intell 1(2):63–72
Emotion Recognition Using Speech Based Tess and Crema Algorithm P. Chitra, A. Indumathi, B. Rajasekaran, and M. Muni Babu
Abstract Without speech, we cannot communicate with people. While interacting with others, we can understand their emotions and feelings. Nevertheless, it is not right place to connect human with machine. In this paper, we are trying to do is make the machine to comprehend the emotions of human while interacting with them. Here, the resolution is based on the common scenarios. To delving the feelings and emotions from human voice is new and it is a challenging process. The main problem is to find the exact feeling concession from the speech dataset, for this reason, various proofs are recognized after discourse and apt decision with proper arrangement in models. The important difficulty is to get the perfect data for emotion extraction, and it is the main era in artificial intelligence. To overcome that here, we have implemented convolution neural network (CNN) models on datasets such as TESS, CREMA-D, and RAVDNESS by appending speckle noises to the particular datasets for emotion detection. Keywords CNN—convolution neural network · Emotion · Extraction · Noise · Datasets · Classification
P. Chitra (B) Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed to be University), Gandhigram, Dindigul, Tamil Nadu, India e-mail: [email protected] A. Indumathi Department of Computer Applications, Kongunadu Arts and Science College, Coimbatore, India B. Rajasekaran Department of Electronics and Communication Engineering, Vinayaka Mission’s Kirupananda Variyar Engineering College, Salem, Vinayaka Mission’s Research Foundation (Deemed to be University), Salem, India M. M. Babu Department of CSE, IIIT R. K. Valley, Rajiv Gandhi University of Knowledge Technology (RGUKT), Idupulapaya, Andha Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_53
749
750
P. Chitra et al.
1 Introduction Nowadays, we are interacting with machine in many of the applications. The communicating way in human being is in many forms, and it may be in facial expression, hand signs, or in the form of vocal, etc. In our daily automated life, most of the common features like image and voices are used to integrate with machines, and it make the life as easier. Discussion is the more skilled and enthralling way for correlation with peoples. In human being, the emotion is built-in naturally. To find and analyze people emotion, multimodel emotion detection is the advancing trend in intelligence system. The emotions are different among people when speaking in their way, such that it is not same for two people, and it may be varying when we test them under certain circumstance. In speech emotion recognition, the reducing of noise in the sample speech dataset is more complex. Mostly, for speech recognition, the features like Mel frequency spectrum coefficient, sound quality, prosodic, and acoustic are used. Multiple processes are conducted to train the machine exactly as like human and to increase the human machine interaction in closer, which means in reality make the human to fantasize in that way and it hard to make them to find whether the person who is talking with them is a machine or a human [1]. Like speech recognition, extracting audio emotion is really a hard task [2]. In past emotion detection method [3–5], the emotion features are extracted directly from the audio signals, and it does not consider the assortment of the information [6], so it is leads to low efficient in emotion recognition model. In multiple organization, the emotion detection system is helpful to find the false SOS messages and to replace customer care call centers also in marketing [7–10]. The emotion detection also helps to find the mentality of a person under different situations [11]. Different people show their emotion in different ways, exhibiting their emotion is different from one person to other person, so to make them in order, we have to adjust the pitch and energy value, speech rate, loudness, and tone [12]. It is important to extract various features like acoustic and prosodic from speech signal to form an effective emotion recognition system [13]. Diverse datasets are used to get different types of emotion and feelings [14]. Mainly, we are using CREMA-D, TESS, RAVDNESS [15–19] datasets in this paper for multimodular feeling detection. From using these datasets, we can analyze human happiness, sadness, afraid, anger, neural, disgust. In this paper with convolution neural network engineering, we are classifying and processing voice samples from voice datasets for emotion recognition, and for backend, we are using TensorFlow. The convolution neural network consists of three layers that are input layer, hidden layer, and output layer. In feed forward neural network, the middle layer is called as hidden layer because the input and output are concealed by the activation function and some other parameters. The ConvNet model works as similar as neurons in human brain. Convolution neural network (CNN) reduces the error range from 5 to 10% which is cavalier to DNN. If we use CNN LSTM network, the effectiveness is low (Figs. 1 and 2).
Emotion Recognition Using Speech Based Tess and Crema Algorithm
751
Fig. 1 Count of emotions to set the fear, sad, happy, and angry
Fig. 2 Waveplot for audio with fear emotion
2 Proposed Model Here, we are using convolutional neural network (CNN), and the CNN is an artificial network in that the joining sequences among their nodes are determined by presenting the animal visual cortex. Here, we are using three types of datasets that are CREMAD, TESS, RAVDNESS. Before, we enter into the proposed model here, we characterized the features as follows 1) 2) 3)
Mel Frequency Cepstral Coefficient (MFCC)—It is used to convert regular frequency to Mel scaled frequency. Zero Crossing Rate—It is used to note number of times the audio frame rate changes from positive to negative and vice versa. Root Mean Square (RMS)—It is used to calculate value in each frame of data.
752
4) 5)
P. Chitra et al.
Mel Scale Spectrogram (MSS)—It is used to convert the spectrogram frequency into Mel scale frequency. Chroma STFT—It is used to create chromogram from wave spectrogram.
Initially, the testing audio is given as input along with noise added with the audio. Here, the Librosa library is used, and its main process is to analyze the given audio and get the exact data from the given audio files. Then, we are identifying the emotions by bar graph, in which all the emotions of the dataset are assigned as integers in the graph. Then for next process, we are plotting the waveplot and spectrograph for all possible emotion in the given dataset, then we are adding some noise to the given dataset, and again, we are plotting the waveplot and spectrograph now we are comparing the dataset with and without noise. After that, we are extracting the features from the dataset again combine the information and separate them into test and train dataset. Then, the collected data are passed to the CNN model and its further layers to train the dataset then by standard scalar, we are scaling the values to get the values as per our convenient. Then, we used ReLU for actuation work, then until the end, we applied softmax to standardize the result. To get best result, we used reduce learning rate as follows (factor = ‘0.4’, monitor = ‘loss’, min_lr = 0.0000001, patience = 2, verbose = 1). Finally, we plotted the value in confusion matrix with classification and accuracy report.
3 Convolution Method of Implementation We used to utilize the convolution neural network in this paper, in which five layers of neural network are assembled in one model. Those five layers are dropout layer with the count of 3, 2 flatten layer, and 2 fully connected layer (dense). Convolution layers and max pooling are the main theme in this model. The upper layers are combined with convolution layer and the max pooling layer. The lower layer is combined with flatten layer and fully connected layers. To get more accuracy in this model, we included some noise with the data, so the network will identify the features of training data in less effect because it changes the frequencies and also hard to find the low-pitch sound frequencies. By fixed rate, it increases the time over a period to stretch the data. Shift the data either in right or left side with randomized value can help in complete distribution of data points between the interval k, and if we change the pitch of data it led to random change in pitch. To classify multiple classes, hot encoding is used. After that splits, the data into testing and training dataset then by use of standard scaler going to standardize the data between 0–1, and based on this, we adjusted the dimension of the model. Finally, we built a model with 3-layer kernel size in which 5 layers are convolution layers and 5 are max pooling layers and by using activation function rectified linear unit (ReLU), a dense layer is created with 256 units also the dropout function with rate of 0.3 (Fig. 3).
Emotion Recognition Using Speech Based Tess and Crema Algorithm
753
Fig. 3 DataFlor diagram to implement the training condition
In our explained model, we had used 5 layer in convolution neural network (CNN), which is composite of 5 max pooling layers, 5 convolution layers, 2 dense layers, 3 dropout layers and finally, 1 flatten layer. Every layer has its own functionality. Then, we had used ReLU activation function, the first layer convld_1 is fed with 256 filters, then it is passed as an input to the max pooling layer that is max_poolingid_1 with the pool size of 3, and we get the reduced shape of (none, 53, 256), now it is given as the input to the second convolution layer that is convld_2. This convld_2 layer now decreases the filters as half that is 256 filters into 128 filters. Now, the result is (none, 51, 128) again it is given as input to max_poolingid_2. It contain the pool size of 3 so it is decreased to 17 from 51, now we are getting the result as (none, 17, 128) it was passed to dropout layer that is dropout_1, now the dropout function with 0.25, and it is passed to next convolution layer that is convld_3. This convld_3 layer decreases 128 filters into 64 filters, and the shape is now (none, 15, 64), now we are giving this input to the third max pooling layer that is max_poolingid_3 with pool size of 3, now it decreased from 15 to 17, and that shape is (none, 15, 64) again we give this to second dropout layer that is dropout_2 with the dropout function value 0.25, again we are feeding this into fourth convolution layer that is convld_4, it decrease the shape to (none, 3, 64) again we are feeding this into next max pooling layer that is max_poolingid_4, it does not affect the shape and the counts of filter, and the output of these layer is fed into fifth convolution layer that is convld_5 it changes the shape into (none, 1, 32), then it is given to next max pooling layer that is max_poolingid_5, and it does not change the feature and shape. Now, the layers are flattened, and it converts to [1, 1] again we are going to feed this into fully connected layer that is dense_1 layer, now the output shape is (none, 256), then it is fed to dropout_3, and it does not change the shape because the units only will drop, then it is connected to last dense layer that is dense_2, and its result shape is (none, 8)
754
P. Chitra et al.
Table 1 Accuracy result for dataset Tess Crema-D, Ravdness
S. No.
Dataset name
Accuracy (%)
1
Ravdness condition
95
2
Tess condition
96
3
Crema-D condition
85
4 Results and Discussion We had used 3 types of datasets, they are Ravdness, Crema-D, and Tess to implement the condition Table 1. In Table 2, we showed the accuracy of the datasets, and in Table 2, we showed the three kinds of dataset outcomes and their values like F1-score, emotion of dataset, support of dataset, precision, and recall values. i.
ii.
iii.
Ravdness: The following snapshot shows the result which we get while we using the Ravdness dataset. We plotted the values through graph while training and testing accuracy and their loss. From this, we can be able to get the accuracy of 96% also the confusion matrix of dataset is plotted and it also represented in Fig. 3. Tess: The following snapshot shows the result which we get while we are using the Tess dataset. We plotted the values through graph while training and testing accuracy and their loss. From this, we can be able to get the accuracy of 99% also the confusion matrix of dataset is plotted, and it also represented below. Crema-D: The following snapshot shows the result which we get while we are using the Crema-D dataset. We plotted the values through graph while training and testing accuracy and their loss. From this, we can be able to get the accuracy of 84% also the confusion matrix of dataset is plotted, and it also referenced underneath (Figs. 4, 5, 6, 7, 8, 9 and 10).
5 Conclusion and Future Work By convolution neural network (CNN), we can be able to find the emotions even with the presence of noise, and in future, we will detect these problems using natural language processing techniques and deep learning method. From these types of models, we can connect with humans easily and understand their feelings and emotions effectively.
Emotion Recognition Using Speech Based Tess and Crema Algorithm
755
Table 2 Dataset outcomes Precision
Recall
F1-score
Support
Angry
0.94
0.96
0.95
141
Calm
0.96
1.00
0.98
129
Disgust
0.99
0.94
0.96
145
Fear
0.98
0.98
0.98
161
Happy
0.94
0.99
0.96
148
Neutral
0.99
0.98
0.98
83
Sad
0.99
0.93
0.96
142
Surprise
0.95
0.95
0.95
131
Accuracy
0.96
1080
Macro avg
0.97
0.97
0.97
1080
Weighted avg
0.97
0.96
0.96
1080
Angry
0.99
1.00
0.99
282
Disgust
0.98
0.99
0.98
302
Fear
1.00
1.00
1.00
295
Happy
0.99
0.97
0.98
313
Neutral
1.00
1.00
1.00
314
Sad
1.00
1.00
1.00
301
Surprise
0.99
0.98
0.98
293
0.99
2100
Macro avg
0.99
0.99
0.99
2100
Weighted avg
0.99
0.99
0.99
2100
Tess_Dataset
Accuracy
CREMA-D_dataset Angry
0.96
0.93
0.94
774
Disgust
0.77
0.84
0.80
726
Fear
0.88
0.76
0.82
748
Happy
0.87
0.86
0.86
812
Neutral
0.77
0.82
0.79
616
Sad
0.78
0.81
0.80
790
Accuracy
0.84
4466
Macro avg
0.84
0.84
0.84
4466
Weighted avg
0.84
0.84
0.84
4466
756
Fig. 4 Ravdness_dataset test loss graph
Fig. 5 Tess_dataset test loss graph
P. Chitra et al.
Emotion Recognition Using Speech Based Tess and Crema Algorithm
Fig. 6 CREMA-D_dataset test loss graph
Fig. 7 Ravdness_dataset test accuracy graph
757
758
Fig. 8 Tess_dataset test accuracy graph
Fig. 9 CREMA-D_dataset test accuracy graph
P. Chitra et al.
Emotion Recognition Using Speech Based Tess and Crema Algorithm
759
Fig. 10 Confusion matrix
References 1. Triantafyllopoulos A, Liu S, Schuller BW (2021) Deep speaker conditioning for speech emotion recognition. IEEE Int Conf Multi Expo (ICME) 2021:1–6. https://doi.org/10.1109/ICME51 207.2021.9428217 2. Zamil AAA, Hasan S, Jannatul Baki SM, Adam JM, Zaman I (2019) Emotion detection from speech signals using voting mechanism on classified frames. In: 2019 international conference on robotics, electrical and signal processing techniques (ICREST), 2019, pp 281–285. doi: https://doi.org/10.1109/ICREST.2019.8644168 3. Ghaleb E, Popa M, Asteriadis S (219) Multimodal and temporal perception of audio-visual cues for emotion recognition. In: 2019 8th international conference on affective computing and intelligent interaction (ACII), 2019, pp 552–558. doi: https://doi.org/10.1109/ACII.2019.892 5444 4. Zhang B, Quan C, Ren F (2016) Study on CNN in the recognition of emotion in audio and images. In: 2016 IEEE/ACIS 15th international conference on computer and information science (ICIS), 2016, pp 1–5. doi: https://doi.org/10.1109/ICIS.2016.7550778 5. Dolka HAXVM and Juliet S (2021) Speech emotion recognition using ANN on MFCC features. In: 2021 3rd international conference on signal processing and communication (ICPSC), 2021, pp 431–435. doi: https://doi.org/10.1109/ICSPC51351.2021.9451810 6. Huang Z, Dong M, Mao Q, Zhan Y (2014) Speech emotion recognition using CNN. In: Proceedings of the 22nd ACM international conference on multimedia. doi: https://doi.org/10.1145/ 2647868.2654984 7. Badsha F, Islam R, Department of mathematics and statistics, Bangladesh University of Business and Technology (BUBT), Dhaka, Bangladesh. Mathematics Discipline, Science, Engineering and Technology School, Khulna University, Khulna, Bangladesh. doi: https://doi.org/ 10.4236/ajcm.2020.104028
760
P. Chitra et al.
8. Khosla S (2018) EmotionX-AR: CNN-DCNN autoencoder based emotion classifier. Proc Sixth Int Works Natural Lang Process Soc Media. https://doi.org/10.18653/v1/w18-3507 9. Basu S, Chakraborty J, Bag A, Aftabuddin M (2017) A review on emotion recognition using speech. Int Conf Invent Commun Comput Technol (ICICCT) 2017:109–114. https://doi.org/ 10.1109/ICICCT.2017.7975169 10. Lukose S, Upadhya SS (2017) Music player based on emotion recognition of voice signals. In: 2017 international conference on intelligent computing, instrumentation and control technologies (ICICICT), 2017, pp 1751–1754, doi: https://doi.org/10.1109/ICICICT1.2017.834 2835 11. Zhang J, Liu Z, Liu P, Wu B (2021) Dual-waveform emotion recognition model for conversations. IEEE Int Conf Multi Expo (ICME) 2021:1–6. https://doi.org/10.1109/ICME51207. 2021.9428327 12. Sarah KD, Morningstar M, Dirks MA, Qualter P (2020) Ability emotional intelligence: what about recognition of emotion in voices? Person Indiv Diff 160:109938. ISSN 0191–8869, doi: https://doi.org/10.1016/j.paid.2020.109938 13. Muljono MRP, Harjoko A, Supriyanto C (2019) Speech emotion recognition of indonesian movie audio tracks based on MFCC and SVM. In: 2019 international conference on contemporary computing and informatics (IC3I), 2019, pp. 22–25, doi: https://doi.org/10.1109/IC3 I46837.2019.9055509 14. Mittal T, Bhattacharya U, Chandra R, Bera A, Manocha D (2020) M3ER: multiplicative multimodal emotion recognition using facial, textual, and speech cues. Proc AAAI Conf Artif Intell 34(02):1359–1367. https://doi.org/10.1609/aaai.v34i02.5492 15. Lynn MM, Su C, Maw KK (2018) Efficient feature extraction for emotion recognition system. In: 2018 4th international conference for convergence in technology (I2CT), 2018, pp 1–6, doi: https://doi.org/10.1109/I2CT42659.2018.9058313 16. Zaman SR, Sadekeen D, Alfaz MA, Shahriyar R (2021) One source to detect them all: gender, age, and emotion detection from voice. In: 2021 IEEE 45th annual computers, software, and applications conference (COMPSAC), 2021, pp 338–343, doi: https://doi.org/10.1109/COM PSAC51774.2021.00055 17. Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D and 2D CNN LSTM networks. Biomed Signal Process Control 47:312–323. https://doi.org/10.1016/j.bspc. 2018.08.035 18. Tariq Z, Shah SK, Lee Y (2019) Speech emotion detection using IoT based deep learning for health care. IEEE Int Conf Big Data (Big Data) 2019:4191–4196. https://doi.org/10.1109/Big Data47090.2019.9005638 19. Jain U, Nathani K, Ruban N, Joseph Raj AN, Zhuang Z, Mahesh VGV (2018) Cubic SVM classifier based feature extraction and emotion detection from speech signals. In: 2018 international conference on sensor networks and signal processing (SNSP), 2018, pp 386–391, doi: https://doi.org/10.1109/SNSP.2018.00081
Wireless Data Transferring of Soldier Health Monitoring and Position Tracking System Using Arduino K. SuriyaKrishnaan, Gali Mahendra, D. Sankar, and K. S. Yamuna
Abstract Hostile warfare is a critical component of any nation’s security. The army, navy, and air force are the primary pillars of national security. Soldiers play the important and vital role. Protection from harm is always a predominant concern for a person serving in military. The department of armament in a country has the responsibility to satisfy the country’s safety expected guarantee. Soldiers engaged in missions or special operations will benefit from this proposed technology. This technique allows the soldiers to be tracked via global positioning systems (GPSs). Mobile computing, medical sensors, and communication are the essential modules used in this technology. The troops are connected to sophisticated sensors in this system. During total mobility, the connection is accomplished via a personal server. A wireless connection connects this personal server to the base station’s server. Each soldier also carries a GSM module, which allows him to communicate with the base station in the need of any crisis. It becomes extremely difficult for the army base station to keep track of the position and health of all soldiers. Therefore, this idea is to monitor continuously the soldier’s health while also identifying his situation during the battle. Keywords Arduino · Global positioning systems (GPSs) tracking · Global system for mobiles (GSM) modem · Mhealth · Nations security
K. SuriyaKrishnaan · G. Mahendra (B) · D. Sankar · K. S. Yamuna Sona College of Technology, Salem, Tamil Nadu, India e-mail: [email protected] D. Sankar e-mail: [email protected] K. S. Yamuna e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. Suma et al. (eds.), Inventive Systems and Control, Lecture Notes in Networks and Systems 436, https://doi.org/10.1007/978-981-19-1012-8_54
761
762
K. SuriyaKrishnaan et al.
1 Introduction Soldiers have a high risk of accidents almost always. He battles in the most challenging terrains, such as slopes, mountains, plains, and forests. Soldiers have an important role in guarding the frontiers of his country. His main priority is to keep the country safe. He fights for the nation to protect it and is ready to lay down his life for it. Hence, every citizen owes to our military. Therefore, this concept which has the potential to be incredibly useful in terms of enhancing warrior health and providing medical support in combat circumstances is proposed. This system mainly focusses on monitoring the soldiers’ health such as heartbeats and body temperature. If a soldier is shot or injured, his heart rate gradually increases or decreases. Thus, this project appears to be the appropriate solution to provide the physicians on the server Website with accurate and timely information in situations when the heart rate is critical. If the heart rate rises over the crucial level or falls below the critical threshold, the GSM modem sends a message to the server. The GPS sniper provides the soldier’s current position, which is useful to pinpoint the soldier’s exact location to deliver medical assistance as soon as feasible. If a soldier is injured, a message will be sent to healthcare in the area or to the bottom station using the GSM electrical devices connected to the station. The goal of this project is to develop health status indicators that are reliable. The technique aims to employ non-invasive sensors to measure heart rate and body temperature. Signal acquisition circuits filter and amplify signals to create the required output. The circuit’s components are low-powered and inexpensive. The non-inheritable knowledge is provided to a small controller in real time through ADC.
2 Literature Survey Jasvinder Singh, et al. proposed a troop locating and health signal system based on the global positioning system (GPS) and the Internet of things (IoT) in 2019. Troopers can connect with one another from any location, making it easier for soldiers to communicate during emergency. Simple circuit and less-power consumption, as well as the use of low-power peripherals and an ARM CPU, reduce the module’s overall power consumption [1]. The peripherals utilized are tiny and lightweight, allowing soldiers to carry them safely and securely. Soldiers’ whereabouts may be traced via GPS from anywhere in the world, and the health system checks their vital health metrics, ensuring trooper safety and security. Niket Patil devised a health monitoring and tracking system in 2018. The study proposed a soldier health monitoring and tracking system based on the Internet of things. The guided module was mounted on the soldier’s body to determine their
Wireless Data Transferring of Soldier Health Monitoring …
763
health status and provide their location via GPS [2]. IoT was used to send the information to the base station. An occasional value circuit worked on the specified module to protect the precious military life on the battlefield. In 2018, Akshay Gondalic and colleagues created an IoT-based care observation system for military personnel based on machine learning. Using GPS, temperature detecting components, heart rate sensors, and other sensors, the base of operations station can track and monitor the whereabouts of soldiers while also monitoring their medical state. The ZigBee technology was used to wirelessly transfer data from the sensors and GPS to the soldiers [4]. In addition, the LoRaWAN network technology has been promoted for use between the leader and base station in conflict zones where cellular network coverage is either missing or data transfer is unfeasible. The collected data were scrutinized. They suggested that the bunch be uploaded to the cloud for additional knowledge analysis and prediction use. William Walker A. L., et al. forecasted a mobile health monitoring system in 2018. The authors discussed a variety of wearable, portable, low-weight, and small-size biosensors that have been created to monitor military health. The BSN comprised of sensors such as heart rate, temperature, and gas sensors that may be worn by soldiers to monitor their health in real time [5]. The research proposed a method for developing a system for real-time soldier health monitoring that comprised of linked BSNs. Afef Mdhaffar et al. proposed a study on IoT-based health monitoring through LoRaWAN in 2017, in which gathered bio sensor data were supplied to an analysis module using the LoRaWAN network architecture for low cost, low power, and secure communication [3]. Heart rate, temperature, and glucose levels were monitored in remote areas where cellular network connectivity is non-existent. The typical area covered by LoRaWAN was around 33 km when the LoRaWAN gateway was located outdoors at a 12 m altitude. The monitoring module was found to consume ten times less power than existing long-range cellular alternatives like GPRS/3G/4G.
3 Proposed Methodology This project has 2 sections, namely transmitter section and receiver section. The transmitter section consists of ATmega328 microcontroller, heartbeat device, LM35 temperature device, GSM SIM800L, GPS modem, LCD display, and danger switch. In receiver section, registered sim card and mobile or LCD display are present. The transmitter section is placed within the soldier unit. In the transmitter section, heartbeat device and LM35 temperature sensors measure interfaced with ATmega328 microcontroller, and so ATmega328 microcontroller is connected to GSM and GPS electronic equipment. The heartbeat and temperature devices collect information and send it to the receiver section through GSM. The receiver section is placed within the base station. In the receiver section, the registered sim card, liquid crystal display, or mobile display square are used to see the information received from the sender unit (Figs. 1 and 2).
764
K. SuriyaKrishnaan et al. Soldier Unit
Heart beat sensor
GSM Modem ATmega328
Temperature Sensor
Microcontroller
Danger switch
GPS Modem
LCD Display
Fig. 1 Block diagram of the proposed methodology
Fig. 2 Arduino
4 Hardware Parameters 4.1 Arduino Arduino Uno is the central controller for this system present in the transmitter section and conjointly in the receiver section. Arduino Uno relies on ATmega328. The Arduino Uno includes fourteen digital pins like input and output pins at the side
Wireless Data Transferring of Soldier Health Monitoring … Table 1 Arduino description
765
Microcontroller
Atmega328p
Input voltage
6–20 V
Digital pins
14
PVM digital pins
6
Analog input pins
8
Clock speed
16 MHz
of half-dozen analog inputs, a 16 MHz quartz universal serial bus, a button, a USB port, and a power jack. Arduino Uno operates on an external offer from 6 to 20 V. If RF module is used for this board, it operates at 7–12 V. The Arduino Uno is equipped with thirty male I/O headers in dip-30 configurations. The board is equipped with sets of digital and analog input/output (I/O) legs that are connected to colorful expansion boards (securities) and separate circuits. The USB string or an external 9-volt battery is used to charge it that takes voltages between seven and twenty volts. It is analogous to the Arduino Nano and the painter. The tackle reference design is available on the Arduino computer and is certified under creative commons attribution share-alike pass of 0.5 license. Numerous performances of the tackle unit of measurement have layout and manufacturing lines available. It unendingly monitors the signals from the sensors if any abnormality just like the heartbeat inflated or weakened, or the temperature rises or falls, and the location info obtained from the GPS electronic equipment. It distributes the message to a central location with the assistance of a GSM electronic equipment. The interface of the small controller is connected to the GSM and GPS as shown in Table 1.
4.2 5 V Regulator The purpose of a voltage controller is to keep the voltage of a circuit close to the set point. Because a power source regularly provides raw current, it may harm one of the circuit’s factors. Voltage controllers are among the most extensively used electronic factors. Input voltage is converted to 5 V by the voltage controller. The Arduino Uno board is turned on by this signal. The power expansion rail’s 5 V pin is used to access the 5 V generated here. One ceramic capacitor and two electrolytic capacitors are present (Fig. 3).
4.3 Heartbeat Sensor The heartbeat sensor makes it simple to examine how the heart works. The flow of blood through the finger is monitored by this sensor. The volume of blood in the finger fluctuates because heart pumps blood through the blood vessels within the finger.
766
K. SuriyaKrishnaan et al.
Fig. 3 Pulse sensor
The sensor beams a lightweight lobe through the finger and detects the quantum of light that reaches the LDR. The electrical outfit amplifies the signal from the LDR, which is filtered and transferred to the ADC. The device utilized in this project is pulse sensor-SEN-11574. Pulse information is very helpful for determining the health of an individual. The heartbeat device is a plug and a pulse device for Arduino. It primarily combines an easy optical pulse device with amplification and noise cancelation electronic equipment creating it to be quick and simple to urge reliable pulse counts. It sips power with four mA current draw at 5 V. To use it, the heartbeat device is clipped to the lobe or tip as shown in Table 2 (Fig. 4). The system features an infrared LED, a photo transistor sensor, high- and low-pass filters, an amplifier, comparator, and output LED. An oscilloscope is used to display the signal. The IR LED is utilized to emit infrared light to a person’s finger. Before impacting the print transistor, blood pressure oscillations within the finger regulate the light intensity. The detector translates the changing light intensity into a commensurable voltage with two factors; a big DC out-set corresponding to the average light intensity, and a minor variable signal caused by the changing blood pressure. The voltage signal is also amplified after passing through a high-pass sludge to exclude the DC element. Before displaying the signal on an oscilloscope, low-pass filtering is used to remove any high-frequency noise. Eventually, a voltage comparator Table 2 Pulse sensor description
Width
16 mm
Height
3.1 mm
Supply volt min
3.3 V
Supply volt max
5V
Wireless Data Transferring of Soldier Health Monitoring … Fig. 4 Heartbeat sensor block diagram
IR LED
Photo Sensitive sensor
767
Low-Pass Filter
Low-Pass Filter Display pulses
Comparator
Reference voltage
compares the signal to the reference voltage, and if the voltage signal is lesser than the specified threshold, an affair LED illuminates, motioning a twinkle. The beats generated by this process are applied to the microcontroller’s counter. The microcontroller’s counter counts the number of beats for 5 s, multiplies it by 12, and shows the result as a twinkle rate per nanosecond because the results must be expressed in beats per nanosecond (bpn). As detecting heartbeats imply detecting light changes on the finger, this method necessitates the use of very accurate technique for collecting light fluctuations. Therefore, the light source must be efficient such that light travels through the finger efficiently making detection more straightforward. Hence, the IR LED, also known as an IR transmitter, has been employed (Figs. 5 and 6). Fig. 5 Heartbeat pulse
768
K. SuriyaKrishnaan et al.
Fig. 6 LM35 sensor
4.4 Temperature Sensor The temperature may be detected with a temperature detecting element LM35. The luminous flux unit series are measured with perfection using microcircuit temperature detectors. The station alerts the base station if a high-temperature variation is noted. The LM35 is a perfection integrated circuit temperature detector with a linearly commensurate affair voltage. A nonstop temperature measurement in analog form is handed by an LM35 temperature detector linked to leg 6 VIN. The analog value is converted to digital form when a lower to higher pulse is applied to the microcontroller PIC18F, which has an inbuilt ADC. When the value is successfully converted, the interrupt is sent to the microcontroller. As a result, the microcontroller runs interrupt service routine (ISR) which is a software procedure invoked by hardware in response to an interrupt. ISR looks at an interrupt and decides what to do with it. The interrupt is handled by ISRs, which then return a logical interrupt value where it reads the converted value in ADC by delivering a high to low pulse to pin 2 RD, where a 10 mV increase in voltage occurs for every °C increase in temperature as shown in Table 3.
Table 3 LM35 sensor description
Pins
3
Range
−55 to 150 °C
Accuracy
0.5 °C
Operating volts
4–30 V
Wireless Data Transferring of Soldier Health Monitoring … Table 4 GPS description
769
Frequency
1575.42 MHz
Lower power
60 mW/40 mW
Chip rate
1.023 MHz
NMEA
0183 and binary protocol
Since the LM35’s direct temperature detector is calibrated in Kelvin, it is not needed to remove a significant constant voltage from its affair to gain suitable centigrade scaling. Trimming estimation at water position ensures low cost. The LM35’s low affair impedance, direct affair, and perfect natural estimation make it easy to affiliate to a readout. The LM35’s function in this design is to cover body temperature. The LM35 device does not require the user to calculate an outsized constant voltage from the affair to gain accessible centigrade scaling. This is an advantage of the LM35 device over linear temperature sensing elements that are graduated in Kelvin. To determine the soldier’s health status, the base station must be aware of the soldier’s blood temperature and rate. Therefore, a low-cost temperature detecting element called LM35 body biosensor that does not require a signal is used. The output voltage of the LM35 does not need to be amplified since it is higher than that of the thermocouples. Since the temperature rises beyond the required level, the GSM module can immediately warn the base station, bypassing the need to wait for heartbeats to rise above the normal range.
4.5 GPS Modem The position of the soldier can be tracked with the GPS electronic outfit. The GPS electronic outfit collects satellite signals, calculates the latitude, and provides it to the regulator in the form of a periodic investigation. The GPS unit contains a GPS module with a GPS receiver antenna. The module functions harmonious with its design and the antenna receives the data from the GPS satellite in NMEA format. This knowledge is transferred to the small regulator whereby it is decrypted to the required format and transferred. The GPS module ceaselessly transmits periodical knowledge (RS232 protocol) within the type of rulings, harmonious with NMEA norms. The latitude, longitude, time, date, and speed values of the receiver are measured within the GPRMC judgment. In this design, these values are uprooted from the GPRMC judgment and are displayed on liquid demitasse display as shown in Table 4 (Fig. 7).
770 Fig. 7 GPS block diagram
K. SuriyaKrishnaan et al.
Start Base station Select GPS modem Select gsm = 1 Select gps = 0
• • • • • • •
GSM Modem
Set Baud rate 4800 bps
Arduino
Check GPRMC
Signal From Satellite
Start. Choose GPS Modem. Set the baud rate to 4800 bits per second. Check to see whether you get a response from the GPRMC. Examine the satellite signal to check if it was received. The satellite signal should be sent to the Arduino. Send the position (latitude and longitude) to the base station using a GSM Modem.
4.6 GSM Modem GSM SIM800L is used in this design, which is a bitsy cellular module that allows GPRS transfer, transferring and entering SMS, and making and ending voice calls. Low value and little footmark and quadrangle band frequency support make this module, a good resolution for any design that needs long varying property. The GSM electronic outfit is used to give power information of the soldier just like the twinkle rate and the body heat to an overseas position. It is like a mobile which needs a SIM card for its operation. The advantage of GSM electronic module over mobile is the presence of AN periodical property, which is directly connected to the small regulator for generating the attention (AT) commands and SMS as shown in Table 5 (Fig. 8).
Wireless Data Transferring of Soldier Health Monitoring … Table 5 GSM description
Fig. 8 GSM block diagram
771
Frequency
9000 MHz/1800 MHz
Input volt
5–12 V
Current