153 72 20MB
English Pages 730 Year 2022
Smart Innovation, Systems and Technologies 302
P. Karuppusamy Fausto Pedro García Márquez Tu N. Nguyen Editors
Ubiquitous Intelligent Systems Proceedings of Second ICUIS 2022
Smart Innovation, Systems and Technologies Volume 302
Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-Sea, UK Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago, DBLP. All books published in the series are submitted for consideration in Web of Science.
P. Karuppusamy · Fausto Pedro García Márquez · Tu N. Nguyen Editors
Ubiquitous Intelligent Systems Proceedings of Second ICUIS 2022
Editors P. Karuppusamy Department of EEE Shree Venkateshwara Hi-Tech Engineering College Gobichettipalayam, India
Fausto Pedro García Márquez Castilla-La Mancha University Ciudad Real, Spain
Tu N. Nguyen Department of Computer Science Kennesaw State University Marietta, GA, USA
ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-19-2540-5 ISBN 978-981-19-2541-2 (eBook) https://doi.org/10.1007/978-981-19-2541-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
We are honored to dedicate the proceedings of 2nd ICUIS 2022 to all the participants, organizers, and editors of this conference proceedings.
Preface
This volume presents the proceedings of the second International Conference on Ubiquitous Computing and Intelligent Information Systems (ICUIS 2022), which took place on March 10–11, 2022, from Shree Venkateshwara Hi-Tech Engineering College, Tamil Nadu, India. ICUIS 2022 is a forum dedicated to the researchers and academicians to exchange the state-of-the-art knowledge about computing and communication technologies by discussing different variety of topics including ubiquitous communication models, predictive big data technologies, artificial intelligence-driven advanced intelligent systems, and so on. The conference has been a good opportunity for the overseas participants to present and discuss their respective research areas amidst the global pandemic situation. The conference includes different technical sessions by categorizing the computing and communication systems. The main aim of these sessions is to disseminate the state-of-the-art research results and findings and discuss the same with the session chair, who have professional expertise in the same field. In this second edition, totally 324 papers were submitted by the authors from all over the world, and out of these, about 57 papers were selected to present at the conference. We were really honored and delighted to have prominent guests as keynote speakers and session chairs of the conference event. The grand opening of the conference is with the distinguished keynote speaker: Dr. R. Kanthavel, Professor, Department of Computer Engineering, King Khalid University, Abha, Kingdom of Saudi Arabia. We are pleased to thank the keynote speaker for taking over the entire session in a more enjoyable and knowledgeable way.
vii
viii
Preface
We would like to thank all the participants of ICUIS 2022 for their innovative research contribution. Many thanks goes as well to the reviewers for delivering their technical support throughout the conference. Our special thanks goes to the faculty members and our institution Shree Venkateshwara Hi-Tech Engineering College for their impeccable assistance in the overall organization of the conference. We hope that this ICUIS 2022 proceedings will give the readers an interesting and enjoying experience. Guest Editors—ICUIS 2022 P. Karuppusamy Shree Venkateshwara Hi-Tech Engineering College Gobichettipalayam, India Fausto Pedro García Márquez Full Professor at Castilla-La Mancha University Ciudad Real, Spain Tu N. Nguyen Professor, Director of Network Science Department of Computer Science Kennesaw State University Marietta, GA, USA
Contents
Development of a Linear-Scaling Consensus Mechanism of the Distributed Data Ledger Technology . . . . . . . . . . . . . . . . . . . . . . . . . . Gennady Shvachych, Ivan Pobochii, Hanna Sashchuk, Oleksandr Dzhus, Olena Khylko, and Volodymyr Busygin
1
A Six-Point Based Approach for Enhanced Broadcasting Using Selective Forwarding Mechanism in Mobile Ad Hoc Networks . . . . . . . . . D. Prabhu, S. Bose, T. Anitha, and G. Logeswari
15
Crop Price Prediction Using Machine Learning Naive Bayes Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Vikram, R. Divij, N. Hishore, G. Naveen, and D. Rudhramoorthy
27
The Performance Evaluation of Adaptive Energy Conservation Scheme Using IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syed Imran Patel, Imran Khan, Karim Ishtiaque Ahmed, V. Raja Kumar, T. Anuradha, and Rajendra Bahadur Singh Towards the Prominent Use of Internet of Things (IoT) in Universities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdool Qaiyum Mohabuth
35
45
Depression Analysis of Real Time Tweets During Covid Pandemic . . . . . G. B. Gour, Vandana S. Savantanavar, Yashoda, Vijaylaxmi Gadyal, and Sushma Basavaraddi
55
Diabetic Retinopathy Detection Using Deep Learning Models . . . . . . . . . . S. Kanakaprabha, D. Radha, and S. Santhanalakshmi
75
Study of Regional Language Translator Using Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Santhi, J. Aarthi, S. Bhavatharini, N. Guna Nandhini, and R. Snegha
91
Fabric Defect Detection Using Deep Learning Techniques . . . . . . . . . . . . . 101 K. Gopalakrishnan and P. T. Vanathi ix
x
Contents
Analysis of Research Paper Titles Containing Covid-19 Keyword Using Various Visualization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Mangesh Bedekar and Sharmishta Desai Survey on Handwritten Characters Recognition in Deep Learning . . . . . 123 M. Malini and K. S. Hemanth A Survey on Wild Creatures Alert System to Protect Agriculture Lands Domestic Creatures and People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 K. Makanyadevi, M. Aarthi, P. Kavyadharsini, S. Keerthika, and M. Sabitha A Study on Surveillance System Using Deep Learning Methods . . . . . . . . 147 V. Vinothina, Augustine George, G. Prathap, and Jasmine Beulah IRHA: An Intelligent RSSI Based Home Automation System . . . . . . . . . . 163 Samsil Arefin Mozumder and A. S. M. Sharifuzzaman Sagar A Review Paper on Machine Learning Techniques and Its Applications in Health Care Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Priya Gautam and Pooja Dehraj Enhanced Mask-RCNN for Ship Detection and Segmentation . . . . . . . . . . 199 Anusree Mondal Rakhi, Arya P. Dhorajiya, and P. Saranya Data Scientist Job Change Prediction Using Machine Learning Classification Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Sameer A. Kyalkond, V. Manikanta Sanjay, H. Manoj Athreya, Sudhanva Suresh Aithal, Vishal Rajashekar, and B. H. Kushal Error Correction Scheme with Decimal Matrix Code for SRAM Emulation TCAMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Sangetha Balne and T. Gowri Knowledge Discovery in Web Usage Patterns Using Pageviews and Data Mining Association Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 G. Vijaiprabhu, B. Arivazhagan, and N. Shunmuganathan Video Anomaly Detection Using Optimization Based Deep Learning . . . 249 Baliram Sambhaji Gayal and Sandip Raosaheb Patil A Fusional Cubic-Sine Map Model for Secure Medical Image Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Sujarani Rajendran, Manivannan Doraipandian, Kannan Krithivasan, Palanivel Srinivasan, and Ramya Sabapathi Innovative Technologies Developed for Autonomous Marine Vehicles by ENDURUNS Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Pedro José Bernalte Sánchez, Fausto Pedro García Márquez, Mayorkinos Papaelias, Simone Marini, Shashank Govindaraj, and Lilian Durand
Contents
xi
Machine Learning Approaches to Predict Breast Cancer: Bangladesh Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Taminul Islam, Arindom Kundu, Nazmul Islam Khan, Choyon Chandra Bonik, Flora Akter, and Md Jihadul Islam A Comparative Review on Image Analysis with Machine Learning for Extended Reality (XR) Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 P. Vijayakumar and E. Dilliraj SWOT Analysis of Behavioural Recognition Through Variable Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Abhilasha Sharma, Aakash Garg, Akshat Thapliyal, and Abhishek Rajput E-Mixup and Siamese Networks for Musical Key Estimation . . . . . . . . . . 343 Pranshav Gajjar, Pooja Shah, and Harshil Sanghvi Microarray Data Classification Using Feature Selection and Regularized Methods with Sampling Methods . . . . . . . . . . . . . . . . . . . . 351 Saddi Jyothi, Y. Sowmya Reddy, and K. Lavanya Visual Place Recognition Using Region of Interest Extraction with Deep Learning Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 P. Sasikumar and S. Sathiamoorthy Electronic Mobility Aid for Detection of Roadside Tree Trunks and Street-Light Poles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Shripad Bhatlawande, Aditya Joshi, Riya Joshi, Kasturi Joshi, Swati Shilaskar, and Jyoti Madake Real Time Video Image Edge Detection System . . . . . . . . . . . . . . . . . . . . . . . 389 A. Geetha Devi, B. Surya Prasada Rao, Sd. Abdul Rahaman, and V. Sri Sai Akhileswar Research Paper to Design and Develop an Algorithm for Optimization Chatbot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Bedre Nagaraj and Kiran B. Malagi Analysis of MRI Images to Discover Brain Tumor Detection Using CNN and VGG-16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Aravind Vasudevan and N. Preethi Missing Data Recovery Using Tensor Completion-Based Models for IoT-Based Air Quality Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . 423 Govind P. Gupta and Hrishikesh Khandare Stock Market Prediction Through a Chatbot: A Human-Centered AI Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Anoushka Halder, Aayush Saxena, and S. Priya
xii
Contents
Air Writing Recognition Using Mediapipe and Opencv . . . . . . . . . . . . . . . 447 R. Nitin Kumar, Makkena Vaishnavi, K. R. Gayatri, Venigalla Prashanthi, and M. Supriya Blockchain Based Email Communication with SHA-256 Algorithm . . . . 455 L. Sherin Beevi, R. Vijayalakshmi, P. Ilampiray, and K. Hema Priya Sentimental Analysis on Amazon Reviews Using Machine Learning . . . . 467 Rajashekhargouda C. Patil and N. S. Chandrashekar Speed Breaker Identification Using Deep Learning Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 B. Manikandan, R. Athilingam, M. Arivalagan, C. Nandhini, T. Tamilselvi, and R. Preethicaa Intelligent System for Diagnosis of Pulmonary Tuberculosis Using XGBoosting Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Sıraj Sebhatu, Pooja, and Parmd Nand Web Based Voice Assistant for Railways Using Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Prasad Vadamodula, R. Cristin, and T. Daniya Segmentation and Classification Approach to Improve Breast Cancer Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Simone Singh, Sudaksh Puri, and Anupama Bhan Study of Impact and Reflected Waves in Computer Echolocation . . . . . . . 543 Oleksandr Khoshaba, Viktor Grechaninov, Tetiana Molodetska, Anatoliy Lopushanskyi, and Kostiantyn Zavertailo Enneaontology: A Proposed Enneagram Ontology . . . . . . . . . . . . . . . . . . . . 559 Esraa Abdelhamid, Sally Ismail, and Mostafa Aref IoT Based Signal Patrolling for Precision Vehicle Control . . . . . . . . . . . . . 569 K. Sridhar and R. Srinivasan Land Use/Cover Novel Dataset Based on Deep Learning: Case Study of Fayoum, Egypt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 Rehab Mahmoud, Haytham Al Feel, and Rasha M. Badry Exploring the Effect of Word Embeddings and Bag-of-Words for Vietnamese Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Duc-Hong Pham A Portable System for Automated Measurement of Striped Catfish Length Using Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Le Hong Phong, Nguyen Phuc Truong, Luong Vinh Quoc Danh, Vo Hoai Nam, Nguyen Thanh Tung, and Tu Thanh Dung
Contents
xiii
IoT Based Automated Monitoring System for the Measurement of Soil Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 Pratoy Kumar Proshad, Anish Bajla, Adib Hossin Srijon, Rituparna Talukder, and Md. Sadekur Rahman Pattern Recognition on Railway Points with Machine Learning: A Real Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 Alba Muñoz del Río, Isaac Segovia Ramirez, and Fausto Pedro García Márquez Sustainability in Development of Grant Applications . . . . . . . . . . . . . . . . . . 643 Sylvia Encheva New Category of Equivalence Classes of Intuitionistic Fuzzy Delta-Algebras with Their Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 Azeez Lafta Jaber and Shuker Mahmood Khalil Deep Neural Networks for Stock Market Price Predictions in VUCA Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665 Dennis Murekachiro Building a Traffic Flow Management System Based on Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675 Varlamova Lyudmila Petrovna and Nabiev Timur Erikovich The Impact of Ruang Cerita Application on the Neuro Depression and Anxiety of Private University Students . . . . . . . . . . . . . . . . . . . . . . . . . . 689 Satria Devona Algista, Fakhri Dhiya ‘Ulhaq, Bella Natalia, Ignasius Christopher, Ford Lumban Gaol, Tokuro Matsuo, and Chew Fong Peng JobSeek Mobile Application: Helps Reduce Unemployment on the Agribusiness Sectors During New Normal . . . . . . . . . . . . . . . . . . . . . 703 Tsabit Danendra Fatah, William Widjaya, Hendik Darmawan, Muhammad Haekal Rachman, Ford Lumban Gaol, Tokuro Matsuo, and Natalia Filimonova The Effect of Learning Using Videos on Online Learning of Private University During the Covid-19 Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 Angelia Cristine Jiantono, Alif Fauqi Raihandhika, Hadi Handoyo, Ilwa Maulida Anam, Ford Lumban Gaol, Tokuro Matsuo, and Fonny Hutagalung Implementation of Artificial Intelligence and Robotics that Replace Employees in Indonesia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733 Hendy Wijaya, Hubert Kevin, Jaya Hikmat, S. Brian Vincent, Ford Lumban Gaol, Tokuro Matsuo, and Fonny Hutagalung Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
About the Editors
Dr. P. Karuppusamy working as Professor and Head in Department of Electrical and Electronics Engineering at Shree Venkateshwara Hi-Tech Engineering College, Erode. In 2017, he had completed doctorate in Anna University, Chennai, and in 2007, he had completed his postgraduate in Power Electronics and Drives in Government College of Technology, Coimbatore, India. He has more than ten years of teaching experience. He has published more than 40 papers in national and international journals and conferences. He has acted as a conference chair in IEEE international conferences and a guest editor in reputed journals. His research area includes modeling of PV arrays, adaptive neuro-fuzzy model for grid-connected photovoltaic system with multilevel inverter. Dr. Fausto Pedro García Márquez works at UCLM as Full Professor (Accredited as Full Professor from 2013), Spain, Honorary Senior Research Fellow at Birmingham University, UK, Lecturer at the Postgraduate European Institute, and he has been Senior Manager in Accenture (2013–2014). He obtained his European Ph.D. with a maximum distinction. He has been distingueed with the prices: Advancement Prize for Management Science and Engineering Management Nominated Prize (2018). He has published more than 150 papers (65% ISI, 30% JCR, and 92% internationals), some recognized as: “Renewable Energy” (as “Best Paper 2014”); “ICMSEM” (as “excellent”), “International Journal of Automation and Computing,” and “IMechE Part F: Journal of Rail and Rapid Transit” (most downloaded), etc. He is Author and Editor of 25 books (Elsevier, Springer, Pearson, Mc-GrawHill, Intech, IGI, Marcombo, AlfaOmega…) and five patents. He is Editor of five international journals and Committee Member more than 40 international conferences. He has been Principal Investigator in four European projects, five national projects, and more than 150 projects for universities, companies, etc. His main interests are: maintenance management, renewable energy, transport, advanced analytics, and data science.
xv
xvi
About the Editors
Tu N. Nguyen is an Assistant Professor and Director of the Intelligent Systems Laboratory (ISL) in the Department of Computer Science at Kennesaw State University, Georgia, USA. His research and teaching are hinged on developing fundamental mathematical tools and principles to design and develop smart, secure, and selforganizing systems, with applications to network systems, cyber-physical systems, and cybersecurity. Dr. Nguyen has published one book and 60+ publications in leading academic journals and conferences. His research work, therefore, focuses on developing fundamental mathematical tools and principles to design and develop smart, secure, and self-organizing systems, with applications to network systems, cyber-physical systems, and cybersecurity.
Development of a Linear-Scaling Consensus Mechanism of the Distributed Data Ledger Technology Gennady Shvachych, Ivan Pobochii, Hanna Sashchuk, Oleksandr Dzhus, Olena Khylko, and Volodymyr Busygin
Abstract The paper proposes and explores a new blockchain system that operates on a linearly scalable consensus mechanism. This selection method confirms the shard through shares voting and scalable random generation by VDF (Verifiable Delay Function) and VRF (Verifiable Random Function). The system analyzes available consensus mechanisms, sharding, and the age of distributed randomness. It is energyefficient, fully scalable, secure, with fast consensus. Compared to available methods, the improved shard method performs network connection and transaction verification and reveals the state of the blockchain. The threshold has a sufficiently low coefficient for small validators to participate in the network and receive rewards. The proposed sharding process runs securely due to a distributed randomness (DRG) process that is unpredictable, impartial, and verified. The network is constantly overloaded to prevent slow adaptive Byzantine malicious validators. Contrary to other sharding blockchains that require Proof-of-Work to select validators, the proposed consensus is attributed to Proof-of-Stake, therefore, energy-efficient. Herein the consensus is achieved by a BFT algorithm which is linearly scalable and faster than PBFT.
1 Introduction A distinctive feature of the innovative technology of the distributed data ledger (blockchain) presented in the form of mathematical algorithms and software is that it requires no participation of contractors when concluding contracts allowing transactions to be made without intermediaries such as enterprise banks and lawyers.
G. Shvachych · I. Pobochii Ukrainian State University of Science and Technology, Dnipro, Ukraine H. Sashchuk · O. Dzhus · O. Khylko Taras Shevchenko National University of Kyiv, Kyiv, Ukraine V. Busygin (B) VUZF University (Higher School of Insurance and Finance), Sofia, Bulgaria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_1
1
2
G. Shvachych et al.
Literature analysis shows that the practical application of blockchain technology does not have in-depth subject coverage. Blockchain is mainly seen as a generalpurpose technology. Paper [1] provides specific examples of companies using blockchain. Moreover, it highlights that blockchain publications are usually predictive, highlighting the potential of the technology extensively, but there is no discussion yet about how blockchain can improve enterprise efficiency. In the publications under review, the main focus is on what can happen if the blockchain is massively implemented in enterprises. The paper also highlights the lack of research that details the implications of blockchain applications for entrepreneurs and describes their entrepreneurial aspects. Similar views are shared by other researchers [2, 3]. On the other hand, analysis of the literature review shows that the competitiveness of blockchain technology is reflected through the choice of technology. It was revealed that approaches to applying blockchain technology could be implemented according to two central schemes: “technology first—then a problem” or “first a problem—then a technology.” However, studies have shown that enterprises with the extensive implementation of blockchain technology tools tend to operate by the latter scheme. Hence, the problem is considered, followed by justifying the problem’s solution through the blockchain. Researchers note that this is the most effective approach [4]. Analysis of methods for implementing blockchain technology based on the capabilities of the already created Ethereum and Bitcoin blockchains has shown certain drawbacks for their use in any area, including in the digital economy. At the same time, some methods of blockchain technology require improvement. For instance, the innovative Bitcoin blockchain was meant to become a peer-to-peer payment system allowing transfer funds and excluding intermediaries such as payment systems or banks. However, Bitcoin gained some shortcomings for its limited bandwidth— around seven transactions a second, which became pretty expensive as a payment system. Soon, a new blockchain infrastructure, Ethereum [5], allowed developers to develop different types of blockchain applications via smart contracts. Nevertheless, Ethereum, with 15 transactions a second, was unable to help high-performance applications such as games or decentralized exchanges and did not solve the scalability problem. Given the performance limitations of Ethereum and Bitcoin, several blockchain projects offered different solutions trying to boost transaction output. Other blockchains proposed replacing the Proof-of-Work consensus with Proofof-Stake. Various blockchains, e.g., EOS, apply Delegated-Proof-of-Stake (DPoS) consensus, where a vote elects the chain of blocks rather than through the process of chain algorithmic. Several chains like IOTA [24] changed the blockchain structure of data by a Directed Acyclic Graph (DAG), disrupting transactions’ interconnected post-processing. Nevertheless, those solutions cannot significantly increase performance [6] without sacrificing other essential aspects such as decentralization and security [7, 8]. It becomes obvious that the most valuable link in blockchain technology is the algorithms for reaching consensus because those provide it with reliability. Research data aim at consensus mechanism further development of the distributed data ledger technology.
Development of a Linear-Scaling Consensus …
3
2 Analysis of Recent Research and Publications The consensus protocol is a crucial factor of a blockchain that determines the level of security and speed of blockchain validation to reach a consensus for the next block. The Proof-of-Work was the first blockchain consensus protocol provided by Bitcoin. PoW means that when the miner solves a cryptographic puzzle, and if succeeded could offer the next block and receive symbolic rewards. Honest nodes control over 50% of the hashing power. The consensus rule herein means that the longest chain remains the only correct; therefore, PoW consensus is based on the chain. Such a consensus has the main drawback: if someone has at least 1% more capacity than the rest of the network, i.e., 51% or more, a kind of “controlling stake” of generating capacity; in this case, one can single-handedly control all operations via the system, create blocks, confirm or block transactions. Note that a hash in such a protocol is a set of 64 alphabetic and numeric characters, and the complexity regulates the number of zeros at its beginning. For instance, we have the following hash 0000045c5e2b3911eb937d9d8c574f09. The main proof-ofwork process is mining. It consists of iterating over a numeric value until the block header looks like it should. After all the necessary conditions are met, the miner publishes a block indicating all the necessary attributes, including the found value. Knowing all the attributes, the complete ones automatically check whether the header hash will look exactly like this and not otherwise with such initial data and such a found value. After confirmation, the miner switches to generating a new block, and the author of the newly created block receives a reward on own Bitcoin wallet [9]. In the Proof-of-Stake approach, nodes also try to hash data, searching for a specific value result. However, the complexity is distributed proportionally and in compliance with the node’s balance according to the number of coins (tokens) in the user’s account. Thus, there is a better chance of generating the next block node with a larger balance. Unlike Proof-of-Stake, the algorithm spends much less power. Another kind of consensus protocol is represented by PBFT (Practical Byzantine Fault Tolerance) (Fig. 1). Named after the mathematical puzzle of Byzantine Generals Problem [10], when several Byzantine generals surrounded the city with their armies, they must agree on actions when attacking or retreating. If the generals do not agree upon the decision, the operation leads to disaster. One “leader” node and other “validators” nodes in PBFT. Each PBFT consensus round includes two main stages: the Prepare and the Commit stage. During Prepare stage, a leader passes on their offer to every validator, who give votes for the request to all the others. The re-relaying is essential for all validators as the rest of the validators must count the votes of each validator. The preparatory stage ends when more than 2f + 1 observe consecutive voices when f is the quantity of malicious validators and the absolute quantity of validators plus one, the leader 3f + 1. The commit stage covers an akin computing process, with the reached consensus when 2f + 1 sequential voices are observed through relaying votes between validators of PBFT O(N)2 complexity of communication, nonscalable for a blockchain system with a huge number of nodes.
4
G. Shvachych et al.
Fig. 1 Graphical interpretation of the PBFT consensus protocol. Source Authors’ elaboration
So, Fig. 1 shows that the PBFT protocol is a type of Byzantine state machine system that requires the state to be kept together and all nodes to perform the same actions. To do this, three types of base agreements must be met, including an agreement, a review agreement (expiration period), and a view change agreement. Obviously, with this, the PBFT protocol mechanism does not require mining or huge computations, so it takes a short time to reach a consensus. Currently, the PBFT protocol is flexibly improved and combined, e.g., with the PoW or PoS algorithms. Thus, new, hybrid consensus mechanisms are formed. Let us consider one of them. Using PBFT and PoW hybrid consensus algorithms, the former generates fast chains while the latter generates slow ones. Transaction confirmation and mining are separated. After packaging the transaction, the PBFT committee confirms its creation by fastBlock. The transaction is approved; the slow chain will pack the fastBlock in the fast chain into a snailBlock, which the miner validates to reach the chain. With this hybrid consensus algorithm, tps has been greatly improved, and the application of PoW mining implemented the decentralization idea. The PBFT committee changes every two days, and all candidate members become miners after successfully mining PoW, which ensures the principle of honesty and fairness. Although the PBFT algorithm was originally designed to serve both private and public networks, it continues to improve and flexibly modify. Such a protocol shows that the public network will play an important role in the consensus mechanism in the future. In this regard, this paper presents one of the options for its improvement. Research objectives. Based on the literature review and the analysis of the current development of blockchain technology problems, develop a fully scalable, evidence-based secure, and energy-efficient blockchain; explore the functionality and features of a blockchain system based on sharding the next-generation solving several problems of available blockchains.
Development of a Linear-Scaling Consensus …
5
3 Main Research Material Statement 3.1 Research and Analysis of Scalability and Security Mechanism and Decentralization of the Blockchain Technology The solution to the scalability providing security and decentralization simultaneously is sharding that builds groups of validators allowing transactions to be processed simultaneously. Thus, the entire output of transactions increases with the quantity of participants linearly. The Zilliqa blockchain became the first public blockchain to offer a solution to the problem of scalability with sharding. However, the blockchain fails to meet two fundamental requirements. First, it shares no data storage with the blockchain (shorted state). That prevents local computers from participating in the network, thereby limiting decentralization. Second, the PoW based blockchain consensus algorithms consume massive computational resources. Sometimes, users cannot get powerful computing power in many scenarios, and all mining-based consensus algorithms face low transaction speed. Solving the blockchain system scalability problem restricts the technology application in various areas of the digital economy. Some developers are proposing a parallel distributed structure of a distributed cloud storage system and a decentralized system for blockchain to solve scalability, large-scale storage, and data exchange. The proposed method demonstrates a new hybrid consensus protocol for large-scale public blockchain based on joint optimization. Prevention of the Sybil attack is a crucial security factor in public blockchains. A Sybil attack is a peer-to-peer attack that only connects the victim to nodes controlled by the attacker. In peer-to-peer networks, where no host is trusted, each request is duplicated for multiple recipients so that no single host can be fully charged. Meanwhile, network users can have multiple identifiers that are physically associated with different nodes. Those identifiers can share resources or have multiple copies of them. The latter will create a backup that will check the integrity of the data taken from the network on its own. The downside is that at some point, more sites that are supposed to represent different recipients of a particular request can be controlled by the same user. Moreover, consider the user becomes an intruder. In that case, the latter will have all the capabilities of an intermediary at this session and unjustifiably get the complete trust of the session initiator. The more identifiers an attacker has, the more likely the next user session will be closed. Figure 2 presents the blockchain splitting into two chains as malicious nodes want to create blocks that do not correspond to the consensus. It is vital for an attacker that a new identifier is light enough to make [11]. Bitcoin and Etherium demand that miners solve a cryptographic puzzle ahead of offering a block. Furthermore, sharding blockchains such as Zilliqa [12] and Quarkchain [13] apply PoW to avoid Sybil attacks. Sharding is defined as dividing
6
G. Shvachych et al.
Fig. 2 Malicious nodes of the blockchain dividing it into two chains. Source Authors’ elaboration
and storing a single logical set of data in multiple databases. Sometimes sharding is recognized as horizontal data partitioning. The sharding involves dividing the blockchain into separate segments (shards). A single shard contains a unique set of smart contracts and account balances. Each shard is assigned a node, identified transactions, and operations, in contrast to the method where each node accounts for verifying each transaction across the entire network. Dividing blockchain into more manageable segments can increase transaction capacity and solve the problem of scalability that most modern blockchains face (Fig. 3). There are two shards in this blockchain (Fig. 3); both forks precisely when the transaction gets in block A of shard #1 and block X of shard #2. The shard must discard one chain and accept another one for the fork. Therefore, if shard #1 acquires chain A, B, and so on, and shard #2 acquires chain W, X, and so on, the consensus gets confirmation. If shard #1 earns chain A, B, and so on, and shard #1 is chain W, X, and so forth, the consensus is rejected and can be resent. If shard #1 acquires chain A, B, etc., and shard #2 acquires chain W, X, etc., one part of the transaction gets confirmation (A, B, etc.), while the other does not (W, X, etc.). Various sharding solutions are offered in both industry and science. Zilliqua was the first public blockchain based on sharding to claim an output of 2800 transactions a second in the industry. Zilliqa prevents a Sybil attack by applying PoW as a face registration process. The Zilliqa network uses the separate directory maintenance committee and network sharding, counting hundreds of nodes each. The transactions are processed solely and are appropriated to various shards. The accepted blocks from all shards get accumulated and combined in the maintenance committee directory. In academia, publications such as Omniledger [14] and RapidChain [15, 16] offer solutions where each sharding contains a subset of blockchain states. Omniledger uses RandHound [10], a multilateral computational scheme to generate a secure random number for nodes allocation to shards. Omniledger assumes an adaptive model, where case attackers can damage more and more nodes over time. According to this security model, one fragment can eventually be damaged. The Omniledger prevents damage to the shards by rearranging all nodes at a specified interval, an epoch (stage). RapidChain builds off the Omniledger and suggests a constraint rule to swap nodes without interruption [17].
Development of a Linear-Scaling Consensus …
7
Fig. 3 The process of a shard formation in the Zilliqua blockchain. Source Authors’ elaboration
3.2 Study of the Blockchain Nodes Distribution in the Shard At the moment, various approaches were proposed for the nodes’ distribution in a shard: distribution based on randomness, location, and centralized control. The sharding based on randomness was found to be the most reliable solution of all the approaches. The sharding based on randomness uses a mutually consistent random number for each node. Thus, a random number should cover the next features: 1. 2. 3. 4.
Erratic: nobody must foresee a random number. Biased: random number generation ought not be tendentious by any member. Checked: Any observer must check the generated random number validity. Scalable: randomness generation algorithm must scale to masses of participants.
The Omniledger blockchain uses the RandHound protocol, driven by a leader distributed random case generation covering the Byzantine convention and PVSS (Public Verified Secret Sharing). RandHound is a protocol that distributes member nodes toward size groups. That completes the first three properties described above but slowly qualifies as scalable. While RapidChain takes a more straightforward approach, allowing each member to make Verifiable Secret Sharing (VSS), applying
8
G. Shvachych et al.
the combined, secret exchanges as the resulting randomness. However, since malicious nodes can transmit incompatible shared resources to different nodes, this protocol is not secure. Furthermore, RapidChain does not demonstrate how the nodes reach consensus on versatile versions of randomness. Ethereum 2.0 proposes the delay check function to avoid a hacker attack by disclosing the actual random number. The delay function check is a cryptographic primitive; it takes a minimum adjustable time to calculate and check the result at once.
3.3 Principal Features of the New, Developed Blockchain System The paper investigates the functionality and features of a blockchain system based on sharding the next generation, solving several blockchain problems to create a fully scalable, evidence-based, energy-efficient, and secure blockchain. The developed approach aims at improving available methods. It is notable that the fundamental differences between the proposed approach and available ones. The paper presents and investigates a distributed ledger system that runs on a linearly scalable consensus mechanism. This selection method confirms a shard by voting shares and has scalable randomness generation by VRF and VDF functions. Such a system is based on analyzing available consensus mechanisms, sharding, and distributed random generation. The proposed approach allows blockchain development with the following advantages: full scalability, security, energy efficiency, and fast consensus. Due to scalability and energy efficiency, the proposed method is suitable for creating a blockchain for the digital economy.
3.4 Development of a New Scalable Blockchain Consensus Protocol As an improvement of the PBFT protocol, the thesis proposed a consensus mechanism scalable in a linear fashion regarding communication intricacy. Instead of inviting everyone to post votes, a leader starts signing a multi-signature to compile validator votes for O(1) multi-signature followed by relaying it. And in toward getting O(N) of signatures, a validator gets only one multi-signature, thereby cutting the communication complexity with O(N)2 to O(N). The multi-signature O(1) sense is a BFT method improvement from the ByzCoin blockchain [18] with the Schnorr signature scheme to aggregate consistent multivalued signals, creating the multicasting tree between the validators expedite message delivery. Nevertheless, Schnorr’s multi-valued signature demands a secret series of commitments, resulting in two round-trip requests for a single multi-signature.
Development of a Linear-Scaling Consensus …
9
The proposed method upgrades available one as the BLS (Boneh-Lynn-Shacham) multi-signature with only one round-trip request. Hence, the developed method is at least 50% faster than the BFT ByzCoin method. Figure 4 depicts the network communication of the developed process during one round of consensus. The developed method for conducting the consensus procedure covers the next steps: 1.
2. 3.
4.
5.
The leader builds a new block and passes the block header to each validator. At the same time, the leader relays the block’s contents with the abrasion-encoding. The “declare” stage (Fig. 4—Announce stage). The validators analyze the block header’s validity, sign it by BLS signature, followed by relaying the signature back to the leader (Fig. 4—Prepare stage). The leader awaits minimum of 2 f + 1 valid signatures from validators and merges them within the BLS multi-signature. Then a leader relays the aggregated multi-signature with the bitmap and changes signed by the validators. Along with step 3, the PBFT Prepare stage is completed. Validators verify whether multi-signature includes minimum 2 f + 1 signers, verifying transactions with the block transmitted content from the leader via step 1, signs the message from step 3, and returns the message to the leader. The leader awaits minimum 2 f + 1 valid signatures and, starting from step 4, combines them within a BLS multi-signature with a bitmap logging of everyone who signed. Then, the leader makes a new block including all signed multisignatures and bitmaps, followed by relaying a new block to each validator. Along with step 5, the PBFT Commit stage is completed (Fig. 4—the Commit stage).
Fig. 4 Network communication of the developed method during one round of consensus. Source Authors’ elaboration
10
G. Shvachych et al.
Proof-of-Stake selects consensus validators. The proposed protocol is different from available PBFT in that a validator that keeps massive voting shares has more votes than the others, instead of a single vote (signature). Contrary to waiting for minimum 2f + 1 signatures from validators. Further, the leader awaits signatures from validators with minimum 2 f + 1 voting shares. Note that the traditional procedure for downloading the history of blockchains and rebuilding the available state is too sluggish to allow re-making changes (it takes several days for the Ethereum blockchain to synchronize the history fully). The current state is much lesser than entire history of blockchain. Loading the present state across the epoch is possible compared to loading the entire history. To optimize the state synchronization, it is proposed to make the state of blockchain as the smallest. Ethereum has many empty accounts and wastes state-space on the blockchain. Empty accounts cannot be deleted due to possible replay errors when old transactions are re-sent to a deleted account. The problem can be solved by preventing replay attacks by allowing transactions to designate the current block’s hash: the transaction is only valid up to a specific number of blocks after the specified hash’s block. Hence, old accounts can be deleted, significantly speeding up the analyzing current blockchain state. Thus, new validators that attach to the shard first load the present shard state to quickly validate the transactions. The new node must perform an appropriate check to provide that the present loaded state is valid. Contrary to downloading the entire history of blockchain and re-making every transaction analyzing the present state, the new node downloads the initial block headers and verifies the headers by verifying the signatures. The state is valid for cryptographic follow from the present state to the initial block. Signature verification is not computationally hard, and it could take much time to verify all signatures, starting with the genesis block. The first block of each epoch is proposed to incorporate an additional hash pointer to the last epoch first block to mitigate the problem. Hence, a new node can traverse the blocks during the epoch as it archives hash-pointers track to the genesis block. That substantially boosts the verification of the present state of the blockchain. So, we note some features of the proposed approach. This paper presents and investigates a new blockchain system that operates on a linearly scalable consensus mechanism. This selection method confirms a shard by voting shares and has scalable randomness generation by the VRF and VDF functions. The new system is based on an analysis of available consensus mechanisms, sharding, and distributed randomness generation. The shortcomings analysis of available blockchain systems showed that the proposed sharding method performs network connection and transaction verification and reveals the state of the blockchain. The proposed consensus mechanism showed that the decision threshold is low enough for small validators to participate in the network and earn rewards. The sharding process covered in this paper is secure through the use of distributed randomness (DRG), which is unpredictable, unbiased, and proven. The network is overloaded continuously to prevent slow adaptive byzantine pests. Unlike other blockchains based on sharding and requiring a PoW-type transaction validation and confirmation model to select validators, the
Development of a Linear-Scaling Consensus …
11
proposed consensus is based on applying a PoS model and, therefore, is more energyefficient. Consensus is reached by a linearly scalable BFT algorithm, which is more of a PBFT. Introducing protocols and network innovations gets a scalable and secure new blockchain system.
4 Conclusions The paper proposes and explores a new blockchain system that operates on a linearly scalable consensus mechanism. This selection method confirms the shard by stock voting and has scalable random generation using the VDF (Verifiable Delay Function) and the VRF (Verifiable Random Function). The new system analyzes available consensus mechanisms, sharding, and generation of the distributed randomness. The proposed approach allows the development of a blockchain with the following advantages: full scalability, security, energy efficiency, and fast consensus. The shortcomings’ analysis of available blockchain systems showed that the proposed sharding method performs network connection and transaction verification and reveals the blockchain state. The proposed consensus mechanism showed that the acceptance threshold has a sufficiently low coefficient for small validators to participate in the network and receive rewards. The sharding process covered in this paper is safe due to the distributed randomness (DRG). The DRG is changeable, impartial, and verified. The network is constantly overloaded to prevent slow adaptive Byzantine malicious validators. Unlike other blockchains built around sharding and require a PoW-type transaction verification and confirmation model to select validators, the proposed consensus is rooted in the PoS model and, therefore, more energy-efficient. Hence the consensus is achieved via a scalable BFT algorithm in a linear fashion, which is more of a PBFT. A scalable and secure new blockchain system is obtained by introducing innovations at the protocol and network levels. The methods for creating the blockchain improve available mechanisms with practical value for use in various digital economy sectors. Separately, we note some promising areas for the practical implementation of the research. Firstly, the considered technologies in supply chains are of undoubted interest. Logistics at the present development stage is one of the biggest problems for the current generation of companies. The industry is looking for new technologies to improve available processes, reduce costs, and increase transparency in the supply chain. That is where blockchain technology offers a solution to most current problems. Secondly, we highlight the proposed approach for implementation in the banking sector. This technology can completely transform the banks’ structure, and soon it will become radically different from what we are used to today. Avoiding the mediation of third parties in various transactions can make a huge layer of banking services useless.
12
G. Shvachych et al.
Thirdly, to date, a certain successful experience has been accumulated in blockchain solutions applications to ensure the integrity and authenticity of documents, information, and control information. In this regard, this direction is promising in terms of the implementation of the proposed research. The paper research demonstrates that one of the main problems of the studied technologies is in the features of the modeling process, both machine and mathematical [19–22]. For instance, for servicing and solving security problems of those technologies, it is necessary to use powerful computing equipment and high-performance ones. On the other hand, the issue of developing new blockchain systems, e.g., based on a linearly scalable consensus mechanism, can be solved only via up-to-date and complex mathematical apparatus. The authors attribute those problems to the prospect of further research on this topic.
References 1. M. Bjørnstad, J. Harkestad, S. Krogh, What are Blockchain Applications? Use Cases and Industries Utilizing Blockchain Technology. NTNU (2017). https://ntnuopen.ntnu.no/ntnu-xmlui/bit stream/handle/11250/2472245/17527_FULLTEXT.pdf?sequence=1&isAllowed=y 2. S. Alvarez, L. Busenitz, The entrepreneurship of resource-based theory. J. Manag. 27(6), 755– 775 (2001) 3. S. Davidson, Economics of Blockchain (2016). SSRN. https://papers.ssrn.com/sol3/papers. cfm?abstract_id=2744751 4. G. Khachatryan, Blockstack: A New Internet for Decentralized Apps - Grigor Khachatryan. Medium (2018). https://grigorkh.medium.com/blockstack-a-new-internet-for-decentralizedapps-1f21cb9179b9#:%7E:text=Blockstack%20is%20a%20new%20decentralized,a%20f orce%20of%20monumental%20change 5. W. Mougayar, V. Buterin, The Business Blockchain: Promise, Practice, and Application of the Next Internet Technology (1st ed., Wiley) (2016) 6. J.T. Kruthik, K. Ramakrishnan, R. Sunitha, B. Prasad Honnavalli, Security model for Internet of Things based on Blockchain, in Innovative Data Communication Technologies and Application (Springer, Singapore, pp. 543–557) (2021) 7. G. Danezis, S. Meiklejohn, Centrally banked cryptocurrencies, in 23rd Annual Network and Distributed System Security Symposium, NDSS, 21–24 (2016) 8. R. Ivanov, V. Busygin, Some aspects of innovative blockchain technology application, in Proceedings of the XII International Scientific and Practical Conference. Modern problems of modeling of socio-economic systems, Kharkiv, Ukraine (2020) 9. B. Awerbuch, C. Scheideler, Towards a scalable and robust DHT, in Eighteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’06 (pp. 318–327) (2006) 10. Wikipedia contributors, Byzantine fault. Wikipedia (2021). Retrieved August 8, 2021, from https://en.wikipedia.org/wiki/Byzantine_fault 11. J.R. Douceur, The Sybil attack, in 1st International Workshop on Peer-to-Peer Systems (IPTPS 02) (2002) 12. The Zilliqa Team. The Zilliqa technical whitepaper. (n.d.). Zilliqa.Com. https://docs.zilliqa. com/whitepaper.pdf 13. Cross Shard Transaction QuarkChain/pyquarkchain Wiki. (n.d.). GitHub. Retrieved August 8, 2021. From https://github.com/QuarkChain/pyquarkchain/wiki/Cross-Shard-Transaction 14. E. Kokoris-Kogias, P. Jovanovic, L. Gasser, N. Gailly, E. Syta, B. Ford, OmniLedger: a secure, scale-out, decentralized ledger via sharding. IEEE Symp. Secur. Privacy (SP), 583–598 (2018)
Development of a Linear-Scaling Consensus …
13
15. M. Zamani, M. Movhedi, R. Raykova, Rapidchain: scaling blockchain via full sharding. Conf. Comput. Commun. Secur., 931–948 (2018) 16. D. Sivaganesan, Performance estimation of sustainable smart farming with blockchain technology. IRO J. Sustain. Wireless Syst. 3(2), 97–106 (2021) 17. E. Syta, P. Jovanovic, E. Kokoris-Kogias, N. Gailly, L. Gasser, I. Khoffi, M.J. Fischer, B. Ford, Scalable bias-resistant distributed randomness, in 38th IEEE Symposium on Security and Privacy (2017) 18. G. Shvachych, I. Pobochy, E. Kholod, E. Ivaschenko, V. Busygin, Multiprocessor computing systems in the problem of global optimization, in Structural Transformations and Problems of Information Economy Formation: Monograph (Ascona Publishing, New York, USA, pp. 281– 291) (2018) 19. G. Shvachych, V. Busygin, K. Tetyana, B. Moroz, F. Evhen, K. Olena, Designing features of parallel computational algorithms for solving of applied problems on parallel computing systems of cluster type. Inventive Comput. Technol. 191–200(2019). https://doi.org/10.1007/ 978-3-030-33846-6_21 20. G. Shvachych, B. Moroz, I. Pobocii, D. Kozenkov, V. Busygin, Automated control parameters systems of technological process based on multiprocessor computing systems. Adv. Intell. Syst. Comput. 666–688(2019). https://doi.org/10.1007/978-3-030-17798-0_53 21. G. Shvachych, N. Vozna, O. Ivashchenko, O. Bilyi, D. Moroz, Efficient algorithms for parallelizing tridiagonal systems of equations. Syst. Technol. 5(136), 110–119 (2021). https://doi. org/10.34185/1562-9945-5-136-2021-11 22. S. Smys, H. Wang, Security enhancement in smart vehicle using blockchain-based architectural framework. J. Artif. Intelli. 3(02), 90–100 (2021)
A Six-Point Based Approach for Enhanced Broadcasting Using Selective Forwarding Mechanism in Mobile Ad Hoc Networks D. Prabhu, S. Bose, T. Anitha, and G. Logeswari
Abstract Broadcasting is a specific procedure that enables a system of PCs to transmit information bundles from a one source to various destinations. In wireless ad hoc networks, broadcasting can be complex because radio signals can overlap geographically. As a result, a basic flooded transmission is often quite expensive and results in significant redundancy. The paper proposes a new approach to broadcasting called Enhanced Broadcasting with Selective Forwarding (EBSF) that extends the distance adaptive broadcasting protocol by introducing a six point approach for finding a modified threshold. In our approach, we introduce six strategic points computed using a centre point, a radius, sine values and cosine values in such way that the strategic points are separated by 60° from the centre of the circle. This research work has the importance of decreasing the number of packets that every node transmits during broadcasting. The proposed technique considerably improved by selecting only an ideal selection of transmission nodes, so reducing redundant broadcasts and ensuring that the data received is equivalent to the information originated.
1 Introduction Broadcasting is an efficient technique for effective communication within a network [1, 2]. As mobile ad hoc networks (MANETs) have more flexibility, these procedures are more often found (e.g., sending an alarm to find a path to a specific host, and paging a specific host). This paper outlines a technique to forward messages in ad hoc networks to reduce redundant packet transmission. We propose an unique sixpoint approach to determine the minimum set of nodes by enhancing distance-based approach [3]. A new system is being formulated by us in which the transmitted data is considerably decreased so that information are retrieved by destination without any considerable data loss. D. Prabhu · S. Bose · T. Anitha (B) · G. Logeswari Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_2
15
16
D. Prabhu et al.
The threshold value acts as parameter which is used by the distant based approach and 0.6 is the first decision parameter chosen in our study. This threshold value is later changed to boost transmission efficiency. In the distance adaptive broadcast method, the number of packets transmitted varies depending on the threshold. Increasing the threshold value considerably decreases the number of packets sent and decreasing it increases the number of packets transmitted [4]. In the distance adaptive broadcast method, the number of packets transmitted varies depending on the threshold. An open-interval value i.e. the threshold is in the range of 0 and 1. All packets were sent to destination if the threshold value is 0, and when the threshold is 1, none of the packets were sent [4]. If the threshold value increases, packets are less likely to be transmitted, and vice versa. Thus the threshold value can be step-up (increased) and step-down (decreased) and variation in performance is shown [4]. The threshold value is 0 meaning that all packets were sent successfully to the destination. A new strategy for calculating the radius is developed. From our experimental simulation work, we observe that the threshold value of 0.4*R is a good choice of ensuring high delivery ratio which is used in EBSF. In our work, six strategy points are picked, all of these are 60 degrees apart from the circle’s centre. This process was repeated for every node on the ad-hoc network that receives a data packet. A new data structure is used to store the source address of the node, called a message cache, and options such as previous a location that helps to optimize it. A simulation using Glomosim 2.03 [5] was used to test this work. The following is an overview of the research: Sect. 2 explained the various previous techniques for the broadcasting ad-hoc network, Sect. 3 discusses the proposed architecture of EBSF, Sect. 4 explains the implementation and experimental results of the proposed work and Sect. 5 describes the conclusion and future work.
2 Related Work In an ad-hoc network, broadcasting is a necessary feature. Flooding is the easy method for broadcasting the packets [3]. Problem of the broadcast storm which causes severe duplication, contention and collision. Usually broadcasting by flooding is very expensive. Several techniques for MANET broadcasting are explored in [6], including probabilistic, counter-based, distance-based, location-based, and clusterbased systems. In [7], geometric location information for nodes is used to obtain smaller neighborhood subsets (local cover set) for retransmission, assuming that each node is aware of its neighbor’s position information. In [8], a protocol called BMW (Broadcast medium window protocol) in which the round-robin transmission of packets to each neighbor is explained. Broadcasting is accomplished in [9] using a distributed and local pruning approach that selects a small fraction of nodes to build a new forwarding node, using the local information available, such as neighborhood information k-hop. In [5], the efficiency of a localized power adaptive broadcasting system is compared to that of broadcast Incremental Power, or BIP, is a well-known broadcasting system. To improve the flooding, [10]
A Six-Point Based Approach for Enhanced Broadcasting …
17
employed on-demand passive clustering methods. The study [11] examines reliable routing based on selective forwarding, as well as the likelihood of selective forwarding depending on node degree and connection loss. Signal strength can be exploited to boost broadcasting efficiency, according to recent literature on Distance Adaptive (DAD) broadcasting in Ad hoc Networks [12]. Researchers propose in this work that a collection of nodes be retransmitted based on their relative distance from the previous broadcast. In [13] an optimized flooding protocol (OFP) based on a modification of the covering problem observed in geometry is addressed in order to significantly reduce the excessive transmission and still be able to cover the entire region. Power adaptive broadcasting is mentioned in [14], in which each node tries to alter it’s transmit power based on local information (up to two hops). The works [7, 12–14] are based on geometric approach i.e. includes distance-based and location-based broadcasting methods. Wu et al. [11] is based on probabilistic scheme. [5, 14] are based on power adaptive broadcasting. Our work involves distance based and location based schemes for finding optimal set of nodes for rebroadcast. Susila Sakthy and Bose [15] a new transmission head method and its purpose. The cluster head’s (CH) activities, the transmission head’s (TH) role, and the TH selection in various scenarios are investigated. Our method (six-point based approach) is simpler and given more performance improvement under simulation environment. In grid networks, a method for eliminating intermittent packet transmission in normal floods used in broadcasting [16] is proposed, as well as approaches for selecting only a small number of retransmission nodes. This mechanism lowers the transmission of redundant packets, allowing packets to be sent with the smallest amount of transmissions possible. A novel method called Enhanced Broadcast using Selective Forwarding, that utilizes a distance-based strategy to identify nodes between all nodes in a grid network, is offered to identify the smallest set of nodes. A cross-layer intrusion detection system is presented in this paper that uses information accessible across multiple tiers of the protocol stack to locate hostile nodes and several sorts of Dos in order to increase detection accuracy. In order to detect anomalies in MANET efficiently, our solution used dynamic anomaly detection. This method uses a self-healing paradigm to remediate attacks that occur during the transaction [17]. Sundan and Arputharaj [18] In our architecture, many pathways are used to enable resilient streaming in the event of a link failure, which are discovered using an efficient evolutionary algorithm. Dynamic encoding methods are employed in the server to adjust to network conditions based on network feedback. Additionally, hand-offs are planned ahead of time, and mobile agents with buffered data are transferred to the anticipated base station. Under a range of network situations, the architecture provides a stable multimedia streaming service. The optimal decided trust inference (ODTI) model advances the trust computation by determining the trustworthiness of each adaptable. Then, in each zone, selects the most fundamental trust-based emphasis point as often involving trade ecosystems in the conveyance of information, forming a non-unquestionably perplexing course. The beginning of the suggested S2ALBR show is investigated using the Network Simulator (NS2)
18
D. Prabhu et al.
instrument under various testing scenarios [19]. The performance of evolutionary algorithms [20] such as the Genetic Algorithm, Antlion Optimization, Ant Colony Optimization and, Particle Swarm Optimization as well as the routes obtained using the fuzzy PetriNet model, in order to determine the best route for wireless sensor networks. Global and local stabilities epidemic equilibrium and disease-free patches are simulated and assessed after achieving a basic reproductive value. The suggested model’s simulations revealed a unique characteristic: a link between reproductive value and charging rate, as well as stability. The outcomes of the simulations are compared to theoretically calculated global and local stability characteristics [21]. The goal of this proposed study [22] in order to improve network building, to design a multi-objective network reconstruction [23] based on community structure utilizing ES by increasing reconstruction performance. The methodology that is used to improve their performance is the community-based approach.
2.1 Covering Problem The following is a summary of the underlying issue: The number of circles with the least number that can be expected to completely fill a 2-dimensional space [2]. The circle layout could not adequately cover the plan than the course of action of the hexagonal lattice, as it was not been shown before. At the beginning, the whole space is filled with normal hexagons, each side of which is R, and then it forms rings around them to encircle them.
2.2 Modified Covering Problem The smallest number of radius R hovers required to cover a 2-dimensional area in most cases, assuming that each circle’s centre is on the circumference of at least one other circle [13]. If a mobile node’s range is supposed to be R, the requirement means that the circle’s centre must be in the midst of another circle, necessitating a Mobile Ad Hoc node to receive and retransmit the message. As with problem coverage, the entire region is initially filled with standard hexagons, each with a R on each side. However, each of the vertices was used as the centre of a circle of radius R. OFP’s strategy can be summarized as follows: S is the source node, and it submits the route request. S is at the centre of a circle. Six additional circles with radius equal to the diameter of the first are drawn after the first. Naturally, six additional circles are drawn, and so on. They provided a straightforward method for identifying the closest node to the specified point, allowing the message to be retransmitted. Our solution for modified covering problem: six-point based approach: The radius R is provided. Six strategy points are designated on the two-dimensional plane utilizing cosine and sine values on the radius value R if S is the specified centre point. Six more R-radius circles are formed, each with six strategy points as the
A Six-Point Based Approach for Enhanced Broadcasting …
19
centre. This procedure is repeated until the entire two-dimensional plane has been covered.
3 System Architecture of EBSF The architecture of the system that uses the enhanced broadcast using selective forwarding is shown in Fig. 1.
3.1 Distance Based Approach When a new message arrives at a node, the source address and sequence number are stored into the message cache, and the message is dropped or rebroadcast if it is
Fig. 1 System architecture of EBSF
20
D. Prabhu et al.
already in the cache. The message cache has functions for initializing, adding, and removing messages. The packets have a sequence number assigned to each node. If the node is from a previously sent node with a distance dm smaller than a threshold value. The packet is then discarded. The packet is deleted if the distance value is less than threshold * NODES RADIUS. The threshold is set at 0.6 in this case. The number of packets transmitted decreases when the threshold value is raised, and vice versa. As a consequence, the threshold value can be obtained or lessened to demonstrate variance in performance. If threshold is set to Zero, then all the packets are transmitted. Hence no enhancement is seen.
3.2 Six-Point Based Approach Six key points are picked in this method, each spaced by 60 degrees from the circle’s centre is given in Eqs. (1) and (2). For each node that gets a packet, the procedure is repeated. When a node receives a packet, it first checks to see if it is already in the message cache. If not, identify the nearest point P among the six points chosen that are separated by 60 degrees from the circle’s centre. Six-strategic points are determined using the following calculation. Six-strategic points are determined using the following calculation. XI = R ∗ Cos((I − 1) ∗ 60)
(1)
YI = R ∗ Sin ((I − 1) ∗ 60)
(2)
where X, Y-co-ordinates, R—node radius, I = 1, 2, 3, 4, 5, 6. Delay is defined by the distance l from P. The node must wait until this delay is over and retransmission is complete. The broadcast system starts broadcasting when the broadcast determiner decides on retransmission. This proposed work entails the implementation of the new EBSF protocol in the protocol stack’s network layer. This protocol will be included in each and every network node. When a packet hits the network layer EBSF, this protocol will handle it. The function of initialization Routing EBSFInit crprojecteates the EBSF structure and allocates memory to it. Initialize stats (statistics to be reported to evaluate the upgrades), the node’s sequence table, and the message cache. The message cache’s top value is initialized into 0. A variable NODES_RADIUS is defined for calculation of radius. Besides several functions defined in the radio layer, the routing function is used for calculating the radius and the computed value is stored in file called “Radius.out”.
A Six-Point Based Approach for Enhanced Broadcasting …
21
A routing function formulates the routing action to be taken and controls the packet if the packet originates via UDP or MAC. The node is the source node, when the packet originates from UDP, and we transmit the data. When data from the MAC layer is received, it is processed and a decision is made on whether to forward the packet to UDP or drop it. The figures are printed with the quantity of transmitted information, the quantity of created information and the quantity of acquired information.
4 Implementation and Results A novel protocol is presented for the network layer of the Glomosim simulator [24], and a six-point technique is employed. GloMoSim simulates networks with thousands of nodes linked by a heterogeneous communications capability that includes multicast, asymmetric communications via direct satellite broadcasts, multihop wireless communications via adhoc networking, and traditional Internet protocols. To set up a GloMoSim scenario, we must first correctly install and test GloMoSim. After successfully installing and testing, start a simulation by running the following command in the BIN subdirectory:./glomosim inputfile > bell.trace The simulation configuration settings are stored in the input file (an example of such file is CONFIG.IN & APP.CONF). At the end of the simulation, a file named GLOMO.STAT is created that contains all of the statistics generated [25]. The simulation environment for scenario is as below:
4.1 Evaluation Metrics Packet Loss Packet loss is defined as the ratio of packets that never made it to their destination to the number of packets initiated by the source. Mathematically it can be shown as Eq. (3). Packet Loss = (nSentPackets − nReceivedPackets)/nSentPackets
(3)
where nReceivedPackets = Number of received packets. nSentPackets = Number of sent packets. Throughput Throughput refers to how much data can be manipulated through given time frame. The data involves amount of data transferred and rate of data successfully delivered. It can be measured either in bits per second or data per second. Mathematically it
22
D. Prabhu et al.
can be shown as Eq. (4). Throughput = Total amount of data transferred/Total time in seconds
(4)
The detailed algorithm for Enhanced Broadcasting by Selective Forwarding (EBSF) is described below. L1 and L2 are two location parameters in the header of every broadcast packet. A node sets L1 to the node from which it receives the bundle (packets) and sets L2 to its own location whenever it transmits a broadcast packet. Every node M keeps track of the distance dm between it and the closest node to which the packet was just sent. The packet is sent by source node N, which moves both L1 and L2 to the same place. The following are the steps defined in our algorithm as follows: 1. 2.
The radius is determined using the radio layer’s radio range function. The packet is treated accordingly, when it arrives from UDP or MAC, • If the packet arrives from UDP, then the node is considered to be the source node and hence the data is sent. • If the packet arrives from MAC layer, either the packet has to send to UDP or dropped.
3.
After receiving a broadcast packet, node M initially places it in the checking array. Duplicity of the packet is ensured with the following procedure. First of its kind, if a packet is not duplicate i.e., the packet is not already available in the message cache, in such a case the packets are inserted in the message cache. And carry out the following steps. a.
b. c.
d.
The number of data received by the node will be increased by one in each stage. Fetch the location of the aforementioned (previous) node and the previous node which transmitted the data. Calculate how far the current node is from the previous node. The six strategy points are determined with the minimum distance set to the value of diameter (2*radius) and using sine and cosine formula and radius. The node M locates the closest point P among the six points picked, which are separated by 60 degrees from the circle’s centre. It calculates its distance l from P and the delays in retransmitting the packet by a delay d given by d = l and returns the original top value of the checking array. (i) If this old top value is less than the current top value, then the array is traversed from the next point of old top value till current top value, if the packet has already been received, then don’t transmit. (ii) If this old top value is less than the current top value, then the array is traversed from the next point of old top value till current top value, if the packet has already been received, then don’t transmit. (iii) If it is more, then traverse the array fully and from the condition (i) and don’t retransmit.
A Six-Point Based Approach for Enhanced Broadcasting …
4.
23
The distance is checked whether it is lesser or greater than threshold * NODES_RADIUS. (i) (ii) (iii) (iv)
If the distance value is less than threshold*NODES_RADIUS then the packet is discarded. (Where threshold is set to 0.6). The packet is discarded if don’t transmit is set to 1. Otherwise, the nodes location and address is updated to the previous location and address. Then broadcast the packet (i.e. send to MAC layer) and increment the data transmitted by one.
The graphical representation of the performance of six-point based approach is shown in Fig. 2. Table 1. Illustrates the performance comparison of six-point based and distance based approach. Entries in the table represents number of nodes 50 transmits the total number of 980 packets, the number of nodes 100 transmits the total number of 1980 packets, the number of nodes 2980 transmits the total number of 2980 packets and the number of nodes 50 transmits the total number of 4950 packets. In Table 1. shows total number of packet transmission get increased (distance based
Fig. 2 Graphical representation of distance based and six-point approaches
Table 1 Parameter evalution
Parameters
Values
OS
Fedora 13
Simulator
GloMoSim
Application
Telnet
No. of nodes
50, 100, 150, 200
Mobility model
Random way point
24
D. Prabhu et al.
Table 2 Comparison of distance and six-point based approaches Number of nodes Total number of packets Total number of packets Total number of packets received transmitted (distance transmitted (six point based approach) based approach) 50
980
576
445
100
1980
1150
972
150
2980
1750
1350
200
4950
2270
1868
approach) but in proposed approached gradually decreased (six point approach). The proposed six point approach clearly shows it minimizing the redundant transmissions (Table 2).
5 Conclusion and Future Work A new protocol for the Glomosim’s network layer is proposed, with a six-point approach. It is indicated that the broadcasting can be significantly upgraded by choosing only an optimal set of transmission nodes and in this manner minimizing the redundant transmissions and simultaneously guaranteeing the data received being equivalent to the information originated. The protocol built in this work reduces data transmissions only on a simulator tool. We can make more enhancements, and this can be used in real-time applications. As a result, this protocol may be included into a framework that fine-tunes broadcast settings based on the actual application. Schemes may be established for discarding packets based on the number of nodes within the transmission range.
References 1. K. Zaheer, M. Husain, M. Othman, Network coding-based broadcasting schemes for cognitive radio networks, in Mobile Communications and Wireless Networks (2018), pp. 65–114 2. E. Royer, C.K. Toh, A review of current routing protocols for ad hoc mobile wireless networks, in IEEE Personal Communications, vol. 6 (1999), pp. 46–55 3. N. Karthikeyan, V. Palanisam, K. Duraiswamy, A review of broadcasting methods for mobile ad hoc network. Int. J. Adv. Comput. Eng. (2009) 4. A. Waheb, J. Al-Areeqi, M. Ismail, R. Nordin, Energy and mobility conscious multipath routing scheme for route stability and load balancing in MANETs. Simula. Modelling Pract. Theor. 77, 45–271 5. K. Karenos, A. Khan, S. Krishnamurthy, M. Faloutsos, X. Chen, Local versus global power adaptive broadcasting in ad hoc networks, in Proceedings of the IEEE Conference on Wireless Communications and Networking (2005), pp. 2069–2075 6. Y.-C. Tseng, S.-Y. Ni, Y.-S. Chen, J.-P. Sheu, The broadcast storm problem in a mobile ad hoc network, in ACM Wireless Networks, vol. 8 (2002), pp. 153–167
A Six-Point Based Approach for Enhanced Broadcasting …
25
7. M.-T. Sun, T.-H. Lai, Computing optimal local cover set for broadcast in ad hoc networks, in Proceedings of the IEEE International Conference on Communications (2002), pp. 3291–3295 8. K. Tang, M. Gerla, MAC Reliable broadcast in ad hoc networks, in Proceedings of the IEEE Conference on Military Communications (2001), pp. 1008–1013 9. J. Wu., F. Dai, Broadcasting in ad hoc networks based on self-pruning, in Proceedings of the 22nd IEEE Annual joint conference of the Computer and Communication Societies (2003), pp. 2240–2250 10. T.J. Kwon, K. Vijay, T. Russel Hsing, M. Gerla, M. Barton, Efficient flooding with passive clustering—an overhead-free selective forward mechanism for ad hoc/sensor networks, in Proceedings of the IEEE, vol. 91 (2003), pp. 1210–1220 11. J. Wu, L. Chen, P. Yan, J. Zhou, H. Jiang, A new reliable routing method based on probabilistic forwarding in wireless sensor network, in Proceedings of the Fifth IEEE International Conference on Computer and Information Technology (CIT’05) (2005), pp. 524–529 12. X. Chen, M. Faloutsos, V. Srikanth, Distance Adaptive (DAD) broadcasting for ad hoc network. IEEE Trans. Parallel Distributed Syst. 15, 908–920 (2004) 13. K. Vamsi, A. Durresi, R. Jain, Optimized flooding protocol for ad hoc networks. IEEE Trans. Parallel Distributed Syst. 15, 1027–1040 (2004) 14. X. Chen, M. Faloutsos, V. Srikanth, Power adaptive broadcasting with local information in ad hoc networks, in Proceedings of the 11th IEEE International Conference on Network Protocols (2003), pp. 168–178 15. S. Susila Sakthy, S. Bose, Optimising residual energy transmission head with SNR value in multiple clusters. Wireless Pers. Commun. 108, 107–120 (2019) 16. D. Bein, K. Ajo, B.A. Sathyanarayanan, Efficient broadcasting in MANETs by selective forwarding, in Scalable Computing: Practice and Experience, vol.11 (2010), pp. 43–52 17. R. Kershner, The number of circles covering a set. Am. J. Math. 61, 665–671 (1939) 18. B. Sundan, K. Arputharaj, Adaptive multipath multimedia streaming architecture for mobile networks with proactive buffering using mobile proxies. J. Comput. Info. Technol. 15 (2007) 19. M. Swetha, S. Pushpa, M. Thungamani, T. Manjunath, D.S. Sakkari, Strong secure anonymous location based routing (S2ALBR) method for MANET. Turkish J. Comput. Math. Educ. 12, 4349–4356 (2021) 20. H. Wang, S. Smys, Soft computing strategies for optimized route selection in wireless sensor network. J. Soft Comput. Paradigm (JSCP) 2(01), 1–12 (2020) 21. S.R. Mugunthan, Rechargeable sensor network fault modeling and stability analysis. J. Soft Comput. Paradigm (JSCP) 3(01), 47–54 (2021) 22. V. Suma, Community based network reconstruction for an evolutionary algorithm framework. J. Artif. Intell. 3, 53–61 (2021) 23. M.S. Deshmukh, N. Kakarwal, R. Deshmukh, An adaptive neighbour knowledge-based hybrid broadcasting for emergency communications, in International Conference on Computer Networks and Inventive Communication Technologies (Springer, Cham, 2019), pp. 86–97 24. J. Nuevo, A Comprehensible GloMoSim Tutorial (2004) 25. K. Jaiswal, Om Prakash, Simulation of MANET using GloMoSim network simulator. Int. J. Comput. Sci. Inf. Technol. 5, 4975–4980 (2014)
Crop Price Prediction Using Machine Learning Naive Bayes Algorithms R. Vikram, R. Divij, N. Hishore, G. Naveen, and D. Rudhramoorthy
Abstract In our nation, development is the crucial mainstay of the economy improvement. A large portion of families are dependent on cultivation. Most of the land is used for cultivation to resolve the issues of the quantity of occupants in the area. Modernize agrarian practices to meet the mentioning requirements. Our assessment hopes to handle the issue of yield cost conjecture even more feasibly to ensure farmers’ profit. To consider better plans, it uses Machine Learning different techniques on various information. In our day today life we are using the agricultural products, Agricultural product price will we varied from one month to another month. Sometimes weather conditions crop price will be varied from one season to another season. If any major price fluctuation is happened in the Agricultural field means it will affect the Gross Domestic Product (GDP). In this paper we are going to predicting the crop price fluctuations from one season to another season Which was very useful for farmers. These predications will help to farmer in the way of proper analysing the crop price to the next season.
1 Introduction Cultivation is the establishment of every economy. From old period, cultivation is considered as the standard and the transcendent culture practiced in any space. There are various methods of extending and work on the gather yield and the idea of the harvests [1]. Data burrowing is similarly significant for expecting the gather cost Information of data mining is the most notable strategy for investigating information according to substitute viewpoints and summing up it into strong data. Yield regard measure is a basic developing issue [2]. Each and every farmer reliably endeavours to know, how much worth he will get from his presumption. Previously, esteem gauge was controlled by taking apart farmer’s previous experience on a particular collect [3, 4]. Precise data about history of accumulate yield is something fundamental for R. Vikram (B) · R. Divij · N. Hishore · G. Naveen · D. Rudhramoorthy Department of Computer Science and Engineering, M.Kumarasamy College of Engineering, Karur 639113, Tamilnadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_3
27
28
R. Vikram et al.
settling on choices identified with regard presumption for the harvests. Accordingly, this paper proposes a plan to anticipate the cost of the yield. Data Analytics is the most comprehensively seen procedure for dismantling edifying records to settle on choices about the information they contain, relentlessly with the colleague of express plans and programming. Prior, crop cost figure was performed by mulling over the rancher’s experience on a field and accumulate. In any case, as the conditions change bit by bit rapidly, farmers are constrained to create a steadily expanding number of harvests. Being this as the current circumstance, a tremendous number of them don’t have satisfactory information about the difficulties that may cause and are not totally mindful of the advantages they get while creating them [5]. The proposed system applies AI besides, gauge computation like Logistic Regression, Decision Trees, XGBoost, Neural Nets, and Clustering to perceive the model among data and a while later cycle on it. This subsequently will help with predicting the true expense of the yield [6]. Price Forecast System: In this project development paper, we will anticipate the crop price prediction by using the different machine learning techniques and Machine learning algorithms [7]. The cost of the harvest is controlled by perceiving the data in our dataset which is given as one of the contributions to the Algorithm. The sources of info esteems for the Parameter are taken by the client and took care of to the calculation [8].
2 Related Work This paper is to determining the interest of the harvest by the Analysing the cost of the yield at the right situation. The scientist gathers the information data associated to their work from the Khorat Plateau markets by the data on market expenses of in excess of 20 close by business areas. The expense is expected utilizing the incredible show computations like the autoregressive coordinated moving normal, halfway least square, fake neural organization. This Survey paper comparing performance of the four Algorithms of price analysing [9], From the algorithms they have well analysed the local market price. In their project they had analysed the price prediction for carrot, beans, potato and Cabbage. As indicated by the outcomes so got the PLS and ANN algorithms gave a preferred outcome over other for both present moment and long stretch evaluating [10]. The exploration focuses over picking a suitable yield for a region choose by the user along these lines assisting the rancher with taking better and shrewd choices. It likewise proposes the position of the crop based the reasonableness to that space. Along these lines, the ranchers get to know the similarity of the picked crop and the region chosen [11]. Author was carried out some different crops such as Tomato, potato cucumber, Banana, strawberry, Apple, Brinjal, Avocado and the information from the earlier a long time are gathered individually. The forecast is done by examining the information from data set utilizing the machine learning different techniques [12].
Crop Price Prediction Using Machine Learning …
29
3 Proposed Work Naive Bayes Algorithm is a Machine Learning Classification technique that utilizes Bayes Algorithm. The presumptions are free and disconnected to different highlights which we use for value forecast. Steps should be followed in this Algorithm The algorithm has 4 Parameters: (1)
(2)
(3)
(4)
The quantity of Parameter (m) The forecast depends on the different Parameter which is given as the contributions of the calculation through our frontend. This Parameter impact the expense of gather as referred to the past portions. The Probability of potential results (p) There can be quite a few potential results which rely upon the quantity of records in each yield in the preparation of the information data collected. Training Information collected—The earlier years information gave in a coordinated design is given to a commitment as the computation from which the evidence will be isolated ward on the latest information limits of a collect to which the expense must be expected. Latest evidence track record values of another Recent year documents which accommodate the upsides of the Parameter for the conceivable cost should be anticipated. The algorithm works based on the formula Prob = numc + (m ∗ prob) /(num + m)
where, num—The quantity of earlier evidence where the worth of the Parameter is pretty much as similar to the new recent evidence boundary esteem. numc—The quantity of past records that has the outcome esteem equivalent to the conceivable result esteem. Prob—he probability of a yield at a specific cost (Fig. 1). The one worth with more noteworthy likelihood would be picked as the most reliable worth. K closest neighbour is a directed AI grouping algorithm which for our situation is utilized for foreseeing the benefit of the yield. This calculation fundamentally looks for similitudes in the whole dataset. In view of the generally comparative or closest qualities the result is anticipated. This Calculation is picked over different calculations due to its higher combination speed and effortlessness.
3.1 Profit Prediction System In excess of 11,000 Indian ranchers carried out self-destruction in 2016, as indicated by the National Crime Records Bureau. While the high pace of self-incurred
30
R. Vikram et al.
Fig. 1 System design
fatalities could be credited to various reasons, monetary, misery and powerlessness to offer yields because of broad change in the country’s produce market costs is among them. In India, the public authority has set least help costs for crops, however doesn’t attempt to expressly constrain these costs upon the purchasers. The still uncertain dependent after shortlisting the similar characteristics found in the information collected values. The estimation of our prediction takes the qualities like yield, esteem, improvement costs and certifiable seed cost.
4 Results 4.1 Naive Bayes Algorithm Naive Bayes classifiers expect to be solid, or guileless, autonomy between characteristics of important elements. Famous employments of Naive Bayes classifiers incorporate spam channels, text investigation and medical diagnosis. They furnish assemble precise models with generally excellent execution given their straightforwardness (Fig. 2). Fig. 2 Equation expansions
Crop Price Prediction Using Machine Learning …
P(A|R) =
P(R|A)P(A) P(R)
31
(1)
We will introduce the fundamental ideas with respect to Naive Bayes algorithm, by concentrating on an Example model: Let as consider the two organization representatives were working at a similar work station and his name was Anbu and Balu. • Anbu comes to the workstation 3 days per a week in the month • Balu comes to the workstation 1 days per a week in the month We are at the workplace and we see passing across us somebody exceptionally fast, so we don’t know who is the Employee is crossed: Anbu and Balu. Given the information that we have until know and tolerating that they simply work 4 days consistently, the probabilities of the individual seen to be all things considered Anbu and Balu, Prob (Anbu) = 3/4 = 0.75. Prob (Balu) = 1/4 = 0.25. At the point when we saw the individual person crossing by, we saw that he/she was wearing a Red coat. • Anbu wears coat 2 times each week • Balu wears coat 2 times each week So, for each week’s worth of work, that has 5 days, we can infer the following: • The probability of Anbu to wear Red coat is → Prob (Red|Anbu) = 2/5 = 0.4 • The probability of Balu to wear Red coat is → Prob (Red|Balu) = 3/5 = 0.6 At first, we know the probability of prob (Anbu) and prob (Balu) and afterward we inferred the probabilities of Prob (Anbu) and prob (Balu). The genuine probabilities are: Prob (Anbu|Red) =
0.3 = 0.67 0.3 + 0.15
(2)
Prob (Balu|Red) =
0.15 = 0.33 0.3 + 0.15
(3)
Above, we had explained the fundamental ideas with respect to Naive Bayes algorithm as an example of Naive Bayes algorithm.
R. Vikram et al.
Crop price Analysis
32
Fig. 3 Crop price prediction analysis using Naive Bayes algorithm
4.2 Result Performance In this model prediction, we had taken Rice as the example for price prediction. As per the market current price of the Rice is Rs. 1685/quintal (Month of Jan). We had taken the dataset in govt site data.govt.in. Brief forecast (last Six months): Minimum crop price prediction Rs.1587/quintal (Month of June), Maximum crop price prediction Rs. 1783/quintal (Month of August). As per the government data price of Rice is Rs. 1546/quintal (Month of December), Price of Rice predicted is Rs. 1532.56/quintal (predicted price). We had approximately 90% Accuracy (Fig. 3).
4.3 User Interface Website The point of arrival interface contains various tabs with the names of all of the singular harvests so, it is very well instinctively utilized by everybody. It’s been kept extremely short-sighted to make it easier to understand. Ensuing to tapping on the tab it will redirected to the discrete page of the yield which accommodate nonidentical insights concerning. The subsequent page has different fragments which depict: Rate variations in cost over next few years a year.
Crop Price Prediction Using Machine Learning …
33
4.4 Supervised Naive Bayes Algorithm The steps to be followed by the Naive Bayes Algorithm to solve like previous problems, • Convert dataset into a recurrence table. • Makes a likelihood table by tracking down the probabilities of the occasions to happen. • The class with the higher back likelihood is the result of the Prediction. 4.4.1
Qualities and Weaknesses of Naive Bayes Algorithm
• Simple and speedy method for anticipating classes, both in binary and multiclass classification problems. • the algorithm performs better contrasted with other classification models, even with less training information data. 4.4.2
The Primary Drawbacks of Utilizing This Strategy Are,
• Naive Bayes accepts that all predictors (or highlights) are autonomous, seldom occurring in real life. This limits appropriateness of this algorithm in genuine use cases.
4.5 Future Intensification The future intensification of our application is to complete this application in each area (by utilizing an GPS module) whatever amount as could be anticipated by extricating the dataset for that space which further grows the accuracy and probability. Completing the discussion doorways making the application simpler to utilize can be another expansion for redesign. In this manner, guaranteeing that the information is open all things considered effortlessness for themselves and the more exact assumption fabricates the probability of benefit for the Farmers.
5 Conclusion Our investigation targets foreseeing both the cost and benefit of the given yield prior to planting. Our web application Works on productive Machine learning techniques and progresses advancements having a general straightforward interface to the customers. The preparation datasets so procured gives enough experiences for foreseeing the fitting cost in the business sectors. Effectively foreseeing the cost of
34
R. Vikram et al.
the yields nearly 95% exactness. The getting ready datasets so procured give them enough encounters to predicting the reasonable expense and solicitation in the business areas. This framework supports the ranchers in decreasing the challenges and end them by endeavouring self-destruction.
References 1. Y.-H. Peng, C.S. Hsu, P.-C. Huang, Developing Crop Price Forecasting Service Using Open Data from Taiwan Markets (2021) 2. M.T. Shakoor, K. Rahman, S.N. Rayta, A. Chakrabarty, Agricultural Production Output Prediction Using Supervised Machine Learning Techniques (2021) 3. S. Veenadhari, B. Misra, C.D. Singh, Machine learning approach for forecasting crop yield based on climatic .parameters, in International Conference on Computer Communication and Informatics (ICCCI-2014), 03–05, 2021, Coimbatore, India 4. R. Dhanapal, A. Ajan Raj, Balavinayagapragathish, J. Balaji, Crop Price Prediction Using Supervised Machine Learning Algorithms (2021) 5. I. Ghutake, R. Verma, R. Chaudhari, V. Amarsinh, An Intelligent Crop Price Prediction Using Suitable Machine Learning Algorithm (2021) 6. G.S. Sajja, S.S. Jha, H. Mhamdi, M. Naved, S. Ray, K. Phasinam, An Investigation on Crop Yield Prediction Using Machine Learning (2020) 7. Y. Masare, S. Mahale, M. Kele, A. Upadhyay, B.R. Nanwalkar, The System for Maximize the Yielding Rate of Crops using Machine Learning Algorithm (2021) 8. M.S. Gastli, L. Nassar, F. Karray, Satellite Images and Deep Learning Tools for Crop Yield Prediction and Price Forecasting (2021) 9. G. Hegde, V.R. Hulipalled, J.B. Simha, A Study On Agriculture Commodities Price Prediction and Forecasting (2020) 10. A. Chaudhari, M. Beldar, R. Dichwalkar, S. Dholay, Crop Recommendation and its Optimal Pricing using ShopBot (2020) 11. L. Nassar, I.E. Okwuchi, M. Saad, F. Karray, K. Ponnambalam, P. Agrawal, Prediction of strawberry yield and farm price utilizing deep learning, in 2020 International Joint Conference on Neural Networks (IJCNN) (2020) 12. L. Shu, B. Hsiao, Y. Liou, Y. Tsai, On the Selection of Features for the Prediction Model of Cultivation of Sweet Potatoes at Early Growth Stage (2020)
The Performance Evaluation of Adaptive Energy Conservation Scheme Using IoT Syed Imran Patel, Imran Khan, Karim Ishtiaque Ahmed, V. Raja Kumar, T. Anuradha, and Rajendra Bahadur Singh
Abstract This article proposes an integrated method for energy conserving Internet of Things (IoT) devices that takes into consideration both node particular and mobile edge computing benefits. Three operating modes are defined based on end node characteristics and application needs. This approach allows end nodes to autonomously pick and alter functioning modes according on the current conditions. It makes smart home deployment simple and feasible. The smart house concept allows occupants to oversee and control their energy consumption. Smart homes require forecasting systems due to the scheduling of energy usage. It can reduce the working time of end nodes by adjusting their sampling frequencies. The integrated solution seeks to reduce IoT device energy usage and therefore battery life. The simulation results illustrate the proposed integrated strategy’s effectiveness and efficiency in terms of energy usage.
1 Introduction The Internet of Things (IoT) is a network of billions of networked devices that can transport data without human intervention. It connects physical devices to the internet so they may be managed remotely [1]. This includes people, cattle with biochip transponders, cars with sensors that inform the driver when the tyre pressure S. I. Patel · I. Khan · K. I. Ahmed Bahrain Training Institute, Higher Education Council, Ministry of Education, Isa Town, Bahrain V. Raja Kumar (B) Department of EEE, GITAM Deemed to be University, Visakhapatnam, India e-mail: [email protected] T. Anuradha Department of Electrical and Electronics Engineering, KCG College of Technology, Chennai, India R. B. Singh Research and Faculty Associate, Department of ECE, Gautam Buddha University, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_4
35
36
S. Patel et al.
is low, and any other natural or man-made object having an IP address. The Internet of Items (IoT) aims to make people’s lives better by allowing things to communicate with one another and act accordingly without explicit instructions. The Internet of Things tracks physical objects in real time, improving our daily lives. IoT is a large system combining several technologies [2]. So an IoT system faces various obstacles at every tier. Most low-level IoT devices are energy-constrained and battery-powered. Batteries are tough to replace in distant regions. They need efficient energy management since they perceive and share data often [3]. An adaptable Energy Management System (AEMS) for smart homes is the major goal of this project. Understanding smart houses in Singapore redesigned a Home Energy Management System. Data collecting devices were installed in smart houses to analyse the home environment using existing Internet of Things (IoT) [4].
2 Background To avoid issues caused by improper garbage management, such as health and environmental concerns, a number of works propose strategies to support household trash management by simplifying waste separation. Proposals have been made for smart garbage management systems that can recognise waste items before they are separated from other trash [5]. Trash separation from hazardous materials like paint and batteries is reduced as a result of this practise. present a hygienic electronic rubbish sorting method. By reducing human contact and pollution at the point of generation, the proposed strategy aims to address the problem of improper waste segregation and handling.
Fig. 1 Existed energy management system
The Performance Evaluation of Adaptive Energy …
37
Fig. 2 Existed IoT proposed smart home technology
In the paper, IoT-enabled smart metres were discussed. Figure 1 shows an adaptive compression strategy described in the same paper to improve the communication infrastructure under consideration [6]. The proposed method reduces the amount of data that consumers provide to utilities while also automating the process of energy management. Investigated a machine learning-based smart home energy system using big data and the Internet of Things (IoT). To save electricity, the home automation system was linked to IoT gadgets. Users’ behaviour was studied using a machine learning system, which was then linked to the amount of energy they consumed. A detailed energy efficiency proposal was also possible thanks to the system’s built-in monitoring (Fig. 2). A case study that focused on the unique characteristics of each residence served as an effective means of validating the technique [7]. Networking with and balancing other residents in the complex is another approach. BIM (Building Information Modeling) technology’s importance was examined in the review research. Recent research on IoT-based building management systems was presented. Data sharing in BIM-related applications has led to challenges with interoperability and renovation projects [8].
3 Adaptive Energy Management Syastem HEM systems are considered in the proposed demand-side management algorithm. The HEM system is depicted in Fig. 3. Home appliances are monitored and controlled
38
S. Patel et al.
Fig. 3 Proposed adaptive energy management system
by this system. Using an IEC-62052 smart metre, a household’s overall energy use may be calculated. Communication and decision-making take place in the proposed system’s central control system. As well as an Ethernet (IEEE 802.3) module, the CCS includes a microprocessor, a display, a keypad, and zigbee communication modules (IEEE 802.15.4). The proposed method will be executed in real time by the microcontroller [9]. The microcontroller gets data from smart metres through zigbee and transmits it via the internet/ethernet to the utility companies (The utility can establish the day-ahead electricity pricing based on power usage statistics from all users). Using a keypad module and a display unit to collect data from users [10]. In this section, we’ll go over the many types of household appliances that can be connected to the End Device Zigbee network, their PAN IDs, power ratings in kW, and the time periods required to perform each appliance’s mission. The microcontroller will decide whether or not the appliances should be turned on or off based on the utility and consumer inputs. This wireless network is commonplace today (IEEE 802.11). Using internet-based servers, sensors and other equipment can be controlled and monitored from a remote place. A residential home’s energy management system is
The Performance Evaluation of Adaptive Energy …
39
Fig. 4 Proposed smart controller based on IoT
designed and implemented with the help of servers and sensors [11]. The IoT energy management system’s most important components are: • Controller • Smart Devices • Wireless Connectivity Figure 4 depicts the overall layout of the system’s control structure. Computers, smartphones, and a wireless router are all part of the system. The internet is utilised as a remote control for a computer or a smart phone. Modules for switching, RF, and environmental monitoring are also available, as are data loggers and other related accessories. With the help of the smart central controller, a variety of loads can be connected to the energy supply [12]. To set up a wireless sensor network (WSAN) using radio frequency (RF; 433 MHz) in a home, this device is responsible. Appliances can be controlled and communicated with using these modules. A smart phone or computer can be used by the homeowner to monitor and control equipment in the home remotely, such as turning on or off the item in the residence.
4 Results The system states change as a result of the processed data being added, and these new methods for energy consumption control are implemented. Control data is sent from
40
S. Patel et al.
Fig. 5 Block diagram of the adaptive pover management system process
the AEMS node to the IoT-based thermostats and lighting control units, allowing them to adjust the set point temperatures and light intensities, among other things. Figure 5 depicts the system’s schematics. As well as providing a foundation for the development of a wireless sensor network that includes wearable nodes as well as other energy metres and monitoring sensors, these customised SHM nodes have also been utilised for general structural health monitoring (e.g. light, heat, electricity, and so on). Vibration, displacement, humidity, and temperature can all be measured by AEMS nodes. For real-time indoor localization and data transfer between wearable nodes and the AEMS node, a UWB radio frequency module was also added. Data from AEMS nodes can be transmitted via the Internet using Wi-Fi modules as part of an IoT device. On-board quad band GSM/GPRS data may be used to transmit and receive data in situations where Wi-Fi is unavailable. A network of 100 nodes is depicted in Fig. 6, which shows the relationship between the amount of energy consumed and the amount of time it takes to simulate it. When compared to EMS and SHM, AEMS is clearly the most energy efficient. The network’s lifespan is extended as a result of the reduction in setup message overheads and energy usage. Figure 7 illustrates the total number of nodes that were still alive at the end of the simulation time in a network of 100 nodes at the end of the simulation duration (at the time of the experiment). As an example, consider Fig. 8,
Fig. 6 Graph between the proposed and existed methods
The Performance Evaluation of Adaptive Energy …
41
Fig. 7 No of live sensor versus total no of nodes
Fig. 8 Energy consuption at differnt modes of the proposed method
which depicts a graphical representation of the energy consumption of the suggested approach at each node in the system.
42
S. Patel et al.
5 Conclusion The rapid expansion and widespread deployment of IoT-based wireless devices has resulted in a tremendous amount of energy being wasted. In light of this, it’s imperative that researchers look into new methods and approaches for extending the life of battery-powered devices. A new taxonomy of various energy-saving strategies recently proposed for traditional WSNs and IoT-based systems has been introduced in this work. We began by looking at some of the most influential studies on energy conservation methods from a classification perspective. In addition, although though the suggested system encourages battery-free adaptive management, a battery unit might be included to store energy and deliver energy during times of abundance. The adaptive model could incorporate a forecast system for the generated supply, which could lead to better energy allocation and demand processing.
References 1. C. Tipantuña, X. Hesselbach, IoT-enabled proposal for adaptive self-powered renewable energy management in home systems. IEEE Access 9, 64808–64827 (2021). https://doi.org/10.1109/ ACCESS.2021.3073638 2. H.M. Hassan, R. Priyadarshini, Optimization techniques for IoT using adaptive clustering, in 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) (2021), pp. 766–771. https://doi.org/10.1109/ICCCIS51004.2021.9397128 3. C. Tipantuna, X. Hesselbach, NFV/SDN enabled architecture for efficient adaptive management of renewable and non-renewable energy. IEEE Open J. Commun. Soc. 1, 357–380 (2020) 4. C. Tipantuña, X. Hesselbach, NFV-enabled efficient renewable and non-renewable energy management: requirements and algorithms. Future Internet 12(10), 171 (2020) 5. S. Kazmi, N. Javaid, M.J. Mughal, M. Akbar, S.H. Ahmed, N. Alrajeh, Towards optimization of Metaheuristic algorithms for IoT enabled smart homes targeting balanced demand and supply of energy. IEEE Access 7, 24267–24281 (2019) 6. V. Reddy, M. Rabbani, M.T. Arif, A.M. Oo, IoT for energy efficiency and demand management, in Proceedings of 29th Australas. Universities Power Engineering Conference (AUPEC) (2019), pp. 1–6 7. G. Loubet, A. Takacs, D. Dragomirescu, Implementation of a batteryfree wireless sensor for cyber-physical systems dedicated to structural health monitoring applications. IEEE Access 7, 24679–24690 (2019) 8. K. Wang, H. Li, S. Maharjan, Y. Zhang, S. Guo, Green energy scheduling for demand side management in the smart grid. IEEE Trans. Green Commun. Netw. 2(2), 596–611 (2018) 9. R. Basmadjian, J.F. Botero, G. Giuliani, X. Hesselbach, S. Klingert, H. De Meer, Making data centers fit for demand response: introducing GreenSDA and GreenSLA contracts. IEEE Trans. Smart Grid 9(4), 3453–3464 (2018) 10. B. Hussain, Q.U. Hasan, N. Javaid, M. Guizani, A. Almogren, A. Alamri, An innovative heuristic algorithm for IoT-enabled smart homes for developing countries. IEEE Access 6, 15550–15575 (2018)
The Performance Evaluation of Adaptive Energy …
43
11. N. Javaid, I. Ullah, M. Akbar, Z. Iqbal, F.A. Khan, N. Alrajeh, M.S. Alabed, An intelligent load management system with renewable energy integration for smart homes. IEEE Access 5, 13587–13600 (2017) 12. A.R. Al-Ali, I.A. Zualkernan, M. Rashid, R. Gupta, M. Alikarar, A smart home energy management system using IoT and big data analytics approach. IEEE Trans. Consum. Electron. 63(4), 426–434 (2017)
Towards the Prominent Use of Internet of Things (IoT) in Universities Abdool Qaiyum Mohabuth
Abstract The emergence of IoT has recently been accelerated by innovations implemented in the areas of on-board equipment and low-power networks, embedded systems software and IT on the periphery of edge computing. With IoT, small devices equipped with sensors are put on networked using low-power radios and there is the integration of different protocols stack for a multi-purpose network (6LoWPAN), leveraging an almost infinite pool of unique network addresses using IPv6. Recent development of IoT technologies has drawn up the attention of Universities to integrate the concept of IoT in all their service areas including learning. They are prepared to invest massively on IoT in their data centres with the objectives of providing a totally new and modern concept in terms of teaching & learning, research and service to the community. The purpose of this study is to make an assessment of the level of knowledge students possess in terms of IoT technologies and whether IoT systems have an influence on students’ learning. The study targeted students in the IT fields due to the fact that they are more exposed to the IoT infrastructure. The findings reveal that IT students have IoT knowledge which needs to be enhanced in terms of further use of big data technologies and data analytics skills. The outcome as regards to the integration of IoT systems in learning is found to be very convincing for Universities.
1 Introduction The Internet is gradually being transformed into a Hypernetwork that incorporates a multitude of connections between artifacts (physical, documentary), actors (biological, algorithmic), scripts and concepts (linked data, metadata, ontologies, folksonomy). It is being named Internet of Things (IoT), connecting billions of people with billions of objects. Developments in Machine-to-Machine (M2M) technologies for remote machine control and also the emergence in the year 2000 of IP (Internet Protocol) on cellular mobile networks have accelerated the evolution of M2M A. Q. Mohabuth (B) University of Mauritius, Reduit, Mauritius e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_5
45
46
A. Q. Mohabuth
towards IoT [1]. IoT can be regarded as a global network infrastructure composed of many interconnected devices technology-based sensorial (WSN, NFC, Zigbee, RF), communication, networking and information processing. The IoT paradigm has its own concepts and features. It also shares large amounts of concepts with other fields of IT. IoT has grown significantly in recent years. The Internet democratisation as a means of communication today allows the use of objects connected to the network. IoT development is one of the challenges of tomorrow for the evolution in many core sectors of the society including education and health. It has a major role to play in Universities. Indeed, its use in higher education will enable, among other things, the centralisation of data, better management of resources and better organisation of the learning journey for students. The evolution of technology has led to the opening of education to new methods based on collaborative processes. The level of connectivity offered by the Internet of Things (IoT) enriches the learning process for students from around the world. The application of IoT in higher education is about building an ecosystem where students and academics can gain an understanding of the new learning environment which can trigger towards a change in learning. Intelligent education brought forward through IoT emphasizes the ideology of better education that meets pedagogical needs as a methodological problem and intelligent learning as a technological problem and develops educational objectives to cultivate intelligent learners as outcomes. Universities have already understood the importance of IoT and are moving towards its integration within the academic community. Many of them are investing in improving their data centres infrastructure where data are located and stored in centralised or globalised distributed databases with cloud computing technologies facilitating the implementation of IoT systems. This is followed by the creation of various services in Universities with the use of chips, sensors and network for facilitating academic and administrative duties.
2 Literature Review IoT describes the connection of devices or objects through the Internet to deliver information according to a specific function assigned to it. Basically, an IoT system is composed of an on-board system that collects data directly at the information source level, a network protocol that will enable the receiving and sending data for information exchange with a facility that will store and process information. IoT describes scenarios where network connectivity and computing capability use devices, objects and sensors to generate, send/receive, exchange data without human intervention [2]. The evolution of transmission networks, decentralization of communication protocols, cloud computing and edge computing has resulted in the emergence of IoT systems [3]. IoT is also referred to as the collection, analysis and sharing of data among large number of objects with sensors and drivers interacting with programs and platforms as illustrated in Fig. 1. It uses the IPv6 protocol for the internetworking of smart physical objects configured to collect and exchange data. It has created a revolution in IT where sensors embedded in machines collect data and
Towards the Prominent Use of Internet of Things …
47
Fig. 1 Transformation into IoT system [6]
smartly connected objects use, manage and monitor data [4]. IoT is seen to attract the attention of Universities, Industries and Societies for its remarkable technologies and power to reach far-reaching ends [5]. With the revolution caused by IoT, students expect a stimulating and easy-tomanage educational experience, with interactive, modern and fun teaching techniques to overcome the problem of traditional boring classes that are based on a unique instructional model. For students, IoT helps in communicating with friends which may be local or remote [7], sharing course and project data, remotely accessing learning materials [8], discussing learning resources online [9]. IoT is also used as supporting tools for students providing them with adapted learning materials by the integration of topics based on knowledge of learners, difficulty level, peer-to-peer interaction, location, date and time [10]. The smart learning process allows learners to complete their courses on their devices mobile (tablets, phones, computers, laptops, connected objects). Students have full access to their learning resources in real-time education through their mobile devices. IoT provides space for learning, sharing and acquiring knowledge and skills that value the different forms of individual and collective intelligence. This facilitates students to adapt to the educational, technological, social and cultural changes brought about by globalisation and the openness to the world offered by the Internet. IoT values innovation, the search for creativity with teamwork bringing out the different actors to learn together and share good practices and experiences. IoT technologies for teaching include electronic whiteboards, microcontroller development boards, iPADs, Laptops and Tablets, Smart Phones, RFID (Radio Frequency Identification) enabled Student ID cards, e-books, e-learning
48
A. Q. Mohabuth
platforms, augmented reality, virtual reality and biometric attendance tracking facilities [11, 12]. Lecture rooms equipped with interactive electronic whiteboards allow consistent and enhanced learning experience making it easy to add, edit and share content with students while bringing online content on the fly to support the discussions of lectures. Students are encouraged to bring their own devices (mobile phones, tablets, laptops, etc.) to the lectures allowing them to perform their course works and assignments on their own devices. Students may even use micro-controller development boards such as Arduino, Raspberry Pi and STM32 Nucleo as small IoT platforms [13]. From the academic perspective, IoT plays a significant role beyond the provision of learning resources to students. It allows tracking out students [14, 15]. It may allow various types of monitoring in the learning environment making it out more conducive to learning. For e.g. One lecture room may communicate with the next and noise sensors may detect if the noise level in the other lecture room exceeds a certain level, a warning message could be displayed on the projected screen in the noisy room [16]. IoT systems may even help in monitoring psychological health for students. Tags may be used for Labs and IT equipment and tracked by the RFID technology [17]. Administrative tasks such as registration for courses may be handled by the use of IoT systems [18]. IoT systems differ from conventional network applications as they are implemented with sensors capable of sensing data in real-time [19]. It is also crucial to look at the energy efficiency of the sensors used with the IoT systems [20] to ensure convenient use of the IoT applications in Universities’ data centres.
3 Research Methodology A survey was conducted to find out about the level of knowledge IT students have on IoT systems and what influence does IoT has on their learning. The sample is stratified over students studying Software Engineering and Information Systems at undergraduate levels with a target of some 150 students. The questionnaire developed consisted of four sections with the first section aiming at extracting the knowledge students possessed about IoT technologies, ten commonly known terminologies were considered. This was followed by a second section which aimed at finding out the knowledge students have towards the development of IoT systems. The third section consisted of ten items related to the practices involved in learning by the use of IoT technologies and the last section is about the demographic study of the students. Likert scale with rating of 1 to 5 was used in the first four sections for convenient statistical computation of the data collected.
Towards the Prominent Use of Internet of Things …
49
4 Results and Discussions Seventy four students successfully completed the questionnaires. Cleaning of data was performed. A total of 45.9% male and 54.1% female responded to the survey. The level of awareness of the current terminologies used in IoT systems is illustrated in Table 1. These terminologies form part of the knowledge-based for IoT systems and have been addressed in the literature review section. As expected, students are seen to have highest knowledge about Cloud Computing. This is quite obvious as this technology has been in place for quite sometimes and students are fully aware of the deliveries of hosted services over the Internet. They have been using the cloud infrastructure involving hardware and software components and they have accessed resources managed by their University data centre. Most of them have knowledge about how cloud computing works which involves the accessibility of cloud applications to remote physical servers and databases through the enablement of client devices. Knowledge about sensors comes in the second position (mean = 3.99) as students have often come across various types of sensors such as temperature, humidity, light, and pressure sensors. They are aware of the functionalities of sensors which respond to a physical stimulus such as light, heat, sound, or motion and transmit impulses as a result for measuring or operating purposes. Students have studied various types of network protocols (mean = 3.96), they are aware about the needs of rules for information exchange in IoT. Students being in IT also have knowledge about the use of embedded software in IoT devices to control specific functions (mean = 3.95). RFID is a core component in IoT systems and weak knowledge about it would mean low level of proficiency. Indeed, it is seen that students are aware about its importance in IoT systems (mean = 3.85). This demonstrates that they are aware that RFID can uniquely identity objects by making use of wireless communication that incorporates the use of electromagnetic or electrostatic coupling in the radio frequency portion of the electromagnetic spectrum. Students are also seen to appreciate the use of smart objects (mean = 3.15) in IoT systems. Table 1 Distributive Statistics about IoT knowledge of students Mean
SD
Skewness
Kurtosis
Smart objects
3.15
0.950
−1.240
−1.635
Sensors
3.99
0.2020
−1.495
23.079
Network protocols
3.96
0.275
−3.133
8.034
Cloud Computing
4.74
0.642
−2.260
3.430
Server Virtualisation
3.08
0.361
4.682
21.871
Big Data
2.47
0.815
1.261
−0.269
6LoWPAN
1.80
0.951
0.422
−1.789
Advanced Encryption Standard (AES)
2.97
0.232
−8.602
74.000
Embedded Software
3.95
0.874
0.107
−1.698
Radio Frequency Identification (RFID)
3.85
0.395
−1.304
1.827
50
A. Q. Mohabuth
However, when it comes to more specific IoT components, students are seen to have poor knowledge. It is observed that students are not very familiar with the terms Big Data, 6loWPAN and AES (mean < 3). These indicate that students have not experienced sufficient work on using extremely large data sets. They have poor exposure on computation analysis to reveal trends, patterns, and associations, especially related to people behaviour and interactions. Similarly, the security aspects of IoT systems are not very much of concern to the students. They do not hold solid knowledge about the application of AES in securing IoT systems. Students are also poorly aware about the integration of low power devices with limited processing capabilities on the IoT system (mean = 1.80). The data is seen to be negatively skewed with a concentration to the left for most of the terminologies on which students have concrete knowledge. As regards to the knowledge needed in implementing IoT systems, a classification in terms of male and female students have been worked out as shown in Table 2. It is seen that there are items where male students have better knowledge than female and vice-versa. Hence, we cannot differentiate IoT knowledge in terms of gender. However, we can note that both male and female students are at ease with the required programming languages for IoT development (mean > 3). Similarly, both are quite well versed with Machine Learning and AI concepts. Where the knowledge is poor, it is seen to be same for both male and female students, experience of big data technologies and data analytic skills (mean < 3). The findings reveal that students being in the IT field are exposed to IoT technologies and they have experienced and are knowledgeable about the IoT concepts. The basic terminologies forming part of the IoT concept are not new to them, but they still need to have enhanced knowledge and skills in some of the more pertinent constituents of IoT notably 6loWPAN, Big Data, AES. Similarly for implementing Table 2 Distributive statistics about knowledge for implementing IoT systems Gender Items Male
Mean SD
Skewness Kurtosis 0.633
−1.402
Experience in any of the programming languages 3.71 (Java,.NET, Python, PHP, Go, C++.)
0.759
Experience in web or mobile development
3.09
0.866 −1.070
1.071
Knowledge of big data technologies for collecting, storing, processing data
2.35
0.849
4.429
Data analytics skills
2.073
2.59
0.925
1.190
0.092
Knowledge in the field of machine learning & AI 3.26
0.666
2.267
3.507
Female Experience in any of the programming languages 3.05 (Java,.NET, Python, PHP, Go, C++.)
0.316
6.325
40.000
Experience in web or mobile development
2.78
0.660
0.854
1.422
Knowledge of big data technologies for collecting, storing, processing data
2.42
1.083
1.604
1.827
Data analytics skills
2.25
0.346
2.286
4.126
Knowledge in the field of machine learning & AI 3.48
0.599
0.101
−0.327
Towards the Prominent Use of Internet of Things …
51
Table 3 Test of normality Kolmogorov–Smirnov Influence Learning Index
Shapiro–Wilk
Statistic
df
Sig
Statistic
df
Sig
0.243
74
0.000
0.940
74
0.002
Table 4 Chi-square test
InfLearningIndex Chi-Square
63.054a
df
10
Asymp. Sig
0.000
a: Zero cells have expected frequencies less than 5. The minimum expected cell frequency is 6.7
IoT technologies, students are seen to have the right background in terms of programming languages, mobile development platforms as well as AI and Machine Learning features. However, more opportunities must be given to them to grasp the concepts of Big Data technologies as well as improving their analytical skills. As regards to whether IoT systems put forward by the University positively influenced learning, this was tested by considering the following hypothesis. Ten factors were identified involving (1) communication interaction through smart classrooms and e-learning applications; (2) accessibility of cloud services; (3) facility to track students; (4) use IoT platforms for assignment work; (5) submission of course work through IoT platforms; (6) making on-line presentation; (7) creating video; (8) clarifying difficult concepts by the simulating reality; (9) automating the learning process by tracking up level of achievement; (10) receiving feedback for learning from the IoT platforms. Ho : Use of IoT systems do not have any influence on students’ learning. H1 : Use of IoT systems positively influence students’ learning. A learning influence index was created based on the above independent variables. The data were seen not to be normally distributed under both tests (i = 0.000 and p = 0.002) as shown in Table 3 which led to the use of a non-parametric test. One sample Chi-Square was used to test this hypothesis as shown in Table 4. The null hypothesis Ho is rejected as p = 0.000 < 0.05. This indicates that the use of IoT systems has a significant effect on learning.
5 Conclusion The study made an assessment of the level of knowledge IT students possessed on IoT technologies and the influence of these on their learning. There is no doubt
52
A. Q. Mohabuth
that Universities should continue to promote the use of IoT technologies. The above study confirmed that IoT systems have a positive impact on learning. The more Universities invest in the IoT infrastructure, the better will be the exposure to students. Providing the IoT technologies will definitely act as incentives for students to use, learn, experience, acquire knowledge, develop skills and implement IoT systems. The above study can be replicated and addressed to non-IT students to investigate their willingness in accepting IoT technologies.
References 1. P. Anand, Towards evolution of M2M into Internet of Things for analytics, in IEEE Recent Advances in Intelligent Computational Systems (RAICS) (IEEE, 2015), pp. 388–393 2. M.B. Abbasy, E.V. Quesada, Predictable influence of IoT (Internet of Things) in the higher education. Int. J. Inf. Educ. Technol. 7(12), 914–920 (2017) 3. S. Smys, W. Haoxiang, A. Basar, 5G network simulation in Smart Cities using neural network algorithm. J. Artif. Intell. 3(1), 43–52 (2021) 4. P. Brous, M. Janssen, P. Herder, The dual effects of the Internet of Things (IoT): a systematic review of the benefits and risks of IoT adoption by organizations. Int. J. Inf. Manage. 51, Article 101952 (2020) 5. J.H. Nord, A. Koohang, J. Paliszkiewicz, The Internet of Things: review and theoretical framework. Expert Syst. Appl. 133, 97–108 (2019) 6. H. El Mrabet, M. Abdelaziz, l’ınternet des objets et les tice, Vers une école intelligente. Faculté de Sciences. UMP (2017) 7. C. Yin, Y. Dong, Y. Tabata, H. Ogata, Recommendation of helpers based on personal connections in mobile learning, in IEEE Seventh International Conference on Wireless, Mobile and Ubiquitous Technology in Education (2012), pp. 137–141 8. L. de la Torre, J.P. Sánchez, S. Dormido, What remote labs can do for you. Physics Today 69, 48–53 (2016) 9. S.P. Mathews, D.R. Gondkar, Solution ıntegration approach using IoT in education system. Int. J. Comput. Trends Technol. (IJCTT) 45, 45–49 (2017) 10. J. Gómez, J.F. Huete, O. Hoyos, L. Perez, D. Grigori, Interaction system based on internet of things as support for education. Proc. Comput. Sci. 21, 132–139 (2013) 11. S. Meacham, A. Stefanidis, G. Laurence, K. Phalp, Internet of Things for Education: Facilitating Personalised Education from a University’s Perspective (2018) 12. S. Hollier, S. Abou-Zahra, Internet of Things (IoT) as assistive technology: potential applications in tertiary education, in Proceedings of the Internet of Accessible Things, Lyon, France (2018), pp. 1–4 13. S. Pervez, S. ur Rehman, G. Alandjani, Role of Internet of Things (IoT) in higher education, in 4th International Conference on Advances in Education and Social Sciences (2018), pp. 792– 800 14. Y.-N.K. Chen, C.-H.R. Wen, Taiwanese university students’smartphone use and the privacy paradox. Uso del tel´efono inteligente en universitarios taiwaneses y la paradoja de laprivacidad, vol. 27 (2019), pp. 61–69 15. A. Mehmet, IoT in education: ıntegration of objects with virtual academic communities, in Á. Rocha et al. (Eds.), New Advances in Information Systems and Technologies, Advances in Intelligent Systems and Computing, vol. 444. Springer International Publishing Switzerland (2016) 16. M. Ca¸˘ta, Smart university, a new concept in the Internet of Things, in RoEduNet International Conference—Networking in Education and Research (RoEduNet NER) (2015), pp. 195–197
Towards the Prominent Use of Internet of Things …
53
17. J.L. Brown, How Will the Internet of Things Impact Education? Edtechmagazine. https://edt echmagazine.com/k12/article/2017/03/how-will-internet-things-impact-education 18. A. Porter, M. Sherwin, The Digital Campus The Online Future For Higher Education (2013), p. 38 19. H. Aldowah, S.U. Rehman, S. Ghazal, I.N. Umar, Internet of Things in higher education: a study on future learning. J. Phys. Conf. Ser. 892, 012017 (IOP Publishing) (2017) 20. J.I.Z. Chen, Y. Lu-Tsou, Graphene based web framework for energy efficient IoT applications. J. Inf. Technol. 3(1), 18–28 (2021)
Depression Analysis of Real Time Tweets During Covid Pandemic G. B. Gour, Vandana S. Savantanavar, Yashoda, Vijaylaxmi Gadyal, and Sushma Basavaraddi
Abstract The assessment of depression and suicidal tendencies among people due to covid-19 was less explored. The paper presents the real-time framework for the assessment of depression in covid pandemic. This approach gives a better alternate option to reduce the suicidal tendency in covid time with retweeting and other alternate real-time ways. Hence, the main objective of the present work is, to develop a real time frame-work to analyse sentiment and depression in people due to covid. The experimental investigation is carried out based on real time streamed tweets from twitter adopting lexicon and machine learning (ML) approach. Linear regression, K-nearest neighbor (KNN), Naive Bayes models are trained and tested with 1000 tweets to ascertain the accuracy for the sentiment’s distribution. Comparatively, the decision tree (98.75%) and Naive Bayes (80.33%) have shown better accuracy with the visualisation of data to draw any inferences from sentiments using word cloud.
1 Introduction Human civilization is developed to present extent because of the communication ability and being social in nature. It means we are able to express our feelings and thoughts in the form of sentiments and emotions. This really binds the relations in the context of society and helps for the survival. In the present scenario, the covid pandemic has been shutting down such normal human activities, impacting significantly on the socio-economic aspects. This has also led to the mental and other health effects of human beings across the world due to the lock down situations. The depression and fear of this pandemic made people to use the social platforms to express their feelings. The social media platforms like, Twitter, Facebook and Instagram are widely used by the public across the globe during this covid time. These social platforms allow the people to express their sentiments and emotions in real time. With the piling up of such large amount of real time data on twitter, G. B. Gour (B) · V. S. Savantanavar · Yashoda · V. Gadyal · S. Basavaraddi Department of ECE, BLDEAs V. P. Dr. P. G. Halakatti College of Engineering and Technology, Vijayapura, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_6
55
56
G. B. Gour et al.
it’s been difficult for the machines to understand and get some inference from such unstructured data. It requires the use of natural language processing (NLP) and ML methods to explore the underlying aspects. In its simplest form, the present model is designed by converting unstructured tweets into a structured form. Then analysing sentiment and emotion with aid of NLP and ML. Section 2 presents the motivation for the present work. Section 3 deals with literature survey along with main findings, Sect. 4 describes methodology adopted with implementation. The Sect. 5 discusses the results and Sect. 6 concludes the work with scope.
2 Motivation Covid has become a never-ending pandemic. Because of its highly contagious nature, the corona has created a havoc around the world. The spreading was so fast that, the doctors, infected public and engineers have not got sufficient time to procure themselves to the possible safety levels. The kind of trauma and the lock down kind of things have disconnected normal human communication to the extent often. In such enigmatic face off, the social network platforms set new dimensions to express the feelings, and thoughts. The sentiment analysis with the help of NLP and ML extracts the hidden emotions of public from the posts, tweets [1, 2]. Figure 1 shows the increased number of internet users across the globe. Owing to the 53.4% internet users in Asia, the present work restricts to India and its neighbouring countries. The motivation for the present work includes, the health infrastructure facilities available in India, the ratio of doctors to the cases owing to huge population, the lack of internet facilities, with bit more illiterates in the rural level. The blind belief also caused hindrance to effectively engage the public with government policies taken to combat the pandemic.
Fig. 1 Global Internet users in the world by March 31, 2021 (Source https://internetworldstats. com/stats.htm)
Depression Analysis of Real Time Tweets During Covid Pandemic
57
Another issue to be addressed at this stage, is the spreading of false information about covid. Many people with just, positive test report or with increased false picture of covid, lost their lives. During the first and second covid wave, many people domestically lost their lives due to shortage of beds in the hospitals, lack of other health facilities like ventilators, medicines. Moreover, many have faced depression and tend to suicide in such havoc pandemic due to the loss of jobs and other socio-economic conditions. Owing to such issues, the frame work has been experimented with help of lexical and ML approach using real-time streaming of tweets from Twitter. The NLP plays a main role in the present context in extracting sentiments and emotions. It is a significant part of artificial intelligence (AI).
3 Related Work This section reveals the recent research work on sentiment analysis with NLP and other techniques using Twitter, Kaggle and other data. Twitter is one of the most widely used social platform which include students, professionals, business people and common persons [3]. Such posted tweets are indeed in real-time carries some useful information rather just set of words [4]. This could be valuable in identifying the people suffering from depression, anxiety and emotional issues. This is quite useful in mitigating the bad effects caused by the corona outbreaks by continuously monitoring its spread [5]. The whole literature work is discussed by adopting both lexicon and machine learning based modeling. As the present work considers the Indian sub-continent with neighboring countries for the study, the literature survey was carried out with geographically based covid studies. Two papers have carried out the Twitter based sentiment analysis with Indian sub-continent in [6, 7]. Over 50,000 tweets (offline) were included in the study to understand the people’s reaction to Government decisions using Naive-Bayes Classifier in [6]. Whereas 410,643 India related tweets in lock down were used to characterize the public emotions during covid pandemic adopting lexicon-based modeling in [7]. In another geographical based research [8], over 12,000 UK tweets were studied to measure the public opinions in forming Govt. policies. This has adopted ensemble learning models imposing complexity due to dependency on the base models. Four language-based models were proposed for the detection of sentiments in Nepali Covid-19 tweets. The work was carried out with 4035 sentences (Twitter offline data). The Nepali sentences were translated to English, pre-processed the sentences with NLTK and applying lemmatization. Then TFIDF transformer was used to converter training data into vectors foe classification using ML methods. The ML methods have shown performance like, NB (77.5%), SVM (56.9%), LSTM (79%). The study suggests that the algorithm may be evaluated against a larger dataset for future improvements, which might result in increased accuracy [9]. The framework for topic extraction and sentiment analysis was proposed by Yin et al. [10], Avasthi et al. [11] to understand the mental status of public using millions of tweets. These studies have shown insightful findings to understand the dynamics
58
G. B. Gour et al.
of public responses during covid pandemic. The studies have adopted sentence-based sentiment analysis. These studies lack the dynamic tracing of public sentiments due to the limited time period for the collection of tweets and real time analysis of the tweets. However, there were studies conducted globally with wide time periods for the tweets (offline) using ML based approaches in [12–15]. The K-mean hierarchical Clustering was proposed in [14] to understand people’s moods, attitudes, sentiments and topic during pandemic time. The Kaggle based public dataset was used to study the sentiments of public in covid time using Logistic Regression, Naïve Bayes, Random Forest in [15]. Here, SVM was found to perform well comparatively. In view of depression analysis of the present study, authors have studied the papers related to suicidal tendencies during covid pandemic. An unsupervised signature-based sentiment Analysis was carried out on 3000 offline tweets to analyze the suicidal tendencies among people during covid-19. The required tweets were extracted using hashtags like #COVID-19, #Coronavirus, #Pandemic, #Corona2020. The meaning of a sentence or a tweet was obtained by using the sentiment analyzer, which identified the different entities talked about in that sentence using the Spacy module of Python. Rule and Signature-based Emotion Analyzer tool called Vader Lexicon (NTLK module of Python) was used for the emotion analysis [2]. As, ML based approaches are more proficient than lexicon based, the deep learning-based sentiment analysis was carried out in [16, 17]. The sentiment analysis with inherent feature extraction using Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) was proposed in [16]. These algorithms can detect the patterns of stop words automatically. These models were trained with 50,000 English movie reviews from IMDB dataset (Kaggle). Comparatively the LSTM, converts the reviews into a set of integers and LSTM test classification was produced by sigmoid layer. The performance shown by the CNN was 87.72% and that of the LSTM was 88.02% [16]. As far as real time work is concerned, around 600,000 tweets were streamed and corpora was built to achieve sentiment analysis using deep learning method in [17]. Gradient scaling of the cases was adopted to understand the trend and future prediction of covid data. The GloVe vectors from Keras module were combined with batchwise segregation of trending tweets in the CNN first layer. The model tuning contains a word embedding dimension of 400 with over 500 hidden layers. Fully connected output layers predict with accuracy of 96.67% with over hundreds of iterations. Such a work can further be enhanced by considering large corpora tweet to classify the outcome with implicit and explicit aspect levels. Even though the deep learning provides better inference automatically the authors believe that, by considering relevant keywords a large and diversified corpora improves the performance of such a system. Twitter for the disaster management was carried out to measure the emergency response, impact and recovery during covid-19. The work was focused on the classification of tweets from news channels using ML methods like, SVM, KNN, Logistic regression [18]. Multi-label text classification using capsule network algorithm was discussed in [19, 20]. The capsules are normally limited by routing algorithms and hence, the CNN outperforms comparatively with single and multi-labeled documents. Deep neural networks (DNN) based sentiment analysis was surveyed in
Depression Analysis of Real Time Tweets During Covid Pandemic
59
[21, 22] which, provides fast training of large dataset with inherent feature extraction and pre-processing and improved accuracy. It can also avoid the trade-off between bias and variance in the data and decreases the error rate. The performance and dimensionality of output can be improved by cascading capsule layers based on the categories. Whereas, the lexicon-based studies were performed with large datasets. The graphical analysis of real time tweets was carried out in [23], to assess the emotion of the public during covid pandemic. The emotion scores were calculated using the NRC sentiment dictionary, which includes basic emotions as well as positive and negative sentiments. The public’s emotion score levels were derived from Corona Tweets and visualized using graphical analysis. Another frame work for the sentiment analysis was proposed in [24] using 16,138 tweets. The tweets were collected three times per a day in three successive months to understand the impact of covid on public. The tweets were extracted by using the “keywords” “coronavirus”, “coronavirusec”, “coronavirusoutbreak”, “COVID”, “COVID19”, “COVID-19”. The IBM Watson Tone Analyzer was used for emotion analysis. This was adopted to label the dataset and the polarity of tweets was detected with manual verification by Text Blob. Such a lexicon-based modeling and visualization methods were used to assess the dynamics of emotions of public during pandemic. This study had detected the correlations between the infection and mortality rates and the emotional characteristics of the twitter users. Lexicon based sentiment analysis was performed on tweets in [3, 7, 25–27] globally. The research work in [25] conducted using 1,305,000 tweets from 11 infectious countries helped to understand the mindset of the people during covid-19 pandemic. The tweets with hashtags like #CORONAVIRUS, #CORONA, #StayHomeStaySafe, #Stay Home, #Covid_19, #CovidPandemic, #covid19, #Corona Virus, #Lockdown, #Qurantine, #Coronavirus Outbreak, #COVID was considered in this research within stipulated time. Here, the NLP and ML techniques were utilized to provide weighted feeling scores in the sentences, topics. The Syuzhet lexicon was then used for emotion analysis. This study was limited by the lack of trending hashtags with biased selection in tweets. The real time tweets analysis was performed with objective to find the correlation between the rise in outbreak and the public sentiments and emotions in [26]. Here the objective was to find the correlation between the rise in outbreak and the public sentiments/emotions. This study was limited by the bias in selection of tweets with English corpora and the perception of sentiments in non-English countries. The research developed a model in [27] to determine the dynamics of users’ sentiment for the top-k trending COVID- 19 subtopics. It also identified the most active users based on their level of engagement with those trending subjects. The study was conducted over 100 million tweets collected from 10,000 users in a predefined time. Twitter-Latent Dirichlet Allocation was used for the topic modeling to resolve the restricted charactertic of tweets. Sentiment detection was proposed with Valence Aware Dictionary and Sentiment Reasoner (VADER). The VADER is a lexicon and rule-based sentiment analysis application that correctly harmonizes to the expressed sentiments. It also detected the intensity of sentiment in sentence level text. he tweets from 12 countries (over 50,000 tweets) were studied with the objective
60
G. B. Gour et al.
to understand how the citizens of different countries are dealing with the situation. The tweets have been collected, pre-processed, and then used for text mining and sentiment analysis. NRC emotion lexicon analysis was performed with scores of sentiments in terms of positive and negative polarities. The Word cloud also used. The Syuzhet package was used for eight types of emotion analysis. NRC does not count the emotions of sarcasm and irony [28].
3.1 Main Findings with Objectives The major observations of the survey are: • Much of the work was carried out with off line twitter data with restricted time for data collection. • The majority of the work was carried out taking into account of global data. • Much of the study was focused ranging from sentiment analysis, impact of covid on public health and socio-economic aspects. • Little work was carried out to depict the mental and suicidal tendencies among the public due to covid. • As far as ML based methods are concerned, there is a need for the study with involvement of versatile data such as, audio, video along with the text. The main findings/objectives are: Owing to above survey, the present work is focusing on tweets from domestic and neighbouring countries. Much work is needed using NLP and deep learning techniques to explore the likelihood of depression levels and suicidal inclination among public due to covid. Hence, in the present work following objectives are framed as; • To stream the live data (stop words and hashtags) from the twitter and make use of it in further sentiment analysis. • Applying lexicon and machine learning models for the analysis of sentiments and emotions.
4 Methodology As the corona cases are on the rise once again in the year 2022, the doctors and experts are in the race to develop the best point of care diagnostic to manage the spreading of this pandemic. While this pandemic may create anxiety and depression. However, circulating false information over the internet makes them more anxious and distressed. The author in [7] very well addressed about the suicidal implications due to covid. Many people lost their jobs, which resulted into socio- economic disparity. This actually led to the development of depression and suicidal kind of sentiments among the people. The [1] has neatly expressed the concern of false
Depression Analysis of Real Time Tweets During Covid Pandemic
61
Fig. 2 Complete flowgraph of the present work
information in social media and its impact on the public health. In the similar path here, authors would like to put the personal experiences seen during covid in India. The misleading social information includes, covid air-borne with fast sever breathing, people afraid to touch the doors of their houses with wrong belief of spitting over gates, created religious disparity to the extent locally as some religion people got infected much. Other things include are, it attacks senior citizens easily, and not curable. But as such besides social media actual figures or things were different and even unnoticed. Such things have created an alarming situation among people, leading to mental disturbances. It becomes important to protect such people from suicidal acts and help them psychologically to come out of such depressions. To analyse their behaviour, a framework for the sentiment analysis is proposed using real time streaming of 1000 tweets from five countries. The methodology adopted includes both lexicon and ML based approaches. The complete flowgraph is as shown in the Fig. 2. The flowgraph consists of four major blocks namely; data collection and extraction, data cleaning and preprocessing, emotion and sentiment analysis, data visualization with machine learning approach.
4.1 Data Collection and Extraction In the first step the tweets related to the current pandemic is collected using Twitter search API. Tweepy is one such popular python module gives access to many Twitters API’s. Figure 3 shows the list of python libraries and modules used for the present framework.
62 Fig. 3 Python libraries used
G. B. Gour et al.
import tweepy from textblob import TextBlob from nrclex import NRCLex import pandas as pd import numpy as np import re import matplotlib.pyplot as plt plt.style.use('fivethirtyeight') import seaborn as sns
The related tweets are obtained by choosing the trending keywords. The IDs used for the scrapping of tweets relevant to covid pandemic are @COVIDNewsByMIB, @coronaextrausa, @PMBhutan, @alliance4nep, and @nhsrcofficial.
4.2 Data Cleaning and Filtering Data filtering is a process of making a text data into a standard format which includes cleaning and organising the data. To clean the data corpus format is used. Corpus is a collection of text. Cleaning of data is the process of removing excess and unnecessary parts of data. Common data cleaning steps are removing punctuations, removing numbers, and lowering the text. These steps will make the text simpler to read from computer. To clean the data regular expression is used. It’s python library which will find a way to search for a pattern from text data. It is important to filter out such things before it’s been acted on by sentiment analyser. The retrieved tweets are unstructured in nature containing unnecessary symbols and characters like, ‘@’, ‘#’, ‘:’, as shown in the Fig. 4. The steps adopted in this phase are, removal of special characters and symbols, removal of tagged user names and web links, removal of duplicate records. It is converted to structured tweets containing only pure text by removing the punctuations, stop words as shown in Fig. 4. Figure 5, shows the cleaning and filtering performed on the scrapped data using python code.
4.3 Data Pre-processing Then data pre-processing is performed using NLP. The NLP is a part of artificial intelligence which, includes various areas like sentiment analysis, text classification, machine translation, speech recognition. The data pre-processing involves tokenization, Stemming and Lemmatization which, are applied on the structured data.
Depression Analysis of Real Time Tweets During Covid Pandemic
63
Fig. 4 a Unstructured tweets. b Structured tweets
Tokenization-It is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases, symbols and other elements called tokens. Tokens can be individual words, phrases or even whole sentences. In the process of tokenization, some characters like punctuation marks are discarded.
64 Fig. 5 Python code for cleaning data
G. B. Gour et al. def cleanTxt(text): text = re.sub('@[A-Za-z0–9]+', '', text) #Removing @mentions text = re.sub('#', '', text) # Removing '#' hash tag text = re.sub('RT[\s]+', '', text) # Removing RT text = re.sub('https?:\/\/\S+', '', text) # Removing hyperlink return text
# Clean the tweets df['Tweets'] = df['Tweets'].apply(cleanTxt) # Show the cleaned tweets df
Stemming-It is a technique used to extract the base form of the words by removing, affixes from them. It is just like cutting down the branches of a tree to its stems. For example, the stem of the words eating, eats, eaten is eat. In this way, stemming reduces the size of the index and increases retrieval accuracy. Lemmatization-Lemmatization in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word’s lemma, or dictionary form. Lemmatization considers the context and converts the word to its meaningful base form, which is called Lemma. In the present work, text blob is used which has variety of such supporting functions. As per the block-3 in the flow graph, further sentiment analysis is carried out by text blob. Here, the TF-IDF which means Term Frequency and Inverse Document Frequency, is a scoring measure widely used in information retrieval (IR) or summarization. The TF-IDF is intended to reflect how relevant a term is in a given document. It weighs a term’s frequency (TF) and its inverse document frequency (IDF). Each word or term that occurs in the text has its respective TF and IDF score [29].
4.4 Emotion and Sentiment Analysis Sentiment analysis: The main idea of doing the sentiment analysis is to understand the reaction and response of people to the pandemic and the various ways by which people can share their feelings on social media network. Text blob is a python library used to perform basic NLP operations on the textual data. In text blob the input is the corpus which is a collection of text document. It is a Python library that is built from NLTK [28, 30, 31]. Text blob gives the polarity and subjectivity of a tweet to find sentiment of tweets. Polarity will be assigned between [1, −1]. The score enables us to understand how positive or how negative the text was and also the subjectivity
Depression Analysis of Real Time Tweets During Covid Pandemic
65
score from which we will know how opinionated the person, who texted it is. The polarity varies from −1 to 1. If a text is positive the polarity will be from above 0. If the text is negative, it will be below 0. Subjectivity of tweets is based on the personal opinions, emotions and reviews which lies between 0 and 1. If the subjectivity is below 0.5 the text would be fact or more objective. If the subjectivity is above 0.5 then the text would be more opinionated. Text blob does the average of individual polarities of particular word which is extracted from the text to find the sentiment. Text blob sentiment analysis is a knowledge-based technique to find the polarity. Based on such subjectivity content, further emotion analysis could be performed. Emotion analysis: The abundant information shared on the social network is either appears to contain a known fact (0) or the opinions (1). Here the subjective is either 0 or 1. This kind of detection allows to know whether the shared data is subjective or not, means based on facts or not (real). This in turn depicts the actual emotions/thoughts of the people during covid time. NRC Lexicon is a list of English words and their associated with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy and disgust). Lexicon is a most popular unsupervised learning approach to sentiment classification or sentiment analysis. Lexicon is a set of words that will measure the emotional affect from the text. It is more suited for domainspecific application such as, the wait for table is long (restaurant review), the battery life is long (mobile review). Here the positive is battery and negative is wait. And also, it doesn’t give the sentiment analysis for sarcastic sentences like, “can the battery life be longer than 2 days?”. NRC Lexicon is a one of the popular Lexicon in unsupervised learning for emotions detection. The emotion detection is done using NRC Lexicon and the emotions of tweet may be happy, sad, calm, angry, trust, fear, anticipation, disgust and joy. With the help of emotions of tweets, the visualization of data is done using word cloud [28, 30, 31]. Visualization and machine learning based approach: Here, the basic machine learning models such as linear regression, K-nearest neighbor (KNN), Naive Bayesian, decision tree are adopted to classify the types of sentiments. These learning models are described as below [9, 32–34]. Logistic regression: It is the popularly used supervised classification algorithm. It is basically a binary classifier. The logistic regression algorithm formula β0 + β1x is treated as sigmoid or logistic function. The binary classifier formula is given by (1); P(yi = 1|X ) = 1 1 + e−(βo+β1x)
(1)
where: P(yi = 1 | X) is the probability of the ith observations target value, yi belonging to class 1. β0 and β1 are the parameters that are to be learned and e represents Euler’s number. K-nearest neighbour: It is one of the supervised learning algorithms, which predicts the correct class by finding the distance between the test data and training
66
G. B. Gour et al.
points and selecting K-number of points nearest to the test data. The working of KNN is summarised as follows; • • • •
Choosing K number of neighbors Estimating the Euclidean distance of K-neighbors Encircling the K-nearest neighbors according the estimated Euclidean distance. Counting the number of data points among K-neighbors and allocating new data points with maximum neighbor number.
K-value depicts the number of nearest neighbors and computes the distance between the test and trained label points. But, updating the distance with every iteration is expensive and hence KNN is called a lazy learning algorithm. Naive Bayesian: Machine learning algorithms benefit from Bayesian methods for extracting vital information from small data sets and handling missing data. It is a probabilistic machine learning algorithm based on the Bayes theorem with the naïve assumption of independence between every pair of features. This assumption is known as naive. Naive Bayesian classifier works as follows: If T is the training set of tweets, each with class label then each such tweet is indicated by feature vector X. With such feature sample space X, the classifier detects X with class having maximum posteriori probability conditioned on X. It means that, the X is predicted relating to binary class Ci only if P(X|Ci ) P(Ci ). The class Ci for which P(Ci |X) is maximized is called the maximum posteriori hypothesis. By Bayes’ theorem given by Eq. (2), P(Ci|X ) = P(X |Ci)P(Ci)/P(X )
(2)
Decision tree: This model refers to the classification and regression tree algorithm that is an optimized version in sklearn. The DT is a recursive classification of the explainable variable governed by the labels. In procedure, the DT begins by splitting the more information bearing variable as root. This is continued recursively by splitting each sub-sample (node) into smaller nodes with branches determined by the threshold. At the end of evaluation, the node becomes the leaf. Finally, the visualisation of the results can be done along with word cloud.
5 Results and Discussion Initially the elementary results are depicted, followed by the respective discussion. Sentiment analysis using text blob: The sentiment analysis is obtained by using text blob considering all the tweets as shown in Fig. 6. Here, the sentiment distribution is depicted country wise. More than 60 of positive tweets are found in the countries, India, USA, Bhutan (highest ~120), Pakistan. With considerable number of negative tweets with all the countries, Pakistan has got more negative tweets. This requires adequate attention to deal with depression kind of acts.
Depression Analysis of Real Time Tweets During Covid Pandemic
67
Fig. 6 Sentiment analysis—countrywise
Emotion analysis using NRC lexicon: Figure 7 shows the emotion analysis taking all countries in this work. The author believes that, the emotions such as, fear/surprise due to pandemic, anger, anticipation, sadness are accountable for the depression comparatively. These emotions are on rise in countries like, India and USA. It is clear from Fig. 7 that, the more depressed countries are Pakistan (with 30 tweets-fear, nearly 25 tweets-anger, 30 tweetsanticipation and 15 tweets-sadness), India (with tweet count of 15-fear, 20-anger, nearly 25-anticipation, 15-surprise and 30-sadness). ML performance: With 1000 tweets, 70% of tweets are kept for training and 30% tweets for the testing of MLs. Label Encoding refers to converting the text to the numeric form which is used to be read by the machine so that, machine learning algorithms will be then applied on the labels to analyse the data in a better way. Table 1 shows the
Fig. 7 Emotion analysis-country wise
68 Table 1 Model comparison with performance of MLs
G. B. Gour et al. Article
Algorithm
Accuracy (%)
[8]
Stacking model
83.50
Voting model
83.30
Bagging model
83.20
CNN
87.72
LSTM
88.02
[17]
Deep Neural netwroks
90.67
[9]
SVM
56.90
Naive Bayes
77.50
[16]
Present paper
LSTM
79.00
K-nearest neighbor
68.88
Decision tree
98.75
Naive Bayes
80.33
model comparison along with performance of the MLs used in the present work. The DT algorithm is able to classify the sentiments by 98.75% accuracy, which proves its recursive ability in recognising more information carrying variables. The Naive Bayesian algorithm is successful in sentiment classification with 80.33% with its probability aspect. The KNN proved to be 68.88% accurate in differentiation of sentiments. Visualization using word cloud: Word cloud for the each of the countries involved in the current work are as shown in Fig. 8. Discussion of the present frame work taking into account of lexical, ML approaches along with word cloud: Taking India as a special case, the corresponding word cloud (Fig. 8) depicts the words such as, “We4Vaccine”, “IndiaFightsCorona”, “Vaccine doses”, “United2FightCorona”. All these words are explaining much about the need for the proper utilization of vaccines in time, vaccine availability to all people overcoming any of the present political disputes, blind beliefs. The word, “positivity rate”, indicates about the alarming situation in India, people were fearing about the pandemic and its positivity rate. This also signifies the socio-economic imbalance due to loss of jobs, food availability. The subjectivity analysis of tweets from Indian sub-continent (Fig. 9a), clearly correlates with emotions for India as shown in Fig. 7. The 15 tweets instantaneously attributed to fear shows that, the people are panic due to side effects of vaccination, 20 tweets related with anger shows that, the people were worrying much of queue for demanding of beds, medicines, vaccines. The word like “sadness”, deliberately showing the miserable state of the public waiting for special transportation, ventilators and oxygen. Another word “disgust” shows that, even after getting such facilities, the people lost their relatives/friends, felt depression for a long period. From the sentiment analysis graph (Fig. 6), nearly 25 tweets are
Depression Analysis of Real Time Tweets During Covid Pandemic
69
a
b
c
d
e
Fig. 8 a word cloud—India. b word cloud—Bhutan c word cloud—Nepal. d word cloud—Pakistan e word cloud—USA
shown as negative instantaneously. This also correlates with above discussion of depression of Indians. Similarly, in case of Pakistan the word cloud represents the words like “positivity ratio”, “coronavirus cases”, “cases tests”, “deaths”, “Daily update”, “doses”. These words are specifying about the rapid rise covid cases increased mortality rate. These results are in line with the subjectivity analysis (Fig. 9b) which correlates with
70
G. B. Gour et al.
Fig. 9 a Subjectivity versus Polarity (India). b Subjectivity versus Polarity (Pakistan)
emotion survey. Here “fear” and “anticipation” have around 30 tweets for each, most of people have fear about their lives due to increasing death cases and positivity rate. Also, people have the expectation from the government towards controlling the rate of increasing cases, and to provide the vaccination for each citizen to avoid the risk of their lives. It is clear that 25 tweets are of the emotion stating towards “anger” because of the disgust about the facilities of the government. The 15 Tweets of the sadness states that the people were suffering from depression is on an average compared with their anticipations and angriness. This discussion is in line with around 10 negative tweets as depicted in Fig. 6. Inference regarding depression and suicidal tendencies: From the discussion of present frame work it is clear that, with real-time streaming of twitter data, the people with similar kind of depression levels can be identified. With the information of their respective sentiments and emotions the present thought process can be understood. This can further be used to notify the concerned agencies or governmental organisation towards the suicidal tendencies. The final deployment of the model can be done with real-time audio (Twitter) for the analysis of depression and suicidal tendency as shown in Fig. 10. This can further be used to notify the concerned agencies in cases of suicidal tendencies or personally helps, a person to self-monitor the depression status.
6 Conclusion and Scope The paper has presented a simple framework based on real-time approach using twitter data. Initially, the twitter data was scrapped and converted to structured format. Then further sentiment and emotion analysis is performed using Text blob/Lexicon and ML based approaches. It helps to understand the depression levels in the Indian sub-continent. The results discussed in the previous section using word cloud, sentiment and emotion analysis together gives new insights to deal with depression/suicidal tendencies. The decision tree (98.75%) and Naive Bayes (80.33%)
Depression Analysis of Real Time Tweets During Covid Pandemic
71
Fig. 10 Final deployment of the model
have shown better performance in classifying the sentiment types. There is a strong need for the augmenting of twitter data with real-time speech and other media (image/video) to explore more in detail of emotions and aiding in proper addressing of depression issues during covid as highlighted at the end of Sect. 5. In this way the paper has put effort in new direction to achieve different insights. By the application of deep learning methods with inclusion of multimedia data would explore more insights in understanding dynamics of depression levels and suicidal probabilities among people.
References 1. H. Kaur, S.U. Ahsaan, B. Alankar, V. Chang, A proposed sentiment analysis deep learning algorithm for analyzing COVID-19 tweets. ınformation systems frontiers 20, 1–3 2. S. Sparsh, S. Surbhi, Analyzing the depression and suicidal tendencies of people affected by COVID-19’s lockdown using sentiment analysis on social networking websites. J. Stat. Manage. Syst. 24(1), 115–133 (2021). https://doi.org/10.1080/09720510.2020.1833453 3. K. Unsworth, A. Townes, Transparency, participation, cooperation: a case study evaluating twitter as a social media interaction tool in the us open government initiative, in Proceedings of the 13th Annual International Conference on Digital Government Research (2012), pp. 90–96 4. C.L. Hanson, S.H. Burton, C. Giraud-Carrier, J.H. West, M.D. Barnes, B. Hansen, Tweaking and tweeting: exploring twitter for nonmedical use of a psychostimulant drug (adderall) among college students. J. Med. Internet Res. 15, e62 (2013) 5. D. Quercia, J. Ellis, L. Capra, J. Crowcroft, Tracking “gross community happiness” from tweets, in Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (2012), pp. 965–968 6. R. Khan, P. Shrivastava, A. Kapoor, A. Tiwari, A. Mittal, Social media analysis with AI: sentiment analysis techniques for the analysis of twitter covid-19 data. J. Crit. Rev. 7(9), 2761–2774 (2020) 7. S. Das, A. Dutta, Characterizing public emotions and sentiments in COVID-19 environment: a case study of India. J. Human Behav. Soc. Environ. 31(1–4), 154–67 (2021) 8. M.M. Rahman, M.N. Islam, Exploring the performance of ensemble machine learning classifiers for sentiment analysis of COVID-19 Tweets, in Sentimental analysis and deep learning.
72
9. 10.
11.
12.
13.
14.
15.
16.
17.
18.
19. 20. 21. 22. 23.
G. B. Gour et al. advances in ıntelligent systems and computing, vol. 1408, ed. by S. Shakya, V.E. Balas, S. Kamolphiwong, K. L. Du (Springer, Singapore, 2022). https://doi.org/10.1007/978-981-165157-1_30 M. Tripathi, Sentiment analysis of nepali COVID19 tweets using NB, SVM AND LSTM. J. Artif. Intell. 3(03), 151–168 (2021) H. Yin, S. Yang, J. Li, Detecting topic and sentiment dynamics due to COVID-19 pandemic using social media. In Advanced Data Mining and Applications, ADMA 2020. Lecture Notes in Computer Science, vol. 12447, ed. by X. Yang, C.D. Wang, M.S. Islam, Z. Zhang (Springer, Cham. 2020). https://doi.org/10.1007/978-3-030-65390-3_46 S. Avasthi, R. Chauhan, D.P. Acharjya, Information extraction and sentiment analysis to gain ınsight into the COVID-19 crisis, in International Conference on Innovative Computing and Communications. Advances in Intelligent Systems and Computing, vol. 1387, ed. by A. Khanna, D. Gupta, S. Bhattacharyya, A.E. Hassanien, S. Anand, A. Jaiswal (Springer, Singapore 2022). https://doi.org/10.1007/978-981-16-2594-7_28 M. Uvaneshwari, E. Gupta, M. Goyal, N. Suman, M. Geetha, Polarity detection across the globe using sentiment analysis on COVID-19-related tweets, in International Conference on Innovative Computing and Communications. Advances in Intelligent Systems and Computing, vol 1394, ed. by A. Khanna, D. Gupta, S. Bhattacharyya, A.E. Hassanien, S. Anand, A. Jaiswal (Springer, Singapore, 2022). https://doi.org/10.1007/978-981-16-3071-2_46 G. Saha, S. Roy, P. Maji, Sentiment analysis of twitter data related to COVID-19. In: Impact of AI and Data Science in Response to Coronavirus Pandemic. Algorithms for Intelligent Systems, ed. by S. Mishra, P.K. Mallick, H.K. Tripathy, G.S. Chae, B.S.P. Mishra (Springer, Singapore, 2021). https://doi.org/10.1007/978-981-16-2786-6_9 N. Kaushik, M.K. Bhatia, Twitter sentiment analysis using K-means and hierarchical clustering on COVID pandemic, in International Conference on Innovative Computing and Communications. Advances in Intelligent Systems and Computing, vol. 1387, ed. by A. Khanna, D. Gupta, S. Bhattacharyya, A.E. Hassanien, S. Anand, A. Jaiswal, (Springer, Singapore, 2022). https:// doi.org/10.1007/978-981-16-2594-7_61 Ahmad, M.H.I. Hapez, N.L. Adam, Z. Ibrahim, Performance analysis of machine learning techniques for sentiment analysis, in Advances in Visual Informatics. IVIC 2021. Lecture Notes in Computer Science, vol. 13051, ed. by H. Badioze Zaman, et al. (Springer, Cham, 2021). https://doi.org/10.1007/978-3-030-90235-3_18 U.D. Gandhi, P.M. Kumar, G.C. Babu, G. Karthick, Sentiment Analysis on twitter data by using convolutional neural network (CNN) and long short term memory (LSTM). Wireless Personal Commun. 17, 1–0 (2021) S. Das, A.K. Kolya, Predicting the pandemic: sentiment evaluation and predictive analysis from large-scale tweets on Covid-19 by deep convolutional neural network, Evol. Intell. 30, 1–22 A. Gopnarayan, S. Deshpande, Tweets analysis for disaster management: preparedness, emergency response, ımpact, and Recovery, in Innovative Data Communication Technologies and Application. ICIDCA 2019. (2020). Lecture Notes on Data Engineering and Communications Technologies, vol. 46, ed. by J. Raj, A. Bashar, S. Ramson (Springer, Cham, 2020). https://doi. org/10.1007/978-3-030-38040-3_87. J.S. Manoharan, Capsule network algorithm for performance optimization of text classification. J. Soft Comput. Parad. (JSCP) 3(01), 1–9 A. Sungheetha, R. Sharma, Transcapsule model for sentiment classification. J. Artif. Intell. 2(03), 163–169 (2020) A.P. Pandian, Performance evaluation and comparison using deep learning techniques in sentiment analysis. J. Soft Comput. Parad. (JSCP) 3(02),123–134 (2021) A. Bashar, Survey on evolving deep learning neural network architectures. J. Artif. Intell. 1(02), 73–82 (2019) A. Kalaivani, R. Vijayalakshmi, An automatic emotion analysis of real time corona tweets. In: Advanced Informatics for Computing Research. ICAICR 2020. Communications in Computer and Information Science, vol. 1393, ed. by A.K. Luhach, D.S. Jat, K.H. Bin Ghazali, X.Z. Gao, P. Lingras (Springer, Singapore, 2021). https://doi.org/10.1007/978-981-16-3660-8_34
Depression Analysis of Real Time Tweets During Covid Pandemic
73
24. S. Kaur, P. Kaul, P.M. Zadeh, Monitoring the dynamics of emotions during COVID-19 using Twitter data. Proced. Comput. Sci. 1(177), 423–430 (2020) 25. M.A. Kausar, A. Soosaimanickam, M. Nasar, Public sentiment analysis on twitter data during COVID-19 outbreak. Int. J. Adv. Comput. Sci. Appl. (IJACSA), 12(2), (2021). https://doi.org/ 10.14569/IJACSA.2021.0120252 26. R.J. Medford, S.N. Saleh, A. Sumarsono, T.M. Perl, C.U. Lehmann, An “Infodemic”: leveraging high-volume twitter data to understand early public sentiment for the coronavirus disease 2019 outbreak, open forum ınfect dis. 7(7), ofaa258. Jun 30 (2020). https://doi.org/10.1093/ ofid/ofaa258. PMID: 33117854; PMCID: PMC7337776 27. M.S. Ahmed, T.T. Aurpa, M.M. Anwar, Detecting sentiment dynamics and clusters of Twitter users for trending topics in COVID-19 pandemic, Plos one. 16(8), e0253300 (2021) 28. A.D. Dubey: Twitter sentiment analysis during COVID-19 Outbreak. Available at SSRN: https://ssrn.com/abstract=3572023 (April 9, 2020) or https://doi.org/10.2139/ssrn.3572023 29. S. Qaiser, R. Ali, Text mining: use of TF-IDF to examine the relevance of words to documents. Int. J. Comput. Appl. 181 (2018). https://doi.org/10.5120/ijca2018917395 30. A. Sadia, F. Khan, F. Bashir, An overview of lexicon-based approach for sentiment analysis, in 2018 3rd International Electrical Engineering Conference at IEP Centre (Karachi, Pakistan, 2018) 31. S. G. Bird , E. Loper, NLTK: the natural language toolkit, in Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (2004) pp 1–4. Association for Computational Linguistics 32. A.H. Alamoodi, B.B. Zaidan, A.A. Zaidan, O.S. Albahri, K.I. Mohammed, R.Q. Malik, E.M. Almahdi, M.A. Chyad, Z. Tareq, A.S. Albahri, H. Hameed, Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: a systematic review. Exp Syst Appl. 1(167), 114155 33. N.V. Babu, E. Kanaga, Sentiment analysis in social media data for depression detection using artificial ıntelligence: a review. SN Comput. Sci. 3(1), 1–20 34. J. Han, M. Kamber, Data Mining: Concepts and Techniques (Elsevier, 2006). ISBN 1558609016.
Diabetic Retinopathy Detection Using Deep Learning Models S. Kanakaprabha, D. Radha, and S. Santhanalakshmi
Abstract Diabetic Retinopathy is a diabetes disablement that affects eyes. According to the WHO information, by the year 2020/21, more than one billion people will experience visual impairment or blindness, and nearly one billion people with diabetes. Computer-aided diagnosis (CAD) tools have the potential support for the ophthalmologist to diagnose sight-threatening diseases such as cataract, glaucoma, diabetic retinopathy etc. The purpose of the proposed work is to detect diabetic retinopathy, where we aimed to diagnose using clinical imaging that incorporate the use of deep learning in classifying full-scale Diabetic Retinopathy in retinal fundus images of patients with diabetes. A comparative analysis is done with various deep learning models like CNN, MobileNetv2, ResNet50, Inceptionv2, VGG-16, VGG-19, and Dense Net and the best model is proposed which is used to make predictions and attain accuracy using lesser number of images. Automatic detection with more accuracy will make screening for retinal diseases as cost effective and efficient and can prevent eye disorders in the earlier stage.
1 Introduction As per 2020/21 statistics, Diabetic Retinopathy (DR) is an insignificant problem of diabetes and mainly affected by adults [1]. New approaches for transforming diabetic eye screening around the world has a sizeable cost saving for the National Health Service (NHS). The number of humans living with diabetes in the world is over S. Kanakaprabha Department of Computer Science and Engineering, Rathinam Technical Campus, Anna University, Coimbatore, India e-mail: [email protected] D. Radha (B) · S. Santhanalakshmi Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India e-mail: [email protected] S. Santhanalakshmi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_7
75
76
S. Kanakaprabha et al.
460 million and is likely to rise over 800 million within the next 35 years. Diabetes affects the eye with the aid of detrimental blood vessels inside the retina and is referred to as diabetic retinopathy [2]. Blindness is an imaginative and prescient impairment that cannot be corrected fully with medicinal drugs or surgical operations or via glasses. One of the reasons for blindness is Diabetic Retinopathy [3]. In the early stages of Diabetic Retinopathy, the sufferer has little sign of minor vision problems and other symptoms. However, it continues to grow and damage the blood vessels in the retina, causing few fluids such as blood, proteins, and lipids to start leaking out which are called exudates. These exudates provide a few complex records within the initial level. Early signs of Diabetic Retinopathy are small bulges known as microaneurysms that can cause swelling in the blood vessels [4]. Among the symptoms of cataracts on retinal and Diabetic Macular Edema, glaucoma is actually a part of the signs and symptoms of Diabetic Retinopathy. The challenge with glaucoma is that development is self-integrated in the early stages, resulting in a step-by-step decline in imagination and spirituality. Subsequently, the diagnosis and classification of Diabetic Retinopathy in particular glaucoma are extremely serious [5]. According to 13 April 2020, in 2019, it is estimated that 1.5 million deaths were caused by diabetes. Every other 2.2 million deaths have been as a result of high blood glucose in 2012. Diabetes is the major symptom in blindness, kidney failure, lower limb amputation, stroke, etc. Approximately 500 million human beings have diabetes worldwide. And most people in low-and middle-profit nations are affected by diabetes. Both the wide variety of instances and the superiority of diabetes had been regularly increasing over the past few years [6]. 1.6 million deaths are immediately attributed to diabetes every 12 months. There’s a globally agreed target to halt the increment in diabetes and weight problems by 2025. Two major problems exist between automated grading and especially CNN. The first is to find the desired offset discrepancy (patients diagnosed positively with DR) and specification (patients positively diagnosed as having no DR). This is especially difficult in the national policy which is a problem of the fifth to No DR, mild DR, moderate DR, Severe DR, and Proliferative DR classes. In addition, overuse is a major problem in neural networks. The Oblique dataset makes the network extremely similar to the most prominent class in the database. Large data sets are often very distorted. In this database, there are less number of images from which it is be ensured that it was still able to familiar with the characteristics of these images. The overview of CNN’s approach based on an in-depth study of the problem of dividing DR into fundus imagery. This is a medical imaging work that is accompanied by an increasing number of diagnoses and as previously mentioned, the one that has been subjected to numerous studies in the past. To our knowledge, this is the first discusses the division of the five categories of DR using the CNN method. A few new methods are being introduced to upgrade CNN to our great database. The trained CNN has the potential benefit of being able to scan hundreds of images per minute, allowing it to be used in real time whenever a new image is acquired. CNN that has been trained can make a swift diagnosis and respond to the patient quickly. CNN can be taught to recognise Diabetic Retinopathy characteristics in fundus images. CNN has the potential to be
Diabetic Retinopathy Detection Using Deep Learning Models
77
a valuable resource for DR doctors in the future, as data sets improve and real-time scenarios become available.
2 Related Work The computer vision techniques can ease many of the diagnostic features of eyes. In the classification of medical images, the convolutional neural network (CNN), a basic model of deep learning in computer vision, has shown incredible achievements in terms of prediction and diagnosis [7]. One of the major applications in eye-tracking helps to discover human feelings which are part of image processing, lot of simulators to track the eye moves of trainees which introduce flight, driving, and operator room, analyse Employees in IT, BPO, Accounting, Banking, and Front Office are more stressed than others [8]. Jyotsna C and others offer a system that tracks the eye for numerous purposes using a web camera and open-source computer vision. And also describe low-cost eye gaze system that can capture the real-time scenario to detect the person eyes in the basic frame, and extracting various feature of the eye [9]. Figure 1 shows the Diabetic Retinopathy of Normal Retina and Diabetic Retina [10]. Computer vision techniques can be applied on such images to find the difference. Diabetes can be classified into two types namely Type 1 and Type 2 diabetes. Both may cause Diabetic Retinopathy. Diabetic Retinopathy (DR) is defined by damage to the retinal blood vessels or light-sensitive tissue in the back of the eye, which results in blindness and visual loss. The most important reason for Diabetic Macular Edema (DME) is DR. In these cases, the location of the retina is known as macula swelling [11]. If the complication is identified at a progressive stage, the threat of an efficient remedy and improvement is less. Therefore, early symptoms in DR detection play an important role. Deep learning is an era of active results in this health plan [12]. Computer-aided technology with a deep learning method proposes Diabetic Retinopathy images with the help of categorizing retinal fundus images into different categories DR [13]. Proliferative images produced have no effect on other category photos and enhance the segmentation results obtained by model, in addition
Fig. 1 Normal retina and diabetic retina [10]
78
S. Kanakaprabha et al.
to what was trained without practice generation [14]. Test of each patient manually consumes time as there are very few ophthalmologists in India and it is necessary to find DR in its early stage to prevent vision loss. Therefore, automate a system that can help ophthalmologists in finding DR [15]. Local performance of lesion and visual acuity are important to help physicans in understanding the seriousness of the situation and planning proper procedure for treating a patient [16]. The development of Deep Learning (DL), especially in the medical model, is more accurate and with potential consequences, as it creates the background of the default feature. Convolutional Neural Networks (CNNs) is an in-depth study method widely used to analyse medical images [17]. Karki et al suggests the way to distinguish the difficulty of DR using in-depth reading. Efficient Net members of Diabetic Retinopathy such as DR, mild, moderate, severe, or enhanced DR are combined to create tests. In the APTOS test database, the best model achieved a quadratic kappa value of 0.924377 [18]. The goal of a Convolution Neural Network Model with detailed structures, design, and implementation using fundus images of the ocular retina is to create a wise Diabetic Retinopathy diagnosis and to divide difficulties into five groups or classes. CNN is a complicated model that evaluates performance based on precision, sensitivity, and point clarity [19]. Nunes et al. latest technology Infrastructure and clinical procedures may hinder the evaluation of Diabetic Retinopathy testing programs who also introduces the EyeFundusScopeNEO, a test-based Ophthalmology program. Information system and portable fundus camera supports opportunistic and systematic screening in primary care with portable, non-invasive cameras. They are used to avoid student exposure to drugs by specialists in Ophthalmology, and that costs a fraction of an existing table fundus cameras. Preliminary studies demonstrate that the system’s ability to increase test access programs and old clinical trials are still under development [20]. Mathias et al. section is about the use of learning in a pre-trained CNN model in the newly renamed ImageNet retina database to develop multiple models effectively and differentiating Diabetic Retinopathy levels based on firmness. The feature removal is also released using appropriate image processing libraries. Such an easily accessible system helps diagnose retinopathy especially in rural areas where there are tools for diagnosing such diseases is less available [21]. Majumder et al. multi-tasked model consists of one split model and one regression model, each with its own loss function. After training the retrieval model and the differentiation model, the features extracted by these two models are integrated into a multi-component perceptron network to separate the five DR components. An improved multi-function model is used in Kaggle’s two main databases for APTOS and EyePACS [22]. The manual analysis of the said images is cumbersome and inaccurate. Therefore, various computer vision engineering techniques are used to predict the occurrence of DR and its phases automatically. However, these methods are computerized and cannot produce very inaccurate features but, fail to differentiate the different categories of DR effectively. This paper focuses on classifying different categories of DRs with very low readability parameters to accelerate training and model integration [23]. Shelar et al. use machine learning to diagnose Diabetic Retinopathy in retina fundus pictures and classify them into different stages of diabetes development, such as Normal, Moderate, and Proliferative Diabetic Retinopathy (PDR), with a focus on
Diabetic Retinopathy Detection Using Deep Learning Models
79
Two Divisions, which will aid physicians in treating patients. Transferred Neural Networks were used to categorise the photos into different stages of eye illness [24]. The Proposed method aims to detect from the crowd of people having discomfort with No DR or infected DR and if the patients are suffering from Diabetic Retinopathy symptoms and identifying the stage of DR. Due to the fact that most diabetic human beings are affected by DR, may also result in a lack of vision. The main reason is that patients need to be classified in a short time with higher accuracy. Comparing with various CNN architectures such as ResNet50, VGG16, VGG19, Mobile Net, Inceptionv3 and DenseNet for DR image classification.
3 Proposed Work The flowchart model for pre-processing and classification of images is shown in Fig. 2. In most cases, an image classification model is a built-in Convolution neural net model. The human and machine perceptions of images are completely different. Pixel values ranging from 0 to 225 can be found in any image. Finally, any picture attributes or patterns can be used to classify a machine. One of the most important aspects of every software project is the dataset.. The dataset used here is Preprocessed Diabetic Retinopathy_Messidor_EyePac. Classes are balanced by the majority class under-sample and already been cut, resized into 300 × 300, and pre-processed. This helps to accelerate training and avoid unnecessary calculation resources for pre-processing. Dataset has been divided into five class 0, 1, 2, 3, and 4 where 0 indicates no DR, class 1 to class 3 as NPDR, and class 4 as indicating DR. Class 1, 2, 3, 4 indicate the level of severity with a minimum of 1 and a maximum of 4. This database is created by medical physicians. Diabetic Retinopathy is mainly affected for the people living with diabetes symptoms in eye
Fig. 2 Pre-processing and classification of image
80
S. Kanakaprabha et al.
disease. It develops when blood sugar damages small narrow vessels in the retina. This causes various other symptoms like blurred vision and loss of vision. This progressive disease can cause irreversible vision loss, so regular eye examinations are important. The doctor can then make an early diagnosis of the condition and slow its progression. The pancreas secretes the hormone insulin, which aids cell renewal by allowing glucose to enter the cells. However, in the case of diabetes, the body does not produce enough insulin or does not utilise it efficiently. Glucose builds up in the bloodstream as a result of this. Consistently high blood sugar levels can harm various organs of the body, including the eyes.
3.1 Pre-processing Images Every dataset can be partitioned into two types such as training and testing. The accuracy of the arrangement depends on the better condition of the input model. Therefore, training image can be applied into pre-processing before considering for the inputs of the model. The pre-processing is tested to all the input images. All the images are usually represented as RGB images. Initially in pre-processing the green channel of the image is excluded. Due to the contrast in the image of that channel when comparing with other channels of red and blue, the green channel of Diabetic Retinopathy appears more pronounced in the image. The image of DR detection using training and testing, respectively. Training and Testing of DR Infected Images (a) 1440 images belonging to 5 classes, (b) 360 images belonging to 5 classes.
3.2 Feature Extraction of the Images Feature extraction is reducing the amount of information generated by extracting detailed symptoms. This is a task for achieving standard characteristics. After detecting the image, a lot of important information has been gathered from the image to identify classification as given in Fig. 2. This is the process of identifying and describing global and local properties of objects in the image. The process of obtaining key features is integrated from the given input data. The size of the data ultimately comes down in order to preserve the important information. When the effect is very large to treat, the data can be replaced with fewer tasks.
3.3 Convolution Neural Network Model Convolutional Neural Network (CNN) is a complement to the classical feedforward network, mostly used within the field of medical image analysis [25]. Every image
Diabetic Retinopathy Detection Using Deep Learning Models
81
Fig. 3 Architecture of CNN
can be extracted into various layer Convolution layer, Max- Pooling layer, Fully Connected layer and the most commonly used activation is ReLU and SoftMax. Figure 3 shows the model of CNN with 8 layers, having the input image in the first layer and 2 layers each of Convolutional and Max-Pooling, and finally, two layers of fully connected layer which classifies the image for a no DR, and infected DR image. The convolution layer with the 3 × 3 kernel size has 64 channels. The default kernel size is (3, 3).The input image size is (224, 224, 3). The remaining are Convolution plus Max-Pooling layer with size (2, 2). If the given image is identified as infected, it can be classified as Mild, Moderate, Severe non-proliferative retinopathy, and proliferative retinopathy.
3.4 Mobile Net Network The Mobile Net model is designed for use in mobile applications, and TensorFlow is the first mobile computer vision model. Mobile Net uses deeply shared specifications. This significantly reduces the number of parameters to be compared to a network of regular depth-locked coils which results in very light weight deep neural networks. Deep separable processing is performed in two different operations which are Depthwise operation and Pointwise operation. Figure 4 is showing the convolution layer and the depthwise operation of Mobile Net. In mobile networks, the samples are divided into a 3 × 3 matrix as shown in the figure. Another advantage of the mobile network is a small, low-cost, simple model, whose parameters are in accordance with the limitations for various applications. They can be built for classification, identification, implementation, and segmentation purposes.
82
S. Kanakaprabha et al.
Fig. 4 Convolution layer and depthwise operation of Mobile net [26]
3.5 Inception v3 Network Inception v3, focused primarily on burning computing power due to changes compared to the previous piece of architecture. Compared to VGG Net, the implementation of Networks (Google Net/Inception v1) proved to be very effective, both in terms of the number of parameters generated through the network, and economic costs (memory and other resources). If any changes are to be made to launch network, care must be taken to ensure that the estimated benefits are not lost. Thus, changing the design of frames for a variety of applications is a problem due to the uncertainty of network efficiency. Early v3 Models, offer many methods for network optimization with constraints on a simplified transformation model. Methods can be factorized as convolution, regularization, dimensionality reduction, and parallel computing methods.
3.6 VGG-19 Network VGG-19 is a Convolution Neural networks, in particular deep neural networks, in which data can be processed as an input form, just like a 2-d array. States are usually displayed in 2D arrays. This event is useful when working with images. It is able to manage photos that have been translated, rotated, scaled, and modified in the future. Input images are 224 × 224 RGB, VGG-based ConvNet. The pre-processing layer is used to get RGB images with pixel values ranging from 0 to 255 and is subtracted from the average image value calculated on all ImageNet training sessions in the collection. After pre-processing the images, they are passed through the material layer. Training images are processed through the stack convolution layers.
Diabetic Retinopathy Detection Using Deep Learning Models
83
3.7 ResNet50 Network ResNet50 is a special kind of neural network that was launched in 2015 by Kaimin He et al. which results in improved accuracy and performance. The intuition behind this is to add multiple layers, i.e., these layers will gradually learn the more complex features. For example, if the first layer recognizes skin edges and the second layer recognizes the textures, and the third layer can learn to detect objects. However, it has been shown that there is a maximum threshold for deepening a CNN model. ResNet50 is the gradient vanishing problem of deep neural networks, which is an alternative keyboard shortcut implemented on the gradient road. The second is that these connections can help by allowing the model to learn the identity of features, which ensures that the top layer will perform atleast like the bottom layer, and it will not be worse.
3.8 VGG16 Network VGG16 is a CNN model proposed by K. Simonyan et al. from the University of Oxford in the article “Very deep CNN model to large-scale pattern recognition networks”. The most unique thing about VGG16 is that instead of doing a large number of hyperparameters, it is concentrated in convolution layers of a 3 × 3 filter with step 1, and the MaxPooling layer of 2 × 2 filter with step 2. At the end of 2 (fully connected layers), SoftMax exits with output.
3.9 DenseNet Network DenseNet is a convolution neural network that uses connection between layers, through Dense blocks that directly connects each other layer (with corresponding feature map sizes). All previous layers ‘feature maps are utilised as input data for each layer, and all following layers’ feature maps are used as input data. The advantages in the DenseNet are, to ease the problems of vanishing gradients, increase the spread of symptoms, encourage reuse of functions, and significantly reduce the number of parameters. Each layer has direct access to the loss function gradients as well as the original signal, allowing for implicit deep observation. This will help you learn deep network architecture. In addition, it turns out that tight connections have a regularizing effect, which leads to over-matching of tasks with smaller sizes, recruitment, and training.
84
S. Kanakaprabha et al.
3.10 Classification of Images The Convolution Neural Network model and other models that use image categorization go through a deep learning process for the No DR, infected DR image using image mode analysis. Figure 5 shows very clear and no abnormalities in the No DR image. Figure 6 shows Mild NPDR images. This is the first stage of Diabetic Retinopathy, which is characterised by tiny patches of inflammation in the retina’s blood vessels. Micro aneurysms are the inflamed sites. During the macula stimulation phase, a little amount of fluid may flow into the retina. This is the central region of the retina. Figure 7 shows the Moderate NPDR image. The swollen tiny blood vessels in the retina begin to obstruct blood flow. Blood and other fluids build up in the macula as a result of this.
Fig. 5 No DR image
Fig. 6 Mild NPDR image
Fig. 7 Moderate NPDR image
Diabetic Retinopathy Detection Using Deep Learning Models
85
Figure 8 shows the Severe NPDR image. A high number of blood vessels in the retina get clogged, reducing blood flow to the area significantly. The body receives signals at this moment to expand new blood vessels in the retina. Figure 9 shows the Proliferative DR images. The condition has progressed to the point that new blood vessels are forming in the retina. Because these blood arteries are always vulnerable, fluid leakage is a possibility. This results in a variety of vision issues, including blurring, a narrow field of vision, and blindness. Table 1 shows the various DR stages and symptoms.
Fig. 8 Severe NPDR image
Fig. 9 Proliferative DR image
Table 1 DR stages and symptoms DR stages
Symptoms
No DR
There are no abnormalities
Mild NPDR
Small parts of micro-aneurysms Swelling like a balloon in the blood of the retina cell
Moderate NPDR Inflammation and deformity of blood vessels Severe NPDR
Many blood vessels are blocked, which Abnormal growth factor causes secretion
PDR
Growth factors drive the spread of new one’s blood vessels within the surface of the retina, new cells are fragile and can leak or bleeding from this can cause scar tissue retinal detachment
86
S. Kanakaprabha et al.
4 Result and Analysis Diabetic Retinopathy images in the database are categorised as No DR, Mild NPDR, Moderate NPDR, Severe NPDR and PDR images. Seven algorithms such as CNN, ResNet 50, VGG16, VGG19, Inceptionv3, Mobile Net and DenseNet are used for the classification of these 5 classes, where performances are good in the classification. The main purpose is to identify whether the people is with No DR or infected DR. The precision for the Proliferative DR stage is 0.80 for CNN, 0.84 for VGG 16 and 0.91 for MobileNetv2. Inception v3 and ResNet 50 precision is 0.20. The F1 score and Recall of the CNN model for the Proliferative DR stage is 0.61 and 0.69 respectively. The F1 score and Recall for the VGG 16 is 0.59 and 0.67 respectively. And the F1 score of Mobile Net V2 is 0.23 whereas the recall is 0.13. Then the F1 score and Recall of the Inception v3 and ResNet 50 are the same i.e., 1.00 and 0.34 for the Proliferative DR stage.
4.1 Confusion Matrix The confusion matrix is the way of estimating the correct and incorrect calculations of a record in the form of a tabular manner. The confusion matrix proves the number of estimates given by any model that include accurate or erroneous predictions. Entries in the confusion matrix are commonly called True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). Accuracy is an important factor for all the model. Accuracy can lead to incorrect results if the dataset given for classification is scaled or unbalanced. The heatmap measures the distribution of accuracy on each DR classification. The classification report can calculate the micro average and the macro average by taking the accuracy for the results of each class, and the accuracy for each class, then it is averaged. In the end, the weighted average is taking the sum of the accuracy for each class. Figure 10a shows the confusion matrix for CNN model. The X- axis indicates the Predicted label and the Y-axis indicates the true label. The value range of the confusion matrix is 0–50. The value 0–4 is indicates the blue color the severity of the disease is very less. And the value between lavender color indicates the disease is 17% Mild Non-Proliferative DR. The pink color indicates the Moderate Non-Proliferative DR prediction of the disease is 25–26. % The Pinch color indicates the 34% in Mild to Moderate Non-Proliferative DR. Value of the Severe Non-Proliferative indicates very high amount of disease is showing in 58%. The Orange color indicates the 42% in No Proliferative DR whereas the Proliferative DR is 47%. The same way Fig. 10b, c show the Confusion matrix for Mobile Net model and VGG-16 model with scaling range as 0–80%. Figure 11a shows the Confusion matrix for VGG-19 and DenseNet Model with scaling ranges from 0 to 80% and Fig. 11b shows the Confusion matrix for ResNet50
Diabetic Retinopathy Detection Using Deep Learning Models
87
Fig. 10 Confusion matrices for CNN model, mobile Netv2 Model and VGG-16 model
Fig. 11 Confusion matrix for VGG-19, dense net models and for ResNet50, Inception v3 models
88 Table 2 Comparison of different deep learning models
S. Kanakaprabha et al. Models
Training accuracy (%)
Testing accuracy (%)
CNN
98
63
MobileNetV2
78
53
ResNet50
71
48
InceptionV2
68
45
VGG16
50
47
VGG19
43
25
DenseNet
68
44
and Inception v3 Models with scaling ranges from 0 to 80%. In ResNet50 and the inception v3 model, true positive is identified very correctly. Table 2 is showing the comparison of Training and Testing accuracy of the different deep Learning models namely CNN, MobileNetv2, ResNet50, Inceptionv2, VGG16, VGG-19 and Dense Net. Training Accuracy of the image using CNN is 98% and very less in case of testing images. And the Mobile Net is with 78% accuracy for training images and the less in testing image with 53% and it’s an underfitting the model. ResNet50, VGG-16, Inceptionv2, and Dense Net training accuracy is in between the 68% to 75% respectively. Testing accuracy of the images is very less. All the model of the testing accuracy is overfitting.
5 Conclusions The proposed work aims to automate the detection of eye disorders like Diabetic Retinopathy using Deep learning models. Ophthalmic image analysis is one of the best tools to identify in the detection of Diabetic Retinopathy disease. Several stages of retinopathy are detected for corrective diagnostic measures. The different stages are No DR, Mild NPDR, Moderate NPDR, Severe NPDR, and Proliferative DR. The comparison of performance in detection is done with different models like CNN, ResNet50, InceptionV3, MobileNetV2, VGG-16 and VGG-19. CNN and MobileNetV2 are better models to detect No DR and Infected DR. In this analysis, CNN is the better model to detect No DR, Mild NPDR, Moderate NPDR, Severe NPDR, and Proliferative DR. The accuracy acquired by CNN is 98% in training and 73% in testing. Convincing predictions are seen which are nearly 100% are in Convolution Neural Network model for all classifications and 90% predictions are seen in all other models. The accuracy can be improved by having more images for training and the number of features can be reduced for the better efficiency.
Diabetic Retinopathy Detection Using Deep Learning Models
89
References 1. A vitamin A analogy may help treat diabetic retinopathy (elsevier.com) 2. New approach for treating people affected by diabetic eye disease (medicalxpress.com) 3. R. Patel, A. Chaware, Transfer learning with fine-tuned MobileNetV2 for diabetic retinopathy, in 2020 International Conference for Emerging Technology (INCET) (2020), pp. 1–4. https:// doi.org/10.1109/INCET49848.2020.9154014 4. P. M. Ebin, P. Ranjana, An approach using transfer learning to disclose diabetic retinopathy in early stage, in 2020 International Conference on Futuristic Technologies in Control Systems and Renewable Energy (ICFCR) (2020), pp 1–4. https://doi.org/10.1109/ICFCR50903.2020. 9249988 5. F. Alzami, Abdussalam, R.A. Megantara, A.Z. Fanani and Purwanto, Diabetic retinopathy grade classification based on fractal analysis and random forest, in 2019 International Seminar on Application for Technology of Information and Communication (iSemantic) (2019), pp. 272– 276. https://doi.org/10.1109/ISEMANTIC.2019.8884217 6. Diabetes (who.int) 7. G.-M. Lin, M.-J. Chen, C.-H. Yeh, et al., Transforming retinal photographs to entropy images in deep learning to improve automated detection for diabetic retinopathy. J. Ophthalmol. 2018, Article ID 2159702, 6 pages (2018) 8. D. Venugopal, Amudha, J., C. Jyotsna, Developing an application using eye tracker. in 2016 IEEE International Conference on Recent Trends in Electronics, Information Communication Technology (RTEICT) (2016) 9. C. Jyotsna, M. SaiMounica, M. Manvita, J. Amudha, Low-cost eye gaze tracker using web camera, in 3rd International Conference on Computing Methodologies and Communication [ICCMC 2019] (IEEE, Surya Engineering College, Erode, 2020), pp. 79–85 10. https://bit.ly/3rh2wWy 11. I.A. Trisha, Intensity based optic disk detection for automatic diabetic retinopathy, in 2020 International Conference for Emerging Technology (INCET) (2020), pp. 1–5. doi:https://doi. org/10.1109/INCET49848.2020.9154021. 12. G. Kalyani, B. Janakiramaiah, A. Karuna et al., Diabetic retinopathy detection and classification using capsule networks. Complex Intell. Syst. (2021). https://doi.org/10.1007/s40747-021-003 18-9 13. Y.S. Boral, S.S. Thorat, Classification of diabetic retinopathy based on hybrid neural network. 2021 5th International Conference on Computing Methodologies and Communication (ICCMC) (2021), pp. 1354–1358. https://doi.org/10.1109/ICCMC51019.2021.9418224 14. R. Balasubramanian, V. Sowmya, E.A. Gopalakrishnan, V.K. Menon, V.V. Sajith Variyar, K.P. Soman, Analysis of adversarial based augmentation for diabetic retinopathy disease grading, in 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (2020), pp. 1–5. https://doi.org/10.1109/ICCCNT49239.2020.922 5684 15. N. Chidambaram, D. Vijayan, Detection of exudates in diabetic retinopathy. 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (2018), pp. 660–664. https://doi.org/10.1109/ICACCI.2018.8554923 16. S. Praveena, R. Lavanya, Superpixel based segmentation for multilesion detection in diabetic retinopathy, in 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), (2019), pp. 314–319. https://doi.org/10.1109/ICOEI.2019.8862636 17. S. Valarmathi, R. Vijayabhanu, A survey on diabetic retinopathy disease detection and classification using deep learning techniques, in 2021 Seventh International conference on Bio Signals, Images, and Instrumentation (ICBSII) (2021), pp. 1–4. https://doi.org/10.1109/ICB SII51839.2021.9445163 18. S.S. Karki, P. Kulkarni, Diabetic retinopathy classification using a combination of EfficientNets, in 2021 International Conference on Emerging Smart Computing and Informatics (ESCI) (2021), pp. 68–72. https://doi.org/10.1109/ESCI50559.2021.9397035
90
S. Kanakaprabha et al.
19. H. Seetah, N. Singh, P. Meel, T. Dhudi, A convolutional neural network approach to diabetic retinopathy detection and its automated classification, in 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS) (2021), pp. 1000–1006. https:// doi.org/10.1109/ICACCS51430.2021.9441943 20. F. Nunes et al., A mobile tele-ophthalmology system for planned and opportunistic screening of diabetic retinopathy in primary care. IEEE Access 9, 83740–83750 (2021). https://doi.org/ 10.1109/ACCESS.2021.3085404 21. J. Mathias, S. Gadkari, M. Payapilly, A. Pansare, Categorization of diabetic retinopathy and ıdentification of characteristics to assist effective diagnosis, in 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), (2021), pp. 801–806. https://doi.org/ 10.1109/ESCI50559.2021.9396908 22. S. Majumder, N. Kehtarnavaz, Multitasking deep learning model for detection of five stages of diabetic retinopathy. IEEE Access 9, 123220–123230 (2021). https://doi.org/10.1109/ACC ESS.2021.3109240 23. Z. Khan et al., Diabetic retinopathy detection using VGG-NIN a deep learning architecture. IEEE Access 9, 61408–61416 (2021). https://doi.org/10.1109/ACCESS.2021.3074422 24. M. Shelar, S. Gaitonde, A. Senthilkumar, M. Mundra, A. Sarang, Detection of diabetic retinopathy and its classification from the fundus ımages.in 2021 International Conference on Computer Communication and Informatics (ICCCI) (2021), pp. 1–6. https://doi.org/10. 1109/ICCCI50826.2021.9402347 25. K. S., D. Radha, Analysis of COVID-19 and pneumonia detection in chest X-ray ımages using deep learning, in 2021 International Conference on Communication, Control and Information Sciences (ICCISc), (2021), pp. 1–6. https://doi.org/10.1109/ICCISc52257.2021.9484888 26. MobileNet Architecture (opengenus.org)
Study of Regional Language Translator Using Natural Language Processing P. Santhi, J. Aarthi, S. Bhavatharini, N. Guna Nandhini, and R. Snegha
Abstract There are diverse languages present all over the world and in countries too. India is also the diverse country with people following various culture and languages. Though there are many languages, the amendments, bills and the drafts made by the constitution, are scripted only by using Hindi and English. Most of the people in India know only their mother-tongue that is their regional languages. In this case, when a draft is passed in the Constitution, majority of the people will not understand the gist of the draft, its uses and its impacts clearly. They need a translator manually to convey the pros and cons of the drafts. Thus, in order to make the people easily understand the essence of the drafts or bills passed by government, an automatic regional language translator is required. This paper reviewed various kind of methods in order to translate one language to another language. In this proposed method in order to process the string and translate into another language we use Natural Language Processing (NLP) toolkit. Thus, this project enables people in various regions to understand the basic ideas of the amendments or drafts or bills passed by the Government summarize.
1 Introduction In recent years, there are many types of web applications or the offline applications used to translate one language to another language by using different ML and DL Algorithms. The language to language translator hugely helps to the regional area peoples who unable understand the other languages. For Example, Central government of India publish their report or circular in languages called English and Hindi. But other state people facing trouble to understand the language. In those scenario, the Regional Language Translator plays important role to translate the script into regional languages [1–3]. P. Santhi (B) · J. Aarthi · S. Bhavatharini · N. Guna Nandhini · R. Snegha Department of Computer Science and Engineering, M. Kumarasamy College of Engineering, Thalavapalayam, Karur, Tamil Nadu 639113, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_8
91
92
P. Santhi et al.
In today’s globalized society, language translation is an essential element, which connects people from all over the world and allows them to share their own knowledge and opinion. Having a translator is vital for communication at all levels, from the United Nations General Assembly to regular travel [4–6]. Human translators are now accessible in practically every language in the world, providing translation services in a variety of fields. The process of converting a phrase or text into another term while maintaining the same meaning is known as translation. Translation has become a prominent subject of study due to the intricacies of grammar and the ease with which context meanings may be lost. Although wordfor-word translation is useful for uncomplicated communication, it is not optimal for talks that contain critical information, includes medicine, law, commerce, education, and religion. The contextual placement of a term is more successful in this industry than direct translation. Translators must be skilled and well-trained in order to convey the desired message accurately and in the appropriate tone. Despite the fact that English is widely spoken over the world, culture and regional languages continue to be important. Translation services are available to bridge communication barriers. Aside from speech translation, there are a slew of additional options [7–10]. These are some of them: Document translation, Website localization And Sign Language translation. NLP toolkit has ability to process the spoken or written type information by suing the Artificial Intelligence (AI). In recent years, we work with various kind of text data need to process and convert into another language for our applications. There are much more research done in field of NLP in order to improve the accuracy of the text data processing [11–13]. Figure 1 contains the components of the NLP. In this Project, we introduce the automated NLP process to translate the scripts in one language to other regional language [14–16].
2 Related Works In the work, Boyer Moore Hybrid (BMH) and the Knuth Morris Pratt (KMP) methods are combined to develop the model for knowledge management systems for the employee of the company. By using BMH and KMP algorithms in machine learning, the text data has been processed. This method held to identify the incomplete information case and help to auto complete the information in field by using combination of BMH and KMP [17–20]. Similarly in another work, the knowledge management system has been implemented by utilizing the combination of Fuzzy and KMP algorithms. This method matching the string literals with which of the string has similarities [21–23]. Another paper, presented the search visualization with KMP algorithms. This paper helps to the research scholar to understand and implement the KMP algorithm with their own model in future works. Some of the algorithms in machine learning perform well in regional language translator, But also the need of the processing speed and accuracy are increasing day to day [24–26]. Hence, we need to go with
Study of Regional Language Translator Using …
93
Fig. 1 Graphical representation of NLP components
new model or algorithm which has higher operation speed along with the accuracy of the model. Many strategies have been used to complete natural language processing tasks for the Sinhala language. Simple Sinhala natural language challenges were employed to test the rule-based method. Similarly, rule-based techniques were used for Sinhala MT [27–29]. One paper used the Knuth-Morris-Prats algorithm, where they translated the palembang language to bahasa Indonesia language, in which they broke down the input sentence into words which will act as pattern string and then preprocessed them and translated using the algorithm [30]. Due to the complexity of language, statistical models for Sinhala were developed and tested. Statistical models were utilized For Sinhala natural language problems and those models are improved by Sinhala natural language models. It gained traction for statistical models in Sinhala [31–33]. Although deep learning does not need feature selection, it is capable of producing more accurate results than statistical approaches and rule-based models [34–36]. For MT, deep learning has been used. A deep learning model, on the other hand, requires a bigger dataset in order to learn, and this requirement allows the deep learning model to generalize [37–39].
94
P. Santhi et al.
Deep learning fails to produce superior outcomes in languages with a reduced digital data presence. As a result, MT in Sinhala has the minimal chance of producing comparable outcomes [40–43]. One another work uses machine learning and deep learning models for sentimental analysis over the text, which is tweets in their case. By analyzing this paper, we learnt about the way of analyzing the sentiments of text, which would be helpful in our work, in analyzing the text of drafts and converting them into regional languages [44–46]. In the above papers analyzed, we found that machine learning and deep learning algorithms are used in text classification, sentimental analysis over the texts and for some feature extractions. With the help of algorithms learnt on analyzing these papers, we fount that NLTK library would help in translating the languages efficiently which will be followed in the work.
3 Existing Method In this section, we presented the existing method in order to translate the scripts in one language to another. There are some free software’s available on online to translate the scripts, but which are using machine learning algorithms. In the work, neural networks for accurate classification of text, image and sound by automated feature extraction is used. The images are stored in frames and the frame changes according to input text or speech sound [47–49]. In another work, for emotion recognition, various deep learning algorithms have been applied, and to improve accuracy the eXnet library has been used. But memory and commutation are not much efficient and are considered to be the drawbacks [50–52] Currently, all the bills, amendments and drafts are passed only in Hindi and English language. Most of the people in India knows only their regional language and it would be difficult for them to understand the pros and cons present over the draft. They need a manual translator who must know the language in which draft is written as well as the regional language. The number of mediators will be very few and he/she will not be reached by all the people over the region. And it also takes some time for the mediator to translate the entire script into the regional language manually. Thus, an automatic system or model is needed to translate efficiently as well as quickly (Fig. 2).
4 Objective of the Work The main objective of the project is to innovate the regional language translator based on NLP tool kit and machine leaning algorithms in order to achieve the higher
Study of Regional Language Translator Using …
95
Fig. 2 Work flow of existing regional language translator
accuracy and higher operational speed. To translate the scripts of one language into various regional language in automatic effective manager by using NLP libraries.
5 Proposed Method In python, NLP toolkit is a package which is used to process the natural language to understand the information in it. This project aims to build a model that summarizes and translates the script in one language into another using the algorithms feeded into the model to produce the concise meaning of the script in particular regional language. In Fig. 3. Represents the basic work flow of the proposed method. A model has to be developed by feeding the NLP libraries like Python’s NLTK to summarize the script. Using extraction or abstraction summarization can be done. Through summarization, the huge script will be converted into the minimized script with no change in the actual meaning of the script. Summarized script provides the essential gist of the draft with minimal sentences. After summarization, the summarized script has to be translated into the specified regional language by using the Indic NLP library. This Indic NLP Library involves in translating each word of the script to the according word in the regional language without changing the meaning of the context.
96
P. Santhi et al.
Fig. 3 Workflow of the proposed method
6 System Configuration The proposed method is implemented by Python and it hold the process in the jupyter notebook which is available in anaconda environment. For the data storage purpose SQLlite has been used. The python has very rich libraries to perform the NLP process on the text data and analysis the text data. To perform the visualization matplotlib is used. The languages that are taken into consideration for input data are Hindi and English. The input scripts are preprocessed, summarized with the help of NLTK, and then translated into three regional languages like Tamil, Telugu and Malayalam (Fig. 4). To boost the accuracy of the model, more data set is added, multiple algorithms are applied and tuned the algorithms. Accuracy, precision, recall and F1 score metrics are used to evaluate the NLP systems. Thus the proposed system translates the drafts and amendments into the regional languages better than the Google Translator as it follows rule based translation in translating the whole phrase, whereas the proposed work translates each word in the script, summarizes them and produce the output in the expected manner with no change in the meaning of the context (Fig. 5).
Study of Regional Language Translator Using …
Fig. 4 Block diagram of the proposed method
Fig. 5 Flow diagram for Natural language translator
97
98
P. Santhi et al.
7 Conclusion The regional language translator is highly needed tool in country like India. Because, India has different culture and language. All the scripts have been published and circulated by the central government of India in language called Hindi and English. Other states which are not speaking Hindi and English are uses human translator to translate the script. In this proposed method, we introduce the NLP based system which is used to translate the scripts automatically with the help of AI. The proposed method will perform well in higher accuracy and operation speed compared to other existing method. The accuracy of the model is tested using precision, recall and F1 score and the obtained score for the parameters are equal to 1.
References 1. D. Anggreani, D.P.I. Putri, A.N. Handayani, H. Azis, Knuth Morris Pratt algorithm in enrekangindonesian language translator, in IEEE 2020 4th International Conference on Vocational Education and Training (ICOVET) (2020), pp. 144–148 2. E. Strubell, A. Ganesh, A. McCallum, Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019) 3. J. Li, X. Chen, E. Hovy, D. Jurafsky, Visualizing and understanding neural models in NLP. arXiv preprint arXiv:1506.01066 (2015) 4. S. Kushwaha, S. Bahl, A.K. Bagha, K.S. Parmar, M. Javaid, A. Haleem, R.P. Singh, Significant applications of machine learning for COVID-19 pandemic. J. Ind. Integ. Manage. 5(04), 453– 479 5. A. Dogan, D. Birant Machine learning and data mining in manufacturing. Expert Syst. Appl. 166, 114060 (2020) 6. R. Apriyadi, Knuth Morris Pratt—Boyer Moore Hybrid algorithm for knowledge management system model on competence employee in petrochemical company (2019), pp. 201–206 7. E. Ermatita, D. Budianta, Fuzzy Knuth Moris Pratt algorithm for knowledge management system model on knowledge heavy metal content in oil plants (2017), pp. 188–192 8. R. Rahim, I. Zulkarnain, H. Jaya, A review: search visualization with Knuth Morris Pratt algorithm. IOP Conf. Ser. Mater. Sci. Eng. 237(1) (2017) 9. Y. I. Samantha Thelijjagodzf, T. Ikeda, Japanese-Sinhalese “machine translation system Jaw/Sinhalese. J. Nat. Sci. Foundat. Sri Lanka, 35, 2 (2007) 10. R. Weerasinghe, A statistical machine translation approach to sinhala-tamil language translation. Towards ICT Enabled Soc. 136 (2003) 11. S. Sripirakas, A. Weerasinghe, D.L. Herath, Statistical machine translation of systems for Sinhala-Tamil, in 2010 International Conference on Advances in ICT for Emerging Regions (ICTer) (2010), pp. 62–68 12. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015) 13. M. Murugesan, S. Thilagamani, Efficient anomaly detection in surveillance videos based on multilayer perception recurrent neural network. J. Microproc. Microsys. 79(November) (2020) 14. S. Thilagamani, C. Nandhakumar, Implementing green revolution for organic plant forming using KNN-classification technique. Int. J. Adv. Sci. Technol. 29(7S), 1707–1712 (2020) 15. J. Weston, S. Chopra, A. Bordes, Memory networks arXiv preprint arXiv:1410.3916 (2014) 16. M. Hu, Y. Peng, X. Qiu, Mnemonic reader: machine comprehension with iterative aligning and Multi-hop answer pointing (2017) 17. B. Peng, Z. Lu, H. Li, K.-F. Wong, Towards neural network-based reasoning, arXiv preprint arXiv:1508.05508 (2015)
Study of Regional Language Translator Using …
99
18. K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078 (2014) 19. D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) 20. K.W. Nugaliyadde, F. Wong Sohel, H. Xie, Language modeling through long term memory network, arXiv preprint arXiv:1904.08936 21. S. Boukoros, A. Nugaliyadde, A. Marnerides, C. Vassilakis, P. Koutsakis, K.W. Wong, Modeling server workloads for campus email traffic using recurrent neural networks, in International Conference on Neural Information Processing, pp. 57–66 (2017) 22. S. Thilagamani, N. Shanti, Gaussian and gabor filter approach for object segmentation. J. Comput. Inf. Sci. Eng. 14(2), 021006 (2014) 23. K. Senevirathne, N. Attanayake, A. Dhananjanie, W. Weragoda, A. Nugaliyadde, S. Thelijjagoda, Conditional random fields based named entity recognition for sinhala, in 2015 IEEE 10th International Conference on Industrial and Information Systems (ICIIS), pp. 302–307 (2015) 24. E. Cambria, B. White, Jumping NLP curves: a review of natural language processing research [review article]. IEEE Comput. Intell. Mag. 9, 48–57 (2014) 25. C.D. Manning, C.D. Manning, H. Schütze, Foundations of Statistical Natural Language Processing. MIT press (1999) 26. J. Jayakody, T. Gamlath, W. Lasantha, K. Premachandra, A. Nugaliyadde, Y. Mallawarachchi, “Mahoshadha”, the Sinhala tagged corpus based question answering system, in Proceedings of First International Conference on Information and Communication Technology for Intelligent Systems, vol. 1 (2016), pp. 313–322 27. P. Perumal, S. Suba, An analysis of a secure communication for healthcare system using wearable devices based on elliptic curve cryptography. J. World Rev. Sci. Technol. Sustain. Devel. 18(1), 51–58 (2022) 28. P. Antony, Machine translation approaches and survey for Indian languages. Int. J. Comput. Linguist. Chinese Lang. Proc. 18(1) (2013) 29. B. Hettige A. Karunananda, Developing lexicon databases for English to Sinhala machine translation, in 2007 International Conference on Industrial and Information Systems (2007), pp. 215–220 30. H.L.H.S. Warnars, J. Aurellia, K. Saputra, Translation learning tool for local language to Bahasa Indonesia using Knuth-Morris-Pratt algorithm. TEM J. 10(1), 55–62 (2021) 31. P. Pandiaraja, S. Sharmila, Optimal routing path for heterogenous vehicular adhoc network. Int. J. Adv. Sci. Technol. 29(7), 1762–1771 (2020) 32. K. Deepa, S. Thilagamani, Segmentation techniques for overlapped latent fingerprint matching. Int. J. Innovat. Technol. Explor. Eng. 8(12), 1849–1852 (2019) 33. H.W. Herwanto, A.N. Handayani, K.L. Chandrika, A.P. Wibawa, Zoning feature extraction for handwritten javanese character recognition, in ICEEIE 2019—International Conference Electrical Electronic Information Engineering. Emergency Innovation Technology Sustainability Future (2019), pp. 264–268 34. H. Azis, R.D. Mallongi, D. Lantara, Y. Salim, Comparison of Floyd-Warshall algorithm and greedy algorithm in determining the shortest route, in Proceediing of 2nd East Indonesia Conference on Computer and Information Technology. Internet Things Ind. EIConCIT 2018 (2018), pp. 294–298 35. S.A. Sini, Enhanced Pattern Matching Algorithms for Searching Arabic Text Based on Multithreading Technology (2019), pp. 0–6 36. P. Pandiaraja, K. Aravinthan, N.R. Lakshmi, K.S. Kaaviya, K. Madumithra, Efficient cloud storage using data partition and time based access control with secure AES encryption technique. Int. J. Adv. Sci. Technol. 29(7), 1698–1706 (2020) 37. R. Apriyadi, Knuth Morris Pratt-Boyer Moore Hybrid Algorithm for Knowledge Management System Model on Competence Employee in Petrochemical Company (2019), pp. 201–206
100
P. Santhi et al.
38. E. Ermatita D. Budianta, Fuzzy Knuth Moris Pratt Algorithm for Knowledge Management System Model on Knowledge Heavy Metal Content in Oil Plants (2017), pp. 188–192 39. R. Akhtar, Parallelization of KMP string matching algorithm on different SIMD architecture: multi-core and GPGPU’s. Int. J. Comput. Appl. 49, 0975–8887 (2012) 40. P. Rajesh Kanna, P. Santhi, Unified deep learning approach for efficient intrusion detection system using integrated spatial–temporal features. Know.-Based Syst. 226 (2021) 41. G. Duan, Sh. Weichang, C. Jiao, Q. Lin, The implementation of KMP algorithm based on MPI + OpenMP, 9th International Conference on Fuzzy Systems and Knowledge Discovery, vol. 10, (2012), pp. 978-1-4673-0024-7 42. P. Santhi, G. Mahalakshmi, Classification of magnetic resonance images using eight directions gray level co-occurrence matrix (8dglcm) based feature extraction. Int. J. Eng. Adv. Technol. 8(4), 839–846 (2019) 43. M. Nazli, Effective and efficient parallelization of string matching algorithms using GPGPU accelerators (2020) 44. P. Rajesh Kanna, P. Santhi, Hybrid intrusion detection using map reduce based black widow optimized convolutional long short-term memory neural networks. Expert Syst. Appl. 194, 15 (2022) 45. X. Lu, The analysis of KMP algorithm and it optimization. J. Phys.: Conf. Ser. 1345(4), 042005 (2019) 46. D. Budianta, Fuzzy Knuth Moris Pratt algorithm for knowledge management system model on knowledge heavy metal content in oil plants, in 2017 International Conference on Electrical Engineering and Computer Science (ICECOS) (IEEE, 2017), pp. 188–192 47. Deepa, K., Kokila, M., Nandhini, A., Pavethra, A., Umadevi, M., Rainfall prediction using CNN. Int. J. Adv. Sci. Technol. 29(7 Special Issue), 1623–1627 (2020) 48. A. Cinti, F.M. Bianchi, A. Martino, A. Rizzi, A novel algorithm for online inexact string matching and its FPGA implementation. Cogn. Comput. 12(2), 369–387 (2020) 49. V.K.P. Kalubandi, M. Varalakshmi, Accelerated spam filtering with enhanced KMP algorithm on GPU, in 2017 National Conference on Parallel Computing Technologies (PARCOMPTECH) (IEEE, 2020), pp. 1–7 50. K. Kottursamy, A review on finding efficient approach to detect customer emotion analysis using deep learning analysis. J. Trends Comput. Sci. Smart Technol. 3, 95–113 (2021). https:// doi.org/10.36548/jtcsst.2021.2.003 51. M. Tripathi, Sentiment analysis of nepali COVID19 tweets using NB, SVM AND LSTM. J. Artif. Intell. Capsule Netw. 3, 151–168 (2021). https://doi.org/10.36548/jaicn.2021.3.001 52. A. Bashar, Survey on evolving deep learning neural network architectures. J. Artific. Intell. Caps. Netw. 73–82 2019. https://doi.org/10.36548/jaicn.2019.2.003
Fabric Defect Detection Using Deep Learning Techniques K. Gopalakrishnan and P. T. Vanathi
1 Introduction The fabric industry is one of the major backers to the economy of our nation. With a huge population in our country, there will always be a never ending demand for the requirement of clothes, and other fabric items. Defects that occur during fabric manufacturing process will lead to reduced profit and time wastage. Some of the defective in cloths are Horizontal Lines, Shade Variation, Dirt / Stains, Uneven Dyeing/Printing/Dye Marks, Drop Stitches, Misprinting, Off Printing or Absence of Printing, Holes etc. Hence a fabric industry needs a system to detect the defective clothes that arrives from the fabric weaving machines. Manual defect detection like handpicking has been done by human intervention. This process seems to be very efficient since it involves manpower. But minor flaws would always go unnoticed by human error. Therefore, with the inception of computers, many researchers apply the machine learning techniques in their research. Mostly used techniques are SVM (Support Vector Machines) Neural Networks, GLCM feature extraction approach, Statistical and spectral approach etc. The Machine learning process appears to be quite good than the manual techniques. But it has certain drawbacks like lower accuracy rates and more time consumption. Recently introduced Deep learning technique is the Artificial Intelligence (AI). Machine Learning techniques are a subset of AI. It uses neural networks to learn the unsupervised, from unstructured or unlabelled data. Deep learning has been evolved in lockstep with the digital era, resulting in a deluge of data in all formats and from all corners of the globe. CNN is a neural network which has been formed based on the principle of our human brain neural systems. The performance of CNN mainly depends on regularization of weight, bias K. Gopalakrishnan (B) ECE Department, Mepco Schlenk Engineering College, Sivakasi, India e-mail: [email protected] P. T. Vanathi ECE Department, PSG College of Technology, Coimbatore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_9
101
102
K. Gopalakrishnan and P. T. Vanathi
and learning rate during back propagation which is done by optimizers. Parameters such as weights, bias, learning rate which are used in the neural network are set by the optimizers. Recently many researchers propose various techniques to detect the defective cloths. Hanbay et al., explained clearly regarding the Image acquisition systems and the merits and demerits of various defective finding systems in his work [1]. LearningBased Local Textural Distributions for Automatic Fabric Defect Detection in Wavelet and Contour let Domain is based upon the non-motif based defect detection, was proposed by the Allili et al. [2]. Images were decomposed by using multiresolution transforms and then features were extracted from sub-bands. The number of level decomposition has been limited due to sub sampling process in multiresolution transform. To overcome the above problem, t a Monte Carlo sampling process was used. Bayes classifier was used for training and testing purposes. CNNs without Labelled Training Data detect Pixel-Wise Fabric Defects” [3] was proposed by Wang et al. The better results were obtained for textured fabrics like plain fabrics, twill fabrics, etc. But the above method required more parameters. Li et al. [4] proposed a new method called yarn dyed fabric detection. Random Drift Particle Swarm Optimization (RDPSO) algorithm was used for parameter determination which was done automatically and the number of parameters were reduced. The approach performed better with small textures but not with large textures. Recently, Deep Learning techniques have been emerged which is better than Machine Learning Techniques because there is no need of feature extraction. The key advantage of this technology is that it learns from its own input. It is also feasible to detect with high precision. Deep Learning Technique based detection of defect in fabric was proposed by Jing [5]. Defect classification in fabric, location of defect in fabric, and defect segmentation in fabric are the categories for the above work. Ouyang proposed Fabric defect segmentation using CNN implanted in an Active Layer [6]. The speed of defect detection of knitted fabrics was increased by YOLO model [7]. A combination of traditional image processing and deep learning was proposed. Defects in Patterned Fabric was found by Sequential Detection method was proposed by Wang et al. [8]. The automatic flaw detection method was affected by the picture complexity and diversity of patterned fabric. In order to avoid this, segment the image depending on the adaptive periodic distance of the fabric’s repeated pattern was used. As a result, the defective image blocks were quickly identified. Then, to detect the defect part of the image, blocks of extracted images were compared to blocks of reference images. The distance measurement and threshold segmentation were used to calculate the defect position. When compared to the other methodologies, the method had minimal processing complexity, excellent detection accuracy, and broad applicability. Furthermore, a new automatic fault location method based on adaptive image segmentation has been developed. The magnitude of the defect free image was considered as the sliding window. The test image was navigated using a sliding window. The correlation of repeating pattern blocks was measured using LBP [8]. For the rock surface image, it delivered a decent pattern recognition and segmentation result. The irregularity and periodicity of the patterned cloth differs
Fabric Defect Detection Using Deep Learning Techniques
103
from that of the rock surface, and the organizational difference between the testing picture blocks was comprehensible. After adaptive picture segmentation, the pixel level image registration method must alter the relative position between the defect free image block and the test image block, correspondingly. This conserved and achieved the registration and recognition of edge parts that are not of the same size. The organizational similarity between the defect free block and the testing repeating pattern blocks decided whether the image was flawed or not, setting a criterion for structural similarity to improve detection accuracy. In practice, the defect free image block was found directly from the inventive image. The feature matching algorithm, combined dictionary construction and feature matching. This was used to determine the fault position. Combining the GLCM and LBP, texture elements of the fabric picture were retrieved. Accurate detection of general defects in yarn-dyed fabric, like holes, knots, etc., can be achieved by the combined process of image processing and deep learning process was proposed in [7]. Even though above techniques provides good results, they are supervised learning method. Niu et al. [9] proposed to create fraudulent defect samples by employing a fake defect picture generator. Some defect-free fabric samples were required to make a faux defect false sample. Encoder and decoder were also included in a lightweight form. A pyramid convolution module was included in the encoder portion to adjust to varied sizes of fabric flaws. Grey Scale image was given as input to the model. The defect segmentation result was the output of the model. Testing of network was done by using real data. Jing et al. [10] proposed utilising CNN to detect fabric defects automatically. A combination of Low-level and High-level elements was the important requirement for the above model. The picture details such as lines and dots were considered as a low-level features. To detect more important faults in the image, High Level features were covered on upper of Low-Level features. The defect detection model was an insubstantial network that combined Low level and High-level characteristics. The network was divided into three sections: (1) pyramid pooling module (2) Light Weight Low level feature taking out module (3) Feature up sampling decoder module. A multiscale encoder was introduced to prevent scaling issues between defect free and defect fabric sizes. Merging local area context statistics with global context statistics played an important function. Liwen Song, Ruizhi Li, and Shiqiang Chen suggested Fabric Defect Detection Based on Membership Degree of Regions [11]. The enhanced fabric defect identification approach was based on each fabric region’s membership degree (TPA). The detection approach more efficiently detected fabric faults while simultaneously suppressing noise and background textures [12].
1.1 Types of Fabric Defects The various types of defects in fabrics are Lines in Horizontal, Shade Variation in Shades, Dirt or Stains, Ununiform Dyeing (or) Printing (or) Dye Marks, discontinuity in Stitches, unprinting, dust Marks, Barre, Neps (or) Knots, Abrasion Marks,
104
K. Gopalakrishnan and P. T. Vanathi
Fig. 1 Architecture of the CNN
Splicing, Holes, Defective in Selvage, Snags, Thick Place (or) Thin Place, Bowing and Skewing, Linesdue to needles, Rough Pick, Rough End, damaged Pick, cracked End, absent End(or)End Out etc. The cause for the above defects may be due to problem in machine or some kind of labor negligence.
1.2 Convolutional Neural Network (CNN) A convolutional neural network (CNN, or ConvNet) is a type of deep neural network used for image recognition in deep learning. Based on the shared-weight architecture of the convolution kernels that scan the hidden layers and translation invariance properties, they are also known as Shift Invariant Or Space Invariant Artificial Neural Networks (SIANN). Figure 1 depicts the architecture of the CNN. CNN’s major tiers are: Convolutional layer, pooling layer and fully connected layer.
1.3 Working of CNN Forward propagation and backward propagation are the two most significant processes. The input picture passes from the input layer to the output layer via a hidden layer in forward propagation. The hidden layer is the one that sits between the input and output layers. From the input layer to the output layer, the image travels forward. As a result, the procedure is known as forward propagation. During this procedure, the CNN generates the output. Loss is also taken into account. The disparity between projected output and input is known as loss. Z [1] = W [1] ∗ a [0] + b[1] a [1] = g(z [1] )
(1)
Fabric Defect Detection Using Deep Learning Techniques
105
FCNN (Fully Connected Neural Network) has increased layer of the network, whereas CNN uses less parameter compared to FCNN. It is because of the convolutional layer of the network. It also speeds up the training process by the parameter sharing of the convolutional layer. INPUT SIZE: n ∗ n ∗ n c . FILTER SIZE: f ∗ f ∗ n c . PADDING: p STRIDE OF THE FILETR: S OUTPUT: [(n + 2 p − 1)/s + 1 ∗ [n + 2 p − 1)/s + 1] ∗ n c . where, nc- —number of channels in the image.
2 Proposed Method Every CNN network is constructed around the application that it works with. The proposed approach block diagram has shown in Fig. 2. The performance of the network has been investigated for various learning rates and Epochs in this proposed work. Here, the convolution layer is taken as the input layer, where input images of dimension 512 × 512 are given. The size of the kernel is 3 × 3, 32 Filters, and the value for stride = 2. The parameters used for first convolutional layer is 3X3, stride-2, filter dimension = 32 and the obtained parameters are 3 × 3*3*32 + (2*32) = 928. A pooling layer is often placed between two convolutional layers. The pooling layer minimizes the amount of parameters and computations by down sampling the representation. The maximum or average pooling functionality can be utilized. Because it is more efficiency, maximum pooling is often employed. After passing through the first convolution layer, the image dimension of 510 × 510 is given as the input to the max pooling layer of size (2,2), and the output of this layer is a reduced image of new dimension 255 × 255. The output from the first max pooling layer is fed into the second convolution layer and the resulting image size after the convolution taking place is 253 × 253. Because of the value of stride = 2.0, the image size was reduced to 256 × 256 after the first
Fig. 2 Block diagram of the proposed work
106
K. Gopalakrishnan and P. T. Vanathi
convolution process. The kernel size for the second convolution layer is 3 × 3, stride = 1, and 32 filters. [3 × 3*(1 + 1*32)]*32 + (1*32) = 9248 are the parameters derived from the second convolutional layer. Following this, more features from the input dataset are collected. The generated image from the second convolution layer is sent into this second max pooling layer to further reduce the dimension of the input dataset. As a consequence, an image with a size of 126 × 126 pixels is obtained. Since the fully linked layer cannot receive input in the form of a matrix, the image must be fed as a one-dimensional array of data. To do this, the image created by the second max pooling layer of size 126 × 126 is flattened using the flatten layer. After the image has been flattened, the resulting array is 1 × 65,028,224. This can now be fed into the forthcoming fully connected layer as input. Every neuron in one layer is linked to all neuron in another layer in fully connected layers. To categories the photos, the flattened matrix passes through a fully linked layer. Each neuron in the dense layer receives input from all neurons in the previous layer, making it a deep-connected neural network layer. The dense layer is considered to be the most often employed layer in the models. In the background, the dense layer calculate matrix–vector multiplication. The values in the matrix are parameters which values can be trained and updated using back propagation period. The dense layer produce an output of ‘m’ dimensional vector. As a result, the dense layer is largely employed to adjust the dimensions of the vector. The other operation like rotation, scaling, and translation to the vector done by thia layer. The dense layer in this study generates a 1 × 128 dimensional vector as its output. Finally, the dense’s output is sent into the sigmoid function. If the class’s output is between 0 and 1, it will appear in the output layer. The sigmoid activation function is to blame. If the class’s output is “0,” the class is considered faulty. If the class’s output result is “1,” the fabric is considered non-defective.
2.1 Sigmoid Function The sigmoid function is a type of logistic function that is commonly symbolised by the letters (x) or sig (x). It is provided by: σ (x) =
1 1 + exp(−x)
(2)
The sigmoid function is also known as a squashing function (0, 1), because its domain and its range are the set of all real numbers. As a result, the output is always between 0 and 1 if the function’s input is either a very large negative number or a very large positive number. Any number between − and + is the same. In neural networks, the sigmoid function is utilised as an activation function. The activation function is where the role of an activation function in one layer of a neural network is considered. An activation function is used to pass a weighted sum of inputs, and
Fabric Defect Detection Using Deep Learning Techniques
107
Fig. 3 A sigmoid unit in a neural network
the output is used as an input to the following layer. Figure 3 shows the sigmoid unit in a neural network.
3 Datasets A new fabric dataset called “ZJU-Leaper” was used. There are five fabric groups of photos in all. White Plain, Thick Stripe, and Thin Stripe images make up Group 1. Dot Pattern, Houndstooth, Gingham, and Knot Pattern images make up Group 2. Twill Plaid, Blue Plaid, Brown Plaid, Gray Plaid, Red Plaid type photos are found in group 3, Floral Print 1, Floral Print 2, Floral Print 3, and Pattern 1, Pattern 2, Pattern 3, Pattern 4 type images are found in group 5. It contains 98,777 images of fabrics. It has a collection of from 19 different fabric categories. Also it has detailed annotations. Out of these, 1200 Defective and 1200 non Non-defective images are selected randomly. The images are spitted into 24 batches resulting 28,800 images for each datasets. From this dataset, 50% images from each dataset is considered for training and remaining 50% images are considered for the testing purpose. Figure 4 depicts the sample dataset used to train the Defect and Non-Defect classes.
Fig. 4 Non defect fabric ımages and defective fabric ımages
108
K. Gopalakrishnan and P. T. Vanathi
4 Experimental Results and Discussion An optimizer is a function or algorithm that alters the characteristics of a neural network, such as its weights and learning rate. As a result, it aids in the reduction of overall loss while also improving precision. In this proposed work, the performance of the various optimizers in fabric images has been evaluated and compared. The network performance has been evaluated for various optimizers like ADAM, ADAMAX, ADADELTA, ADAGRAD, SGD, NADAM, FTRL, and RMS. For performance measure, the parameters like Sensitivity (Se), Specificity (Sp), Precision (Pr), Negative Predictive Value (NPV), False Positive Rate (FPR), False Discovery Rate (FDR), False Negative Rate (FNR), Accuracy (Acc), F1 Score, and Matthews Correlation Coefficient (MCC) have been used. The network parameters for the proposed work are shown in Table 1 The algorithm has been developed and executed in MATLAB version 2009. The accuracy and loss factor for various optimizers have been evaluated with the learning rate from 0.001 to 0.0001 and by changing the epochs. The optimizers provide an increasing accuracy of 97.40% with the learning rate of 0.0001 and 100 epochs and the results are shown in Fig. 5. The comparison results of CNN with various optimizers are shown in Table 2 The experimental results show that ADAM (Adaptive Moment Estimation) optimizer provides improved accuracy than other optimizers. All of this is done in ADAM with first and second order momentums. The instinct behind the Adam is that it is not required to roll too fast merely to jump over the minimum; instead, it is needed to slow down a little to allow for a more attentive search. ADAM conserves an exponentially decaying mean of previous gradients ‘m’ in addition to an exponentially decaying means of previous squared gradients like AdaDelta (t). The above Optimizer provides better results if the datasets have large number of images. Table 3 shows the performance output of CNN with ADAM optimizers for ZJU-Leaper Table 1 Parameters fort he network
Layer (type)
1. Output shape
Parameter#
conv2d_4(conv2D)
(none, 510,510,32)
928
max_pooling2D_4 (maxpooling)
(none, 255,255,32)
0
conv2d_5(conv2D)
(none, 253,253,32)
9248
max_pooling2D_5 (maxpooling2)
(none, 126,126,32)
0
flatten_2(flatten)
(none, 508,032)
0
dense_4(dense)
(none, 128)
65,028,224
dense_4(dense)
(none, 1)
129
Total Parameters: 65,038,497; Trainable Parameters:65,038,497; Non Trainable Parameters: 0
Fabric Defect Detection Using Deep Learning Techniques
2.
109
ADAM OPTIMIZER
ADADELTA OPTIMIZER
SGD OPTIMIZER
FTRL OPTIMIZER
Fig. 5 Output for various optimizers
datasets. Figure 6 shows the confusion matrix CNN with ADAM optimizers for ZJU-Leaper datasets results. The formula used for determining the evaluation parameters are given in Eqs. 3–11 sensitivit y (or ) T r ue positive rate =
TP T P + FN
(3)
Speci f icit y (or ) T r ue negative rate =
TN T N + FP
(4)
110
K. Gopalakrishnan and P. T. Vanathi
ADAMAX OPTIMIZER
ADAGRAD OPTIMIZER
NADAM OPTIMIZER
RMS PROP
Fig. 5 (continued)
Pr ecision (or ) positive pr edictive value = N egative pr edictive value = False positive Rate = Accuracy =
TP T P + FP
TN T N + FN
FP FP + T N
TP +TN T P + T N + FP + FN
(5) (6) (7) (8)
Fabric Defect Detection Using Deep Learning Techniques Table 2 Comparison results for various optimizers in terms of accuracy and loss
Table 3 Performance results for ADAM optimizers
Optimizers
111
Accuracy in %
Loss factor
FTRL
50.00
0.6800
ADAGRAD
72.32
0.6100
ADADELTA
78.20
0.6100
SGD
78.78
0.6815
ADAMAX
94.07
0.2030
NADAM
95.15
0.4100
RMSPROP
95.68
0.3010
ADAM
98.16
0.1017
Parameters
Value
Sensitivity (Se)
98.16
Specificity (Sp)
97.21
Precision (Pr)
97.20
Negative Predictive Value (NPV)
97.60
False Positive Rate (FPR)
2.79
False Discovery Rate (FDR)
2.80
False Negative Rate (FNR)
2.41
Accuracy (Acc)
97.40
F1 score
97.39
Matthews Correlation Coefficient (MCC)
94.80
Fig. 6 Confusion matrix
False Discover y Rate = F1 Scor e =
FP FP + T P
2T P 2T P + F P + F N
(T P ∗ T N )(F P ∗ F N ) Matthews Correlation Coe f f icient = √ (T P+F P)(T P + F N )(T N + F P)(T N + F N )
(9) (10) (11)
where, TP—True Positive, FN—Fazlse Negative, TN—True Negative, FP—False Positive.
112
K. Gopalakrishnan and P. T. Vanathi
5 Conclusion The suggested fabric defect detection method is capable of dealing with a wide range of fabric types. The original image is not directly used as input in the suggested approach. The original images are separated into many patches along the fabric surface’s inherent period, which serves as an operation object for deep CNN training. On the ZJU Leaper dataset, the proposed method achieves an average accuracy of 98.16%, allowing for unfailing finding of common defects in yarndyed fabric like Ringing, Thin-Bar, Scrape, Bulges, Cracked-End, Dye, and Slums. The experimental findings show that the suggested method can effectively learn faulty features by adaptively altering the parameters, compared to the classic shallow learning approaches. Furthermore, the proposed technology can increase efficiency by reducing the measuring time and obtaining more precise defect images than current fabric defect detecting systems.
References 1. K. Hanbay, M.F. Talu, Ö.F. Özgüven, Fabric defect detection systems and methods—A systematic literature review, Optik—Int. J. Light Electron Opt. (2016) 2. M.S. Allili, Wavelet-based texture retrieval was using a mixture of generalized Gaussian distributions, in Proceeding IEEE International Conference Pattern Recognition (2010), pp. 3143–3146 3. M.S. Allili, N. Baaziz, Contourlet-based texture retrieval using a mixture of generalized Gaussian distributions, in Computer Analysis of Images and Patterns (Lecture Notes in Computer Science), vol. 6855 (Springer) 4. B.Z.C. Tang A method for defect detection of yarn-dyed fabric based on frequency domain filtering and similarity measurement. Autex Res. J. 19(3) (2018) 5. Z. Wang, J. Jing, Pixel-wise fabric defect detection by CNNs without labeled training data, in IEEE Access PP(99), 1:1 (2020) 6. W. Ouyang, B. Xu, J. Hou, X. Yuan Fabric defect detection using activation layer embedded convolutional neural network. IEEE Access PP(99) (2019) 7. J. Jing, D. Zhuo, H. Zhang, Y. Liang, M. Zheng. Fabric defect detection using the improved YOLOv3 model. J. Eng. Fib. Fabr. 15(1), (2020) 8. W. Wang, N. Deng Sequential detection of ımage defects for patterned fabrics. IEEE Access 8:174751–174762 (2020) 9. G. Zhang, K. Cui, T.-Y. Hung, S. Lu, Defect-GAN: high-fidelity defect synthesis for automated defect inspection arXiv:2103.15158v1 [cs.CV] (2021) 10. J.-F. Jing, H. Ma, H.-H. Zhang, Automatic fabric defect detection using a deep convolutional neural network. Colorat. Tech. 14 (2019) 11. K.B. Franklin, R. Kumar, C. Nayak, A hollow core bragg fiber with multilayered random defect for refractive index sensing, in Inventive Communication and Computational Technologies. (Springer, Singapore, 2020), pp. 381–389 12. J.F. Jing, H. Ma, H.H. Zhang, Automatic fabric defect detection using a deep convolutional neural network. Rev. Prog. Colorat. Related Top. 135(3) (2019) 13. M. Jawahar, L. Jani Anbarasi, S. Graceline Jasmine, M. Narendra, R. Venba, V. Karthik, A Machine Learning-Based Multi-feature Extraction Method for Leather Defect Classification, vol. 173 (Springer Science and Business Media Deutschland GmbH, LNNS, 2021), pp. 189–202
Fabric Defect Detection Using Deep Learning Techniques
113
14. S. Zhao, L. Yin, J. Zhang, J. Wang, R. Zhong Real-time fabric defect detection based on multi-scale convolutional neural network, IET Inst. Eng. Tech. 2(4), 189–196 (2020) 15. C. Li, J. Li, Y. Li, L. He, X. Fu, J. Chen, Fabric defect detection in textile manufacturing: a survey of the state of the art in security and communication networks. Volume 2021 |Article ID 9948808 (2021)
Analysis of Research Paper Titles Containing Covid-19 Keyword Using Various Visualization Techniques Mangesh Bedekar and Sharmishta Desai
Abstract It’s been around two years from the outbreak of the coronavirus, thus labeled as Covid-19, and there has been an explosion of literature being published by research scholars related to work done on Covid-19. Covid-19 as a keyword has been mentioned in the titles of most of these papers. It was thought to analyse the number of papers and the titles of papers which include Covid-19 in the title of the research papers. The various combinations of other words like, prefixes, suffixes, N-gram combinations with the keyword Covid- 19 in the titles of these papers were also analysed. The research publication repositories analysed were: IEEE Explore, ACM Digital Library, Semantic Scholar, Google Scholar, Cornel University etc. The domains of research publication title analysis were restricted to computer science/computer engineering related papers. As the term labeling the corona virus outbreak as Covid-19 was labeled in 2020, the timeline of title analysis was restricted from 2019 till December 2021. The term Covid-19 is also one of the most searched terms in most of these research repositories as is evident from the search suggestions offered by them. Considering the usefulness of Bag of Words and N Gram algorithm in analytics and data visualization, a methodology is proposed and implemented based on bag of words algorithm to do prefix and suffix words analysis. This methodology is working correctly to state different prefix and suffix words used by various researchers to demonstrate significance of their titles. Methodology based on N Gram analysis is found effective to find topic on which most of the researchers have done work. Word Clouds are generated to demonstrate different buzz words used by researchers in their respective paper titles. These are useful for providing visualization of the data if it is in big size.
M. Bedekar (B) · S. Desai School of Computer Enginering and Technology, Dr. Vishwanath Karad MIT World Peace University, Kothrud, Pune, Maharashtra 411038, India e-mail: [email protected] S. Desai e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_10
115
116
M. Bedekar and S. Desai
1 Introduction Covid-19 has dramatically affected each and everyone’s life worldwide. The economic and social disruption caused by the pandemic is shocking. Tens of millions of people are at risk of falling into extreme poverty. Millions of enterprises are facing existential threat. Almost every sector is affected by the Covid Pandemic. Similarly, work done by researchers in the domain of Covid-19 and related domain has increased tremendously. It was thought to analyse the number of papers and the titles of papers which include Covid-19 in the title of the research papers. The various combinations of other words like, prefixes, suffixes, N-gram combinations with the keyword Covid19 in the titles of these papers were also analysed. As the term labelling the corona virus outbreak as Covid-19 was labelled in 2020, the timeline of title analysis was restricted from 2019 till date. This paper is intended to analyze the research done by different people in this corona year under computer domain. To analyze the research title effectively, a combination of Bag of words and N-Gram algorithm. Bag of words algorithm is used to extract keywords from text which can be used for doing analysis further. N Gram algorithm is used to find frequency of sequence of words in a text. Titles containing “Covid” or “Covid-19” or “Corona” as a keyword were collected from different major digital libraries like ACM, IEEE, Semantic Scholar, Google Scholar etc. Analysis is done to identify keywords that are mostly preferred as suffix and prefix with keyword “Covid or corona”. N-Gram analysis on titles is done to analyze on which topic most of the researchers have done work. Word Clouds are generated to demonstrate different buzz words used by researchers in their respective paper titles to highlight significance of their research.
1.1 Statistics of Papers from Each Research Repository It can be observed in the Table 1, research related to Corona started majorly from the end of year 2019 when the naming nomenclature for Corona disease was finalized, “Covid-19”. It can be observed that there are more papers on Google or Mendeley Table 1 Statistics of number of papers from each repository
Publisher
Number of papers 2018–19 2019–2020 2020–21
IEEE Computer Society of India ACM
125
585
420
58
597
300
Cornel University
0
1
1190
Semantic Scholar
36
8930
9560
Google Scholar
1880
27,600
81,700
Mendeley
517
185,873
26,553
Analysis of Research Paper Titles Containing …
117
Fig. 1 Paper publication across multiple repositories
database and on Semantic Scholar, in decreasing order. Google Scholar has the highest number of papers as it contains papers from all domains (including medical and technical). Year 2019–20 is the year when a lot of research on Corona was published. Data collected till December-2021.
1.2 Comparative Analysis of Number of Papers Across These Repositories Comparative analysis of year wise papers published under different publishers is shown in Fig. 1. It shows that papers were reflected in the databases of the repositories considered from the year 2018–19.
2 Literature Review For visualization of big size of data and for critical decision making, different techniques are useful like Bag of Words, N Gram Analysis and Word Cloud or Tag clouds. Algorithms like Bag of Words and N Gram analysis are used by authors in literature text mining and analysis. In [9] the authors have classified human actions using visual N gram algorithm. Bag of visual words is also used to improve the system. The problem of non-vocabulary conversion of human actions to words is removed by using proposed method. Graph based N gram method is followed to improve the algorithm. Word based atten-tion model which is based on pair of words designed by using N gram model in [10]. In [11] authors have proposed a method based naïve Bayes and Support Vector machine for text classification. N gram method is used
118
M. Bedekar and S. Desai
to capture order of words. Bag of n words representation is used to represent text document. The problem high dimension vector space is removed in [12]. Constant feature space is proposed based on standard alpha-bet. It has avoided the use of document vocabulary space and NLTK tools. It helped to reduce the dimension of vector space. In [13] a method is proposed to organize and reuse safety instructions using Ontology method. This method can be co-related with the method used for research papers titles analysis. Arabic Text Classification by using N gram method is illustrated in [14]. Frequency statistics are calculated using proposed method. In [15] authors have proposed a methodology for image classification using Bag of Words algorithm. Pooling and coding methods are proposed and evaluated for image features extraction. Image classification method using graph-based Bag of Words algorithm is proposed in [16]. Image representation using bag words algorithm is proposed by Li et al [17]. Authors have used Inverse Document Frequency (IDF) to optimize Bag of Words algorithm. Word Cloud are studied and extended by many authors in literature. Authors have created word cloud for smart-phone named “Pediacloud” for visualizing links between text and images [17]. Capabilities of Tag Cloud or Word Clouds [18] are elaborated in detail in [19].
3 Word Clouds/Tag Clouds Visualization of the Titles Word clouds/Tag Clouds are useful for representing textual data so that visualization will be better. It is useful for provid-ing attention to data and keywords if data is in big size. Word Clouds or Tag Cloud, are useful in decision making and for critical observation [19]. By writing scrappers in python using Beautiful Soap libraries, paper titles were successfully ex-tracted related to Covid. Extracted titles are stored in file database. Title’s pre-processing is done to remove punctuation marks, stopwords etc. Keywords are extracted by applying bag of words algorithm. Combinations of keywords are ex-tracted further by using N Gram algorithm. Python scripts were written to extract 2-, 3- or 4-g text from the extracted titles. This process is demonstrated in Fig. 2 Visual representation of different keywords used by researchers in their titles is shown by drawing word clouds [18] as given in Fig. 3a, b.
4 Prefix and Suffix Word Analysis and Visualization Bag of Words algorithm is used for text processing and extracting features from documents. For analyzing the titles of the research papers stating Covid-19, Suffix and prefix word analysis is done to understand author’s tendency of using key-words
Analysis of Research Paper Titles Containing …
119
Fig. 2 Visualization of research titles using word/tag cloud
Fig. 3 a Word cloud for cornel university repository; b Word cloud for google scholar repository
before and after “Covid”. For doing this analysis, a methodology based on Bag of Words algorithm is proposed as depicted in Fig. 4. Further analysis of frequently used prefix and suffix words with ‘Covid-19’ in different research titles is shown in Table 2. It can be observes that maximum
Fig. 4 Methodology followed for Prefix and Suffix Analysis
120 Table 2 Frequently used suffix and prefix with “Covid” keyword
M. Bedekar and S. Desai Most frequently used suffix
Most frequently used prefix
Pandemic
Impact
Epidemic
Context
Outbreak
Review
Pneumonia
Novel
treatment ˙Infection
Response
Research
Effects
Fake news
Model
Prevention
researchers have used ‘pandemic’ or ‘outbreak’ words to show significance of their titles. Some have done research on spread on fake news with respect to Covid.
5 Tree Map Visualization of the Titles Tree maps are used to visualise data using rectangles of different sizes and colours. Size of triangle signifies its contribution towards whole dataset. These results show that on which parameter of corona, maximum research work is done. To draw Tree Map, N Gram analysis is applied to extract combination of various keywords used in research titles. Tree map is drawn on Google Scholar Database shown in Fig. 5.
Fig. 5 Tree map for N gram visualization on Google Scholar
Analysis of Research Paper Titles Containing …
121
6 Comments and Discussion N gram analysis has shown that Covid-19 pandemic word is used by most of the researchers among different publication repositories. Computer Specific researchers have used words like Fake news detection or sentiment analysis or classifica-tion in their titles. These publications may be objectively considered and critically reviewed whether they as truly contribut-ing to the body of knowledge or are just following the hype of using popular keywords in the Title of the paper and thus piggybacking on the keyword Covid-19.
7 Conclusion Along with major social or economic disruption, analysis is done to find how Covid has affected research domains and the titles of research papers being communicated by researchers. In Year 2019–20, maximum papers were reflected with the title on Covid-19 under all major publishers. N-gram analysis was done as well to check which words are frequently pre-ferred by researchers in their title. Observation using Prefix and suffix analysis, suggests more than 70% researchers have used Pandemic, Outbreak prefix in their research paper titles. Visual Representation is possible using Word Clouds or Tag clouds or Tree maps. For using these visualization techniques correctly, Bag of Words and N gram algorithm can be used for required keyword extraction.
References 1. 2. 3. 4. 5. 6. 7. 8. 9.
https://scholar.google.com/ https://www.semanticscholar.org/ https://www.library.cornell.edu/arxiv https://dl.acm.org/ https://ieeexplore.ieee.org/Xplore/home.jsp https://www.mendeley.com/ https://www.wordclouds.com/ https://colab.research.google.com/ R. Hernández-Garcíaa, J. Ramos-Cózarb, N. Guilb Edel, G.-Reyesc, H. Sahli, Improving Bagof-Visual-Words model using visual n-grams for human action classification. Expert Syst. Appl. 92, 182–191 (2018) 10. I. Lopez-Gazpioa, M. Maritxalara, M. Lapatab, E. Agirre, Word n-gram attention models for sentence similarity and inference. Expert Syst. Appl. 132, 1–11 (2019) 11. B. Li, Z. Zhao, T. Liu, P. Wang, X. Du, Weighted neural bag-of-n-grams model: new baselines for text classification, in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (Osaka, Japan, 2016), pp. 1591–1600 12. F. Elghannam, Text representation and classification based on bi-gram alphabet. J. King Saud Univ. Comput. Inf. Sci. 33(2), 235–242 (2021)
122
M. Bedekar and S. Desai
13. S. Zhang, F. Boukamp, J. Teizer, Ontology-based semantic modeling of construction safety knowledge: towards automated safety planning for Job Hazard Analysis (JHA). Autom. Constr. 52, 29–41 (2015) 14. L. Khreisat, Arabic text classification using N-gram frequency statistics a comparative study, in Proceedings of the 2006 International Conference on Data Mining (2006), pp. 78–82 15. C. Wang, K. Huang, How to use bag-of-words model better for image classification. Image Vis. Comput. 38, 65–74 (2015) 16. F.B. Silva1, R. de O. Werneck Siome, Goldenstein, S. Tabbone, R. da S.Torres, Graph-based bag-of-words for classification. Pattern Recogn. 74, 266–285 (2018) 17. Q. Li, H. Zhang, J. Guo, B. Bhanu, LeAN, Improving bag-of-words scheme for scene categorization. J. China Univ. Posts Telecommun. 19(Supplement 2), 166–171 (2012) 18. B. Tessem, S. Bjørnestad, W. Chen, L. Nyre, Word cloud visualisation of locative information. J. Locat. Based Serv. 9(4), (2015) 19. J.R.C. Nurse, I. Agrafiotis, M. Goldsmith, S. Creese, K. Lamberts, Tag clouds with a twist: using tag clouds coloured by information’s trustworthiness to support situational awareness. J. Trust Manage. 2 (2015). https://doi.org/10.1186/s40493-015-0021-5
Survey on Handwritten Characters Recognition in Deep Learning M. Malini and K. S. Hemanth
Abstract Handwritten characters in a document are used in every part of life. The document is a proof of communication and need to restore digitally. Digital format of the handwritten document when scanned, the system should be able to recognize each character in the document and store. This pattern recognition in deep learning needs to be effectively performed. Using deep learning, the model adapts multiple neural networks to learn and test the enormous observations. The scope of the paper presents the survey on various techniques used to recognize handwritten characters in south regional languages, several datasets used in the process of training and testing the model is presented with the statistics report on the number of works done for two consecutive years in four regional languages in South India.
1 Introduction South Indian states Andhra Pradesh, Karnataka, Tamil Nadu, Kerala use languages Telugu, Kannada, Tamil, and Malayalam respectively in their day to day. The governments in the states use their respective languages for the governance activity. The government sectors such as hospitals, courts, schools use the regional language to communicate and store the proof of communication digitally. The identification of handwritten character is still in an infant stage. The present survey study emphasizes on identification and feature extraction of handwritten characters in various languages. This is the most challenging task in pattern recognition, as it is handwritten and the languages have similar shaped consonants, combination of consonants such as consonant-consonant (conjunct consonant), consonant–vowel which leads to complexity in classifying the extracted character. To extract and classify the handwritten character, deep learning arena is chosen. In deep learning, the patterns are M. Malini (B) · K. S. Hemanth Reva University, Bengaluru, India e-mail: [email protected] K. S. Hemanth e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_11
123
124
M. Malini and K. S. Hemanth
recognized as the way human brain identifies each pattern. There are several architectural models such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Deep Neural Network (DNN). CNN is a model that converts segmented characters into a form where in the prediction of each character becomes approximately perfect. RNN is a model that can be used to identify word which can remember previous characters. DNN analyzes the given data after training is done and decides the prominent features by its own.
2 Preface on South Regional Languages The section introduces the languages in South States in India. The South States are Andhra Pradesh, Karnataka, Kerala, Tamil Nadu and Telangana, when written alphabetically.
2.1 Kannada Language The language Kannada is mostly spoken in Karnataka and the Kannada script consists of 49 letters. There are 15 vowels and 34 consonants. (Fig. 1). Consonants take modified shapes when added with vowels. Fig. 1 Kannada vowels and consonants. Source http:// www.somyatrans.com/kan nada-translation-agency.html
Survey on Handwritten Characters Recognition in Deep Learning
125
Fig. 2 Tamil alphabets. Source Karthigaiselvi and Kathirvalavakumar [1, p. 45]
2.2 Tamil Language The Tamil language in South India is spoken in Tamil Nadu. The Tamil script has 12 vowels, 18 consonants, a special letter as shown in Fig. 2. Vowels, consonants are combined to form 216 consonant based vowels. Totally, Tamil script has 247 characters.
2.3 Telugu Language Telugu is spoken in Indian states viz., Telangana and Andhra Pradesh. Script consists of 16 vowels, 37 consonants (Figs. 3 and 4).
Fig. 3 Telugu vowels. Source Rama and Henge [2, p. 26]
126
M. Malini and K. S. Hemanth
Fig. 4 Telugu consonants. Source Rama and Henge [2, p. 26]
2.4 Malayalam Language The language usually is spoken in Kerala state in India. Malayalam script consist of 16 vowels and 37 consonants with total of 51 letters (Figs. 5 and 6).
3 Literature Survey CNN is used to reduce the complexity of parameters, update the weights and at the same time retain the learning capability of the model. It can be used to learn the handwritten character better. It can also be used in applications such as image classification, object detection, Natural Language Processing and so on (Fig. 7). Fig. 5 Malayalam vowels. Source http://malayalamstu dent.blogspot.com/2007/
Survey on Handwritten Characters Recognition in Deep Learning
127
Fig. 6 Malayalam consonants. Source http:// malayalamstudent.blogspot. com/2007/
Fig. 7 A CNN sequence to identify handwritten digits. Source Towards Data Science [3]
The process of CNN deals with huge amount of information, so that several features can be extracted at the same time. The handwritten image is fed as input to the neural network. Later, the hidden layers perform feature extraction with the use of various functional calculations according to its defined operations. Convolutional layers, pooling layers, with padding is used repeatedly so that the feature extraction of the image is extracted. At the end, the model identifies the letter from the image. Using CNN architecture, Sharma and Kaushik [4], had proposed a framework in identifying the Bengali characters using Deep Neural Networks and particle swarm optimization technique that could perform better result. Cross Entropy could be used to reduce the error rate and loss function drastically, while training the model in
128
M. Malini and K. S. Hemanth
CNN. It was implemented by Rao et al. [5] to extract features. But time taken to train the model is more. If the models are developed by GPU (Graphics Processing Unit) are given training, the entire operations could be quickly performed. In GPU, a processor with explicit memory for those operations that uses floating point for graphics is used. This decreases the number of CPU (Central Processing Unit) cycles. There is a model named AskharaNet, where Siddiqua [6] explained a GPU with a dataset named KSIC (Kannada Scene Individual Character). The model developed could be trained effectively compared to CPU model. But when testing data was done, the accuracy reduced due to uninstallation of GPU. Also claims that overlapping of letters might be another reason for reduced value for the handwritten language. For training dataset quickly another method can be implemented that uses the model which is pre-trained. Chauhan et al. [7], discuss on the method in deep learning architecture which makes use of transfer learning. It also specifies about image-augmentation strategy included in the architecture for speedy recognition of handwritten characters. The architecture designed was named (Handwritten Character Recognition) HCR-Net that was trained with different dataset with different languages, but the experiment was performed remains challenging if the handwritten scripts with different styles, fonts vary. The different methods of CNN architecture include, Supervised Layer in Deep CNN architecture, Artificial Neural Network, Deep Belief Network, Gabor feature of CNN, DenseNet, Deep Forward Neural Network, Adam method, RMSProp (Root Mean Square Propagation), Divide and Merge Mapping, Optimal Path Finder were under prominent methods discussed by Rajyagor and Rakhlia in [8]. CNND (Convolutional Neural Network with Digital Logic) architecture was familiarized in [9], that had additional feature of Digital Logic. Rajalakshmi et al. [9] also provided details on various methods in CNN with which better accuracy in identifying the handwritten characters were possible. Siddiqua et al. [10] discusses about providing training to the dataset using stochastic gradient descent with momentum and uses transfer learning neural network. The algorithm also evaluates for mini batch as well. Several CNN models were used such as Alex Net, Google Net, Inception V3, ResNet50 (Residual Neural Network), Squeeze Net, VGG-19 (Visual Geometry Group-19) based on the input size image and scaled as required. Shaffi and Hajamohideen [11], also discusses about training the CNN model for the handwritten Tamil characters with a huge dataset for Tamil Script named uTHCD (unconstrained Tamil Handwritten Character Dataset). The extensive study was made among existing datasets and the one which was newly developed. Among the study, the newly developed dataset performed better during training as well as testing data. When the architecture of CNN extracts the features, pass the features to Maxpooling and then further reduced to scaling features. During the process, many prominent features might be lost. To reduce the loss, the features are transformed to vectors. These vectors provide predicted values as well as required parameters, which further finds the orientation. In [12], Ramesh et al. helps to know about Capsule Networks that overcomes the limitations of CNN. Where a group of neurons on a particular network is termed Capsule. The activation of certain neurons activates capsule that provides output vector that preserves orientation of the entities. This helps to
Survey on Handwritten Characters Recognition in Deep Learning
129
extract letters and words as well. The model is trained with word images, where Nayak and Chandwani [13] studies on handwritten recognition using CNN, RNN with a layer named CTC (Connection Temporal Classification) with Tensorflow. The model designed provides a line recognition system that allows better training option. Ingle et al. [14] uses online system for training the model that is specific to huge variety of handwritten characters in Latin script. It describes the pipeline on data generation and integrated with complete recognition system that is, OCR model so as to yield best result. Sudana and Gunaya [15] proposes a method to identify the handwritten characters in English, by using the whole sentence form extraction than each character at a time, if a particular class of writer need to be identified. The model discussed by Priya and Haneef [16] was trained with different words. When the scattering wavelet filters in CNN architecture is included that would enhance the capacity of the model to extract the feature of the Malayalam input patterns that are handwritten. The experiment utilized multi-class classifier technique to classify the extracted data that provided the good result. An application in Android was developed by Mor et al. [17] to detect handwritten English alphanumeric characters using CNN. The developed app is a GUI that had used tensorflow and keras. The architecture used transfer learning approach and the model designed was trained on Emnist dataset and Adamax optimizers produce better result. When the user writes on the screen, the application extracts the feature and identifies the characters. To improve accuracy in the training, dropouts could be used to avoid overfitting. In [18], Kavitha and Srimathi, uses CNN model to recognize Tamil handwritten scripts. They used the dataset developed by HP Labs India for training the model. Xavier initialization for the weights was used. Further ReLU (Rectified Linear Activation Unit) optimizer and Softmax classifier were carried out for the experiment. There can be an option of hybrid models to increase the accuracy of the language. In [19], Iyswarya et al., used hybrid models that included CNN with ADAM (Adaptive Moment), CNN with Lion optimizer. Explicitly RMSProp and AdaGrad optimizers were used to find the accuracy. Adam optimizer performed well even though the data had noise. RMSProp was able to maintain the learning rate based on the previous experience on learning rate.
4 Datasets Available for South Indian Languages 4.1 Kannada Datasets KHTD [20]: It has 204 files written by 51 individuals in Kannada. Total of 4298 text lines and 26115 dataset. Chars74K dataset [21]: It has 657 classes in-turn comprises of 25 Kannada characters that are handwritten. The dataset also consists of English characters with 64 classes, storing 7705 characters in natural images, 3410 handwritten characters and 62,992 characters that are synthesized.
130
M. Malini and K. S. Hemanth
Kannada-MNIST [22]: It contains Kannada handwritten digits about 10K images, also called as ‘dig-MNIST’ dataset.
4.2 Tamil Datasets To make use of Tamil handwritten characters’ dataset [23], Isolated Handwritten Tamil Characters, hpl-tamil-iso-char, hpl-tamil-iso-char-train, hpl-tamil-iso-chartest dataset exists that can be readily available to download.
4.3 Telugu Datasets Telugu Vowel dataset [24] which is handwritten can be used to train and test by downloading in the given link. Lipi Toolkit [25] is used to download isolated handwritten Telugu character, hpltelugu-iso-char, hpl-telugu-iso-char-train, hpl-telugu-iso-char-test. IEEE Dataport [26] contains dataset that has Telugu handwritten characters such as vowels, consonants, combination of consonant-consonant, combination of consonant-vowels.
4.4 Malayalam Datasets Amrita_MalCharDb [27, 28] contains dataset for Malayalam handwritten characters that has 85 Malayalam character classes representing vowels, consonants, half-consonants, vowel modifiers, consonant modifiers and conjunct characters.
5 Statistics on the Number of Works Done on Each Regional Languages The work on finding the total number of research works was carried out by the South Indian Languages based on the publishing papers in IEEE, Elsevier and Springer journals in two years namely 2021 and 2020 was done (Tables 1 and 2; Charts 1 and 2).
Survey on Handwritten Characters Recognition in Deep Learning
131
Table 1 In 2021, the total number of works by respective languages was roughly approximated Languages
IEEE
Elsevier
Springer
Total
Kannada
9
3
2
14
Tamil
5
2
9
16
Telugu
10
2
10
22
Malayalam
7
1
9
17
Total
31
8
30
Table 2 In 2020, the total number of works in respective languages Languages
IEEE
Elsevier
Springer
Total
Kannada
9
2
9
20
Tamil
9
3
4
16
11
3
14
28
Malayalam
10
4
5
19
Total
39
12
32
Chart 1 The chart informs that Telugu language work was done more in 2021
Total No. Of Journals -
Telugu
30 20 10 0 IEEE
Elsevier
Springer
Total
Name of the Journals
Chart 2 The chart provides report that Telugu language work was done more comparatively in the year 2020
Total No. Of Journals
Kannada
Tamil
Telugu
Malayalam
30 20 10 0 IEEE
Elsevier
Springer
Total
Name of the Journals
Kannada
Tamil
Telugu
Malayalam
132
M. Malini and K. S. Hemanth
6 Conclusion The survey provided a prodigious detail about the techniques used in extracting the features of handwritten south Indian characters using Deep Learning. It also gave a link to the South regional language datasets so as to download the benchmarked datasets to train the model and test it. Approximately, the number of research works done in each regional language in South was studied carefully for 2 years on 3 journals. Accordingly, Telugu handwritten character recognition takes its leading place than the rest of the language area.
References 1. M. Karthigaiselvi, T. Kathirvalavakumar, Structural run based feature vector to classify printed Tamil characters using neural network. Int. J. Eng. Res. Appl. 7(7), Part-1, 44–63 (2017, July). ISSN: 2248-9622 2. B. Rama, S.K. Henge, OCR-the 3 layered approach for decision making state and identification of Telugu hand written and printed consonants and conjunct consonants by using advanced fuzzy logic controller. Int. J. Artif. Intell. Appl. (IJAIA) 7(3), 23–35 (2016, May). https://doi. org/10.5121/ijaia.2016.7303 3. Towards Data Science. https://towardsdatascience.com/a-comprehensive-guide-to-convoluti onal-neural-networks-the-eli5-way-3bd2b1164a5318 4. R. Sharma, B. Kaushik, Offline recognition of handwritten Indic scripts: a state-of-the-art survey and future perspectives (2020). 1574-0137/© 2020 Elsevier Inc. https://doi.org/10.1016/ j.cosrev.2020.100302 5. A.S. Rao, S. Sandhya, S. Nayak, C. Nayak, Exploring deep learning techniques for Kannada handwritten character recognition: a boon for digitization. Int. J. Adv. Sci. Technol. IJAST 29(5), 11078–11093 (2020). ISSN: 2005-4238 6. S. Siddiqua, N. Chikkaguddaiah, S.S. Manvi1, M. Aradhya, AksharaNet: a GPU accelerated modified depth-wise separable convolution for Kannada text classification. Int. Inf. Eng. Technol. Assoc. Rev. d’Intell. Artificielle 35(2), 145–152 (2021, April). http://iieta.org/journa ls/ria. https://doi.org/10.18280/ria.350206 7. V.K. Chauhan, S. Singh, A. Sharma, HCR-Net: a deep learning based script independent handwritten character recognition network (2021). arXiv:2108.06663v1 [cs.CV], 15 Aug 2021 8. B. Rajyagor, R. Rakhlia, Handwritten character recognition using deep learning. Int. J. Recent Technol. Eng. (2020) Retrieval number: F8608038620/2020©BEIESP. https://doi.org/ 10.35940/ijrte.F8608.038620 9. M. Rajalakshmi, P. Saranya, P. Shanmugavadivu, Pattern recognition—recognition of handwritten document using convolutional neural networks (2019). IEEE 978-1-5386-95432/19/$31.00 ©2019 IEEE 10. S. Siddiqua, C. Naveena, S.S. Manvi, Recognition of Kannada characters in scene images using neural networks, in Fifth International Conference on Image Information Processing (ICIIP) (2019) 11. N. Shaffi, F. Hajamohideen, uTHCD: a new benchmarking for Tamil handwritten OCR. IEEE Access 9, 101469–101493 (2021) 12. G. Ramesh, J. Manoj Balaji, G.N. Sharma, H.N. Champa, Recognition of off-line Kannada handwritten characters by deep learning using capsule network. Int. J. Eng. Adv. Technol. (IJEAT) 8(6) (2019, August). ISSN: 2249-8958 13. P. Nayak, S. Chandwani, Improved offline optical handwritten character recognition: a comprehensive review using tensor flow. Int. J. Eng. Res. Technol. (IJERT) 10(11) (2021, November). ISSN: 2278-0181
Survey on Handwritten Characters Recognition in Deep Learning
133
14. R. Reeve Ingle, Y. Fujii, T. Deselaers, J. Baccash, A.C. Popat, A scalable handwritten text recognition system, in 2019 International Conference on Document Analysis and Recognition (2019). 2379-2140/19/$31.00 ©2019 IEEE. https://doi.org/10.1109/ICDAR.2019.00013 15. O. Sudana, W. Gunaya, Handwriting identification using deep convolutional neural network method. J. Telkomnika Telecommun. Comput. Electron. Control 18(4), August 2020, 1934~1941 (2018). ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, Decree No: 21/E/KPT/2018. https://doi.org/10.12928/TELKOMNIKA.v18i4.14864 16. P. Priya, A.M. Haneef, Malayalam handwritten character recognition. Int. Res. J. Eng. Technol. (IRJET) 07(07), 4307–4313 (2020, July). e-ISSN: 2395-0056, p-ISSN: 2395-0072 17. S.S. Mor, S. Solanki, S. Gupta, S. Dhingra, M. Jain, R. Saxena, Handwritten text recognition: with deep learning and android. Int. J. Eng. Adv. Technol. (IJEAT) 8(3S) (2019, February). ISSN: 2249-8958 18. B.R. Kavitha, C. Srimathi, Benchmarking on offline handwritten Tamil character recognition using convolutional neural networks. J. King Saud Univ. Comput. Inf. Sci. (2019). https://doi. org/10.1016/j.jksuci.2019.06.004 19. R. Iyswarya, S. Deepak, P Jagathratchagan, J. Kailash, Handwritten Tamil character recognition using convolution neural network by Adam optimizer. Int. J. Adv. Res. Sci. Commun. Technol. (IJARSCT) 6(1) (2021, June). ISSN (Online) 2581-9429. https://doi.org/10.48175/IJARSCT1356 20. A. Alaei, P. Nagabhushan, U. Pal, A benchmark Kannada handwritten document dataset and its segmentation, in 2011 International Conference on Document Analysis and Recognition (2011), pp. 141–145. 1520-5363/11 $26.00 © 2011 IEEE. https://doi.org/10.1109/ICDAR.201 1.37 21. The Chars74K Dataset. http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/ 22. Kaggle. https://www.kaggle.com/c/Kannada-MNIST 23. Lipi Toolkit. http://lipitk.sourceforge.net/datasets/tamilchardata.htm 24. Kaggle. https://www.kaggle.com/syamkakarla/telugu-6-vowel-dataset 25. Lipi Toolkit. http://lipitk.sourceforge.net/datasets/teluguchardata.htm 26. IEEE Dataport. https://ieee-dataport.org/open-access/telugu-handwritten-character-dataset 27. TC-11 Reading Systems. http://tc11.cvc.uab.es/datasets/Amrita_MalCharDb_1 28. Kaggle. https://www.kaggle.com/ajayjames/malayalam-handwritten-character-dataset
A Survey on Wild Creatures Alert System to Protect Agriculture Lands Domestic Creatures and People K. Makanyadevi, M. Aarthi, P. Kavyadharsini, S. Keerthika, and M. Sabitha
Abstract Wild animal ready framework in ML is utilized to secure the property, fringe, and runway from wild creatures with enormous information study of a hazardous wild monster. Checking the exhibition of wild animals is utilized to secure the farmlands. In old ways of spotting creatures in paddy fields and residences embody crafted by regular eyes to notice the creature movement. It is not feasible for men to watch creature developments ceaselessly for the duration of the day. From where a desire of specific discovery of animals which penetrate appreciably in the rice fields and the agricultural lands of the individuals. The ways utilized for the prevalence of the creatures encapsulate division and item discovery strategies. This result requires difficulties in the development of returns and causes the ranch failure. This tosses back to ranchers destroying the land on account of incessant monstrosities by wild animals.
1 Introduction Farming is a significant stockpile of nourishment for an extraordinary number of people and a few parts of the world. Tragically, agriculturalists are reliant upon old strategies, these are grown gradually in past years. Be that as it may, the development of harvests is getting down [1–3]. Furthermore, the assortment of things gives low development of harvests with the unsettling influence of wild monster. In the beyond a couple of years, wild animals are a specific issue for agriculturalists all around the planet, animals like elephants, bison, and panthers, and so on make significant mischief to yield by wild animals continuing on top of the circle and stomping all over the top of the yield. This leads to cash drawbacks to the farmer [4–7]. Ranchers with goliath spaces of horticultural terrains notice it horrendously drawn-out to inundate K. Makanyadevi (B) · M. Aarthi · P. Kavyadharsini · S. Keerthika · M. Sabitha Department of Computer Science and Engineering, M. Kumarasamy College of Engineering, Karur, Tamil Nadu 639113, India e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_12
135
136
K. Makanyadevi et al.
their territory physically [8–10]. Crop hurt outcomes in decreasing the development of the yields. On account of the development of furrow fields into previous lifestyle climates, yield raiding transforming into one of the principal contrasts of assessment in humanity. A lot of the populace prompts deforestation and winds up in shortage of food, a spot of the safe house, and water shortage in woodland regions [11–13]. Along these lines, wild animals are going into home grown regions are expanding these days and harming human property. Wild monsters go into metropolitan regions to look for food [14–16]. Horticulture is the principal part of the ranchers and furthermore happens a significant job in the economy astute yet wild monster obstruction in ranchers land will prompt a significant misfortune in the field. Once in a while creatures kill people too. Security of people and creatures is significant however we can’t be ready to ensure the field for the entire day. Thus, these are utilized to secure the ranchers land, and person. Reconnaissance has a significant influence in endless fields be it at home, emergency clinics, theological colleges, public spots, spreads and a few spots [17, 18]. It assists us with covering a specific region and helps to track down burglary and furthermore gives validation of approval [19–21]. On account of spreads or rural grounds observation is really imperative to assist individuals with observing the phone individuals attempting to get entrance from something. Different styles point just at reconnaissance which is predominantly for mortal gate crashers, however, we will generally fail to remember that the principal foes of undifferentiated from ranchers are the beasts that obliterate the yield [22–24]. This might make the rancher extremely helpless yield and furthermore it will make loos to that rancher. This issue is entirely articulated, to the point that occasionally the ranchers choose to pass on the spaces infertile because of similar to visit monster assaults. This framework assists us with holding down similar to wild beasts from the spreads just as gives observation usefulness [25–27].
2 Literature Survey
A Survey on Wild Creatures Alert System to Protect Agriculture …
137
Table 1 Comparison of literature survey S.No. Title of the paper Authors
Methods
Merits
Findings
1
Implementation Sheela S., of wildlife Shivaram K., animals Chaitra U. detection and protection system using Raspberry Pi [1]
Use of Raspberry Pi
It uses low cost monitoring wildlife intrusion using IoT devices
Core idea of the wildlife detection system
2
IOT in agricultural crop protection and power generation [2]
Anjana M., Sowmya M. S., Charan Kumar A., Monisha R., Sahana R. H.
Integrating greenhouse technology and IoT devices
Suitable for large scale growers
Protection of crops through notification
3
Protection of crops from animals using intelligent surveillance system [3]
Mriganka Gogoi and Savio Raj Philip
Segmentation and object detection
Object subtraction method and alarm method
YOLO and SIFT algorithms
4
The smart crop protection system [20]
Mohit Korche, Sarthak Tokse, Shubham Shirbhate, Vaibhav Thakre, S. P. Jolhe
Use of PIC Unuse of microcontrollers buzzer and laser diode
5
Corn farmland monitoring using wireless sensor network [6]
T. Gayathri, S. Ragul, S. Sudharshanan
Use of sensor, microcontroller and GSM
6
Development of IoT based smart security and monitoring devices for agriculture [9]
Tanmay Baranwal, Nitika, Pushpendra Kumar Pateriya
High speed Notification data-processing with real time and informing to information user
High speed data processing and analysing
7
Protection of crops from wild animals using intelligent surveillance system [10]
Pooja G., Mohammad Umair Bagali
Use of RFID Real time technology for motion motion detection detection
Continuous monitoring for particular area
Restrictions of buzzer sounds
The extraction Idea of of images is integration of done in this GSM from video
(continued)
138
K. Makanyadevi et al.
Table 1 (continued) S.No. Title of the paper Authors
Methods
Merits
Findings
8
Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning [22]
Mohammad Sadegh Norouzzadeh, Anh Nguyen, Margaret Kosmala
Use of deep convolutional neural networks
This will collect the images of animals without any cost
Concept of underlying on hand designed objects features
9
Photographic paper texture classification using model deviation of local visual descriptors [5]
David Picard, Ngoc-Son Vu, Inbar Fijalkow
Use of VLAT + POEM combination on datasets
Gradient Understanding computation of classification and orientation of images
10
A computer vision framework for detecting and preventing human–elephant collisions [13]
Isha Dua, Pushkar Shukla, and Ankush Mittal
Data processing Frame Frame and extraction of detection and detection and motion areas morphological binary pattern operation
11
A smart Bindu D. and farmland using Dilip Kumar raspberry pi crop M. D. et al. prevention and animal intrusion detection system [16]
12
Solar fencing unit and alarm for animal entry prevention [11]
Use of RFID injector, fog machine
Krishnamurthy Usage of electric B., Divya M. devices such as motors, battery and sensors to get power from solar and use for micro-chips
Real time motion detection
Usage of image detection
The smart alarm system will monitor the and it will generate the alarm
Integrating notification with efficient usage
A Survey on Wild Creatures Alert System to Protect Agriculture …
139
3 Suggested Approach The proposed framework will utilize the article coordinated with a predefined object of the YOLO equation, the camera can catch a picture and move it to the server [28, 29]. While catching the picture of meddlesome creatures, the picture inside the server can get erased. The knowledge shut circuit TV distinguishes the creature through the YOLO system equation and sends notices through Mail [30, 31]. It moreover enacts the signal precisely and furthermore ranchers will board it. During this venture, we tend to utilize the YOLO document for creature discovery and acknowledgement [32–34]. In Table 1, we have included the different literature survey which have not included yolo algorithm. It moreover enacts the signal precisely and furthermore; ranchers will board it. During this projected framework we’ll understand creatures in period abuse AI OPENCV when the securing of the picture it’s to be pre-handled and compacted [2, 35]. Pictures are acclimated to train the model. It’s prepared by acting component extraction on the picture to get the ideal example inside the picture. Followed by include combination and aspect decrease to pack the picture for solid and period execution [36] (Fig. 1).
3.1 YOLO Algorithm YOLO full formation “You only see once”. It can be an associate degree formula that recognizes multiple objects in one image (in real time). Object detection in YOLO acts as a regression negative and provides a probable type of detected images. Accordingly, they will need the propagation in forward only which to detect the things or objects through the network of neural system. This indicates that the prediction algorithm within the entire image is completed in a single flow [37]. CNN is used to predict different types of opportunities and boundary boxes simultaneously. The YOLO formula has assorted variations. Many primary common adaptations are YOLO and YOLOv3 [38, 39].
Fig. 1 Proposed system diagram
140
K. Makanyadevi et al.
Fig. 2 Module diagram for processing Images using ML
3.1.1
Selection Criteria for YOLO Algorithm
• Speed: This algorithm improves detection speed, which in turn predicts in the real time objects. • High Accuracy: This algorithm can be a predictive technique that produce less background error that gives accurate results. • Learning Skills: This is an algorithm which has amazing learning skills that enable us to state object which represent and apply them to object identification. 3.1.2
Steps of YOLO Algorithm
The YOLO algorithm performs the following 3 techniques: • Residual blocks • Bounding box regression • Union (IOU) Residual Blocks The image is segmented into S × S grid by the idea of segmentation. In Fig. 2, image will show the neural input in which the images divided into several phases [40]. There are several phase cells of equal dimensions. Each phase cell can observe the objects appearing within themselves. For example, if an object center appears to be within a precise cell, this cell is responsible for policing.
Bounding Box Regression It is the view or suggestion of objects of an image [41]. The border box is marked with a yellow outline. YOLO uses a border box lag to predict the height, width, center and sophistication of objects. The pictures will represent the all-possible things of the object within the box.
A Survey on Wild Creatures Alert System to Protect Agriculture …
141
Bounding-box regression is a well-known technique that is mainly used for adjusting or predicting localization boxes in the detection of the object in real time [42].
Intersection on Union (IOU) • The Intersection on the Union (IOU) may be a development of material detection, which describes the boxes joining together. YOLO uses IOU to provide a release box around products. • The IOU is enough for one of the expected border boxes, which is the real box. This mechanism removes the border boxes that are not equal to the main box [43].
3.2 Implementation of YOLO with OpenCV • There are various implementations of YOLO algorithm and perhaps most popular of them is the Darknet. In Fig. 1, we are going to use OpenCV to implement YOLO algorithm as it is really simple. To get started you need to install OpenCV on your Pc using this command in you command prompt. • To use YOLO via OpenCV, we need three files viz.—‘yoloV3.weights’, ‘yoloV3.cfg’ and ‘ylov3.names’ contain all the names of the labels on which this model has been trained on. • First, we are going to load the model using the function “cv2.dnn.ReadNet()”. We download and used tools which are freely available on GitHub. • Here we use and find the instructions to use it from the same GitHub page. We manually label each image with the location of the animals in the images. We can use a tool label Img for this and it can make our work really easy but still it is going to take time as we have to do this manually. • We need to change the settings and keep it for YOLO before moving forward make sure that all your images are in the same folder and the folder contains only the images we want.
3.3 YOLO Architecture Framework Design and Algorithm Function A YOLO community includes 3 fundamental components. • First, the set of rules is likewise referred to as the prediction vector. • The second is the community. • Third is loss works.
142
K. Makanyadevi et al.
4 Results Comparison 4.1 SSD For smaller items, the performance of the SSD is much worse when compared to the faster R-CNN. The main reason for this shortcoming is that on the SSD, highresolution layers are always detecting minute objects. Anyway, these layers are less effective for classification because they have features that are low such as colors or corners which will always reduce all accuracy of the SSD. There are a few limitations of this method that can be taken from the complexity of the SSD’s data multiplication because the SSD requires a large amount of data for training purposes. Depending on the application it can be very expensive and time-consuming.
4.2 Faster R-CNN The accuracy of this algorithm depends on the time issues that cost. It is considerably slower than algorithms like YOLO. Even though there are improvements in RCNN and Fast RCNN they always require multiple passes.
4.3 YOLO YOLOv3 is one of the best changes made to an object detection system from when the Darknet 53 is introduced. This modified update was well received by critics and other experts. Although YOLOv3 is still considered empirical, complex analysis has shown shortcomings and there are no optimal solutions for the loss process. It was then adjusted to the optimal model of the similar model then used and tested for functional improvements. The best version of the given software is best for analyzing previous errors. Compared to other versions of YOLO, version 3 is a stable and recommended version. It is also a CNN algorithm.
5 Conclusion and Future Works Through this audit paper, we can ensure the farming harvest, home grown animals, and people from wild animals have been with progress planned and tried. It’s been created by joining choices of the relative multitude of processors and bundles utilized and tried [44]. The *presence of every module has been contemplated out and put critically so feeder to the unit [45]. The disadvantage of yield annihilation by creatures gives inconvenience to society these days. It needs crushing thought and a useful
A Survey on Wild Creatures Alert System to Protect Agriculture …
143
objective. Through the audit paper, we pass on a good agreeable relationship considering the way that it concentrates to manage the disadvantage. The goal is to the domain from wild creatures and intruders and to hinder the obliteration of harvests which makes a genuine insinuation of the agrarian terrains [46]. The ranchers secure their manors and terrains and saving the ranchers from indispensable cash misfortune and also save [47]. They feel awkward for the insurance of their fields from inadequate endeavors it causes to feel that their properties are ensured, later on, we can imagine or plan an IOT application that gives pictures and video to ranchers, through any gadgets and ranchers will get a notification whenever there is a possibility in the land by animals alongside additional data on temperature and stickiness [48].
References 1. S. Sheela, K.R. Shivaram, U. Chaitra, P. Kshama, K.G. Sneha, K.S. Supriya, Low-cost alert systemfor monitoring the wildlife from entering the human populated areas using IOT devices. Int. J. Innov. Res. Sci. Eng. Technol. 5(10) (2016) 2. M. Anjana, M.S. Sowmya, A. Charan Kumar, R. Monisha, R.H. Sahana, Review on IoT in agricultural crop protection and power generation. Int. Res. J. Eng. Technol. (IRJET) 06(11) (2019) 3. M. Gogoi, S.R. Philip, Protection of crops from animals using intelligent surveillance. J. Appl. Fundam. Sci. (2016) 4. S. Santhiya, Y. Dhamodharan, N.E. Kavi Priya, C.S. Santhosh, A sensible farmland mistreatment raspberry pi crop hindrance and animal intrusion detection system. Int. Anal. J. Eng. Technol. (IRJET) 05(03) (2018) 5. D. Picard, N.-S. Vu, I. Fijalkow, Photographic paper texture classification using model deviation of local visual descriptors, in IEEE International Conference on Image Processing, October 2014 6. T. Gayathri, S. Ragul, S. Sudharshanan, Corn farmland monitoring using wireless sensor network. Int. Res. J. Eng. Technol. (IRJET) 02(08) (2015) 7. M. Chandra, M. Reddy, K.R.K. Kodi, B.A.M. Pulla, Smart crop protection system from living objects and fire using Arduino. Sci. Technol. Dev. IX(IX), 261–265 (2020) 8. S. Thilagamani, C. Nandhakumar,Implementing green revolution for organic plant forming using KNN- classification technique. Int. J. Adv. Sci. Technol. 29(7S), 1707–1712 (2020) 9. T. Baranwal, Development of IOT based smart security and monitoring devices for agriculture. Department of Computer Science Lovely Professional University, Phagwara, Punjab (IEEE, 2016) 10. G. Pooja,M.U. Bagali, A smart farmland using Raspberry Pi crop vandalisation prevention and intrusion detection system 1(5) (2016) 11. B. Krishnamurthy,Int. J. Latest Eng. Anal. Appl. (IJLERA) 02(05) (2017) 12. S. Thilagamani, N. Shanti, Gaussian and Gabor filter approach for object segmentation. J. Comput. Inf. Sci. Eng. 14(2), 021006 (2014) 13. I. Dua, P. Shukla, A. Mittal, A vision-based human–elephant collision detection system, in IEEE International Conference on Image Processing, 25 February 2016, pp. 225–229 14. A.A. Altahir, V.S. Asirvadam, N.H.B. Hamid, P. Sebastian,Modeling camera coverage using imagery techniques for surveillance applications, in 4th IEEE International Conference on Control System, Computing and Engineering (2014) 15. P. Perumal, S. Suba, An analysis of a secure communication for healthcare system using wearable devices based on elliptic curve cryptography. J. World Rev. Sci. Technol. Sustain. Dev. 18(1), 51–58 (2022)
144
K. Makanyadevi et al.
16. D. Bindu,Int. J. Eng. Basic Sci. Manag. Soc. Stud. 1(1) (2017) 17. K.S. Bhise, Int. J. Sci. Eng. Res. 7(2) (2016) 18. P. Pandiaraja, S. Sharmila, Optimal routing path for heterogenous vehicular adhoc network. Int. J. Adv. Sci. Technol. 29(7), 1762–1771 (2020) 19. V. Deshpande, Int. J. Sci. Res. (IJSR) 2319–7064. Index Copernicus Value (2013): 6.14 | Impact Factor (2014) 20. M. Korche, S. Tokse, S. Shirbhate, V. Thakre, S.P. Jolhe, Smart crop protection system. Int. J. Latest Eng. Sci. (IJLES) 04(04 July to August) (2021) 21. M. Murugesan, S. Thilagamani, Efficient anomaly detection in surveillance videos based on multi-layer perception recurrent neural network. J. Microproces. Microsyst. 79(November) (2020) 22. M.S. Norouzzadeh, A. Nguyen, M. Kosmala, A. Swanson, M.S. Palmer, C. Packer, J. Clune, Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Natl. Acad. Sci. (2018) 23. P. Pandiaraja, K. Aravinthan, N.R. Lakshmi, K.S. Kaaviya, K. Madumithra, Efficient cloud storage using data partition and time based access control with secure AES encryption technique. Int. J. Adv. Sci. Technol. 29(7), 1698–1706 (2020) 24. S.J. Sugumar, R. Jayaparvathy, An early warning system for elephant intrusion along the forest border areas. Curr. Sci. 104, 1515–1526 (2013) 25. P. Rajesh Kanna, P. Santhi, Unified deep learning approach for efficient intrusion detection system using integrated spatial–temporal features. Knowl. Based Syst. 226 (2021) 26. A.V. Deshpande,Design and implementation of an intelligent security system for farm protection from wild animals. Int. J. Sci. Res. 10(2), 300–350 (2016) 27. P. Santhi, G. Mahalakshmi, Classification of magnetic resonance images using eight directions gray level co-occurrence matrix (8dglcm) based feature extraction. Int. J. Eng. Adv. Technol. 8(4), 839–846 (2019) 28. R. Logeswaran, P. Aarthi, M. Dineshkumar, G. Lakshitha, R. Vikram, Portable charger for handheld devices using radio frequency. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 8(6), 837–839 (2019) 29. P.J. Pramod, S.V. Srikanth, N. Vivek, M.U. Patil, C.B.N. Sarat, Intelligent intrusiondetection system (In2DS) using wireless sensor networks, in Proceedings of the 2009 IEEE International Conference on Networking, Sensing and Control, Okayama, Japan, 26–29 March 2009 30. K. Deepa, S. Thilagamani,Segmentation techniques for overlapped latent fingerprint matching. Int. J. Innov. Technol. Explor. Eng. 8(12), 1849–1852 (2019) 31. P.S. Dhake, S.S. Borde, Embedded surveillance system using PIR sensor. Int. J. Adv. Technol. Eng. Sci. 02(03) (2014) 32. S.G. Nikhade, Wireless sensor network system using Raspberry Pi and Zigbee for environmental monitoring applications, in 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, T.N., India, 6–8 May 2015, pp. 376–381 33. D. Pradeep, C. Sundar, QAOC: noval query analysis and ontology-based clustering for data management in Hadoop 108, 849–860 (2020) 34. A. Narayanamoorthya, P. Alli, R. Suresh, How profitable is cultivation of rainfed crops? Some insights from cost of cultivation studies. Agric. Econ. Res. Rev. 27(2), pp. 233–241 (2014) 35. N. Andavarapu, V.K. Vatsavayi, Wild animal recognition in agriculture farms using W-COHOG for agro-security. Int. J. Comput. Intell. Res. (9) (2017) 36. P. Rekha, T. Saranya, P. Preethi, L. Saraswathi, G. Shobana, Smart AGRO using ARDUINO and GSM. Int. J. Emerg. Technol. Eng. Res. (IJETER) 5(3) (2017) 37. M.J. Wilber, W.J. Scheirer, P. Leitner, B. Heflin, L. Zott, D. Reinke, D.K. Delaney, T.E. Boult, Animal recognition in the Mojave desert: vision tools for field biologists, in 2013 IEEE Workshop on Applications of Computer Vision (WACV), 15–17 January 2013, pp. 206–213 38. S. Sivagama Sundari, S. Janani, Home surveillance system based on MCU and GSM. Int. J. Commun. Eng. 06(6) (2014)
A Survey on Wild Creatures Alert System to Protect Agriculture …
145
39. B. Bhanu, R. Rao, J.V.N. Ramesh, M.A. Hussain, Agriculture field monitoring and analysis using wireless sensor networks for improving crop production, in Eleventh International Conference on Wireless and Optical Communications Networks (WOCN) (2014) 40. M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis and Machine Vision (Thomson Learning, part of the Thompson Corporation, 2008). ISBN: 10: 0-495-24438-4; J. Clerk Maxwell, A Treatise on Electricity and Magnetism, vol. 2, 3rd edn. (Clarendon, Oxford, 1892), pp. 68–73 41. B. Ganesharajah, S. Mahesan, U.A.J. Pinidiyaarachchi, Robust invariant descriptors for visual object recognition, in 6th IEEE International Conference on Industrial and Information Systems (ICES), 16–19 August 2011, pp. 158–163 42. M.P. Huijser, T.D. Holland, A.V. Kociolek, A.M. Barkdoll, J.D. Schwalm, Animal-vehicle crash mitigation using advanced technology, phase II: system effectiveness and system acceptance. Western Transportation Institute, College of Engineering, Montana State University (2009) 43. G. Mahalingam, C. Kambhamettu, Faceverification with aging using AdaBoost and local binary patterns, in ICVGIP, pp. 101–108 (2010) 44. V. Balasubramaniam, IoT based biotelemetry for smart health care monitoring system. J. Inf. Technol. Digit. World 2(3), 183–190 (2020) 45. P.J. Patil, R.V. Zalke, K.R. Tumasare, B.A. Shiwankar, S.R. Singh, S. Sakhare, IoT protocol for accident spotting with medical facility. J. Artif. Intell. 3(02), 140–150 (2021) 46. I.J. Jacob, P. Ebby Darney, Design of deep learning algorithm for IoT application by image based recognition. J. ISMAC 3(03), 276–290 (2021) 47. A. Sungheetha, R. Sharma, Real time monitoring and fire detection using internet of things and cloud based drones. J. Soft Comput. Paradigm (JSCP) 2(03), 168–174 (2020) 48. A. Srilakshmi,K. Geetha, D. Harini, MAIC: a proficient agricultural monitoring and alerting system using IoT in cloud platform, in Inventive Communication and Computational Technologies (Springer, Singapore, 2020), pp. 805–818 49. S.K. Tetarave, A.K. Shrivatsava, A complete safety for wildlife using mobile agents and sensor clouds in WSN. IJCSI Int. J. Comput. Sci. 9(6), No. 3 (2012)
A Study on Surveillance System Using Deep Learning Methods V. Vinothina, Augustine George, G. Prathap, and Jasmine Beulah
Abstract Video Surveillance data is one of the varieties of Big Data as it produces a huge amount of data required for further needs. Nowadays, the number of disruptive and aggressive activities that have been occuring are increased dramatically. Hence, securing individuals in public places like shopping malls, banks, public transportations, etc., has become significant. These public places are being equipped with CCTV (Closed Circuit Television) cameras to monitor the activity of the people. Monitoring also needs human’s consistent focus to analyze the captured scenes and to react immediately if there is any suspicious activity such as theft, sabotage, bullying, etc. But constant focus on surveillance camera videos and identifying the unusual activities in the video is a challenging task as it needs time and manpower. Hence this paper analyzes various start-of-the-art video analytics methods to detect any aggression and unusual sign.
1 Introduction Establishing a security regime and developing a security culture is a subtle balancing act as security in public places is ever-changing and needs to be kept under continuous review, with susceptible areas recognized and corrective actions implemented to address them. CCTV is used generally in public places mainly for security and monitoring purposes. The surveillance system can be manual or automatic [1]. Though it provides many potential applications as in Fig. 1, it involves a hugely labor-intensive V. Vinothina (B) · A. George · G. Prathap · J. Beulah Kristu Jayanti College Autonomous, Bengaluru, India e-mail: [email protected] A. George e-mail: [email protected] G. Prathap e-mail: [email protected] J. Beulah e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_13
147
148
V. Vinothina et al.
Monitoring parking lots, traffic, public places, paents in hospitals
Monitoring Employee, Animals, Tourism
Vandalism Deterrent, Monitoring outdoor Perimeter
Preven ng The on Construc on sites
Applica ons of Video Surveillance
Preven ng loss of merchandise
Remote Monitoring against the , breakins, dishonest employees
Facility Protecon against thwart trespassers
Monitoring daily business opera ons
Fig. 1 Applications of video surveillance system [4]
job in the case of a manual surveillance system [2] and cost in the case of semi or fully automatic to set up the environment in terms of storage. The large amount of video data being generated is stored in its raw format to avoid the loss of any significant information. This leads to the necessity of large investments in storage and video management infrastructure. Due to the growth of High-Definition video, the demand for storage infrastructure keeps going high. Only few researches tried to provide better storage for video analytics. Efficient storage system is proposed in [3] for efficient storage and retrieval of video data. This paper mainly focuses on deep learning methods used for identifying human actions and abnormal activities. Anomalous or suspicious activity is nothing but an unusual activity of an object. With help of CCTV, events such as thefts, assaults, mugging, and people collapsing on the street from medical emergencies [5] can be captured. But processing the live footage using a special algorithm is very challenging in research. Moreover, ensuring a high level of security in a public space through a surveillance system is a challenging task in recent years [6]. So, there is a need for an intelligent surveillance system. As per the TFE report [7], users of video surveillance face more challenges in implementing and maintaining their intelligence video surveillance system. Hence, a study requires reviewing the state-of-the-art video surveillance system to monitor public places and to detect the suspicious activity of people. This secures the infrastructure, business, technology, most importantly people, and builds confidence. The main objectives of this study are, therefore, as follows:
A Study on Surveillance System Using Deep Learning Methods
149
• Recognizes the proposed Video surveillance system. • Analyzes the techniques applied in the video surveillance system and its limitations. • presents recommendations for the development of surveillance system researches based on the analysis. This paper is organized as follows: Sect. 2 describes the related work, Sect. 3 analyses the various approaches being proposed for video analytics, Sect. 4 suggests the future work based on Sect. 3 analysis and Sect. 5 concludes and describes the future direction of the paper.
2 Review of Literature Various Researcher has reviewed the various developments and different stages in the surveillance video system. Sreenu et al. [8] proposed a survey on video analysis on violence detection in the crowd using the deep learning method. This paper mainly focuses on object identification and action analysis. Various papers published in ScienceDirect, IEEEXplore, and ACM on surveillance video analysis using deep learning are considered for review. Applications of the surveillance system, deep learning models, datasets used, and algorithms used are discussed. The steps involved in FSCB [9] for crowd behavior analysis in real-time video and crowd analysis in independent scenes [10] of video are explained. The results obtained from the analysis on deep learning methods for crowd analysis are also considered for discussion. Ko [11] surveyed behavior analysis in video surveillance for homeland security applications. The author discussed the developments and general strategies of video surveillance, extraction of motion information, behavior analysis and understanding, person identification, anomaly detection, and also automated camera switching and data fusion for a multi-camera surveillance system. Beddiar et al. [12] summarized the evolution of vision-based Human Activity Recognition (HAR) for automatically identifying and analyzing human activities. This survey mainly focuses on HAR approaches based on (1) Feature extraction process (2) Recognition states (3) Sources of the input data (4) Machine supervised, unsupervised and semi-supervised methods, HAR applications, image/video input, and single-point/Multi-point acquisition of data. Authors have also discussed the most used Datasets for human activities recognition at the atomic action level, behavior level, interaction and group activities level. Various Performance metrics used for evaluating the HAR system are also discussed in detail. Shidik et al. [13] presented a detailed study of video surveillance systems in terms of research trends, methods, s, the public datasets used by researchers, and network infrastructure to handle multimedia data. Various challenges and opportunities related to researches in video surveillance systems were also discussed. A systematic review on image processing methods for video surveillance for ATM crime detection has been proposed by Sikandar et al. [14]. The researches based on image preprocessing
150
V. Vinothina et al.
methods for different circumstances, moving object detection, face detection, facial component detection, object shape and appearance detection, trained or untrained classifier systems were reviewed in detail. Analysis on Image acquisition techniques, samples used either video or images, parameters used for evaluation were described clearly. This detailed study focused on the abnormal detection system such as covered face detection, user abnormality activity detection and illegal object detection. A review of various video summarization methods and a comparative study is performed by Senthil Murugan et al. [15]. The comparative study is performed based on the limitations, data set used, advantages, and disadvantages of each method. It also presented object detection, classification, and tracking algorithms.
3 Proposed Approaches for Video Analytics This section discusses different methodologies proposed for video analytics to detect suspicious or anomalous activity in the surveillance data. These methodologies make the surveillance system smarter and avoid any mishappening in the future. An anomaly detection system detects suspicious activity by classifying the normal and abnormal activities of objects. Based on that object behavior events are classified as normal or anomaly. To classify, every recognition system is performing the following steps depicted in Fig. 2. • • • •
pre-processing—identifying keyframe, noise removal, extracting the foreground feature extraction—identifying unique properties of an object from the frames object tracking—object trajectory behavior understanding—detecting object pattern/behavior.
3.1 Video Preprocessing The raw video needs to be preprocessed to analyze the scenes or events occurring in the video. Frames are extracted from the video as preprocessing step. There are different techniques for extracting the frame. For video analytics, video is divided into
Preprocessing (identifying key frame, noise removal, extracting the foreground )
Feature Extraction
Fig. 2 General steps in identifying suspicious activity
Object Tracking
Identifying and reporting suspicious activity
A Study on Surveillance System Using Deep Learning Methods Table 1 Conventional methods and image features used in key frame extraction procedure [10]
151
Frame extraction methods
Features
Ontologies
Textual labels
Neuro nets
SIFT
Visual attention
Texture
Genetic algorithms
Shape
Statistics
Motion
Histogram difference
Color
Clustering
SBD
Dynamic weighting of feature similarity
Dupl. removal
Particle swarm optimization
Length/width of parts in facial images [19]
Fuzzy evolutionary immune network
Environment such as room corners, walls etc. [20]
Spectra clustering and auxiliary graphs Frame categorization Histogram intersection and weighting Wavelet transform Discrete cosine transform Support vector machine classifier Supervised machine learning, correlogram
frames, audio track, and textual parts. Each unit has an important role in analyzing the video. We shall discuss mainly frame extraction methods for video analytics. Keyframe extraction is signification step in video processing as it helps in acquainting with video content, reduces useless information [16], video classification, action recognition, and event detection [17]. Mashtalir and Mikhnova [18] discussed the approaches proposed before 2013 and more detailed information on key frame extraction techniques. Table 1 lists the features and methods used for key frame extraction proposed till 2013. It is found that mainly color features together with Shot Boundary Detection (SBD) are used as preprocessing phase [18]. The following are some of the Challenges observed in Conventional Methods [18]: • finding a method that works well with video contains complex background and overlapped objects. • the limited set of features is used for video processing otherwise a method consumes time. • Users’ feedback is often considered to extract key frames.
152
V. Vinothina et al.
Using key frames one can recognize some actions and it may be appropriate to achieve the preferred results. To overcome challenges in conventional methods, various deep learning-based has been proposed to identify the key frame which helps in identifying the actions. Tables 2 and 3 illustrates the deep learning methods used in video analytics for recognizing human actions and activities.
3.2 Feature Extraction Feature is a characteristic or attribute of data set used in analysis or decision making. Feature extraction is the process of extracting new features from the dataset using original variable. Various methods were porposed for feature extraction from text, image and video. Principal Component Analysis, Linear Discriminant Analysis, Singular value decomposition, Non-negative matrix factorization and CUR matrix decomposition methods [31] are most widely used for feature extraction. This paper manily focuses on methods employed for Object detection and identifying human actions.
3.3 Deep Learning Methods Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are the deep learning neural networks widely used for analysing image and sequencing image respectively. In surveillance system, these networks could be employed in understanding temporal and spatial events. Recently, these neural networks are applied for detecting objects, human actions, crowd violence, traffic monitoring, traffic prediction and crime detection to name a few. The following section describes the research carried out in surveillance system using deep learning methods. Yan et al. [17] proposed Automatic key frame extraction using deep learning as it is a pre-requisite for video understanding tasks. The proposed work involves two steps: (1) Automatic Frame Level Video Labelling (2) Designing two-stream CNN for the key frame. Appearance and motion features are extracted using CNNs and then fused to enhance the frame representation capability. Then, Linear Discriminant Analysis is used to identify a projection that minimizes the distance between the same action video and maximizes the distance between different action videos. The results aligned with the ground reality and well-characterized the video content. To recognize the single-action and multi-action, an algorithm has been proposed by Zhou et al. [21] using a keyframe estimator which uses Regularized Logistic Regression (RLR). RLR assigned the probability to each frame as key action evidence. Frame have the utmost probability is selected as the keyframe. Then the temporal relationship between the key frames and label of current sequence is predicted using HMM. The proposed method outperformed the methods considered for evaluation in terms of accuracy and speed.
Real-time action recognition based on keyframe
Edge computing based distributed intelligent video surveillance
Face recognition system Distributed protocol
Vehicle trajectories system
Deep generative modelling system
Zhou and Nagahashi [21]
Chen et al. [22]
Kavalionak et al. [2]
Tong et al. [23]
de Bem et al. [24]
Variational auto encoders generative adversarial network (VAEGAN)
Graph convolution and self-training
CNN, LSTM
Regularized logistic regression (RLR), hidden Markov model (HMM)
Self-supervised—LDA and CNN
Framework to annotate keyframes in human action videos
Yan et al. [17]
Method used
Objective
Author
Human3.6M dataset, ChictopiaPlus dataset, DeepFashion dataset
–
PEERSIM
Intel Core i5-6400 Quad-Core CPU, Intel Xeon Nehalem EX Six-Core CPU
Python—i7 CPU, 32 GB memory Evaluation parameters are—frame level and video level accuracy and recognition speed
Caffe-NVIDIA GPU
Tool
Table 2 Deep learning methods in recognizing humam actions and object identification
Caffe, NVIDIA Titan X GPU
Millions of vehicle snapshots
Yale database B
Seven days videos from the intersection of 30 streets
KTH, UCF sports and s-KTH
UCF101, VSUMM
Dataset
(continued)
Human body analysis
Vehicle movements
Face recognition
Traffic monitoring and traffic flow prediction
Walking, jogging, running, boxing, handwaving and hand clapping Driving, golf swing, kicking, lifting, riding the horse, running, skateboarding, swing-bench, swing-side
Walking, jogging, running, boxing, handwaving and hand clapping
Actions considered
A Study on Surveillance System Using Deep Learning Methods 153
An infrared video-based CNN based faster surveillance system R-CNN
Deep object detection in MPGP framework for surveillance video region-based networks
Zhang et al. [26]
Wang et al. [27]
Supervised graph Laplacian objective function (SGLOF) for CNN models
Pedestrian search system
Zhang et al. [25]
Method used
Objective
Author
Table 2 (continued)
–
HP Pavilion 550 with Inteli7–6700 processor and Nvidia QuadroK4200GPU
i-LIDS, 3DPeS, VIPeR, PRID, CUHK01, CUHK03, Shinpuhkan
Tool
CUHK, XJTU, and AVSS data set
Ground truth
–
Dataset
Pedestrian, vehicle
Pedestrian, ground vehicles
Pedestrian re-identification in video surveillance system
Actions considered
154 V. Vinothina et al.
A Study on Surveillance System Using Deep Learning Methods
155
Table 3 Deep learning methods in recognizing abnormal activities Author
Objective
Singh et al. [21]
Method used
Tool
Dataset
Actions considered
Automatic CNN, RNN threat recognition system using neural network
–
UCF crime dataset
Abuse, arrest, arson, assault, and normal videos Road accidents, burglar, explosion, shooting, fighting, shoplifting, robbery, stealing, and vandalism
Sabri and Li [23]
System to surveil home/building to detect suspicious activity
CNN (foreground object detection, saliency algorithm)
PC system and Raspberry pi
Images of humans and three categories of weapons (knife, small gun, large gun)
Recognition of the presence of guns
Singh et al. [24]
An autonomous solution to detect suspicious activity
Background subtraction, HOG, SVM
Raspberry pi
Violent flows Pedestrian and benchmark crowd and dataset violence detection
Maha Vishnu et al. [26]
Intelligent traffic and accident detection system
Hybrid SVM, MATLAB DNN, multinomial logistic regression
Real-world traffic videos
High traffic, ambulance arrival, accident detection
Karthikeswaran et al. [27]
A suspicious activity detection system
Adaptive MATLAB linear activity and vb.net classification (ALAC)
–
Abnormal behavior
Bouachir et al. [28]
Real-time detection of suicide attempts
Body joint positions and SVM
Used own Suicide by sequence of hanging videos with a attempt suicide attempt and unsuspected actions
–
(continued)
156
V. Vinothina et al.
Table 3 (continued) Author
Objective
Sharma and Detection of Sungheetha [35] abnormal incident
Method used
Tool
Dataset
Actions considered
Hybrid CNN and SVM
–
Multi-sensor dataset named PETS
Abnormal activity
To save the human workforce and time in judging the captured activities are anomalous or suspicious through CCTV, Singh et al. [28] have been proposed a deep learning model using CNN and RNN. Transfer learning is applied to reduce the computational resources and training time by retraining the Inception V3 model with the considered dataset. The productivity of the Inception model is passed to CNN which produced a vector of the high-level feature map. The vector is passed to RNN for the final classification. Refer [28] for CNN and RNN layer architecture. The constraints in implementing the proposed deep learning model such as video feed and processing power are discussed. The model produced better accuracy with reduced overfitting. Multi-layer edge computing architecture is proposed by Chen et al. [22] to build a Distributed Intelligent Video Surveillance (DIVS) System to provide flexible and scalable training capabilities. Three video analysis tasks such as (1) vehicle classification using CNN (2) traffic flow prediction using LSTM and (3) dynamic video data migration based on the comparison of an edge node to complete a training iteration and all the edge nodes to complete a training iteration. The factors specific to the distributed DL model such as Parallel training, model synchronization and workload balancing are considered and this model is evaluated based on the scale of edge nodes and video analysis tasks. A distributed camera-aided protocol has been proposed in [2] to support face recognition in video surveillance. This work attempted to reduce the workload of a server by exploiting the computational capabilities of surveillance devices known as Smart Sensing Units (SSU) in performing video analysis tasks such as identifying a person. Within the time interval, if SSU is not able to perform the video analysis task, then the video is forwarded to the server. The proposed protocol is evaluated using 1NN and KNN classification with LBP using the Yale database. The results have shown that High bandwidth is consumed by surveillance devices used in hotspots and partially placing the video analysis task in local resources of surveillance devices reduced the 50% workload of the main server. To detect suspicious activity mainly the presence of humans with a gun, a CNNbased low-cost multiunit surveillance system is proposed by Sabri and Li [29]. Foreground object detection algorithm and saliency algorithms are employed with a CNN to recognize the features. The proposed R-CNN-based model is evaluated on both PC and Raspberry pi systems. High speed and accuracy are achieved in recognizing the objects. The system can detect suspicious activity even in low recognition accuracy using saliency algorithms.
A Study on Surveillance System Using Deep Learning Methods
157
An autonomous crowd violence and pedestrian detection in restricted areas system is designed to reduce the burden of humans from continuously monitoring the surveillance system and helps the law enforcement authorities in guarding the public places of the city [30]. The activity detection module and communication module in the system are responsible to detects violent behavior in the crowd and communicate the information from the surveillance site to the police authorities without any delay. The system is evaluated by comparing the implementation results in Raspberry pi and PC. For crowd violence detection, the system obtained good results in terms of accuracy and sensitivity. Inspired physical methods such as fluid dynamics, energy, entropy, and force model were also employed in crowd video surveillance and analysis system [32]. Denser and Sparse traffic situation, the arrival of ambulance and accident detection system is proposed in [33]. The hybrid medial filter is used during preprocessing of the video and to remove the noise. Hybrid SVM is applied for vehicle tracking. The Gradient orientation features of the vehicle are identified using Histogram of Flow Gradient and the accident detection is implemented using Multinomial Logistic regression. The arrival of the ambulance is identified with a Deep Learning Neural Network (DNN) using siren sound and the proposed systems are produced good results in terms of accuracy, precision, recall, sensitivity and detection time. Adaptive Linear Activity Classification and IoT framework is used to detect the abnormal behavior of humans in public places [34]. This system involved the steps of background modeling using the Gaussian mixture method, foreground detection using the thresholding method, and activity analysis. The system is implemented using the ALAC technique in vb.net. The parameters, Sensitivity, specificity, accuracy, time complexity and computation time are used to evaluate the performance with the proposed methods. Bouachir et al. [35] proposed a vision-based intelligent video surveillance system to automatically detect suicide by hanging. The pose and motion features of humans are computed based on the spatial configuration of body joints and a binary classifier is used to classify the negative and non-negative frames. Pose representation and analysis, pose and motion features, scaling parameter calculation and feature selection and learning are the steps involved in the system. Frame-based and sequencebased detection results with accuracy and false alarm parameters were considered for evaluation. LDA, L-SVM, NB, RBF-SVM classifiers were used for comparison and RBF-SVM produced better results. VeTrac, a Vehicle trajectories system is proposed to track vehicle movements to understand urban mobility and benefits other applications [23]. Graph convolution process and Semi-Supervised learning methods are used to categorize observations from a different camera and to construct the vehicle trajectories respectively. Pedestrian detection system [25] consists of pedestrian detection, multi-person tracking and person re-identification. Deformable Part Module (DPM),local-to-global trajectory models and person trajectories were adopted to implement the system. F-Score criteria are used to evaluate the proposed system with different data sets. To overcome the challenge in detecting extremely low-resolution targets, Zhang et al. [26] proposed an Automatic Target Detection/Recognition (ATD/R) by training
158
V. Vinothina et al.
a CNN based faster-RCNN using long-wave infrared imagery datasets. This system is applicable for civilian and military applications and various weather conditions being adapted for testing. A framework called Moving-object Proposals Generation and Prediction (MPGP) framework has been proposed in [27] to reduce the searching space and generate some accurate proposals which can reduce computational cost.
4 Suggested Feature Work This section presents the research gap observed during the evolution of the literature survey of this study to enhance the further research in terms of strategic challenges, techniques and methodologies are as follows:
4.1 Real-Time Analytics The challenging part of the Anomaly detection system is executing the mode in real-time [28, 30]. Mainly systems are needed for detecting abnormal crowd density, theft, the person falling, multiple people running, etc. in real-time. In [29], online learning mode is adapted to retrain the system with new images by deploying more compute resources.
4.2 Environments Developing a video surveillance system for constrained environments such as lighting and weather is still a challenging task. In addition to this, noisy images, object overlapping, background clutters, occlusions, and camera positioning plays a significant role in analyzing video data [13, 25]. It is hard to detect and track the motion in dynamic scenes with illumination, weather changes, and shadow [25].
4.3 Infrastructure Design and Algorithms As video surveillance data is a kind of Big Data, an interactive and intelligent infrastructure design is needed to process/analyze the video. Many researchers proposed distributed AI and Deep Learning algorithms for distributed cluster computing and cloud computing [36–38]. Due to communication overhead, high latency, and packet loss [39] in the centralized, cluster, and cloud-based video surveillance system, edge computing is used as a computing paradigm [22, 40].
A Study on Surveillance System Using Deep Learning Methods
159
But one of the downsides of doing analytics at the edge increases the camera cost as more processing power is required [40]. When video data is stored in edge for processing, it is more vulnerable to hackers. Hence, the safety and security of data must be ensured at the device level and processing level. Few researchers proposed a Hybrid video Surveillance system that uses edge and hybrid of recording and/or analytics at the edge combined with a degree of further processing at the center. Enterprises works in remote locations could use this hybrid system and the success of this system depends upon the kind of analytics required, and whether the camera will be sending video signals or just metadata. To increase the safety of the people, a system needed to discover the potential threats in addition to the existing ones in advance and alert the authorities [28].
4.4 Framework Big-data video analytics either online or offline is expensive in terms of storage, computation and transmission [41]. Most of the video analytics used traditional client–server frameworks lack in managing complex application scenarios and are rarely used in the distributed computing environment. Some of the commonly used frameworks such as Kyungroul, iSurveillance, Smart surveillance, RISE and EDCAR framework are still facing errors in detection [13]. A distributed video analytics framework for intelligent surveillance systems has been proposed in [41]. Distributed video processing library is introduced to ensure scalability, effectiveness and fault tolerance. Still, many of the proposed frameworks require refinement and focus on developing domain-specific video analytics algorithms and next-generation distributed deep learning algorithms. The creation of video indexing in the data layer needs to be considered for future investigation. As video analytics deals with big data, a cloud-based system is mandatory for processing. Hence, challenging factors such as resource utilization, security, and privacy in the cloud system requires further attention for the smooth processing of video big data.
4.5 Other Challenges Still, an efficient method is needed in detecting dynamic scenes with illumination, weather changes and shadow detection [13]. Performance improvement in Segmentation techniques, an accurate method to track many people in a crowded environment with poor lighting and noisy images is still a challenging problem in video surveillance. Researchers can further focus on setting up an automatic traffic signal monitoring system, accident detection, identifying traffic signal violations and criminal activities with the help of high-end distributed processing systems [33]. And also, the
160
V. Vinothina et al.
implementation of video surveillance systems in unconstrained environments causes variations in viewpoints, resolutions, lighting conditions, background clutters, poses and occlusions that need to be considered while designing and developing video analytic methods.
5 Conclusion The objective of this literature review is to identify and analyze the research interest development, methods, datasets, and actions considered for surveillance in the video surveillance system. The methods widely adopted for intelligent surveillance are based on detection, tracking and understanding behaviors. Most of the widely used methods are based on Deep learning techniques. It also applied vision and image processing algorithms. In addition to this, few studies focus on improving and integrating infrastructure designs and system design. Edge computing infrastructure design is adopted in a few of the studies to improve resource usage and reduce latency. Still, there are many limitations and open challenges to be explored further in the field of video surveillance. Understanding action behaviors in a crowded environment poor lighting, bad weather conditions, and noisy images are still challenging and future enhancement of research in video surveillance.
References 1. S. Chaudharya, M.A. Khana, C. Bhatnagar, Multiple anomalous activity detection in videos. Procedia Comput. Sci. 125, 336–345 (2018). International Conference on Smart Computing and Communications, ICSCC 2017. ScienceDirect 2. H. Kavalionak, C. Gennaro, G. Amato et al., Distributed video surveillance using smart cameras. J Grid Comput. 17, 59–77 (2019). https://doi.org/10.1007/s10723-018-9467-x 3. B. Haynes, M. Daum, D. He, A. Mazumdar, M. Balazinska, A. Cheung, L. Ceze, VSS: a storage system for video analytics, in Proceedings of the 2021 International Conference on Management of Data, Association for Computing Machinery, New York, NY, USA (2021), pp. 685–696. https://doi.org/10.1145/3448016.3459242 4. Video Surveillance Applications. https://www.videosurveillance.com/apps/ 5. Using AI to Detect Suspicious Activity in CCTV Footage. https://dzone.com/articles/using-aito-detect-suspicious-activity-in-cctv-foo 6. W. Aitfares, A. Kobbane, A. Kriouile, Suspicious behavior detection of people by monitoring camera, in 2016 5th International Conference on Multimedia Computing and Systems (ICMCS) (2016) 7. Challenge of Video Surveillance. https://www.tfeconnect.com/the-challenges-of-video-survei llance-and-how-to-overcome-them/ 8. G. Sreenu, M.A. Saleem Durai, Intelligent video surveillance: a review through deep learning techniques for crowd analysis. J. Big Data 6(48), 1–27 (2019). https://doi.org/10.1186/s40537019-0212-5 9. A. Pennisi, D.D. Bloisi, L. Iocchi, Online real-time crowd behavior detection in video sequences. Comput. Vis. Image Underst. 144, 166–176 (2016).https://doi.org/10.1016/j.cviu. 2015.09.010
A Study on Surveillance System Using Deep Learning Methods
161
10. X. Wang, C.-C. Loy, Deep learning for scene-ındependent crowd analysis. Group Crowd Behav. Comput. Vis. 209–252 (2017). https://doi.org/10.1016/b978-0-12-809276-7.00012-6 11. T. Ko, A survey on behavior analysis in video surveillance for homeland security applications, in 2008 37th IEEE Applied Imagery Pattern Recognition Workshop (2008), pp. 1–8. https:// doi.org/10.1109/AIPR.2008.4906450 12. D.R. Beddiar, B. Nini, M. Sabokrou et al., Vision-based human activity recognition: a survey. Multimedia Tools Appl. 79, 30509–30555 (2020). https://doi.org/10.1007/s11042-020-090 04-3 13. G.F. Shidik, E. Noersasongko, A. Nugraha, P.N. Andono, J. Jumanto, E.J. Kusuma, A systematic review of ıntelligence video surveillance: trends, techniques, frameworks, and datasets. IEEE Access 7, 170457–170473 (2019). https://doi.org/10.1109/ACCESS.2019.2955387 14. T. Sikandar, K.H. Ghazali, M.F. Rabbi, ATM crime detection using image processing integrated video surveillance: a systematic review. Multimedia Syst. 25, 229–251 (2019). https://doi.org/ 10.1007/s00530-018-0599-4 15. A. Senthil Murugan, K. Suganya Devi, A. Sivaranjani et al.: A study on various methods used for video summarization and moving object detection for video surveillance applications. Multimedia Tools Appl. 77, 23273–23290 (2018). https://doi.org/10.1007/s11042-018-5671-8 16. M.M. Khin, Z.M. Win, P.P. Wai, K.T. Min, Key frame extraction techniques, in Big Data Analysis and Deep Learning Applications, ICBDL 2018. Advances in Intelligent Systems and Computing, vol. 744, ed. by T. Zin, J.W. Lin (Springer, Singapore, 2019). https://doi.org/10. 1007/978-981-13-0869-7_38 17. X. Yan, S.Z. Gilani, M. Feng, L. Zhang, H. Qin, A. Mian, Self-supervised learning to detect key frames in videos. Sensors 20(23), 6941 (2020). https://doi.org/10.3390/s20236941(2020) 18. S. Mashtalir, O. Mikhnova, Key frame extraction from video: framework and advances. Int. J. Comput. Vis. Image Process. 4(2), 68–79 (2014). https://doi.org/10.4018/ijcvip.2014040105 19. B.A. Athira Lekshmi, J.A. Linsely, M.P.F. Queen, P. Babu Aurtherson, Feature extraction and ımage classification using particle swarm optimization by evolving rotation-ınvariant ımage descriptors, in 2018 International Conference on Emerging Trends and Innovations in Engineering and Technological Research (ICETIETR) (2018), pp. 1–5. https://doi.org/10.1109/ICE TIETR.2018.8529083 20. R.Vazquez-Martin, P. Nunez, A. Bandera, F. Sandoval, Spectral clustering for feature-based metric maps partitioning in a hybrid mapping framework, in 2009 IEEE International Conference on Robotics and Automation Kobe International Conference, Center Kobe, Japan, 12–17 May 2009 21. L. Zhou, H. Nagahashi, Real-time action recognition based on key frame detection, in Proceedings of the 9th International Conference on Machine Learning and Computing, ICMLC 2017 (2017). https://doi.org/10.1145/3055635.3056569 22. J. Chen, K. Li, Q. Deng, K. Li, P.S. Yu, Distributed deep learning model for ıntelligent video surveillance systems with edge computing. IEEE Trans. Ind. Inf. 1–1 (2019). https://doi.org/ 10.1109/tii.2019.2909473 23. P. Tong, M. Li, M. Li, J. Huang, X. Hua, Large-scale vehicle trajectory reconstruction with camera sensing network, in Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom’21), Association for Computing Machinery, New York, NY, USA (2021), pp. 188–200. https://doi.org/10.1145/3447993.3448617 24. R. de Bem, A. Ghosh, T. Ajanthan et al., DGPose: deep generative models for human body analysis. Int. J. Comput. Vis. 128, 1537–1563 (2020). https://doi.org/10.1007/s11263-02001306-1(2020) 25. S. Zhang, D. Cheng, Y. Gong, D. Shi, X. Qiu, Y. Xia, Y. Zhang, Pedestrian search in surveillance videos by learning discriminative deep features. Neurocomputing 283, 120–128 (2018). https:// doi.org/10.1016/j.neucom.2017.12.042 26. H. Zhang, C. Luo, Q. Wang, M. Kitchin, A. Parmley, J. Monge-Alvarez, P. Casaseca-de-laHiguera, A novel infrared video surveillance system using deep learning based techniques. Multimedia Tools Appl. 77(20), 26657–26676 (2018). https://doi.org/10.1007/s11042-0185883-y
162
V. Vinothina et al.
27. H. Wang, P. Wang, X. Qian, MPNET: an end-to-end deep neural network for object detection in surveillance video. IEEE Access 6, 30296–30308 (2018) 28. V. Singh, S. Singh, P. Gupta, Real-time anomaly recognition through CCTV using neural networks. Procedia Comput. Sci. 173, 254–263 (2020). ISSN 1877-0509. https://doi.org/10. 1016/j.procs.2020.06.030 29. Z.S. Sabri, Z. Li, Low-cost intelligent surveillance system based on fast CNN. Peer J. Comput. Sci. 7, e402 (2021). https://doi.org/10.7717/peerj-cs.402 30. D.K. Singh, S. Paroothi, M.K. Rusia, Mohd.A. Ansari, Human crowd detection for city wide surveillance. Procedia Comput. Sci. 171, 350–359 (2020). ISSN 1877-0509. https://doi.org/ 10.1016/j.procs.2020.04.036 31. M. Sivasathya, S. Mary Joans, Image feature extraction using non linear principle component analysis. Procedia Eng. 38, 911–917 (2012). ISSN 1877-7058. https://doi.org/10.1016/j.pro eng.2012.06.114 32. X. Zhang, Q. Yu, H. Yu, Physics ınspired methods for crowd video surveillance and analysis: a survey. IEEE Access 6, 66816–66830 (2018). https://doi.org/10.1109/ACCESS.2018.2878733 33. V.C. Maha Vishnu, M. Rajalakshmi, R. Nedunchezhian, Intelligent traffic video surveillance and accident detection system with dynamic traffic signal control. Cluster Comput. 21, 135–147 (2018). https://doi.org/10.1007/s10586-017-0974-5 34. D. Karthikeswaran, N. Sengottaiyan, S. Anbukaruppusamy, Video surveillance system against anti-terrorism by using adaptive linear activity classification (ALAC) technique. J. Med. Syst. 43, 256 (2019). https://doi.org/10.1007/s10916-019-1394-2(2019) 35. W. Bouachir, R. Gouiaa, B. Li, R. Noumeir, Intelligent video surveillance for real-time detection of suicide attempts. Pattern Recognit. Lett. 110, 1–7 (2018). ISSN 0167-8655. https://doi.org/ 10.1016/j.patrec.2018.03.018 36. Z. Zhao, K.M. Barijough, A. Gerstlauer, Deepthings: distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE Trans. Comput. Aided Design Integr. Circ. Syst. 37(1), 2348–2359 (2018) 37. J. Chen, K. Li, K. Bilal, X. Zhou, K. Li, P.S. Yu, A bi-layered parallel training architecture for large-scale convolutional neural networks. IEEE Trans. Parallel Distrib. Syst. 1–1 (2018) 38. M. Langer, A. Hall, Z. He, W. Rahayu, MPCA SGD: a method for distributed training of deep learning models on spark. IEEE Trans. Parallel Distrib. Syst. 29(11), 2540–2556 (2018) 39. H. Kavalionak, C. Gennaro, G. Amato: Distributed video surveillance using smart cameras. J. Grid Comput. 1–19 (2018) 40. S. Yi, Z. Hao, Q. Zhang, Q. Zhang, W. Shi, Q. Li, Lavea: latency-aware video analytics on an edge computing platform, in Proceedings of the Second ACM/IEEE Symposium on Edge Computing (ACM, 2017), p. 15 41. A. Uddin, A. Alam, T. Anh, Md.S. Islam, SIAT: a distributed video analytics framework for ıntelligent video surveillance. Symmetry 11, 911 (2019). https://doi.org/10.3390/sym11070911 42. R. Sharma, A. Sungheetha, An efficient dimension reduction based fusion of CNN and SVM model for detection of abnormal incident in video surveillance. J. Soft Comput. Paradigm (JSCP) 3(02), 55–69 (2021)
IRHA: An Intelligent RSSI Based Home Automation System Samsil Arefin Mozumder and A. S. M. Sharifuzzaman Sagar
Abstract Human existence is getting more sophisticated and better in many areas due to remarkable advances in the fields of automation. Automated systems are favored over manual ones in the current environment. Home Automation is becoming more popular in this scenario, as people are drawn to the concept of a home environment that can automatically satisfy users’ requirements. The key challenges in an intelligent home are intelligent decision making, location-aware service, and compatibility for all users of different ages and physical conditions. Existing solutions address just one or two of these challenges, but smart home automation that is robust, intelligent, location-aware, and predictive is needed to satisfy the user’s demand. This paper presents a location-aware intelligent Received Signal Strength Indicator (RSSI) based home automation system (IRHA) that uses Wi-Fi signals to detect the user’s location and control the appliances automatically. The fingerprinting method is used to map the Wi-Fi signals for different rooms, and the machine learning method, such as Decision Tree, is used to classify the signals for different rooms. The machine learning models are then implemented in the ESP32 microcontroller board to classify the rooms based on the real-time Wi-Fi signal, and then the result is sent to the main control board through the ESP32 MAC communication protocol to control the appliances automatically. The proposed method has achieved 97% accuracy in classifying the users’ location.
1 Introduction The home automation system enables users to operate various types of devices and allows managing home appliances to be more straightforward and saves energy. Automation systems for the home and building are becoming popular [1]. On the S. A. Mozumder East Delta University, Chattogram, Bangladesh, India A. S. M. Sharifuzzaman Sagar (B) Sejong University, Seoul, South Korea e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_14
163
164
S. A. Mozumder and A. S. M. Sharifuzzaman Sagar
other side, they improve comfort, mainly when everyone is preoccupied with work. “Home automation systems” placed in homes boost comfort in addition to providing centralized management of airflow, heating, air conditioning, and lighting [2]. In the last ten years, academics have introduced a slew of home automation systems. Wireless home automation systems have made use of a variety of technologies, each with its own set of benefits and drawbacks. As an example, Bluetoothbased automation can be quickly and easily installed and de-installed, but it is limited to short distances. GSM and ZigBee are additional extensively used wireless technologies. A local service provider’s phone plan is required to use GSM for longdistance communication. For battery-powered devices in wireless real-time applications, Zigbee [3–8] is a wireless mesh network protocol developed for low-cost, lowenergy usage. Data rates, transmission, and network dependability are all constrained, and upkeep is prohibitively expensive. Wi-Fi technology is used in [6, 8–11]. Wi-Fi technology has advantages over ZigBee and Z-Wave in terms of cost, ease of use, and connection. With Wi-Fi-equipped smart gadgets, the cost is usually low. Do-ityourself Wi-Fi equipment is also easier to come by, resulting in an affordable option. Second, because Wi-Fi is now required and installed in the majority of homes, buying appliances that are Wi-Fi enabled is easier. As Wi-Fi is noted for its simplicity, a user only has to connect a limited number of devices for a home automation system. As a result, it seems that home automation systems using Wi-Fi are a better fit. On the other hand, the elderly constitute a significant and rising part of the global population. Statistics reveal that the proportion of persons aged 65 and over is steadily increasing due to a variety of factors, including decreased birth rates and women’s fertility. The percentage of the population aged 65 and above in the United States climbed from 12.4% in 2000 to 13.3% in 2011, and it is anticipated to rise to 21% by 2040 [12]. According to a United Nations study [13], life expectancy was 65 years in 1950, 78 years in 2010, and is expected to climb to 83 in 2045. Moreover, it was reported that 35% of people aged 65 and above had a handicap [12]. Some of them need help to achieve critical personal requirements. Adopting home automation systems with automatic essential control appliances can minimize the cost of in-home personal assistance. To ensure that home automation systems are suitable for all users, critical requirements such as cost-effective, location-aware service, automatic control appliances, wireless connectivity must be fulfilled. To this end, there are no researches available where the Wi-Fi fingerprinting-based home automation system method is implemented. This paper presents practical solutions to fulfill the requirements by undertaking the following methods. 1. 2. 3.
The proposed system is cost-effective because the IP-based light and bulb were not used as they were controlled using a relay module. The proposed system uses a Wi-Fi signal to detect the user’s location through a machine learning approach. The proposed system can automatically control appliances without any human interference.
IRHA: An Intelligent RSSI Based Home Automation System
4. 5.
165
The proposed system has Wi-Fi connectivity to send the control signal to the main control unit to control the appliances. The proposed approach delivers a user-friendly prototype of a home automation system.
The rest of the paper is divided as follows: related works relevant to home automation system is discussed in Sects. 2 and 3 describes the method used in the proposed system, Sect. 4 presents the proposed system implementation in a home, the results of the proposed system implementation is discussed in Sect. 5, and finally conclusion is drawn in Sect. 6.
2 Related Work Sosa and Joglekar [14] introduced home automation using various sensors reading, Fuzzy Logic, and IoT. The author used an IR sensor, LDR, to detect human daylight in a room, and they also used a temperature and humidity sensor to control the fan. Sanket et al. [15] proposed smart home automation using ultrasonic sensors and google speech recognition. They used ultrasonic sensors to detect the hand movement with respect to the sensors, and google speech activates the fan and lights, and they achieved 75% accuracy in controlling home appliances. Reddy et al. [16] introduced a home automation system using If This Then That (IFTTT). They used ESP32 as a WiFi-enabled microcontroller module and optimized an app to provide commands to control light fans with real-time monitoring features. Other authors used the Artificial Bee Colony Optimization, Metric Routing Protocol, Adaptive Multi-Objective Optimization to increase the throughput and security in the wireless network [17–19]. Table 1 shows the research gaps highlighted in earlier studies in the domains of home automation systems. The research gap in current studies in the domains of smart homes, smart buildings, and smart surroundings is shown in Table 1. Location-aware service needs precise location information to enable services. Previous methods are based on the instruction provided by the users to provide home automation services. However, the elderly, handicapped, and Childs may find it difficult to control the appliances through an app or manually.
3 Materials and Methods The proposed system is embedded with an ESP32 microcontroller board which is small and has Wi-Fi communication capabilities. Our proposed system takes RSSI value and detects user location with the help of the Wi-Fi fingerprinting technique. The detailed description of our proposed system is described in this section.
166
S. A. Mozumder and A. S. M. Sharifuzzaman Sagar
Table 1 Previous studies and research gap in the field of home automation system Research
Communication
Controller
Application
[20]
Bluetooth
PIC
Control indoor appliances
[21]
Bluetooth
Arduino
Manage appliances both indoors and outdoors at a short distance
[22]
Bluetooth, GSM
PIC
Operate both indoor and outdoor devices
[3]
ZigBee, Ethernet
Arduino
Operate indoor devices
[4]
X10, Serial, EIB, ZigBee, Bluetooth
ARM MCU
Indoor home automation service
[5]
Wi-Fi, ZigBee
Raspberry PI
Controlling humidity, temperature
[6]
ZigBee
Laptop
Control of home appliances is being discussed, but no implementation has not been taken
[7]
Wi-Fi
Raspberry PI board
A/C system implementation
3.1 System Architecture and Design The proposed home automation system can be divided into data acquisition, prediction based on the data, and control appliances automatically. Figure 1 shows the basic working architecture of the Intelligent RSSI-based home automation system (IRHA) system. The proposed system consists of a data acquisition unit and a control unit. The detailed description of these two units is given below: 1.
2.
Data Acquisition Unit: The ESP32 development board is used in the data acquisition unit to receive the Wi-Fi signal (RSSI) from the Wi-Fi routers, and the RSSI signals are fed into the machine learning classifier to detect the user’s location—the decision tree machine learning model trains and detects the user’s location based on the RSSI signals. The control data is sent to the main control unit through the ESP-NOW communication protocol to control the appliances. Control Unit: The control unit consists of an ESP32 development board, relay, and power connection. The main task of the control unit is to perform the user location detection based on the received RSSI data from the data acquisition unit. The Wi-Fi fingerprinting method is implemented in the control unit with the help of the decision tree classification algorithms. When the control unit detects users’ location, it sends the control signal to the relay. The relay modules are connected with appliances such as bulb, fan and when the user is in the specific room relay turn on the appliances.
IRHA: An Intelligent RSSI Based Home Automation System
167
Fig. 1 The basic architecture of the IRHA system
3.2 Wi-Fi Fingerprinting Wi-Fi Fingerprinting develops a probability distribution of RSSI values for a particular place and builds a radio map of a particular region based on RSSI data from multiple gateways [23]. The closest match is then identified, and a projected location is generated by comparing current RSSI readings to the fingerprint. In principle, this technique’s implementation may be separated into two different phases: 1.
2.
Offline phase: Fingerprints of received signal intensities are gathered from several accessible gateways for distinct reference locations whose locations have previously been calculated and then saved in a system. Online phase: A pattern-matching technique is used to compare a user’s fingerprint to those in the database while the user is logged in online. The user’s location is then determined to be the one that corresponds to the source database’s nearest fingerprint.
168
S. A. Mozumder and A. S. M. Sharifuzzaman Sagar
As discussed above, the online phase uses a pattern-matching technique to compare the users’ location from the database. Artificial intelligence and pattern recognition algorithms are the most used methods for the Wi-Fi fingerprinting online phase. Artificial intelligence method includes machine learning algorithms and deep learning algorithms. However, Deep learning algorithms need big storage and high computation power which is not available in small microcontroller chips. Thus, machine learning algorithms are the best available methods to implement Wi-Fi fingerprinting in energy and computation-constrained modules. In this paper, we have used the decision tree classifier algorithm to train the RSSI fingerprinting data of the different location data to detect the user’s location in real-time. The machine learning-enabled Wi-Fi fingerprinting algorithms are implemented in the control unit of the proposed system to perform real-time user location detection and control the appliances.
3.3 Decision Tree Classification A decision tree is a prediction model that is often used to illustrate a classification strategy. Classification trees are a popular exploration tool in many industries, including economics, trade, medicine, and engineering [24–27]. The Decision tree is often used in data analysis because of its efficiency and clarity. Decision trees are visually depicted as hierarchical, making them simpler to understand than alternative strategies. This structure primarily consists of a root node and a series of forks (conditional) that result in additional nodes until we approach the leaf node, which holds the route’s ultimate conclusion. Due to the simplicity of its depiction, the decision tree is a self-explanatory structure. Each internal node evaluates one attribute, whereas each branch represents the property’s value. Finally, each lead gives a classification to the scenario. Figure 2 shows an example for a basic decision tree “user location” classification model used in our proposed Wi-Fi fingerprinting method. It simply decides whether the user is in a particular room based on the different room’s RSSI signature. RSSI values of different rooms are collected from the nearby Wi-Fi access point using the ESP32 Wi-Fi module and then are saved in the database. We have used the scikitlearn machine learning python library to train our dataset for location detection. The datasets were split into the 70-30 training and testing ratios, respectively. The trained model is then ported to the microcontroller using the elequentarduino library. The decision tree model of our proposed method recursively partition the feature space by putting the samples with sam labels together given. Given training vectors xi ∈ R n i = 1, . . . , l and a label vector y ∈ R l . The proposed model’s partition subsets can be defined as follow, ft Q le m (θ ) = (x, y)|x j ⇐ tm
(1)
IRHA: An Intelligent RSSI Based Home Automation System
169
Fig. 2 Decision tree example for IRHA system ft Q right (θ ) = Q m \Q le m m θ
(2)
where m is represented by Q m with Nm Samples. For each candidate split θ = ( j, tm ) le f t right consisting of a feature j and threshold tm , partition the data into Q m (θ ) and Q m (θ ) subsets. The quality of a candidate split of node m is then computed using loss function H(), parameters selection equation is below, G(Qm, θ ) =
le f t f t Nmright right Nm H Q le H Q m (θ ) m (θ ) + Nm Nm
(3)
where θ ∗ = argmin θ G(Q m , θ ) Three loss functions such as Gini, Entropy, and misclassification can train the Decision Tree model for the Wi-Fi fingerprinting method. We have used the Gini method as a loss function for fingerprinting method. The relevant loss function equation can be seen below, H (Q m ) =
Pmk (1 − Pmk )
k
where Pmk = 1/Nm
y∈Q m
I (y = k).
(4)
170
S. A. Mozumder and A. S. M. Sharifuzzaman Sagar
3.4 ESP-Now Protocol ESP-NOW is an Espressif-defined protocol for connectionless Wi-Fi connectivity. Data packet is contained in a vendor-specific action frame and sent through one Wi-Fi device to the other without the need for connectivity in ESP-NOW. The CTR with CBC-MAC Protocol (CCMP) is used to secure the action frame in the ESP-now. CCMP is a method for generating a message authentication code from a block cipher. The message is encrypted in CBC mode using a block cipher algorithm, resulting in a chain of blocks where each block is dependent on the correct encryption of the preceding block. This dependency assures that any modification to any plaintext bits can change the final encrypted block that cannot be anticipated or reversed without having the block cipher’s key. It is much more secure than the Wired Equivalent Privacy (WEP) protocol and Temporal Key Integrity Protocol (TKIP) of Wi-Fi Protected Access (WPA). Moreover, It uses minimal CPU and flash memory, can connect to Wi-Fi and Bluetooth LE, and is compatible with different ESP families. ESP-NOW has a versatile data transmission method that includes unicast, broadcast, and one-to-many and many-to-many device connections. ESP-NOW can also be used as a stand-alone supplemental module to assist network configuration, diagnostics, and firmware updates. Based on the data flow, ESP-NOW defines two roles, initiator and responder. The same device may perform two functions concurrently. In an IoT infrastructure, switches, sensors, LCD panels, and other real-time applications often serve as the initiator, while lights, sockets, and other real-time applications serve as the responder, according to the industry standard. ESP-NOW protocol provides 1 Mbps of bit rate by default. We have used the ESP-NOW protocol for our proposed system as it is lightweight less storage-consuming.
4 Implementation The implementation of the proposed system was done on a three-bedroom home, where users from different locations wore wristbands. A detailed description of the implementation along with the control method is described in this section.
4.1 Hardware Components The hardware components mainly consist of the Wrist Band and Control System. Figure 3 shows the hardware components used in the proposed system. Both parts have ESP32 and power sources in common. In the case of the wristband, a tiny battery has been used to power it up. The Control system uses SMPS to activate the whole unit. It has 3 Lights and 3 Fans, and relay modules control those. ESP32 provides
IRHA: An Intelligent RSSI Based Home Automation System
171
Fig. 3 The hardware components of the proposed home automation system
the control signal for relay modules. A detailed description of the used hardware is presented below: ESP32 development board: ESP32 development board, the successor of the ESP8266 kit. The ESP32 has several additional features compared to its predecessor. It has a dual-core processor and wireless connectivity through Wi-Fi and Bluetooth. Relay Module: Microcontroller output signals cannot run real-world loads, so a relay module has been introduced to control the appliances. Initially, output signals from the microcontroller unit are sent to a photocoupler to provide base voltage to the driver transistor of the relay. 6 channel relays have been used in our proposed system.
172
S. A. Mozumder and A. S. M. Sharifuzzaman Sagar
4.2 Control System Method Algorithm 1 shows the control system of the proposed home automation system. The system takes RSSI signals as the input and controls home appliances by predicting the user’s location. The acquired RSSI value is first saved into a variable and then fed into the Wi-Fi fingerprinting techniques deployed in the data acquisition unit. As discussed before, the Wi-Fi fingerprinting technique’s online phase is responsible for matching the RSSI fingerprint with the database. The Decision Tree classifier is responsible for detecting the user location based on the previously acquired dataset. The classifier models first detect the user’s location and then send a specific signal to the control unit to control home appliances such as light fans. Here, one thing should be noted that the control signals are different for every room, which can be seen in Algorithms 1. The data acquisition unit uses the ESP-NOW communication protocol to send the control signal to the control unit. The control unit receives the data and uses the relay module to control appliances based on the received signal.
5 Results and Discussion Decision tree classification has been used in the proposed system to classify the RSSI value to determine the user’s location. Data acquisition and pattern recognition are among the most important factors for a Wi-Fi-based fingerprinting technique. Data acquisition was made by taking the RSSI value for every room using the ESP32 development board. The acquired data are then used to train the classifier models to determine the best models. We have trained our dataset with available and common classifiers such as SVM, Decision Tree, Random Forest, and Naïve Bayes. We have found that Decision Tree outperforms other algorithms in terms of accuracy, which
IRHA: An Intelligent RSSI Based Home Automation System
173
is 97%, which can be seen in Fig. 4. Therefore, we used the Decision Tree classifier for our proposed home automation system. Figure 5 shows a trained Decision Tree confusion matrix, which indicates the percent of correctly predicted and incorrectly predicted classes. The confusion matrix depicts occurrences of the true label horizontally and predicted label instances vertically. There was a total of 19 expected “room1” classes, while the proposed model predicted all instances correctly for “room1”. As for room 2, the proposed model predicts 2 incorrect labels, while the proposed model predicts all classes correctly for room 3. We also present the Decision Tree classifiers ROC curve in Fig. 6 for every class. The true positive rate of the proposed model is displayed on the y-axis in the ROC curve, while the false-positive rate is represented on the x-axis. Micro average and
Fig. 4 Accuracy comparison different machine learning methods trained with the same dataset
Fig. 5 Confusion matrix of the proposed model
174
S. A. Mozumder and A. S. M. Sharifuzzaman Sagar
Fig. 6 ROC curve for all classes of the proposed model
macro average ROC curves are calculated for the proposed model. We have also calculated ROC and AUC for every class for our proposed model.
6 Conclusion The whole design plan and working technique of a home automation system is described in detail in this research study. The purpose of the paper is to seek ideas about how to improve it and make it more user-friendly, particularly for the elderly and disabled. This suggested system has two mechanisms: data acquisition and control. The system collects Wi-Fi RSSI values to anticipate the user’s position on the data acquisition side, and on the control side, the system automatically controls appliances. A machine learning method called a Decision tree classifier is used to predict the state of the user’s location. The proposed method also uses the ESP-NOW protocol to enable a secure connection between the data acquisition and control units. It offers users simplicity, flexibility, and dependability, as well as a low-cost solution. However, IoT-based solutions such as controlling home appliances through the app are not presented in this paper, and the accuracy of the Wi-Fi fingerprinting method can be improved.
IRHA: An Intelligent RSSI Based Home Automation System
175
References 1. A. Alheraish, Design and implementation of home automation system. IEEE Trans. Consum. Electron. 50(4), 1087–1092 (2004). https://doi.org/10.1109/tce.2004.1362503 2. H. Jiang, Z. Han, P. Scucces, S. Robidoux, Y. Sun, Voice-activated environmental control system for persons with disabilities, in Proceedings of the IEEE 26th Annual Northeast Bioengineering Conference (Cat. No.00CH37114) (n.d.). https://doi.org/10.1109/nebc.2000.842432 3. K. Baraka, M. Ghobril, S. Malek, R. Kanj, A. Kayssi, Low cost arduino/android-based energyefficient home automation system with smart task scheduling, in 2013 Fifth International Conference on Computational Intelligence, Communication Systems and Networks (2013). https://doi.org/10.1109/cicsyn.2013.47 4. M. Zamora-Izquierdo, J. Santa, A. Gomez-Skarmeta, An integral and networked home automation solution for indoor ambient intelligence. IEEE Pervasive Comput. 9(4), 66–77 (2010). https://doi.org/10.1109/mprv.2010.20 5. I. Froiz-Míguez, T. Fernández-Caramés, P. Fraga-Lamas, L. Castedo, Design, implementation and practical evaluation of an IoT home automation system for fog computing applications based on MQTT and ZigBee-WiFi sensor nodes. Sensors 18(8), 2660 (2018). https://doi.org/ 10.3390/s18082660 6. Z. Li, M. Song, L. Gao, Design of smart home system based on Zigbee. Appl. Mech. Mater. 635–637, 1086–1089 (2014). https://doi.org/10.4028/www.scientific.net/amm.635-637.1086 7. Structuring and design of home automation system using IOT 4(5), 200–206 (2018). https:// doi.org/10.23883/ijrter.2018.4368.gjcfn 8. M. Helo, A. Shaker, L. Abdul-Rahaim, Design and Implementation a cloud computing system for smart home automation. Webology 18(SI05), 879–893 (2021). https://doi.org/10.14704/ web/v18si05/web18269 9. D. Choudhury, Real time and low cost smart home automation system using ınternet of things environment. Int. J. Comput. Sci. Eng. 7(4), 225–229 (2019). https://doi.org/10.26438/ijcse/ v7i4.225229 10. B. Davidovic, A. Labus, A smart home system based on sensor technology. Facta Univ. Ser. Electron. Energ. 29(3), 451–460 (2016). https://doi.org/10.2298/fuee1603451d 11. W. Jabbar, T. Kian, R. Ramli, S. Zubir, N. Zamrizaman, M. Balfaqih et al., Design and fabrication of smart home with internet of things enabled automation system. IEEE Access 7, 144059–144074 (2019). https://doi.org/10.1109/access.2019.2942846 12. Census.gov (2022). Retrieved 12 Jan 2022, from https://www.census.gov/prod/2002pubs/ censr-4.pdf 13. Un.org (2022). Retrieved 12 Jan 2022, from https://www.un.org/en/development/desa/popula tion/publications/pdf/ageing/WorldPopulationAgeing2013.pdf 14. J. Sosa, J. Joglekar, Smart home automation using fuzzy logic and ınternet of things technologies, in International Conference on Inventive Computation Technologies (Springer, Cham, 2019), pp. 174–182 15. S. Sanket, J. Thakor, P. Kapoor, K. Pande, S.V. Shrivastav, R. Maheswari, Relative hand movement and voice control based home automation and PC, in International Conference on Inventive Computation Technologies (Springer, Cham, 2019), pp. 232–239 16. M.R. Reddy, P. Sai Siddartha Reddy, S.A.S. Harsha, D. Vishnu Vashista, Voice controlled home automation using Blynk, IFTTT with live feedback, in Inventive Communication and Computational Technologies (Springer, Singapore, 2020), pp. 619–634 17. S. Shakya, L.N. Pulchowk, Intelligent and adaptive multi-objective optimization in WANET using bio inspired algorithms. J. Soft Comput. Paradigm (JSCP) 2(01), 13–23 (2020) 18. S. Smys, C. Vijesh Joe, Metric routing protocol for detecting untrustworthy nodes for packet transmission. J. Inf. Technol. 3(02), 67–76 (2021) 19. I.J. Jacob, P. Ebby Darney, Artificial bee colony optimization algorithm for enhancing routing in wireless networks. J. Artif. Intell. 3(01), 62–71 (2021)
176
S. A. Mozumder and A. S. M. Sharifuzzaman Sagar
20. P. Munihanumaiah, H. Sarojadevi, Design and development of network-based consumer applications on Android, in 2014 International Conference on Computing for Sustainable Global Development (INDIACom) (2014). https://doi.org/10.1109/indiacom.2014.6828089 21. M. Asadullah, K. Ullah, Smart home automation system using bluetooth technology, in 2017 International Conference on Innovations in Electrical Engineering and Computational Technologies (ICIEECT) (2017). https://doi.org/10.1109/icieect.2017.7916544 22. Smart home automation using Arduino integrated with bluetooth and GSM. Int. J. Innov. Technol. Explor. Eng. 8(11S), 1140–1143 (2019). https://doi.org/10.35940/ijitee.k1230.098 11s19 23. M. Brunato, C.K. Kallo, Transparent location fingerprinting for wireless services (2002) 24. H. Oh, W. Seo, Development of a decision tree analysis model that predicts recovery from acute brain injury. Jpn. J. Nurs. Sci. 10(1), 89–97 (2012). https://doi.org/10.1111/j.1742-7924. 2012.00215.x 25. G. Zhou, L. Wang, Co-location decision tree for enhancing decision-making of pavement maintenance and rehabilitation. Transp. Res. Part C: Emerg. Technol. 21(1), 287–305 (2012). https://doi.org/10.1016/j.trc.2011.10.007 26. S. Sohn, J. Kim, Decision tree-based technology credit scoring for start-up firms: Korean case. Expert Syst. Appl. 39(4), 4007–4012 (2012). https://doi.org/10.1016/j.eswa.2011.09.075 27. J. Cho, P. Kurup, Decision tree approach for classification and dimensionality reduction of electronic nose data. Sens. Actuators B: Chem. 160(1), 542–548 (2011). https://doi.org/10. 1016/j.snb.2011.08.027
A Review Paper on Machine Learning Techniques and Its Applications in Health Care Sector Priya Gautam and Pooja Dehraj
Abstract Healthcare data analysis has seen a significant rise in recent years and thus becoming a prominent research field. Healthcare data types include Sensor data, Clinical data, and Omics data. Clinical data are generally collected whereas a patient is being treated whereas multiple wearable and wireless sensor devices are used for collecting the Sensor data. Omics data includes transcriptome, genome, and proteome data types, all of which are complicated and high-dimensional. Since the data type varies and manually dealing with the raw data becomes a challenging task. This is where Machine Learning emerges as a significant tool for analyzing the data. To precisely forecast the healthcare data results, Machine Learning utilizes varied statistical techniques and advanced algorithms. For analysis ML utilizes different algorithms such as Reinforcement Learning, Supervised Learning, and Unsupervised Learning. This paper discusses the Machine learning algorithms of many forms that can be used to analyse and survey enormous volumes of healthcare data.
1 Introduction Healthcare term can be described as a system that refines the medical services and demands of the people to serve. Significant efforts have been made by Health companies, IT companies, Physicians, Patients, and vendors to maintain and restore health care records. Healthcare analytics is handling a wide variety of diseases including cancer, diabetes, stroke etc. Cancer is a deadly disease and can be of different types including lung cancer, breast cancer, colon cancer, etc. Every year 12% (approx.) cases are reported, out of which 10% of cases die. Similarly, out of 11% reported breast cancer cases, 9% die and this happens in all the other types of cancer [1].
P. Gautam (B) · P. Dehraj Noida Institute of Engineering and Technology, Greater Noida, India e-mail: [email protected] P. Dehraj e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_15
177
178
P. Gautam and P. Dehraj
According to a report released by ICMR and Bengaluru-based National Center for Disease Informatics and Research, the number of cancer patients in India is estimated to be 13.8 lakh for the year 2020 and may increase to 15.7 lakhs by 2025. To tackle cancer problem, it is important to create reliable and advance quality data. Healthcare research with Machine Learning has seen exponential growth over the years. Since the medical data has different types ranging from Omics data to Clinical data, it becomes very difficult for humans to perform manual analysis for correct decisions. Therefore, Machine Learning becomes a boon in the Healthcare sector which not only helps in better understanding of the data but also helps in improving decision making.
2 Machine Learning Techniques Machine learning is a subset of data analysis that is automate the development of analysis algorithms. It is similar to autonomic computing technique in which a self-adaptive system is designed and developed. This approach also automates the development of analysis and response algorithm [2–6]. Machine learning focuses on creating a computer program that can be used for analyzing and accessing data. The data thus obtained can be used for learning purposes. It refers to a machine’s ability to employ statistical approaches and complex algorithms more powerful to predict and operate on data changed the rule-based system to make the system more powerful. Data is the major component of Machine Learning and acts as a foundation for any model. Data accuracy is directly proportional to the Prediction accuracy i.e., the more relevant the data is the better is the prediction accuracy. Once the data is finalized, the next step is to select a problem-based algorithm. ML is used in a range of industries, including finance, retail, and so on, in addition to healthcare. Machine Learning Algorithm Types: These are broadly classified into 3 categories: • Supervised Learning • Unsupervised Learning • Reinforcement Learning Various types of machine learning algorithms and their applications are shown in Fig. 1 [7].
2.1 Supervised Learning Supervised automates the process of supplying both correct input data and correct output data to machine learning models. In supervised machine learning, the system is trained using labelled training data and then predicts the output using that data.
A Review Paper on Machine Learning Techniques and Its …
179
Machine Learning Types Supervised learning
Classification Image Classification Fraud Detection Spam Filtering Diagnostics
Unsupervised learning
Clustering Face Recognition Image Recognition Text mining Big Data Visualization
Regression Risk Assessment Score Prediction
Reinforcement learning
Gaming Robot Navigation Finance sector Manufacturing Inventory Management
Association City Planning Biology Targeted Marketing
Fig. 1 Machine learning algorithm types
Some of the input data has already been labelled with the desired outcome, as shown by the labelled data. This supervised learning method aims to find a mapping function that connects the input variable (x) to the output variable (y). Supervised learning can be used to detect fraud, assess risk, filter spam, categorise photos, and so on in today’s environment. In Fig. 2 shown how supervised machine learning works [8].
Fig. 2 Work of supervised machine learning
180
P. Gautam and P. Dehraj
Fig. 3 Types of supervised machine learning
Supervised Learning
Classification Classification
Regression Regression
Fig. 4 Work of unsupervised machine learning
2.1.1
Supervised Learning Algorithm Types
Classification When the output variable is categorical then we used classification algorithms, such as “Male” or “Female”, “Yes” or “No”, “True” or “False” etc. After the inputs are divided into two or more classes, the learner must develop a model that assigns unseen inputs to one or more (multi-label categorization) of these classes. Usually, this is done under the guidance of a specialist. With email (or other) messages as inputs and the classes “spam” and “not spam” as classes, spam filtering is an example of classification (Fig. 3).
Regression In these types of machines learning the output variable is a real value. It is being used to forecast continuous variables like market trends and weather forecasts, among other things. It is a problem with supervised learning, but the outputs are continuous rather than discrete. Using previous data, for example, to forecast stock prices. On two independent datasets, the following is an example of classification and regression (Fig. 4).
A Review Paper on Machine Learning Techniques and Its …
2.1.2
181
Supervised Learning Algorithms [9]
Logistic Regression (LR) It is used for Regression problem. We describe the relationship between two or more variables in this Regression. We predict things that follow this linear pattern based on this relationship. In the field of machine learning, LR is a prominent mathematical modeling approach for epidemiologic datasets. It starts by utilizing the logistic function to calculate. After that, the coefficients of the logistic regression model are learned, and predictions are created through using logistic regression model [5]. There are two parts to this generalised linear model: a link function and a linear portion. The classification model is calculated via the linear component, the connect function is responsible for communicating the calculation’s result [6]. For LR, you’ll need a hypothesis and a cost function, which is a supervised machine learning algorithm. It is worth noting that cost function optimization is critical [10].
K-Nearest Neighbour (KNN) This algorithm used for classification problems and regression problems. KNN is a well-known supervised classification technique that is utilized in a variety of applications, including pattern recognition and intrusion detection. KNN is a straightforward and simple algorithm. Even though KNN is highly accurate, it is computationally and memory expensive since both testing and training data must be saved [3]. A forecast for a new instance is produced by first finding the most comparable examples and then summarising the output variable based on those similar cases. In regression, the mean value can be utilised, while in classification, the mode value can be used. The distance measure is used to discover a similar case. The training dataset should be made up of vectors in a multidimensional feature space, each with a class label [11].
Naive Bayes (NB) The Bayes theorem supports this approach, which implies that each pair of features is independent. To categorise the noisy cases in the data and build a robust prediction model, Naive Bayes was used. The NB technique is a classification approach for binary and multiclass settings. The NB classifiers are a set of classification methods based on the Bayes theorem. They all follow the same criteria, though: each pair of traits to be classified must be distinct from one another [11]. This works in a similar fashion to SVM, however it employs statistical methods. When a new input is added, the probabilistic value for that input is calculated and the data is labelled with the class that has the highest probability value for that input [5].
182
P. Gautam and P. Dehraj
Decision Trees (DT) This algorithm solves regression and classification problem. It is used for dividing repeatedly data depending on a particular variable. In this algorithm the data represent in tree structure. The data is divided into the node and the leaf tree represent as the final decision. DT is a supervised algorithm that examines decisions, their anticipated consequences, and their outcomes using a tree-like model. Each branch represents a conclusion, and each node is a query. The class labels are the leaf nodes. When a sample data point reaches a leaf node, the label of the matching node is assigned to the sample. When the problem is simple and the dataset is small, this strategy works well. Even though the method is simple to learn, it has downsides when used with unbalanced datasets, such as overfitting and biased results. Decision tree (DT) can, on the other hand, map both linear and nonlinear relationships [3].
Support Vector Machine (SVM) It is used for Classification problem. SVM is the classifier which is used for classifying the data set into two classes with the support of hyper plane. Supervised machine learning (SVM) technique that is mostly used to solve classification issues, It can, however, be utilised to solve regression issues. The data items are first plotted as dots in an n-dimensional space, with each feature value corresponding to a separate coordinate in this manner. The hyperplane that divides the data points into two classes is then discovered. This method can be used to optimise the minimum distance between the chosen hyperplane and instances near the border [11]. SVM contains basic functions that use nonlinear relationships to map points to new dimensions, which puts it ahead of other algorithms [4]. The no probabilistic binary classifier is also known as SVM. Since it separates data points into two classes. When compared to many other algorithms, SVM is more accurate. It is, nonetheless, superior for scenarios with little datasets. The reason for this is that the training process becomes more complicated and time consuming as the dataset grows larger. It is impossible to perform successfully when data contains noise. SVM use a subset of training points to improve classification efficiency. Both linear and nonlinear problems can be solved with SVM, but nonlinear SVM outperforms linear SVM [3].
Random Forest Algorithm It is utilised to address problems like classification and regression. In this algorithm the input data entered at the top and the data are divided into smaller set which traverse through the tree. RFA [12] is a popular machine learning technique that can perform both problems like classification and regression. It is a supervised learning algorithm that uses recursion as its foundation. A group of decision trees is built in this algorithm, and the bagging approach is utilized for training [13]. The Random
A Review Paper on Machine Learning Techniques and Its …
183
Forest Algorithm (RFA) is noise-insensitive which can be used on datasets that are unbalanced. Overfitting is likewise not a significant issue in RFA [3].
Classification and Regression Trees Classification and Regression Trees is a predictive model that forecasts the output value based on the tree’s current values. The CART model is depicted as a binary tree, with each root representing a single input and a split point on that variable. Predictions are made using the output of the leaf nodes [11].
ANN (Artificial Neural Network) For picture categorization problems, Artificial Neural Network is a well-known supervised machine learning approach. Artificial neurons are the basic notion of an ANN in machine learning, it works in the same way as a biological neural network. The nodes in each layer of an ANN are connected to the nodes in the other layers. A deeper neural network can be formed by increasing the number of hidden layers [14]. Three varieties of functions can be found in neural networks. For a given set of inputs, the error function determines how good or bad the result was. On the other hand, the search function identifies adjustments that might minimise the error function. Based on the search function, how the revisions are made will be determined by the update function. It is an iterative method for improving the algorithm’s performance [4].
2.2 Unsupervised Learning In supervised machine learning, labelled data is used to train machine learning algorithms under the supervision of training data. Under the supervision of training data, supervised machine learning models are addressed using labelled data. However, in many circumstances, we do not have labelled data and must deduce the underlying pattern from the available data. Unsupervised machine learning was used to solve this type of challenge in machine learning. The purpose of this Machine Learning is to group data based on similarities, reveal the dataset’s underlying structure, and show it in a compressed format. In Fig. 5 shown how unsupervised machine learning works.
184
P. Gautam and P. Dehraj
Output
Algorithm
Input Raw data
Unknown Output No Training Set
Interpretation
Processing
Fig. 5 Work of unsupervised machine learning
Fig. 6 Unsupervised learning types
Unsupervised Learning
Clustering
2.2.1
Association
Types of Unsupervised Learning Algorithm
Clustering Clustering is the most important types of unsupervised learning algorithm. It mainly used for finding a pattern or structure in a collection of uncategorized data (Fig. 6).
Association You can use association rules to create associations between data objects in a huge database. The goal of this unsupervised method is to find interesting correlations between variables in massive databases. For example, people who buy a new house are more inclined to buy new furniture.
2.2.2
Algorithms of Unsupervised Learning
Partition Clustering (PC) Objects are partitioned, and their dissimilarity in partition clustering can cause them to change clusters. It’s useful in bioinformatics when deciding how many clusters to use, for example, for a tiny dataset of gene expression. The disadvantage is that the number of clusters must be manually entered. However, in bioinformatics,
A Review Paper on Machine Learning Techniques and Its …
185
this method is widely employed. Partition clustering techniques include COOLCAT, fuzzy k-means and CLARANS, CLARA [15].
Graph-Based Clustering In interactomes, graph-based clustering is frequently utilised to make sophisticated predictions and sequence networks. This approach is frequently slow and sensitive to user-defined parameters. Graph-based clustering techniques include SPC (Superparamagnetic clustering), MCL (Markov cluster algorithm), MCODE (Molecular complex detection), and RNSC restricted neighbourhood search cluster [15].
Hierarchal Clustering It is applied in the clustering process. We use this approach to find the two clusters that are the closest together and merge the two clusters that are the most comparable. We must repeat these processes until all of the clusters have been integrated. Objects are coordinated into a node tree, which are referred to as clusters, in hierarchical clustering. The two types of nodes are parent and child nodes. There can be one or more child nodes for each node, and each node can have one or more parents. Clusters may be traversed at multiple degrees of granularity, making this approach useful in bioinformatics. The shortcomings include the fact that they are typically slow, that errors made when merging clusters cannot be undone despite the fact that they affect the outcome, and that when big clusters are joined, crucial local cluster structure may be lost. This strategy is used to display gene interconnections based on gene similarity, but it might also be used to display protein sequence family connections. Hierarchical clustering algorithms include Chameleon, ROCK, LIMBO and spectrum [15].
Density Based Clustering (DENCLUE) Clusters are subspaces in which items are dense and separated by low-density subspaces in density-based clustering, whereas Clusters are subspaces with a high density of items and divided by low-density subspaces in density-based clustering. It’s a bioinformatics technique for identifying the densest subspaces in interactome networks, which are primarily clique-based. The advantages of this strategy are its speed and ability to locate clusters of various forms. The number of clusters is not taken into account by some of these algorithms. OPTICS (Ordering points to uncover the clustering structure), CLIQUE (Clustering in quest), DENCLUE (Density-based clustering), and CACTUS (Clustering categorical data using summaries) are all density-based clustering techniques [15].
186
P. Gautam and P. Dehraj
Model-Based Clustering Objects are designed to a model that is similar to usually a distributional statistic, in model-based clustering. A parameter can be used to specify the model, and the model can even be altered throughout the process. In bioinformatics, background knowledge is incorporated Gene expressions, interactomes, and sequencing are all investigated, using this method. This approach has a disadvantage in that it takes a long time to process large datasets. The statistics will be erroneous if the user’s assumptions are incorrect when defining the models. SVM-based clustering, COBWEB, and Autoclasis are examples of model-based clustering algorithms [15].
2.3 Reinforcement Machine Learning In a few respects, reinforcement learning differs from both supervised and unsupervised learning. In Reinforcement Learning, the current input’s state determines the output, while the prior input’s outcome determines the future input. In this type of Machine Learning, we take the concept of giving rewards for every positive result and make that the base of our algorithm. Reinforcement Machine Learning algorithm develops a system by which it improves its performance using the environment’s feedback. Reinforcement Machine Learning is an iterative process and develops learning by interacting with the environment without any human interference. In Fig. 7 shown how supervised machine learning works.
2.3.1
Types of Reinforcement Learning
See Fig. 8.
Output
Input Row Data Algorithm Reward
Best action
State
Selection of Algorithm Agent
Fig. 7 Work of reinforcement machine learning
A Review Paper on Machine Learning Techniques and Its … Fig. 8 Reinforcement learning types
187
Reinforcement
Positive
Negative
Positive Reinforcement Positive reinforcement happens when an event occurs as a result of a certain behaviour, boosting the strength and frequency of the behaviour. Positive reinforcement improves the strength and frequency of a behaviour when an event occurs as a result of that behaviour. To look at it another way, it has a positive impact on behaviour. Some of the advantages of positive reinforcement learning are as follows: • Change can be maintained over a lengthy period of time. • Enhances performance. • Excessive reinforcement may lead to an oversupply of states, reducing the output quality. Negative Reinforcement Negative reinforcement is described as reinforcing a behaviour by preventing or avoiding a negative situation. Unpleasant Reinforcement occurs when a behaviour is enhanced as a result of avoiding or stopping a negative situation. Some of the advantages of Negative reinforcement learning are as follows: • Behaviour enhances. • Demonstrate your contempt for a minimum level of performance. • It only provides enough to cover the most basic needs. Here following are the differences between supervised, unsupervised, and reinforcement machine learning: Criteria
Supervised learning
Unsupervised learning
Reinforcement learning
Definition
Machine learning is based on label data
Without any supervision, the machine is trained using unlabelled data
An agent interacts with its surroundings by taking actions, relearning, and rewarding itself
Data types
Labelled data
Un-labelled data
No predefined data
Types of problem
Classification and Regression
Association and clustering
Reward-based
Training
External supervision
No supervision
No supervision
188
P. Gautam and P. Dehraj
3 Related Surveys As the field of healthcare develops, researchers are focused on the types of data that may be used to predict outcomes. Ajay et al., for example, focused on clinical and genetic data and employed machine learning. To analyse them, you’ll need to learn algorithms. Other data kinds, however, are not. Sensor and Omics data are also present to work on. Our survey’s main goal is to collect all forms of information and use machine learning to examine them.
4 Healthcare Data Analysis The health care sector deals with a huge amount of patients’ health information. Thus, manually processing this information is next to impossible. Machine Learning enters the picture at this point. Machine Learning develops patterns from huge data and predicts the patients’ future outcomes using algorithms [11]. Healthcare Data Types: In healthcare sector, as mentioned below, different kinds of data can be used. • Clinical data • Sensor data • Omics data. Various types of Health Care Data and their applications are shown in Fig. 9 [2].
Health Care Data
Clinical Data
Electronic health records Administrative data Claims data Patient / Disease registries Health surveys Clinical trials data Fig. 9 Types of healthcare data
Sensor Data
Omics Data
Genomic data Transcriptomic data Proteomic data
A Review Paper on Machine Learning Techniques and Its …
189
4.1 Clinical Data Most health and medical research relies heavily on clinical data. Clinical data is information gathered during a patient’s continuous therapy or as part of a formal clinical study programme. Clinical data includes laboratory tests, radiological pictures, allergies, and other information obtained during a patient’s continuous treatment, as well as data from the Electronic Health Record (EHR). The following authors put their clinical experience to use. Tahmassebi et al. [16] proposed developing ML algorithms for early prediction of pathological complete response (PCR) to neoadjuvant chemotherapy and survival results in breast cancer patients using Multiparametric Magnetic Resonance Imaging (mpMRI) data. Linear support vector machine, linear discriminant analysis, logistic regression, random forests, stochastic gradient descent, adaptive boosting, and Extreme Gradient Boosting were used to grade the characteristics of PCR (XGBoost). Clinical data is separated into six categories: • Electronic health records (EHR): The purpose of electronic health records is the integration of a patient’s medical chart into digital documents. • Administrative data: The ordinary management of health-care programmes generates administrative health data. Administrative health databases are designed to collect and preserve this type of data and were built by provincial governments as a result of universal medical care insurance. • Claims data: It is also known as administrative data, which are sort of electronic data, but based on bigger scale. Claims data collect the information on million doctors such as appointment, Bills, Insurance, and other patient’s provider communication. • Patient/Disease registries: It is the special database which contains information about patient diagnosed with specific types of disease. • Health surveys: It is the critical survey tools to measure the population health status, health behaviour and risk factor.
4.2 Sensor Data Time series signals, which are an ordered succession of pairs produced by sensors, are included in sensor data. These data elements, which can be simple numerical or categorical values or more sophisticated data, are processed by computing equipment. The following authors are involved in sensor data research. Using data streams obtained from wearable sensors, Lonini et al. [17] presented machine learning algorithm to identify Parkinson’s disease.
190
P. Gautam and P. Dehraj
4.3 Omics Data Omics data is a large collection of complexes, high-dimensional data that comprises genomic, transcriptomic, and proteomic data. Various strategies, including machine learning algorithms, were necessary to handle this type of data. There are three forms of omics data: • Genomic data: Genomic data is a collection of gene expression, copy number variation, sequence number, and DNA data in bioinformatics. The authors listed below are responsible for the study on genomic data. Njage et al. [18] presented machine learning algorithms for improving hazard categorization in microbial risk assessment. Due to the enormous complexity of genomes data, the authors defined ML-based predictive risk modelling for risk assessment. Datasets relating to DNA isolation and sequencing were gathered, and important features were extracted using feature extraction. • Transcriptomic data: Transcriptomic data is a collection of many mRNA transcripts data within a biological sample. Different datasets are created by analyzing and extracting these samples. The following authors apply transcriptome data to their work. Bobak et al. [19] established a framework for identifying gene signatures for tuberculosis diagnosis by combining several gene expression datasets. By combining four datasets, 1164 patients were sampled. The results were analysed using machine learning methods such as random forest, support vector machine with polynomial kernel, and partial least square discriminant analysis. • Proteomic data: Proteins expressed as a cell, tissue, or organisms are collected using this method. It is a means of representing the cell’s real functioning molecules. Proteomic data is a collection of proteins expressed in the form of a cell, tissue, or organism. It is a representation of the cell’s real functioning molecules. The following authors apply proteomic data to their work. Deep learning techniques were proposed by Liang et al. [20] for the study of FLT3ITD in acute leukaemia patients. 191 patients with protein data were sampled, with 231 individuals having serum levels. Deep learning with layered autoencoders and dimensionality reduction were used to reduce the proteins from 291 to 20.
5 Disease Prediction and Detection Using Machine Learning To forecast or diagnose a disease in its early stages, many machine learning methods have been applied, making treatment easier and increasing the patient’s chances of recovery. Different sorts of diseases have been recognized as a result of these approaches, although accuracy varies based on the basis of like as well as the algorithm used, the set of features, the data set for training, and so on. This section contains the following information: we’ll look at a few diseases as examples, as well
A Review Paper on Machine Learning Techniques and Its …
191
as the importance of recognising diseases as soon as feasible, ML algorithms used to locate the condition, the qualities utilised to make an estimate. In the article’s discussion part, there will be a rigorous comparison of the machine learning approaches that have been employed, as well as tips for how to improve them further.
5.1 Cancer The human body contains the appropriate number of cells of each type. The onset of cancer is marked by sudden alterations in cell arrangement. The regulation and division of cells are determined by the signals generated by cells. Since these signals are disturbed, cells multiply abnormally, resulting in a mass called a tumour. Because it is non-invasive and non-ionizing, thermography is more reliable these days. Emerging technology has produced efficient and advantageous results, making it superior to other technologies. Cancer cells can be recognised in thermographic pictures using feature extraction and machine learning methods. To extract features from pictures, SIFT (Scale invariant feature transform) and SURF (Speeded up robust feature) approaches can be utilised. In order to produce better interpretations, the features could be further filtered using principal component analysis (PCA) [3].
5.1.1
Breast Cancer
Breast cancer is a form of cancer that affects both women and men, primarily affects women and is the most common cause of death in women. However, early diagnosis of malignant cells by procedures such as MRI, mammography, ultrasound, and biopsy can help to lower this risk. The tumour is classified to diagnose breast cancer. Tumours are classified as benign or malignant. It’s worth noting that cancerous tumours are more dangerous than benign tumours. Physicians, on the other hand, have a difficult time distinguishing between these tumours. Machine learning algorithms are useful because without being expressly coded, they can learn and grow from their experiences [11]. In recent years, a slew of machine learning algorithms for breast cancer diagnosis and classification have been created. The three stages of their process that may be analysed are pre-processing, feature extraction, and classification. The feature extraction step is important because it helps determine whether a tumour is benign or malignant. Image properties such as smoothness, coarseness, depth, and regularity are extracted using segmentation [21]. In order to extract useful information from images, they are frequently transformed to binary. However, It was determined that doing so caused some important aspects in the image to evaporate, removing critical information, prompting the decision to leave the photographs in grey scale format. Discrete wavelet transformations can be used to convert images from the frequency response to the time domain. The wavelet decomposition is made up of four matrices: the approximation coefficient matrix,
192
P. Gautam and P. Dehraj
the horizontal detailed coefficient matrix, the vertical detailed coefficient matrix, and the diagonal detailed coefficient matrix., ese are the values that will be used in the machine learning algorithms [10].
5.1.2
Lung Cancer
Lung cancer is a disease that affects the lungs. People who suffer from lung and a history of chest difficulties are more likely to be Lung cancer has been discovered. Lung cancer is known to be increased by tobacco, smoking, and air pollution. Lung cancer begins in the lungs and eventually spreads to other organs. Lung cancer symptoms do not present until the disease has advanced, which makes it more deadly [5]. Tomography with a computer readings less obnoxious than X-ray and MRI data. Grayscale conversion, binarization, segmentation and noise reduction are all most important techniques for getting the images in the appropriate format with the least amount of noise and distortion. The average of RGB is utilized to convert to grey scale, and a median filter is employed to reduce noise. Segmentation cleans up the images by removing extraneous features and locating objects and boundaries. Features including area, perimeter, and eccentricity are taken into account during the feature extraction step [22]. Humans have a tough time identifying small-cell lung cancer (SCLC) since it looks almost comparable to a healthy lung. Machine learning techniques, such as Deep learning based on CNN algorithms, In this case, it could be utilised to detect SCLC. Deep learning techniques typically necessitate big training datasets, which is a problem. To get around the problem, you can employ the entropy degradation technique (EDM). High-resolution lung CT scans are required for both training and testing. EDM converts vectorized histograms to scores using a shallow neural network, which are subsequently converted to probability using a logistic function. In this technique, SCLC is a detection of treated as a problem of binomial with two parts of groups: a healthy individual or a patient of lung cancer. As a result, both of these sorts of data are originally provided. While this approach is reasonably accurate, it is far from perfect, and there is much room for improvement. Could is suggested, however suggested it be improved even more by giving a larger training set and a broader training area more extensive network. Because CNN is utilized in many CT imaging applications [23], combining it with image processing improves image processing for better detection.
5.2 Diabetes Diabetes is a long-term illness condition that must be detected early in order to receive proper treatment. Diabetes is caused by a spike in blood sugar levels, and it
A Review Paper on Machine Learning Techniques and Its …
193
makes life difficult for diabetics for a variety of reasons. Diabetes is classified into three types: type 1, type 2, and gestational diabetes. Discriminant analysis is a technique for finding an input’s class label based on a set of equations derived from the data. The two main goals of DA are to develop a meaningful a procedure for classifying test samples to interpret the prediction to get a better understanding of the relationship between characteristics. When classifying a patient, criteria such as pregnancy, weight, glucose levels and blood pressure, skinfold thickness, insulin Blood ratio, DPF (Diabetic pedigree function) are all factors to consider [6]. Machine learning techniques such as GNB (Gaussian Naive Bayes), LR, KNN, CART, RFA, and SVM, as well as indicators from electronic medical records (EMRs) such as serum-glucose1, serum-glucose2 levels, BMI, age, race, gender, creatinine level, and so on, as well as indicators from electronic medical records (EMRs) such as serum-glucose1, serum-glucose2 levels, were used to predict type 2 diabetes [24]. Various machine learning approaches were applied from time to time to try to increase the accuracy of the forecasts. Neural networks were used in one technique. A backpropagation technique was used to train a feed-forward neural network in this method. When compared to other machine learning techniques, predictions made using neural networks were found to have a better accuracy [23]. Deep neural networks (DNNs) have also been employed for diabetic prediction, with the DNNs being trained using fivefold and tenfold cross validation. It’s worth mentioning that both of the aforementioned neural network-based diabetes prediction algorithms have shown near-97% accuracy [25].
5.3 Heart Diseases Heart disease is a serious condition caused by a blockage in the arteries of the heart. Chronic heart disease is caused by an accumulation of plaque in the coronary arteries, which advances slowly and It seems to be possible that you’ll have a heart attack as a result of this. Several risk factors for major heart illnesses have been found, including abnormal High blood pressure, It seems to be possible that you’ll have a heart attack as a result of this. dyslipidaemia, smoking, lack of physical activity, and ageing are all factors that affect glucose metabolism. Heart disease is characterised by shortness of breath, physical body weakness, swollen feet, and weariness, among other symptoms [14]. Diagnostics and therapies in several subfields of cardiology are among the tasks that precision medicine has completed. Precision medicine in cardiology has accomplished jobs in interventional cardiology, to mention a few, there are tailored therapy options for correcting heart arrhythmias, the impact of gender on the outcome of cardiovascular illnesses, and another genetic research. Clinical decision support and patient monitoring tools are accessible in today’s healthcare informatics (CDSSs). Machine learning has advanced to the point where complicated issues that could previously only be handled by humans may now be solved by machines. Using this
194
P. Gautam and P. Dehraj
medicine with precision approaches, The CDSS could be used to make difficult clinical decisions, identify novel phenotypes, and provide personalised treatment options for individuals. In cardiology, blood testing are one of the most used precision medicine research procedures. AGES is a precision medicine test that reduces ischemic heart disease by merging a set of possibilities and blood tests. Precision medicine is primarily concerned with genetics, and numerous research studies are currently underway to determine the genetic causes of disease. Precision cardiology includes gathering information into cardiac genetics, cardiac oncology, and ischemic heart disease. When it comes to precision medicine in cardiology, methods like blood testing, genetic tests, For diagnostic and therapeutic purposes, imaging tests, or perhaps a combination group of them, could be used. For therapeutic objectives and diagnostic, technologies such as blood testing, genetic tests, imaging tests, or even a combination of these could be used in cardiology precision medicine. Because many cardiovascular disorders have genetic foundations, therapies based on precision medicine are thought to be more effective, particularly under these circumstances CNN, NLP, RNN, SVM, and LSTM are some machine learning methodologies that can be used to produce exact CDSS employing deep learning [26]. A machine learning (ML) technique for diagnosing cardiologic illnesses includes pre-processing, feature selection, cross validation, machine learning classifiers, and classifier performance evaluation. Missing value removal, standard scalar, and MinMax scalar are among the pre-processing techniques available. In machine learning, feature selection is critical since irrelevant features can impact the algorithm’s classification results. Prior to classification, feature selection saves execution time and improves classification accuracy. There are a variety of feature selection algorithms available. The most prevalent feature selection algorithms are Relief, LASSO (Least Absolute Shrinkage Selection Operator), and mRMR (Minimal Redundancy Maximal Relevance) [27].
5.4 Chronic Kidney Disease Chronic kidney disease is a kind of kidney disease that impairs kidney function over time and eventually leads to renal failure. Clinical data, blood testing, imaging scans, and biopsy can all be used to diagnose CDK. However, biopsy has several drawbacks, as an example inconvenient, time-consuming, expensive, and even dangerous. This is where machine intelligence can assist in overcoming these limitations. SVM was a common classifier in many illness predictions utilizing machine learning. However, when it comes to CKD, there isn’t much study that employs SVM to classify patients. The main machine learning classifiers utilized in this domain were ANN, DT, and LR. When comparing ANN to DT and LR for CKD diagnosis, the results showed that ANN performed significantly better [23].
A Review Paper on Machine Learning Techniques and Its …
195
5.5 Dermatological Diseases Dermatological diseases are complicated and varied, and few individuals are familiar with them. Early detection is usually preferable because it can prevent significant consequences. Dermatological ailments such as eczema, herpes, melanoma, and psoriasis are only a few of the diseases that should be diagnosed early to save lives. The first steps in one technique for diagnosing dermatological problems were data gathering and data augmentation using photographs. Phase 2 is critical since this is where the model is built and trained. In the final phase, various augmentation techniques like synthetic minority oversampling technique (SMOTE) and computer vision techniques like grey scaling, blurring, increasing contrast, changing the colour channel, sharpening, reducing noise, and smoothing are used to convert the image to an array and break down the features using the trained model. When there are more samples in the database, it is preferable to train the model. However, the SVM must first be trained by providing it the final convolutional layer’s trained features as the training data, which even the SVM will transform to vectors and store [23].
6 Conclusion Data varies significantly in the healthcare industry and to analyze these data types, to improve predictions; various machine learning methods are applied. These forecasts can be used and studied in the future to improve sensitivity, accuracy, specificity, and precision. In healthcare, there is a separate form of data. To analyse this variety of data and improve prediction, Various Algorithms for Machine Learning (ML), such as supervised, unsupervised, and reinforced algorithms, are utilised, and performance criteria such as accuracy, sensitivity, specificity, precision, F1 score, and Area under Curve are used to evaluate them. Several ML (Machine learning) algorithms and feature extraction methodologies for analysing various forms of data in healthcare have been proposed by various authors for cancer patient survival prediction, according to the survey’s findings.
References 1. A. Dhillon, A. Singh, Machine learning in healthcare data analysis: a survey. J. Biol. Today’s World8(6) (2019) 2. I. Ibrahim, A. Abdulazeez, The role of machine learning algorithms for diagnosing diseases. J. Appl. Sci. Technol. Trends 2(01), 10–19 (2021) 3. V. Mishra, Y. Singh, S. Kumar Rath, Breast cancer detection from thermograms using feature extraction and machine learning techniques, in Proceedings of the IEEE 5th International Conference for Convergence in Technology, Bombay, India, March 2019 4. P. Dehraj, A. Sharma, Complexity assessment for autonomic systems by using neuro-fuzzy approach, in Software Engineering (Springer, Singapore, 2019), pp. 541–549
196
P. Gautam and P. Dehraj
5. P. Radhika, R. Nair, G. Veena, A comparative study of lung cancer detection using machine learning algorithms, in Proceedings of the IEEE International Conference on Electrical, Computer and Communication Technologies, Coimbatore, India, November 2019 6. A. Al-Zebari, A. Sengur, Performance comparison of machine learning techniques on diabetes disease detection, in Proceedings of the 1st International Informatics and Software Engineering Conference, Ankara, Turkey, November 2019 7. W. Hurst, A. Boddy, M. Merabti, N. Shone, Patient privacy violation detection in healthcare critical infrastructures: an investigation using density-based benchmarking. Future Internet 12(6), 100–105 (2020) 8. B. Mahesh, Machine learning algorithms—a review. Int. J. Sci. Res. (IJSR) [Internet] 9 (2020) 9. I.H. Sarker, Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci. 2(3), 1–21 (2021) 10. P. Dehraj, A. Sharma, Autonomic provisioning in software development life cycle process, in Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur, India (2019) 11. A. Bharat, N. Pooja, R.A. Reddy, Using machine learning algorithms for breast cancer risk prediction and diagnosis, in Proceedings of the 3rd International Conference on Circuits, Control, Communication and Computing, Bangalore, India, July 2018 12. M.S. Yarabarla, L.K. Ravi, A. Sivasangari, Breast cancer prediction via machine learning, in Proceedings of the 3rd International Conference on Trends in Electronics and Informatics, Tirunelveli, India, April 2019 13. S. Sharma, A. Aggarwal, T. Choudhury, Breast cancer detection using machine learning algorithms, in Proceedings of the International Conference on Computational Techniques, Electronics and Mechanical Systems, Belgaum, India, June 2018 14. M.R. Ahmed, S.M. Hasan Mahmud, M.A. Hossin, H. Jahan, S.R. Haider Noori, A cloud based four-tier architecture for early detection of heart disease with machine learning algorithms, in Proceedings of the IEEE 4th International Conference on Computer and Communications, Chengdu, China, April 2018 15. P. Dehraj, A. Sharma, A new software development paradigm for intelligent information systems. Int. J. Intell. Inf. Database Syst. 13(2–4), 356–375 (2020) 16. A. Tahmassebi, G.J. Wengert, T.H. Helbich, Z. Bago-Horvath, S. Alaei, R. Bartsch et al., Impact of machine learning with multiparametric magnetic resonance imaging of the breast for early prediction of response to neoadjuvant chemotherapy and survival outcomes in breast cancer patients. Invest. Radiol. (2019) 17. L. Lonini, A. Dai, N. Shawen, T. Simuni, C. Poon, L. Shimanovich et al., Wearable sensors for Parkinson’s disease: which data are worth collecting for training symptom detection models. npj Digit. Med. (2018) 18. P.M. Njage, P. Leekitcharoenphon, T. Hald, Improving hazard characterization in microbial risk assessment using next generation sequencing data and machine learning: predicting clinical outcomes in shigatoxigenic Escherichia coli. Int. J. Food Microbiol. (2019) 19. C.A. Bobak, A.J. Titus, J.E. Hill, Comparison of common machine learning models for classification of tuberculosis using transcriptional biomarkers from integrated datasets. Appl. Soft Comput. (2019) 20. C.A. Liang, L. Chen, A. Wahed, A.N. Nguyen, Proteomics analysis of FLT3-ITD mutation in acute myeloid leukemia using deep learning neural network. Ann. Clin. Lab Sci. (2019) 21. H. Dhahri, E. Al Maghayreh, A. Mahmood, W. Elkilani, M. Faisal Nagi, Automated breast cancer diagnosis based on machine learning algorithms. J. Healthcare Eng. 2019, Article ID 4253641 (2019) 22. P. Dehraj, A. Sharma, A review on architecture and models for autonomic software systems. J. Supercomput. 77(1), 388–417 (2021) 23. S.M.D.A.C. Jayatilake, G.U. Ganegoda, Involvement of machine learning tools in healthcare decision making. J. Healthcare Eng. 2021 (2021) 24. I. Ibrahim, A. Abdulazeez, The role of machine learning algorithms for diagnosing diseases. J. Appl. Sci. Technol. Trends 2(01) (2021)
A Review Paper on Machine Learning Techniques and Its …
197
25. S. Ayon, M. Islam, Diabetes prediction: a deep learning approach. Int. J. Inf. Eng. Electron. Bus. 7(6), 21–27 (2019) 26. S. Niazi, H.A. Khattak, Z. Ameer, M. Afzal, W.A. Khan, Cardiovascular care in the era of machine learning enabled personalized medicine, in Proceedings of the International Conference on Information Networking, Barcelona, Spain, April 2020 27. A.U. Haq, J.P. Li, M.H. Memon, S. Nazir, R. Sun, A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob. Inf. Syst. 2018, Article ID 3860146 (2018)
Enhanced Mask-RCNN for Ship Detection and Segmentation Anusree Mondal Rakhi, Arya P. Dhorajiya, and P. Saranya
Abstract Ship detection and segmentation using satellite remote sensing imagery have become a hot issue in the scientific community. This sector helps control maritime violence, illegal fishing, and cargo transportation. Most available methods used for detecting ships perform object detection but don’t perform semantic segmentation. Besides, previous papers had various flaws, such as the inability to detect small ships, greater false positives, and severe noise in the given Synthetic Aperture Radar (SAR) images, which affects low-level feature learning in shallow layers and makes object detection more difficult. The intricacies of SAR images also significantly reduce the benefits of Convolutional Neural Networks (CNNs). Then there are a few models that can’t tell the difference between things that appear to be ships. Furthermore, some existing models performed somewhat worse than other state-ofthe-art given and successful frameworks. The cost of computing resources is higher for some models. This research provides a ship detection approach based on an improved Mask Region-Based Convolutional Neural Networks (Mask RCNN). At the pixel level, the proposed approach can detect and segment ships. For object detection and segmentation, Mask RCNN comprises two parts. In the segmentation part, more Convolutional Layers are added, and hyperparameters are changed to improve the overall output. Because of these changes, the proposed model can work more accurately than existing models. Using this mentioned method on the Airbus Ship Detection dataset on Kaggle, we achieved an accuracy of 82.9% on the proposed model.
1 Introduction Satellite imagery helps gather information over our target areas [1]. But it is better to use optical images over SAR images because of their various advantages [2]. Ship A. M. Rakhi · A. P. Dhorajiya · P. Saranya (B) Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_16
199
200
A. M. Rakhi et al.
detection using distant satellite imagery for marine security, including traffic surveillance, protection against illicit fishing, oil discharge management, and monitoring of sea pollution, is a recent practical and efficient application [3]. Additionally, it is useful for economic and military purposes and vessel life cycle analysis [4]. Generally, there are many ship detection challenges that need to be solved. Detection and segmentation accuracy for smaller ships was low in previous State-of-TheArt models due to the loss of low-level features when the original images were processed through their feature extraction network [5]. Besides, a core part of ship detection is semantic segmentation, which is the process of taking an image and labeling each pixel of a given image with certain classes [6]. And this part is skipped by so many existing models. Then it’s tricky to find and identify tiny objects, and the given different ship types are very hard to distinguish. There are various drawbacks to SAR ship detection: It is prone to strong noise interference; they are very vulnerable to strong winds; the provided objects smaller in size are very difficult to pinpoint and identify, and different types of ship are harder to distinguish [7]. Due to the drawbacks seen in the previous papers on ship detection, an enhanced Mask RCNN model has been proposed. The purpose of the proposed model is to detect ships with good accuracy and more negligible amounts of false positives. Unfortunately, the original Mask RCNN result contained a huge number of false positives; thus, the idea sprung while studying. In this proposed model, more Convolutional Layers were added and some hyperparameters were changed to minimize the number of false positives and improve the overall performance. So, an enhanced Mask RCNN model had been proposed for ship detection and segmentation, which reduces the false positives, and the accuracy rate is also impressive. Furthermore, the proposed technique can detect smaller ships from a complex image while also providing a higher-quality segmentation mask for each instance, as shown in Fig. 1. Fig. 1 Detection of ships from a complex background [8]
Enhanced Mask-RCNN for Ship Detection and Segmentation
201
A two-stage approach has been proposed to detect and segment ships from the input images. In the very first step, a Binary Classifier is used within the Regional Proposal Network (RPN) in the Object Detector, which will inform whether there is any ship or not in the given image. If there is any ship, it will enter into the second stage to execute the task of mask segmentation using Fully Convolutional Networks. The paper is structured as follows: Some related work about detecting and segmenting ships is reviewed in Sect. 2. Section 3 describes the proposed implementation model, including various stages of detecting and segmenting ships. Along with this, a short description of the loss function is also mentioned here. Section 4 describes the implementation of the proposed model and performance comparison with the past models also contains pieces of information about the dataset. In Sect. 5, the results and future study directions are discussed.
2 Related Work The use of satellite remote sensing photos to detect ships is crucial in controlling maritime violence, illegal fishing, and cargo transportation and also works towards traffic supervision. Zhang et al. [9] proposed a method for spotting ships that involve finding the object’s position in the images and determining its label by discriminating between the particular objects and their given background. Finally, a unique task, supervised learning strategy, is created to efficiently train each suggested Task-guided network (TGN) and a differentiable weighting method (DWM) to balance further learning across diverse tasks. Intense spotted noise in SAR images impacts lowlevel feature learning in deep layers, making object detection more challenging. The complexities of SAR images also diminish the benefits of CNNs greatly. Finally, the model beats the multistage detector by 1.9% after being trained using two datasets: the High-Resolution SAR type Images dataset and the Large-Scale SAR type Ship Detection Dataset. In field of the SAR image detection, Li et al. [10] suggested a model. The SAR ship detection model was trained using an intense network, ResNet, and implemented a quicker training speed. The model was trained using a SAR ship dataset created with Sentinel-1, RADARSAT-2, and Gaofen-3 datasets, which included 2900 photos and 7524 ships in varied situations and had an average precision of 94.7%. However, this model was unable to distinguish between objects that appeared to be ships, as a result, the model’s accuracy rate was hampered. Mao et al. [11] created a SAR imaging model that functions similarly to a lowcost ship detection network. A unique U-Net-based bounding box regression based network and another score map regression based network running in parallel are recommended for detecting ships in SAR photos. In the polar coordinate system, the ship’s bounding boxes are assigned in a given 4-tuple format, which may be plotted and converged without problem during the model’s training phase. The score map shows the given probability of the current position being the ship’s center. With great certainty, all pixels having a forecasted score over a specific threshold can be
202
A. M. Rakhi et al.
selected as “excellent” ship locations. It performed admirably in terms of detection and real-time capability. However, this model fared somewhat worse than other given state-of-the-art frameworks in terms of detection. Nie et al. [12] used Mask RCNN, a combination of Faster RCNN and FCN, to create a ship detection and segmentation model. This model was trained using the Airbus Ship Detection Challenge dataset. Using this approach, the identification and segmentation of mAPs improved from 70.6% and 62.0% to 76.1% and 65.8%, respectively. This model, however, is unable to detect small ships accurately due to the loss of low-level features inside its framework. Yu et al. [13] suggested a new approach for detecting ships. The following model’s algorithm consists of the following two steps: First, the AdaBoost classifier is paired with precisely Haar-like features to extract candidate area slices quickly. A peripherycropped based network is constructed for verification of ships based on the characteristics of ships. When it was trained using the NWPU VHR-10 dataset, this model’s precision and recall rates were 91% and 87.7%, respectively. However, this proposed algorithm still has a lot of scopes for improving the overall performance in the future, like improving the accuracy rate within the ship candidate extraction procedure should be the top priority. PCNet has a very good level of accuracy rate for ship verification, although the accuracy of the ship detecting mechanism might be improved. The second goal is to bring down the cost of computing resources. To overcome the overall challenges from the existing papers, like detecting smaller ships, higher false positives, and SAR photos with a lot of speckle noise, which impacts low-level feature learning in shallow layers, making object detection more challenging. The complexities of SAR images also diminish the benefits of CNNs greatly. Then a few models are unable to distinguish between objects that appear to be ships. Moreover, some existing models performed somewhat worse than other given state-of-the-art frameworks in the present period. The cost of computing resources is higher for some models. A modified model is proposed to detect as well as segment ships using Mask RCNN.
3 Methodology Detecting and segmenting ships using satellite imagery plays a vital role in controlling maritime violence. Most of the existing papers related to ship detection and segmentation methods performed only the object detection, but ship segmentation down to the pixel level has not been performed at all. Mask RCNN is a hybrid of powerful models, Faster RCNN and FCN, that can recognize objects as well as segment them semantically. It genuinely works toward the problem of instance segmentation, as illustrated in Fig. 2. At the very first stage, Faster RCNN does the job of object detection, and at the end of this process, the Bounding Box Regressor draws a bounding box around the detected object. Then, at the final stage, the mask is generated with the help of the Mask Generation Head. So, in this paper, we mentioned an enhanced Mask
Enhanced Mask-RCNN for Ship Detection and Segmentation
203
Fig. 2 Flow of instance segmentation
RCNN model with more convolutional layers at the Mask Segmentation Head part while simultaneously avoiding Overfitting of Feature maps and changing the hyperparameters like Detection_min_confidence, Image resolution settings, Loss weights and training the model with more epochs, which actually improved the overall performance as well as generated higher quality masks than the original Mask RCNN. Even in this model, after performing the segmentation of the input image, the generated mask doesn’t carry any background images, which is again helpful in classifying the ships or for further analysis.
3.1 Object Detection The flow diagram of the proposed model is explained in Fig. 3. It will take the raw picture as its first input, and this raw image will work as an input for the backbone layer. The proposed model will use Resnet50 as the backbone layer. This will generate Feature Maps as output and send them to RPN and ROIAlign. With the help of RPN, we will find the area in the input image where the object can possibly be found. Finally, once we get the area where the object is present, RPN will label the area as foreground (where the object will be present) and the rest of the area as background (where the object will not be present). The foreground class for every image will be forwarded to the next stage of our proposed algorithm, which is ROIAlign. We used ROIAlign instead of ROI Polling in this model because loss and misalignment problems are associated with ROI Polling. ROIAlign enables the preservation of spatial pixel-to-pixel alignment for each region of interest, with no information loss as there is no quantization. So, this will again contribute to enhancing the final result of the proposed model. Also, RPN makes all the Anchor Boxes (a set of pre-defined bounding boxes) the same size. Then it will enter the Binary Classifier. If there is any object, the process will move towards the regression step and generate a bounding box around the object detected by the classifier. Here ends the object detection process, and the final output of object detection is the generation of a bounding box around the region of interest. Figure 4 shows the Architecture Diagram of Proposed model which gives a detailed view of how the features are extracted and recombined to create a feature map and Classification, Bounding Box Regression and Mask Segmentation is performed simultaneosly.
204
A. M. Rakhi et al.
Fig. 3 Flow diagram of proposed model
3.2 Semantic Segmentation After completing object detection, the proposed model will move forward to generate a mask, which is the output of the semantic segmentation process. It is the process of taking an image and assigning a class to each pixel of the image. In Fig. 4, there are two classes. They are the ship (foreground) and the background. The pixels inside the outline created are assigned to the ship class, and the pixels outside that outline are assigned to the background class. Now, the key to semantic segmentation is that the computer automatically segments the images. But, in order to do so, we must first collect a number of images with ships as backdrops and then label where the ships and backgrounds are located. Then finally, we use that information to create
Enhanced Mask-RCNN for Ship Detection and Segmentation
205
Fig. 4 Architecture diagram of proposed Mask RCNN
a convolutional neural network. In the proposed model, more convolutional layers are used compared to the original model. While avoiding overfitting of data and improving the overall performance, these layers enable the model to generate a mask with better per-pixel segmentation. We train that network, and when we add a new image of a ship located in the background, our model will be able to classify it into two categories. One is a ship, and the other one is in the background (Fig. 5).
3.3 ROI Align ROI Align is a function used for extracting a smaller feature map for each Region of Interest in segmentation and detection based tasks. It facilitates the alignment of retrieved characteristics with the input (Fig. 6). It calculates the precise values inside the provided input features at four different sample positions in each bin using bilinear interpolation. As a result, the data is then compiled (using maximum value or the average of all the values).
206
A. M. Rakhi et al.
(a)
(b)
Fig. 5 a Image with derived bounding box, b mask
Fig. 6 ROI align. Source Mask-RCNN [6]
3.4 Loss Function Ltotal = Lcls + Lmask + Lbbox is the multi-task loss function for each Region of Interest during training. Lmask is specified solely for mask kth for a Region of Interest connected with the given ground-truth class k. We employed average binary crossentropy loss, as shown in Fig. 7, to obtain Lmask ’s loss value.
Fig. 7 Binary cross-entropy loss function. Source Cross-entropy [14]
Enhanced Mask-RCNN for Ship Detection and Segmentation
207
where yij represents label of a cell (i, j) in the ground truth mask for the provided region of size m × m; yˆikj is the predicted value from the model of the same cell in the mask learned for the ground-truth class k. It’s not the same as using FCNs for semantic segmentation, which employs per-pixel softmax and multinomial crossentropy loss suitably. For FCNs loss function, the masks across classes internally compete, which increases overall loss values, whereas in MaskRCNN, it doesn’t. This loss function helps us to provide us with good segmentation results in our mask prediction stage.
4 Results and Discussion 4.1 Dataset Used The dataset is being used from Airbus Ship Detection Challenge on Kaggle [8], which has 193,000 images for training and 15,600 images for testing purposes, having 768 * 768 resolution for each image, where the images may have one or more ships in it. A separate file containing run-length encoded data was provided to convert them into the mask for training the model with the provided dataset. The Overall positive outcome on prediction upon the test dataset will support the robustness and improved performance upon the previous existing State-Of-the-Art models.
4.2 Training the Model and Performance Evaluation The model is trained with 3 different learning rates of 0.006, 0.003, and 0.0015 to enhance the model’s performance while lowering false positives. The models were initially trained for 2 epochs with a learning rate of 0.006, resulting in a massive reduction of loss values but a heavy penalty on false positives. The models were then trained for a further 10 epochs with a learning rate of 0.003, while applying custom augmentation of high-quality ship images, which reduced the overall loss values. The models were further trained with 12 more epochs with a learning rate of 0.0015 while providing a plateauing result on the loss values but achieving a reduced number of false positives. All the above experiments were performed on the Nvidia Tesla P100 GPU. As The Ship Detection dataset was used from Airbus’ Ship Detection Challenge at Kaggle [8], we also compared several pre-trained models with similar datasets and found a surprising outcome. Table 1 shows the quantitative evaluation of our model’s performance, which achieved an accuracy (private score) of 0.82956. The Private Score can also be used as an accuracy metric for our approach as the final outcome is calculated by comparing our output of RLE .csv file, with the output file of the competition, which is a private dataset owned by Airbus and hasn’t been
208 Table 1 Performance evaluation of the proposed model
A. M. Rakhi et al. CNN architecture
Private score
Public score
Enhanced MaskRCNN model
0.82956
0.69006
Fig. 8 a Training loss versus validation loss, b training mask loss versus validation mask loss
revealed. Approximately 93% of the test data in the dataset is used to compute the Private Score, while the rest of 7% dataset images were provided by Airbus’s competition administrators. The Public Score is determined using about 7% of the test data from our dataset and the remaining 93% of the dataset images provided by Airbus’s competition administrators (Fig. 8). After training the enhanced model for 22 epochs and 0.003 learning rate, it is observed in Fig. 2 that in the first graph, we have plotted Total Loss values for Train Loss versus Validation Loss, and we measured the best value of 1.332292 around the 19th epoch. While in the second graph, we have plotted between Train Mask Loss versus Validation Mask Loss, where we observed a different value, i.e., lowest loss of 0.359136 in the 21st epoch. After considering the epoch with the lowest loss value, it is selected as the final model, which is further used to calculate the values of the Testing dataset and creates annotations, finally submitting the data in a CSV file in run-length encoded format required per the competition rules and requirements (Fig. 9, Table 2). Based on comparing it with other popular tested models, while observing that our proposed model outperformed them all, achieving a Private Score and Public Score of 0.82956 and 0.69006. The Kaggle competition winner achieved 0.85448, which assured us of our fruitful findings while improving the detection of even small ships while implementing the model based on our research findings.
Enhanced Mask-RCNN for Ship Detection and Segmentation
(a)
209
(b)
Fig. 9 a Input images, b output images with predicted mask and score
Table 2 Performance differentiation of the given proposed model with the existing models
Method
Private score
Public score
SSS-Net [15]
0.80
0.642
PspNet [16]
0.82136
0.68975
LinkNet [16]
0.79362
0.62758
Proposed model
0.82956
0.69006
5 Conclusion This was our first project developing CNN models, and we can confidently say that we have learned a lot after implementing the proposed Mask RCNN model. After training our model, we achieved an accuracy (private score) of 0.82956 while submitting the
210
A. M. Rakhi et al.
submission .csv file in the competition, and for the first time, we indeed felt that the score was way better. We saw a greater reduction of false positives in the final result. Along with this, the per-pixel segmentation of generated masks is better than the original Mask RCNN. The proposed model can also detect smaller ships efficiently compared to other pre-trained models but in the future, we will try to enhance the model’s efficiency to be more accurate and require less computational time and cost by implementing different types of object detection models.
References 1. K. Kamirul, W. Hasbi, P.R. Hakim, A.H. Syafrudin, Automatic Ship Recognition Chain on Satellite Multispectral Imagery (2020), 221918–22193 2. Y. Lei, X. Leng, K. Ji, Marine Ship Target Detection in SAR Image Based on Google Earth Engine (2021), 8574–8577 3. K. Zhao, Y. Zhou, X. Chen, A Dense Connection Based SAR Ship Detection Network (2020), 669–673 4. J. Cao, Y. You, Y. Ning, W. Zhou, Change Detection Network of Nearshore Ships for Multitemporal Optical Remote Sensing Images (2020), 2531–2534 5. U. Kanjir et al., Vessel Detection and Classification from Spaceborne Optical ˙Images: A Literature Survey (2018), 1–26 6. K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN (2017), 2980–2988 7. Z. Hong et al., Multi-scale Ship Detection from SAR and Optical Imagery Via a More Accurate YOLOv3 (2021), 6083–6101 8. Kaggle. Airbus Ship Detection Challenge [Internet]. Available at: https://www.kaggle.com/c/ airbus-ship-detection/data 9. X. Zhang et al., Multitask Learning for Ship Detection from Synthetic Aperture Radar Images (2021), 8048–8062 10. Y. Li, Z. Ding, C. Zhang, Y. Wang, J. Chen, SAR Ship Detection Based on Resnet and Transfer Learning (2019), 1188–1191 11. Y. Mao, Y. Yang, Z. Ma, M. Li, H. Su, J. Zhang, Efficient Low-Cost Ship Detection for SAR Imagery Based on Simplified U-Net (2020), 69742–69753 12. X. Nie, M. Duan, H. Ding, B. Hu, E.K. Wong, Attention Mask R-CNN for Ship Detection and Segmentation from Remote Sensing Images (2020), 9325–9334 13. Y. Yu, H. Ai, X. He, S. Yu, X. Zhong, M. Lu, Ship Detection in Optical Satellite Images Using Haar-like Features and Periphery-Cropped Neural Networks (2018), 71122–71131 14. Y. Ma, L. Qing, Q. Zhi-bai, Automated ˙Image Segmentation Using Improved PCNN Model Based on Cross-entropy (2004), pp. 743–746 15. Z. Huang, S. Sun, R. Li, Fast Single-Shot Ship Instance Segmentation Based on Polar Template Mask in Remote Sensing Images (2020), 1236–1239 16. D. Hordiiuk, I. Oliinyk, V. Hnatushenko, K. Maksymov, Semantic Segmentation for Ships Detection from Satellite Imagery (2019), 454–457
Data Scientist Job Change Prediction Using Machine Learning Classification Techniques Sameer A. Kyalkond , V. Manikanta Sanjay , H. Manoj Athreya , Sudhanva Suresh Aithal , Vishal Rajashekar , and B. H. Kushal
Abstract In this ever-expanding world, each and every industry is highly competitive. Recently, the Artificial Intelligence [AI] technologies are influencing every facet of data science domain. This technology driven-competitive world creates a lot of difficulties for people working in the data science sector by developing a notion of abandoning the data scientist profession. On the other hand, the human expectations aren’t matching reality, followed by No Clear Benchmarking in Salary Pay-Outs, Mapping a Data Scientist’s Role to Business Goals, and finally, there is a lack of upskilling for Data Science Professionals. These are the causes for the job change in the individuals and as the competition increases, there is always a substitute, henceforth the job security also falls in this competitive market. Different machine learning methods are used to figure out or to forecast the job change for the employees in the firm. This can be observed by the HR and give additional rewards for employee since the employee is an important asset to the company and its development. The synergy of deep learning approaches, machine learning and ensemble methods are utilized. This research work attempts to perform a comparison on various classification approaches such as logistic regression, random forest, KNN and support vector machine in order to demonstrate which method performs the best to offer an insight to the general community.
Supported by organization x. S. A. Kyalkond · V. Manikanta Sanjay (B) · H. Manoj Athreya · S. S. Aithal · B. H. Kushal JSS Academy of Technical Education, Bengaluru, India e-mail: [email protected] V. Rajashekar BNM Institute of Technology, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_17
211
212
S. A. Kyalkond et al.
1 Introduction The data scientist profession has always caught the interest of many graduates and individuals. A good salary does not guarantee that a data scientist is satisfied with his or her employer or that he or she wishes to work for the same firm for the rest of his or her career. When it comes to their careers, most data scientists make the decision to change employment, in matter of how much money they make or what their job is. If you’ve been a data scientist for a long time but were not getting anything else done beyond what was given to you at the start, it’s time to consider making a change. Years working on the same tasks would hone your abilities and expertise. You’d have a hard time learning more about the domain. In terms of money, there may be alternative areas or lines of work that might pay you more than your current position. The field is ever-growing and new and new technologies would come up in this growing world which would create endless possibilities for all the people who are looking for the job change The job change problem is real in the world where there are many other options available for the people to do in the growing world and there are a endless number of possibilities which can be created than the existing possibilities.
2 Related Work When we look at successful data products, we often find well-designed user interfaces with intelligence activities and, most significantly, a meaningful output that, at the very least, the consumers perceive as addressing a relevant issue. Now, if a data scientist spends all of their time learning how to create and operate machine learning algorithms, they can only be a tiny (but crucial) component of a team that leads to the success of a project that results in a valued product. This implies that data science teams operating in isolation will have a hard time producing value. Brooks [1] This is primarily owing to the increasing buzz surrounding data science and artificial intelligence (AI) (AI). Before reaching to a decision, data science efforts frequently entail a lot of experiments, trial and error techniques, and repetitions of the same process. The technique takes months and months to reach the desired outcome. Now after discussing causes and situations about shifting of employment let’s view the functioning of the procedure of the prediction analysis of it. So firstly we require datasets comprising associated data kinds for our issue such as City code, gender, appropriate experience, university name(Candidate’s), an education level(of employ), experience, business size, last job of the employee, Employ ID [2]. Multiple Imputation by Chained Equation is a multiple imputation approach that is frequently superior than single imputation methods such as mean imputation. Train Test Split, Data may be separated into train-validation(test) parts after missing values are imputed, and the model can be created using the training dataset. Before we begin, bear in mind that the data is very unequal, hence we must first balance
Data Scientist Job Change Prediction Using Machine …
213
it. KNN gives 81% accuracy followed by XGBoost which has 90% accuracy and is significantly greater than the classification approach, random forest has an accuracy of 75% and in the same range ANN has an accuracy of 77%. Finally, LGBM has an accuracy of 91% which has been calculated from [3].
3 Methodology 3.1 Data Preparation The dataset can be highly problematic in this kind of problem statement as the collection is pretty tough as the data is of employee and no organization authorizes the release of this kind of data, so the Hr analytics dataset has been which is released by IBM. It contains 19000 entries in it and around 14 characteristics and some of them are experience, education level, company size, company type etc. The dataset preview may be shown in Fig. 1 together with the feature list in Fig. 2.The feature list is the actual features that are there in the dataset and has been used in the model.
3.2 Data Preprocessing After evaluating each attributes of the provided dataset, we determined that there are 14 categorical features and1 numerical feature in the given dataset. Categorical features are those attributes whose explanations or values are picked from a defined set of alternatives. There are many mistakes in the present dataset as after performing the null test which reveals how many null values are there in the features the authors were able to identify numerous for some of the features which can be seen in Fig. 3. So to fix that, the label encoding was conducted by the authors to complete the
Fig. 1 Dataset overview
214
S. A. Kyalkond et al.
Fig. 2 Features list
values that were lacking to make the dataset suitable so that optimum output may be obtained. After that, a crucial step is to split the dataset into training and testing data so that we may train the model properly on the training data and then test the model to obtain the accuracy on the testing model.
3.3 Feature Visualisation There are many characteristics that need to be worked on since the data is rather huge and correct visualization will aid in attaining proper results at the conclusion of model training and testing. Having so many features causes some problem in determining which feature has to be given more weightage so for that one has to deal with feature importance part which is very essential as it shows one which are the independent variable are given importance and that chosen for a split and also gives us a better understanding of the model and also the data which is shown in Fig. 4. After the key characteristics are taken, a correlation between those features are done to know how to perform with each other, thus the authors conducted the correlation between the significant features (Fig. 5).
Data Scientist Job Change Prediction Using Machine …
Fig. 3 Features list
Fig. 4 Features importance
215
216
S. A. Kyalkond et al.
Fig. 5 Corelation matrix
3.4 Model Architecture The models that have been used by the author are different machine learning classification models. Starting with logistic regression, In Logistic Regression there are just 2 outcomes. For Example: True or False, 1 or 0. Logistic Regression employs one or more Independent Variables to predict a result. The K-Nearest Neighbour technique is often used to determine the closest predicted value from a given dataset. The algorithm typically offers the best result, although it takes some time. The Random Forest Method is a supervised learning algorithm that takes samples from many data sets and predicts the optimal answer. It is constructing a structure like a decision tree. It’s also more precise than the Decision Tree. However, it is far slower in prediction and more difficult in construction. The data sets in the Support Vector Machine technique are separated into various segments as support vectors by a Margin, and there is a space between them known as the Hyperplane. Finally, the accuracies of all the classification models that have been used have been taken out to get the dif-
Data Scientist Job Change Prediction Using Machine …
217
ference between the performance of each. The formula which has been followed by the authors in performing the model have been shown in below.
4 Experimental Results After attempting with all that the authors have done and executing all the models to achieve the accuracy after preprocessing the data appropriately which was a significant step to get decent results. The findings that the authors obtained are support vector machine with 86% accuracy, after it KNN received 87% accuracy, logistic regression with 88% and lastly the best of all the approaches utilized is random forest with 90% accuracy. Model name Support vector machine Logistic regression KNN Random forest
Accuracy (%) 86.2 88.1 86.9 89.5
5 Conclusion The approach utilized by the writers led them to get remarkable outcomes in the form of the precision and to reach the aim they intended. After acquiring all the accuracies of KNN, support vector machine, random forest and logistic regression, the authors concluded with a conclusion that random forest surpassed all the other classification techniques. This may be utilized by many businesses in building the algorithm to figure out the attrition of the data scientist in their firm, and also used in other sectors.
218
S. A. Kyalkond et al.
Future work The problem statement isn’t connected to one industry, there are many organizations that are operating in diverse sectors and have individuals from all industries, which offers a big future scope to work with the same model on multiple job profiles. Another thing which can be included is that the dataset may be gathered from different firms, and it would be a nice thing to test the model on it. Trying different ensemble models would be a really fantastic practise to test all the models that are there and merging it using the stacking approach.
References 1. 2. 3. 4. 5. 6.
7. 8.
9.
10. 11. 12.
13. 14.
15.
16.
J. Brooks, Why So Many Data Scientist Are Leaving Jobs A. Jain, 5 Key Reasons Why Data Scientists are Quitting Their Job P. Dandale, HR Analytics—Job Change of Data Scientists S. Yadav, A. Jain, D. Singh, Early prediction of employee attrition using data mining techniques, in 2018 IEEE 8th International Advance Computing Conference (IACC), IEEE (2018) S. Kakad, et al., Employee attrition prediction system. Int. J. Innov. Sci. Eng. Technol. 7(9), 7 (2020) G. Marvin, M. Jackson, M.G.R. Alam, A Machine Learning approach for employee retention prediction, in 2021 IEEE Region 10 Symposium (TENSYMP), pp. 1–8 (2021). https://doi.org/ 10.1109/TENSYMP52854.2021.9550921 P.K. Jain, M. Jain, R. Pamula, Explaining and predicting employees’ attrition: a machine learning approach. SN Appl. Sci. 2, 757 (2020). https://doi.org/10.1007/s42452-020-2519-4 A.C. Patro, S.A. Zaidi, A. Dixit, M. Dixit, A novel approach to improve employee retention using Machine Learning, in 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), pp. 680–684 (2021). https://doi.org/10.1109/ CSNT51715.2021.9509601 A. Mhatre, A. Mahalingam, M. Narayanan, A. Nair, S. Jaju, Predicting employee attrition along with identifying high risk employees using Big Data and Machine Learning, in2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), pp. 269–276 (2021). https://doi.org/10.1109/ICACCCN51052.2020.9362933 G. Seni, J. Elder, Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions . Morgan & Claypool (2010) S.M. Alhashmi, Towards understanding employee attrition using a Decision Tree approach, in 2019 International Conference on Digitization (ICD), IEEE (2019) G. Martínez-Muñoz, D. Hernández-Lobato, A. Suárez, An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intelli. 31(2), 245–259 (2009). https://doi.org/10.1109/TPAMI.2008.78 M. Pratt, M. Boudhane, S. Cakula, Employee attrition estimation using random forest algorithm. Baltic J. Modern Comput. 9(1), 49–66 (2021) V. Vijay Anand, R. Saravanasudhan, R. Vijesh, Employee attrition—a pragmatic study with reference to BPO industry, in IEEE-International Conference on Advances in Engineering, Science and Management (ICAESM -2012), pp. 769–775 (2012) S. Modi, M.H. Bohara, Facial emotion recognition using Convolution Neural Network, in 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), IEEE (2021) R.S. Shankar, J. Rajanikanth, V.V. Sivaramaraju, K.V.S.S.R. Murthy, Prediction of employee attrition using datamining, in 2018 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN), pp. 1–8 (2018). https://doi.org/10.1109/ICSCAN. 2018.8541242
Data Scientist Job Change Prediction Using Machine …
219
17. R. Joseph, et al., Employee attrition using Machine Learning and depression analysis, in 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), IEEE (2021) 18. V. Shah, S. Modi, Comparative analysis of psychometric prediction system, in Smart Technologies. Communication and Robotics (STCR), 1–5 (2021). https://doi.org/10.1109/STCR51658. 2021.9588950 19. A. Qutub, et al., Prediction of employee attrition using Machine Learning and ensemble methods. Int. J. Mach. Learn. Comput 11 (2021) 20. A. Patel, et al., Employee attrition predictive model using machine learning. Int. Res. J. Eng. Technol. (IRJET) 7(5) (2020) 21. A. Bashar, Survey on evolving deep learning neural network architectures. J. Artif. Intelli. 1(02), 73–82 (2019) 22. S.J. Manoharan, Study of variants of Extreme Learning Machine (ELM) brands and its performance measure on classification algorithm. J. Soft Comput. Paradigm (JSCP) 3(02), 83–95 (2021) 23. S.R. Mugunthan, T. Vijayakumar, Design of improved version of sigmoidal function with biases for classification task in ELM domain. J. Soft Comput. Paradigm (JSCP) 3(02), 70–82 (2021)
Error Correction Scheme with Decimal Matrix Code for SRAM Emulation TCAMs Sangetha Balne and T. Gowri
Abstract The physical address in Static Random-access memory (SRAM) emulated Ternary Content Addressable Memories (TCAMs) are less capacious, making it less memory efficient. Most important role played by TCAMs is observed in network routing. TCAM is a hardware device that increases the speed of packet forwarding. However, single bit or multiple bit errors may have an impact. In existing summary, using a three-dimensional parity rule-based search method, the memory is properly safeguarded, and its contents are safely searched by simulating it without damaging the stored bits. In proposed summary to protect memories a method based on decimal algorithm is used known as Decimal Matrix Code (DMC). A modified-DMC is developed in this research to improve memory dependability. The suggested approach generates check bits using Hamming codes as well as Decimal Matrix Code. With fewer bits, a greater number of faults can be detected and corrected by the suggested approach and requires less memory utilization as shown in table.
1 Introduction For high-speed data search capacity, Content Addressable Memory (CAM) is a specific gadget for some applications including ATM [1], correspondence organizations, Local area network spans/switches, data sets, query tables, and label catalogs. CAM [2] is a semiconductor utilized in network routers and switches to perform quick table lookups. The search operations done in SRAM are not used for high-speed search operations as SRAM search operation is slow [3], but as CAM is an associative memory it performs all SRAM operations quickly and precisely. It’s also important in network devices, picture processing, and other areas. In a parallel searching operation, Ternary Content Addressable Memories (TCAMs) [4–6] compares the data S. Balne Department of ECE, Samkruti College of Engineering & Technology, JNTU, Hyderabad, India e-mail: [email protected] T. Gowri (B) Department of EECE, GITAM Deemed to be University, Visakhapatnam, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_18
221
222
S. Balne and T. Gowri
to be searched with the existing input information data and provides the address of the data as the output; it also employs don’t care term (x) either ‘1’ or ‘0’. TCAM consumes a lot of energy to run. The storage of bits in TCAM is as follows the information cells store the binary values 0 and 1, whereas mask cells contain the don’t care states. Field Programmable Gate Arrays (FPGAs) are a versatile platform for implementing systems [7]. They give an immense measure of logic and memory components that can be redone to play out a particular task. As a result, they’re appealing for usance of connecting. They don’t, in any case, incorporate Content addressable memory or Ternary CAM blocks because FPGAs [8] are used in an assortment of uses other than systems administration. This isn’t an issue for double CAMs since they can be effectively impersonated with cuckoo hashing and Random-access memory locations for a minimal price. TCAMs are additionally reproduced using rationale and memory assets, however the overheads are significantly higher in this occasion making copying not serious for ASIC executions. A variety of approaches have been presented in the literature to replicate TCAMs in FPGAs [9, 10]. Some of them use FPGA flip-flops and circuitry to implement TCAM memory cells [11]. Because of the restricted versatility of this procedure as far as TCAM size, plans dependent on taking advantage of the FPGA’s incorporated SRAM memory are inclined toward and carried out by Field programmable gate array providers [12]. Exactly when Static RAM memories are used to complete a Ternary CAM, each TCAM cell has a huge number of bits. For example, shown that each single TCAM bit requires more than 55 bits of Xilinx FPGA Block RAMs (BRAMs). In the case of dispersed RAMs, each TCAM bit requires 6 bits. This indicates that a significant number of memory bits are consumed, which raises the likelihood of soft errors [13–15]. ECCs (Error Correction Code) can be employed to safeguard them, anyway as recently expressed, they add critical memory overhead. Protection can be implemented for Ternary CAMs that are imitated utilizing logical basics and flip-flops by utilizing triple secluded excess, which significantly increases the flip–flounders and adds casting a ballot rationale to cure blunders, bringing about a generous asset overhead. A novel memory architecture called a [16] resource-efficient SRAM-based TCAM (REST), which emulates TCAM functionality using optimal resources. The distinctiveness of the data contained in memories used to imitate TCAMs is utilized in this short to create an efficient error correcting mechanism. The recommended approach will correct most of single-piece blunders when recollections are secured with an equality spot to distinguish single-piece issues. Thus, the strategy is engaging for working on the trustworthiness of FPGA-based TCAM executions without bringing about critical asset overheads. An adjusted DMC approach dependent on the Decimal Algorithm and Hamming codes is proposed in this review. It changes the info information into images and coordinates them in a framework. Hamming Codes [17] are utilized to ascertain even repetitive pieces, XOR tasks are utilized to work out vertical excess pieces, and a decimal calculation is utilized to compute disorder bits in the proposed procedure. The proposed arrangement utilizes the encoder reuse procedure, which disposes of the prerequisite for a different unraveling circuit, hence reducing the space overhead
Error Correction Scheme with Decimal Matrix Code …
223
of additional circuits. To detect and repair errors, the suggested technique uses the Decimal Algorithm. This brief is coordinated as follows. Section 1 examine Introduction. Section 2 examines Error rectification and identification in SRAM imitated TCAM. Proposed conspire is introduced in Sect. 3. In Sect. 4 Results are examined. At last, the end is summed up in Sect. 5.
2 Conventional Method Error Correction Scheme Using SRAM Emulated TCAM: To identify singlebit errors, the approach to safeguard the memories used to impersonate the Ternary CAM incorporates for each word equality bit. When an error is found, the memory contents inherent redundancy is exploited to attempt to repair the issue. The parity protection implementation is presented in Fig. 1 [18], where ‘p’ denotes the parity bit. Whenever there is a mismatch between the stored parity and the recomputed parity bit, an error signal is issued in addition to the match signal. This is a typical equality check that can recognize all single-bit errors. Distinguishing the error on each entrance is critical to stay away from incorrect outcomes on search tasks. Suppose that a single-bit error occurred on a specific word, which has been identified by the parity check. When an error is detected, we can examine the contents of the memory to see if we can rectify the problem. A first endeavor could be to
Fig. 1 Parity protected error detection using Two SRAMs emulated TCAM
224
S. Balne and T. Gowri
read all of the words in the memory and count how often each standard shows up in each position. Let us refer to that number as the rule’s weight in that memory. For example, in Fig. 2 [18], r1 has a weight of 1, r2 has a weight of 2, and r3 has a weight of 4. For an 8-position memory, the weight of error free rule is 0, 1, 2, 4, and 8, which can assist us with distinguishing the incorrect bits. Let’s look at several examples of single-bit faults in Fig. 3 to learn more about the error correcting procedure. For example, e3 changes the weight of r3 on the leftmost memory from 4 to 3 when it is activated. Because 3 is not a valid value, we would find the erroneous bit in r3 and repair it after detecting the parity mistake. This method would work for rules with
Fig. 2 Single-bit errors on a parity-protected TCAM with six-bit keys and four replicated rules employing two SRAMs
Fig. 3 Proposed block diagram of SRM emulated TCAM
Error Correction Scheme with Decimal Matrix Code …
225
a weight greater than two, that is, rules with two or more “x” bits on the key bits corresponding to that memory. However, for regulations with a lower weight, simply checking the weight may not be enough. Consider a rule with a weight of two. The weight will be changed to three and the error will be fixed if an error occurs that transforms a zero to a one. This is less likely to happen because only two positions have a one. On account of a weight one rule, an error that changes one more bit to one would bring about a real weight of two. When glancing at e4, this is obviously visible. In such situation, one-esteemed r2 values relate to key qualities 000 and 011, which don’t compare to substantial principles. Just places that relate to key qualities that are one separation away from the first worth will be identified overall. An error in a weight one rule that sets the position that was one to zero, on the other hand, can be rectified by checking if the rule has zero weight on the other memory.
3 Proposed System The outputs of the Hamming block are provided as input to the SRAM emulated TCAM, where the error correction process completes and the final decoded output is obtained in the proposed method as shown in Fig. 3. Hamming Encoder Block: Information bits are found in cells numbered D0 through D7. This 8-bit word (N = 8) is broken down into two symbols (k = 2), each with four bits (m = 4). The cells with the numbers P1 to P3 are horizontal check bits, and the cells with the numbers V0 to V3 are vertical check bits as represented in Fig. 4. • For all the symbols in a single row, the horizontal superfluous bits are calculated. The Hamming formulae for calculating the horizontal check bits are represented by the equations below. • D3 D1 D0 P1 = D3 D1 D0 • P2 = D3 D2 D0 P3 = D3 D2 D1… and so on. • The vertical check bits are calculated using the calculations below. • D4 = V0 = D0 • V1 = D1 D5… and so on. The following is an example of SRAM mimicking TCAM. To reduce power usage, we split bits, as we did in TCAM. In this study, the bit partition and the rules developed to search for the key are 8 bits and 4 rules, as shown in Tables 1 and 2, Fig. 4 Logical representation of 8-bit modified-DMC
D3
D2
D1
D0
D7
D6
D5
D4
V3
V2
V1
V0
P3
P2
P1
226
S. Balne and T. Gowri
Table 1 Bit partition and the rules developed for R1 = 0000. xxxx and R2 = 000x. 0011 Address
R1(0000)
R1(xxxx)
Address
R2(000x)
R2(0011)
0000
1
1
0000
1
0
0001
0
1
0001
1
0
0010
0
1
0010
0
0
0011
0
1
0011
0
1
0100
0
1
0100
0
0
0101
0
1
0101
0
0
0110
0
1
0110
0
0
0111
0
1
0111
0
0
1000
0
1
1000
0
0
1001
0
1
1001
0
0
1010
0
1
1010
0
0
1011
0
1
1011
0
0
1100
0
1
1100
0
0
1101
0
1
1101
0
0
1110
0
1
1110
0
0
1111
0
1
1111
0
0
Table 2 Bit partition and the rules developed for R3 = 01xx.1010 and R4 = 1xxx.1000 Address
R3(01xx)
R3(1010)
Address
R4(1xxx)
R4(1000)
0000
0
0
0000
0
0
0001
0
0
0001
0
0
0010
0
0
0010
0
0
0011
0
0
0011
0
0
0100
1
0
0100
0
0
0101
1
0
0101
0
0
0110
1
0
0110
0
0
0111
1
0
0111
0
0
1000
0
0
1000
1
1
1001
0
0
1001
1
0
1010
0
1
1010
1
0
1011
0
0
1011
1
0
1100
0
0
1100
1
0
1101
0
0
1101
1
0
1110
0
0
1110
1
0
1111
0
0
1111
1
0
Error Correction Scheme with Decimal Matrix Code …
227
respectively. These 8 bits are partitioned into two blocks each as 4 bits. The address positions of memories are recognized with these partitioned bits as the power of 2. i.e.; 8 bits then.
Positions = 2ˆ4 = 16 positions in two SRAMs. • • • •
Rules for these two memories are four. Let us consider an example of 6 bits as 00001111. It is partitioned in two blocks 0000 and 1111. The positions in this are calculated as 2ˆ4 = 16 positions. Rules considered are 4 as we are using 4-bit SRAM. r1, r2, r3, r4. • R1 = 0000. xxxx. When there are no ‘x’ bits ‘1’ is placed in the memory. Since 3x bits all ones are placed in other memory. In the same way we do for R2, R3 and R4.
• On the furthest left memory reading1100, the primary position address 0000 is accessed, while the other memory reading 1100, the fourth position address 0011 is accessed. Only rules r1 and r2 would have a match after performing AND. • To identify single-bit faults, the suggested technique to secure the memory used to imitate the TCAM uses a per-word parity bit. When an error is found, the memory contents’ inherent redundancy is exploited to attempt to repair the issue. Figure 2 portrays the parity protection execution, where p signifies the parity bit. When there is a mismatch between the stored parity and the recomputed parity, an error signal is issued in addition to the match signal. This is a standard parity check that identifies all single-bit mistakes. • Great starting point could be reading the complete word in the memory and note how many times each rule appears in each position. Let us refer to that number as the rule’s weight in that memory. For an 8-position memory, the weight for an error-free rule can only be 0, 1, 2, 4, and 8, which can help us identify the incorrect bit. • First, we should compute the level of single-digit error designs that can be fixed for each weight in a 2b -position memory. • All patterns can be fixed if the weight is zero. • Weight one: this corresponds to 1-b/2b for everyone except those who set a bit to one for a place with an address at distance one. • Weight two: except for the two that set a place with a one to a zero, which corresponds to 1–2/2b , all patterns can be adjusted. • If a weight of four or more is present, then it can any design. Let’s look at several examples of single-bit faults in Fig. 5 to learn more about the error correcting procedure. For example, e3 changes the weight of r3 on the leftmost memory from 4 to 3 when it is activated. Because 3 is not a valid value, we would find
228
S. Balne and T. Gowri
Fig. 5 Single-bit faults on a parity-protected TCAM with 8-bit keys and four rules emulated with two SRAMs
the erroneous bit in r3 and repair it after detecting the parity mistake. This method would work for rules with a weight greater than two, that is, rules with two or more “x” bits on the key bits corresponding to that memory. Algorithm of Proposed System: As shown pseudo code of algorithm 1 which gives brief idea about how to write program. The cycle begins when an equality shortcoming is found when perusing something from a square memory. To fix the issue, we should initially design Fig. 5, the primary stage peruses every one of the areas in the square, and the segment loads are dictated by adding the ones found in every section. Then, at that point, in the subsequent stage, the section weight is checked in various cases to attempt to track down the mistaken segment Assuming this occurs, the mistaken piece of that segment in the word is the one that had the equality issue, and it is amended. This second piece of the calculation starts by deciding if any sections have an unlawful weight. As recently expressed, just the accompanying segment loads are legitimate: 0, 1, 2i for i = 1, 2, …, b. Thus, assuming a section has a load of three, it is the wrong one. Assuming that the wrong piece is found, it is fixed, and the system is finished. If not, we’ll check out the sections with no weight. Those oughts to compare to TCAM sections that aren’t utilized, and any remaining memory squares ought to have zero weight. Subsequently, we research assuming they have no weight on another square, then the issue has effectively been identified
Error Correction Scheme with Decimal Matrix Code …
229
and settled. We continue to the sections with one weight in the event that every one of the segments with zero weight additionally have no weight in another square. To do as such, we hope to check whether they have no weight on another square. So, the mistake happened in that section, which we will fix. If not, we continue on to the last stage, which includes really taking a look at the segments of weight two. The two locations of the two places that contain a one are XORed to achieve this. Assuming the outcome contains multiple, there was a mix-up in the section, which we fixed. Assuming it doesn’t occur, we’ve committed one of a handful of the errors that can’t be scattered. The whole interaction can be handily carried out in a delicate processor, which is accessible in numerous FPGA-based systems administration frameworks, to deal with the control capacities.
4 Results Evaluation of the proposed design is observed in Xilinx ISE 14.7 using the device Spartan 3E and is compared with existing once using the same device. The Simulation and synthesis using Spartan 3E for existing and proposed result are shown below. Simulation: (Existing)—As shown in Fig. 6, Parity protected error detection and correction with two SRAM’s, this is the simulation results of the existing design. Design Summary as shown in Table 3, shows the area occupied by the existing design. Timing Summary below shows the delay of the existing design.
Fig. 6 Parity protected error detection and correction with two SRAM’s
230 Table 3 Device utilization summary (estimated values for existing design)
S. Balne and T. Gowri Logic utilization
Used
Available
Utilization (%)
Number of slices
0
4656
0
Number of bonded IOBs
15
232
6
Simulation: (Proposed)—As shown in Fig. 7, the simulation of the Proposed design for Error detection and correction with Decimal Coded Matrix. Design Summary as shown in Table 4, shows the area occupied by the proposed design using modified decimal coded matrix. Timing Summary below shows the delay of the existing design.
Fig. 7 The simulation of the proposed design for error detection and correction with modified decimal coded matrix
Error Correction Scheme with Decimal Matrix Code … Table 4 Device utilization summary (estimated values for proposed design)
231
Logic utilization
Used
Available
Utilization (%)
Number of slices
1
4656
0
Number of 4 input LUTs
1
9312
0
Number of bonded IOBs
10
232
4
Number of GCLKs
1
24
4
So using above graphs, tables and design summary tells that the proposed design requires less memory utilization compared to exiting design.
5 Conclusion A technique for protecting Static RAMs used in emulating Ternary Content addressable memory on Field programmable gate arrays has been presented in the current overview. The methodology depends on the perception that not all qualities are reachable in specific SRAMs, implying that the memory contents have some inherent redundancy. Modified-DMC was proposed in this research to safeguard memory from radiation-induced mistakes. The proposed method detects and corrects big MCUs using a mix of Hamming coding and a decimal algorithm. In addition, the Encoder re-use technology minimized the area overhead of additional circuits. The results suggest that the proposed method is more trustworthy than existing methods in terms of data.
References 1. C.L. Chen, M.Y. Hsiao, Error-correcting codes for semiconductor memory applications: A state-of-the-art review. IBM J. Res. Develop. 28(2), 124–134 (1984) 2. K. Pagiamtzis, A. Sheikholeslami, Content-addressable memory (CAM) circuits and architectures: A tutorial and survey. IEEE J. Solid-State Circuits 41(3), 712–727 (2006) 3. F. Yu, R.H. Katz, T.V. Lakshman, Efficient multimatch packet classification and lookup with TCAM. IEEE Micro 25(1), 50–59 (2005) 4. P. Bosshart et al., Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN, in Proceedings of ACM SIGCOMM (2013), pp. 99–110
232
S. Balne and T. Gowri
5. I. Syafalni, T. Sasao, X. Wen, A method to detect bit flips in a soft-error resilient TCAM. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 37(6), 1185–1196 (2018) 6. S. Pontarelli, M. Ottavi, A. Evans, S. Wen, Error detection in ternary CAMs using Bloom filters, in Proceedings of Design, Automation & Test in Europe Conference & Exhibition (2013), pp. 1474–1479 7. M. Irfan, Z. Ullah, G-AETCAM: Gate-based area-efficient ternary content-addressable memory on FPGA. IEEE Access 5, 20785–20790 (2017) 8. W. Jiang, Scalable ternary content addressable memory implementation using FPGAs, in Proceedings of ACM ANCS, San Jose, CA, USA (2013), pp. 71–82 9. Z. Ullah, M.K. Jaiswal, R.C.C. Cheung, E-TCAM: An efficient SRAM-based architecture for TCAM. Circuits Syst. Signal Process. 33(10), 3123–3144 (2014) 10. A. Ahmed, K. Park, S. Baeg, Resource-efficient SRAM-based ternary content addressable memory. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 25(4), 1583–1587 (2017) 11. I. Ullah, Z. Ullah, J.-A. Lee, Efficient TCAM design based on multipumping-enabled multiported SRAM on FPGA. IEEE Access 6, 19940–19947 (2018) 12. Ternary Content Addressable Memory (TCAM) Search IP for SDNet: Smart CORE IP Product Guide, PG190 (v1.0), Xilinx, San Jose, CA, USA, Nov (2017) 13. J.R. Dinesh, C. Ganesh Babu, V.R. Balaji, K. Priyadharsini, S.P. Karth, Performance investigation of various SRAM cells for IoT based wearable biomedical devices, in Inventive Communication and Computational Technologies (2021), pp. 573–588 14. V. Gherman, M. Cartron, Soft-error protection of TCAMs based on ECCs and asymmetric SRAM cells. Electron. Lett. 50(24), 1823–1824 (2014) 15. R.C. Baumann, Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans. Device Mater. Reliab. 5(3), 301–316 (2005) 16. A. Ahmed, K. Park, S. Baeg, Resource-efficient SRAM-based ternary content addressable memory. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 1–5 (2016) 17. A.D. Houghton, The Engineer’s Error Coding Handbook (Chapman and Hall, London, U.K., 1997) 18. P. Reviriego, S. Pontarelli, A. Ullah, Error Detection and Correction in SRAM Emulated TCAMs (2018)
Knowledge Discovery in Web Usage Patterns Using Pageviews and Data Mining Association Rule G. Vijaiprabhu, B. Arivazhagan, and N. Shunmuganathan
Abstract Web browsing and navigation behavior of web users are assessed with web usage patterns. The browsing details that are stored in weblogs consists of the fields namely IP Address, Session, URL, User agent, Status code and Bytes transferred. The three categories of Web mining are Usage mining, Content mining and. Structure mining which are used to analyze usage patterns, content searching and locates the link structure, navigation pattern of the web within the user respectively. The primary data sources for web analysis are server logs, access logs and application server logs. The two most used data abstraction are page view and session abstraction. This work creates different types of page view abstraction and analyzes the web patterns to find the navigation behavior using data mining technique FP-Growth algorithm that generates itemsets and rules with the user accessed pattern. Frequent sets are generated and then association rules are formed with minimum support, confidence. Variant user minsupport values are applied with the dataset to analyze the number of frequent itemsets and association rules for each support value. The proposed analysis creates a multi view of web analysis with variant pageviews and improves the existing FP-Growth algorithm by producing more frequent sets and best rules. This leads to personalize the content of the site as per user needs. This research work is implemented and the result is visualized in Rapid miner tool.
1 Introduction Web mining [1] is an emerging research area due to vast amount of World Wide Web (WWW) services in present years. User browsed details are stored in Weblogs and it is difficult to identify those pieces of information which are relevant to the user as the stored information are large in nature. Recent research addresses this problem of web data by applying data mining techniques on it. The technique includes classification, clustering and association mining and each has different research area. Among them, Mining frequent patterns [2] in transaction databases are emerging popularly with G. Vijaiprabhu (B) · B. Arivazhagan · N. Shunmuganathan Department of Computer Science, Erode Arts and Science College (Autonomous), Erode, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_19
233
234
G. Vijaiprabhu et al.
data mining techniques. These patterns are identified with frequent item sets which are generated with support and frequency measure with Apriori and FP-Growth algorithm. Frequent item sets are the subsets of frequently occurred collection of sets of items. These are then used to generate association rules. The main aim is to analyze hyperlink structure and content of page and usage data.
1.1 Web Mining Categories Web mining can be broadly divided into three [3] distinct sub categories namely Content Mining, Usage Mining and Structure Mining. Usage Mining Analyze patterns and discover new sequence [4] of transactions from click streams and other user interactions logs. It exercises data mining techniques to server logs or tracking histories for the content discovery. The steps in Web Usage Mining are includes Preprocessing, Knowledge discovery and Pattern analysis. It materializes web activities into an important procedure for making more user friendly with web services. The need for analysis of usage patterns grows continuously as the complexity of interaction between users and web applications increases. Content Mining It is a process of extracting useful information from the content [5] of a web files. The content of webpage includes facts and Figs rooted in a webpage. Text mining is most important area in content mining. It extracts useful knowledge from Web page contents. Generally, Clustering of web pages can be automatically done according to their topics. Patterns also discovered in Web pages for extracting useful data such as details about products reviews of customer to analyze customer sentiments. Structure Mining It concentrates on hyperlink structure of the website applications and web pages. The documents on web are linked in some way. The existence of these hyperlinks resembles the method of web site design and the design should be user friendly. The Structure mining operation aims in locating and modeling the link structure of the web. Finding those linking structure in web documents is a biggest challenge of Web Structure Mining.
1.2 Weblog Log files list the actions that have been occurred [6] between the server and the user. These files reside in the web server. The server delivers the web pages of user needs and it stores all files necessary to display. The browser requests the page from the
Knowledge Discovery in Web Usage Patterns …
235
Web server, and using HTTP the server delivers the needed page. Then the browser converts the files and forms and present it in viewable format. The server can also allow multiple clients to view the same page simultaneously. Fields in Weblog The weblog file resides in Web servers that contain each activity of the user who access the web server through browser, Web proxy servers which is an intermediate server that exist between the client and the server. Each browsed information is stored in a separate log file in the proxy server and Client browsers that actually reside in the client’s browser window and for this purpose special types of software is loaded in their browser window. Content refers to the format of request and reply between the server and the browser. It maintains different types of information and the basic includes, • User Name—The identification of the user. It is mostly the IP address of the computer which is assigned by the Internet Service Provider. It is a unique address of each computer. But the user can be identified only with some websites which require user name and password otherwise only the IP address is stored as User ID. • Visiting URL—Identifies and stores the path used by the user to get the web sites. A link may be get directly or it may be reached with another link. All the user viewed details are stored in the path. • Time Stamp—The Session (Time) of the user spend to visit the web page. The session may be fixed as 30 min and if the user continues his/her visit in the same webpage it may be considered as next session from the same user. • User Agent—The name of the browser from which the user sends the request to the server. It describes the type and the version of the browser being used. • Request Type—The type of information being transferred and it is generally have ‘GET’ for request and ‘POST’ for reply. • Status Code—The success or failure of the webpage transferred. It is coded in number series. File Format The File types comprise of [7] Transfer Log, Agent log, Error Log and Referrer Log. The most usual method of storing file is Common Log Format (CLF) in Agent Log File format. Example The log file entries produced in CLF is as follows, 127.0.0.1—xxxx [10/Oct/2000:13:55:36 -0700] “GET /apache_pb.gif HTTP/1.0” 200, 2326. where, 127.0.0.1 is the IP Address, xxxx is the user id of the user, [10/Oct/2000:13:55:36 -0700] is the session of the login time. GET /apache_pb.gif HTTP/1.0 gives three
236
G. Vijaiprabhu et al.
types of details in which Get implies the request of user, apache_pb.gif is the requested information, HTTP/1.0 is the protocol used by the client. 200 represent the status code in which 200 series states successful response, 300 for redirection, 400 for error, 500 for error in the server and 2326 is the total bytes transferred in that session.
2 Related Works Gupta et al. [8] compared two frequent sets algorithm Apriori and FP-Growth for the analysis web data in World Wide Web. The work finds out most relevant information accessed by the user with the browsing history. This analysis involves data mining techniques to find relevant patterns. The paper concentrates on web personalization, system improvement, security, designing and learning environment. First it analyze FP-growth algorithm with four modules namely Preprocessing module, Generation of FP-Tree, Creation of association rule and Result set generator. Secondly, it analyzes Apriori algorithm which generates candidates sets with multiple scans. Then it compares the features of both algorithms and found FP-Growth has no candidate sets and so it doesn’t requires high memory utilization also it takes just two pass for scanning but Apriori needs multiple passes over the data by doing so it confirms FP-Tree takes less processing time. Dharmarajan et al. [9] develop an algorithm to classify user navigation behavior by creating frequent item sets with Apriori and FP Growth algorithm to enhance web design and web personalization. Rapid miner tool is used to assess the performance of both algorithms. The item sets increase with number of instances. Hence, this work analyzes the performance with number of instances with processing time for Apriori and FP-growth algorithms. Apriori and FP generates frequent item sets and from that sets, best association rules are formed with support count. In preprocessing, User identification is done to get unique users in each session. Then Log Parser is used to change unstructured data to structured one. This parser supports all types of Log files. Apriori algorithm produces candidate sets which require more memory space and time. FP Growth directly produces item sets. With the weblog dataset in the result it is found that FP-Growth performs better than Apriori with less processing time. Kamalakannan et al. [10] presents an analysis of Association rule mining with Apriori and FP Growth to discover the successive search of clients engaged with e-shopping. E-commerce generates huge amount of transactional data. Data mining may expose trends and determine patterns from these data that may lead to the high success rate. Apriori undergo from severe drawbacks like extensive I/O scans for the database, high cost of computations essential for generating frequent item sets. These drawbacks make this algorithms impractical in case of extremely huge databases. Other tree based algorithms like FP growth depend deeply on the memory size. Apriori use candidate set while FP use tree structure to analyze frequent sets. These both algorithms are compared with a sample E-commerce dataset with transaction. From the results, it is concluded, FP is slightly better than Apriori.
Knowledge Discovery in Web Usage Patterns …
237
Manasa et al. [11] integrates two frequent set generator Apriori and FP-growth algorithm to analyze the structure of the website. Apriori use breath first search with hash tree to generate patterns. The limitation is it produces large number of candidate sets which requires additional memory space and time. So scanning of candidate set is a tedious job as it is large in number. But FP-Tree uses support count without candidate set to produce frequent items. The proposed IAFP (Integration of Apriori and FP-Growth) method integrates both algorithms to make hybrid of frequent set generator in order to overcome the limitations of both algorithms alone. The work is implemented in Eclipse. This hybrid approach able to handle large dataset and produce association rule efficiently than it produces alone. Gupta et al. [12] proposed an improved FP Tree algorithm for generating item sets with frequent access patterns from the access path of the user in Weblog data. The improved part contains backward tree traversal which takes processing time than the existing algorithms. Weblog data with common log file format is taken for analysis. Generally, FP tree avoids candidate generation to have frequent sets. The node of FP tree has frequent items with higher frequency followed by lower frequency items in nodes. The algorithm has recursive method to analyze frequent patterns. The proposed method maintain frequent pattern tree to store information about patterns and it is sorted according to the page count leads to less processing time. Sample dataset is taken for analysis with support as a metric. Processing time in seconds is noted for each support value from 0.1 to 0.8. The results shows the proposed improved FP-Tree growth algorithm tales less processing time than existing also the time is reduces as the support value increases. Serin et al. [13] put forward a clustering based association rules to mine frequently browsing pattern during surfing which aids in improving the advertising process and web portal Management. This work aims to bridge the gap of Knowledge Mining in the area of Navigation behavioral pattern. Preprocessing was done with data cleaning, user and session identification. Then Fuzzy algorithm was used to cluster the users with similar surfing pattern in which a user belongs to two or more clusters. The clustered data was applied with Apriori algorithm that discovers interesting characterized relationship even in large databases. To get more accurate user behavioral pattern, frequent itemsets and rules were formed. These rules specify the user’s browsing behavior and also predict the next visiting page. Kaur et al. [14] discussed about some data mining technique in revealing user patterns in web data among students. The target of this research is to decide mining approaches for the utilization of web research. These algorithms are surveyed and learned in order to find the best one in getting the patterns regarding the interest of the students according to the frequently visited sites by the students. From the analysis, it was learnt that Apriori algorithm is easy to implement but consumes large memory space due to large candidate sets. Though K-means clustering is faster than both algorithms and easy to implement, it is tough find number of k. FP-Growth scanned the database twice but faster than the Apriori algorithm and consumes less memory space. This work aids in fascinating patterns from the web information to facilitate the web based analysis.
238
G. Vijaiprabhu et al.
3 Methodology Proposed Work—Association Rule Mining Using FP-Growth with User Pageview Matrix (AR-FPGUPM) The proposed work generates association rules (AR) from user page views (UPM) instead of generating from actual weblog dataset. Hence, this work initially creates pageview matrix from weblog dataset and then applies Frequent Pattern (FP)-Growth algorithm to generate itemsets and then rules. Finally, compares the existing AR-FPG i.e. Association rule mining using FP-Growth implementation with weblog dataset and the proposed FP-Growth implementation AR-FPGUPM with pageview matrix in result and discussion section.
3.1 Pageviews Pageviews are a meaningful entities, in where mining tasks can be applied in order to view each communication. Conceptually it is written as, v = pv1 , w pv1 , , pv2 , w pv2 , , . . . , pvn , w plv ,
(1)
Generally, preprocessed data results in a set of page views, PV = {p1 , p2 , …, pn } and a set of user visits V = {v1 , v2 , …, vm } where each vi in V is a subset of P. These page views are the semantically meaningful entities in which each one corresponds to a communication. In Eq. (1), each piv = p j for some j in {1, 2, 3, …, n} and w( plv ) is the associated weight with page view piv and v represents the communication. Each page view can be written as a z-dimensional feature vector in which n is the total number of extracted features from the site in a global dictionary. In Eq. (2), the series fwp (f i ) is the weight of ith feature in pageview. pageview = (fwp (f1 ), fwp (f2 ), . . . , fwp (fz ))
(2)
Types of Pageview Matrix All user communication can be viewed as an a x b Userpageview matrix which is also denoted by UPM. User Pageview Matrix An example of a user matrix is depicted in Table 1, in which the weights for each pageview is the number of time that a particular user spent on the URL. It may be a composite or aggregate values. These pageview represents a collection or sequence of pages and not a single one. It is a matrix composed of users (User 1, User 2, …) and their pages (A.html, B.html, …).
Knowledge Discovery in Web Usage Patterns …
239
Table 1 User pageview matrix A.html
B.html
C.html
D.html
E.html
User 1
1
0
1
0
1
User 2
1
1
0
0
1
User 3
0
1
1
1
0
User 4
1
0
1
1
1
User 5
1
1
0
0
1
User 6
1
0
1
1
1
Term Pageview Matrix An example of a user term pageview matrix is depicted in Table 2 that shows a matrix of user sessions as well as a document term for the corresponding site. In Table 2, web, data mining, businesses…, are the search terms and A.html, B.html…, are the web pages received as per the search term and hence this is known as term pageview matrix. Content—Enhanced Transaction Matrix It is derived by multiplying the user-pageview matrix with the transpose of the termpageview matrix as depicted in Table 3. From the resulting matrix, it is revealed that users 4 and 6 are more interested in Web, while user 3 is more interested in mining. Various data mining tasks can now be performed on this matrix. For example, clustering the matrix may reveal segments of users that have common interests, also association rule may be generated with a certain degree of confidence. Therefore, it provides a better understanding of the underlying relationships among communications. Table 2 Term pageview matrix A.html
B.html
C.html
D.html
E.html
Web
0
0
1
1
1
Data mining
0
1
1
1
0
Business
1
1
0
0
0
Intelligence
1
1
0
0
1
Marketing
1
1
0
0
1
Ecommerce
0
1
1
0
0
Search
1
0
1
0
0
Information
1
0
1
1
1
Retrieval
1
0
1
1
1
240
G. Vijaiprabhu et al.
Table 3 Content-enhanced transaction matrix Web
Mining
Business
Intelligence
Marketing
User 1
2
1
1
2
2
User 2
1
1
2
3
3
User 3
2
3
1
1
1
User 4
3
2
1
2
2
User 5
1
1
2
3
3
User 6
3
2
1
2
2
3.2 Frequent Pattern-Growth Algorithm (FP-Growth) It finds the most frequent and relevant patterns https://en.wikipedia.org/wiki/Pat tern [15] in large set of communications. It uses a prefix-tree structure for storing compressed and crucial informations about frequent patterns which is known as FP-tree. The working principle is as follows, 1. 2. 3.
The input database is compressed into FP-tree in order to represent frequent sets. Compressed database is transformed into a set of conditional databases where each instance is associated with one frequent pattern. Conditional data is mined individually.
FP Tree FP-Tree has the complete informations about frequent patterns. The structure has the following, • Root Node will be labeled as null always and have frequent Item Header table and Item-Prefix. • Item–Prefix has three fields. – Item name—registers the item which are represented by node. – Count—increases for each number of transaction by each node. – Node link—points the next node if there any or to the null if there is none. • Item Header has two fields. – Item name—represents the node itself. – Head of node link represents the first node in the Tree. Figure 1, show FP-Tree for limited dataset. In transaction/communication database, the items with frequent pattern are listed with ID (TID). In the Header table, the number of transaction for each item is listed and based on these two tables, Tree is created with root node and children, the root node are always pointed as null.
Knowledge Discovery in Web Usage Patterns …
241
Fig. 1 FP-Tree
Conditional FP-Tree It is a sub-database consisting of prefix paths. The lowest node is used to construct a Conditional FP Tree. The itemsets that usually meets the threshold support are considered and included in the Conditional FP Tree.
3.3 Association Rule Mining with Frequent Pattern Growth (FP-Growth) A rule based method [16] to discover interesting patterns capable to deal with small to large database using some interestingness measure such as support. The rules are formed on the basis of frequent sets. These frequent sets are formed with FP-Growth algorithm. The analysis is now extended for applications such as web mining, sequence mining, bioinformatics etc. The rules should satisfy user-specified confidence value. The process comprise of two steps, 1.
Minimum support value is applied to have all frequent item sets. Support—The frequency of pages with all user and other items. Support(C, D) = (Page visit that contains both C and D)/(Total number of page visits)
2.
(3)
Minimum confidence value is applied on the frequent item sets to form association rules. Confidence—The likeliness occurrence of consequent with antecedent.
242
G. Vijaiprabhu et al.
Confidence(C, D) = (Page visit that contains both C and D)/(Page visit that contains C)
(4)
The association rule is defined as, Let D = {d1 , d2 , …, dn } be a set of n attributes in a database and N = { n1 , n2 , …, nm } be a set of transactions with unique ID and it is a subset of items in C. Then rules (A, C) is formed as, A => C, where A, C ⊆ D
(5)
In Eq. (5), A is called as ‘antecedent or premises’ and C is called as ‘consequent or conclusions’ and both are subset of itemset D. Algorithm AR-FPGUPM Step 1: Creation of User pageview Matrix (UPM) as in Table 1 Step 2: Frequent item set generation Step 2.1: Scan the dataset and find support for each item Step 2.2: Build a data structure FP-Tree with a root node pointed as null Step 2.3: Scan the database again, keep the max count in top, and then lower count i.e. sort the frequent itemsets in decreasing order Step 2.4: Two same itemset in different branch share a common prefix to the root and the count of itemset is incremented each time a item is included Step 2.5: Frequent itemsets are extracted from FP-Tree Step 3: Frequent itemset are mined from FP-Tree by traversing the path in FP-Tree in conditional pattern base (lowest node is examined first) Step 4: Construct a conditional FP-Tree with the itemsets that meet the threshold support value Step 5: Frequent patterns are then generated from conditional FP-Tree Step 6: Association rules are formed with frequent itemsets using confidence Step 7: Use measures of interestingness support, confidence to rank rules
Advantages AR-FPGUPM • The data table generated by UPM summarizes the whole communication within the webpage and so FP-tree is created in a better way than actual with the actual dataset. • Frequent itemsets are generated more in number using UPM than in actual dataset. • Forms more association rules as the frequent itemsets has a significant role in producing rules.
Knowledge Discovery in Web Usage Patterns … Table 4 Dataset description
243
Attribute name Description User ID
IP address
Login time
Time and date of session
URL
Website address
Status code
Response type between server and the browser
4 Result and Discussion 4.1 Dataset Description The dataset is taken from Kaggle repository which is a real one designed for a university. It consists of three months records with the attributes User ID (IP), Login Time, URL and Status code that are explained in Table 4.
4.2 Pageview Matrix The dataset taken for analysis consists of the attributes IP, Time, URL, Status. User Pageview Matrix (UPM) is created as shown in Fig. 2, in which UPM with a single attribute IP is created and in Fig. 3. UPM with two attributes IP, Status is created. The entries in number depict weights of the page (URL) which represents the number of
Fig. 2 User pageview matrix (UPM) with IP (user) attribute
Fig. 3 User pageview matrix (UPM) IP (user), status attribute
244
G. Vijaiprabhu et al.
Fig. 4 Association rules
views for the particular URL by each User. This matrix forms consolidated entries of the whole page views. Both matrices are created using pivot table in rapidminer tool with the functions aggregate and grouping operators.
4.3 Association Rules In Fig. 4, the entry left to the arrow is the premises and the right side implies the conclusion. Each rule is assigned a confidence value.
4.4 Performance Analysis The results are assessed with number of frequent itemsets and association rules for each minimum support threshold value for existing Association rule mining using Frequent Pattern-Growth (AR-FPG) and proposed Association rule mining using Frequent Pattern-Growth with User Pageview Matrix (AR-FPGUPM). Frequency itemset are generated based on the user specified min support count. The itemsets are the base for generating association rules. Table 5, list the number of itemsets and association rules formed for variant support count value starts from 0.1 to 1.0 with confidence value 0.5 for existing AR-FPG, proposed AR-FPGUPM. From Table 5, it is noted that the existing AR-FPG with actual weblog data generates less number of itemsets and association rules. Also, no rules are formed for minsupport value 0.9 and 0.10 as there are no enough itemsets to create rules. Because the actual weblog entries don’t provide enough knowledge to produce frequent itemsets. The existing method generates same number of itemsets and rules for the minsupport values 0.4, 0.5 and 0.6. However, the proposed AR-FPGUPM with page view matrix generates more and variant number of itemsets and rules. Also, rules are formed for all minsupport values. This is because of the additional information with the User Pageview Matrix. The matrix gives a clear consolidated information and weight of each user page visits with URL, Status of the page. In Fig. 5, for variant minsupport values starts from 0.1 to 1.0 existing AR-FPG produce less itemsets than proposed AR-FPGUPM. This shows the aggregation of
Knowledge Discovery in Web Usage Patterns …
245
Table 5 Number of frequent items, association rule for variant minsupport value with confidence value 0.5 Min support value
Number of frequent items
Number of frequent items
Number of association rule
Number of association rule
AR-FPG
AR-FPGUPM
AR-FPG
A-FPGUPM
0.1
56
96
41
127
0.2
22
45
10
62
0.3
18
34
8
44
0.4
14
31
6
44
0.5
14
28
6
37
0.6
14
25
6
24
0.7
10
18
3
17
0.8
7
13
2
11
0.9
4
9
No rule
5
1.0
4
7
No rule
3
120 100 80 60 40 20 0
1
2 AR-FPG
3
4
5
6
AR-FPGUPM
7
8
9
10
1.2 1 0.8 0.6 0.4 0.2 0
Minsupport values
Itemsets in number
Number of Frequent itemsets for variant minsupport values
Minsupport value
Fig. 5 Number of frequent itemsets for AR-FPG and AR-FPGUPM with variant minsupport values
details created by the proposed method with UPM gives better results in producing frequent sets. Association rules are formed with frequent sets not with the actual dataset in the proposed work. In Fig. 6, no association rules are formed for minsupport value 0.9 and 0.10 by the existing method AR-FPG but the proposed method AR-FPGUPM produce association rules as the frequent itemsets are high in number for the proposed one.
G. Vijaiprabhu et al.
Number of Association rules for variant minsupport values 150 125 100 75 50 25 0
1
2
3
AR-FPG
4
5
6
A-FPGUPM
7
8
9
10
1.2 1 0.8 0.6 0.4 0.2 0
Minsupport Values
Association rules in number
246
Minsupport value
Fig. 6 Number of association rule for AR-FPG and AR-FPGUPM with variant minsupport values
5 Conclusion Web usage is analyzed to have the browsing history of the user. The association between the links shows the interestingness of the user. This analyze will help to design the web page according to the user. The existing work AR-FPG is analyzed using data mining technique Association rule mining and FP-Growth algorithm with weblog dataset. The proposed work AR-FPGUPM work transform the actual weblog dataset to User Pageview Matrix (UPM) in which data are aggregated. This matrix gives compact details and with this matrix more frequent itemsets and best association rules are formed. The analysis is tried for variant minsupport values in Rapid miner tool and it is concluded for certain minsupport value the existing method couldn’t produce rules as the frequent itemsets are less. Hence, it is proved that the proposed AR-FPGUPM forms more association rule than the existing ones. In Future, this work can be analyzed for variant type of weblog dataset. Term Pageview matrix and Content enhanced transaction matrix can be created and be applied with FP-Growth. Acknowledgements We would like to express our special thanks to our management as well as our principal for providing us the excellent opportunity, platform, and support to do this research work.
References 1. G.R. Bharamagoudar, S.G. Totad, P.V.G.D. Prasad Reddy, Literature survey on web mining. IOSR J. Comput. Eng. 5(4), 31–36 (2012).https://doi.org/10.9790/0661-0543136 2. C.-H. Chee, J. Jaafar, I.A. Aziz, M.H. Hasan, W. Yeoh, Algorithms for frequent itemset mining: a literature review (2018). https://doi.org/10.1007/s10462-018-9629-z
Knowledge Discovery in Web Usage Patterns …
247
3. K. Dharmarajan, D. Dorairangaswamy,Current literature review—web mining. Elysium J. Eng. Res. Manage. 1(1), 38–42 (2014). https://doi.org/10.1109/ICACA.2016.7887945 4. A.D. Kasliwal, G.S. Katkar. Web usage mining for predicting user access behaviour. Int. J. Comput. Sci. Inf. Technol. 6(1), 201–204 (2015). http://ijcsit.com/docs/Volume%206/vol6is sue01/ijcsit2015060145.pdf 5. S. Asadianfam, M. Mohammadi, Identify navigational patterns of web users. Int. J. Comput. Aided Technol. (IJCAx) 1 (2014). http://airccse.org/journal/ijcax/papers/1114ijcax01.pdf 6. M.J.H. Mughal, Data mining: Web data mining techniques, tools, and algorithms: An overview. Int. J. Adv. Comput. Sci. Appl. 9(6) (2018). https://doi.org/10.14569/IJACSA.2018.090630 7. P. Ristoski, C. Bizer, H. Paulheim, Mining the web of linked data with rapid miner. Web Semant. Sci. Serv. Agents World Wide Web 35, 142–151 (2015). https://doi.org/10.1016/j.web sem.2015.06.004 8. A. Gupta, M. Atawnia, R. Wadhwa, S. Mahar, V. Rohilla, Comparative analysis of web usage mining. Int. J. Adv. Res. Comput. Commun. Eng. 6(4) (2017). https://doi.org/10.17148/IJA RCCE.2017.6461 9. K. Dharmarajan, M.A. Dorairangaswamy, Analysis of FP-growth and Apriori algorithms on pattern discovery from weblog data, in IEEE International Conference on Advances in Computer Applications (ICACA), (2016). https://doi.org/10.1109/ICACA.2016.7887945 10. R. Kamalakannan, G. Preethi, A survey of an enhanced algorithm to discover the frequent itemset for association rule mining in e-commerce.Int. J. Creative Res. Thoughts (IJCRT) 8(11) (2020). https://ijcrt.org/papers/IJCRT2011049.pdf 11. G. Manasa, K. Varsha, IAFP: Integration of Apriori and FP-growth techniques to personalize data in web mining. Int. J. Sci. Res. Publ. 5(7) (2015). http://www.ijsrp.org/research-paper0715/ijsrp-p4379.pdf 12. P. Gupta, S. Mishra, Improved FP tree algorithm with customized web log preprocessing. Int. J. Comput. Sci. Technol. 3 (2011). http://www.ijcst.com/vol23/1/prateek.pdf 13. J. Serin, R. Lawrance, Clustering based association rule mining to discover user behavioural pattern in web log mining. Int. J. Pure Appl. Math. 119(17) (2018) https://acadpubl.eu/hub/ 2018-119-17/2/159.pdf 14. A. Kaur, R. Maini, Analysis of web usage mining techniques to predict the user behavior from web server log files. Int. J. Adv. Res. Comput. Sci. 8(5) (2017). http://www.ijarcs.info/index. php/Ijarcs/article/view/3655 15. M. Dimitrijevic, T. Krunic, Association rules for improving website effectiveness: Case analysis. Online J. Appl. Knowl. Manage. 1(2) (2013). http://www.iiakm.org/ojakm/articles/2013/ volume1_2/OJAKM_Volume1_2pp56-63.pdf 16. S. Aggarwal, V. Singal, A survey on frequent pattern mining algorithms. Int. J. Eng. Res. Technol. 3(4), 2606–2608 (2014). https://www.ijert.org/research/a-survey-on-frequentpattern-mining-algorithms-IJERTV3IS042211.pdf
Video Anomaly Detection Using Optimization Based Deep Learning Baliram Sambhaji Gayal and Sandip Raosaheb Patil
Abstract Background: Excellence in the growing technologies enables innovative techniques to ensure the privacy and security of individuals. Manual detection of anomalies through monitoring is time-consuming and inefficient most of the time; hence automatic identification of anomalous events is necessary to cope with modern technology. Purpose: To enhance the security in public places as well as in the dwelling areas, surveillance cameras are employed to detect anomalous events. Methods: As a contribution, this research focuses on developing an anomaly detection model based on the deep neural network classifier which effectively classifies the abnormal events in the surveillance videos and is effectively optimized using the grey wolf optimization algorithm. The extraction of the features utilizing the Histogram of Optical flow Orientation and Magnitude (HOFM) based feature descriptor furthermore improves the performance of the classifier. Results: The experimental results are obtained based on the frame level and pixel levels with an accuracy rate of 92.76 and 92.13%, Area under Curve (AUC) rate of 91.76 and 92%, and the equal error rate (EER) is 7.24 and 9.37% which is more efficient compared with existing stateof-art methods. Conclusion: The proposed method achieved enhanced accuracy and minimal error rate compared to the state of art techniques and hence it can be utilized for the detection of anomalies in the video.
1 Introduction Detection of an anomaly in image, signal, and the area of processing video is a challenging task owing to the demand for anomaly detection of monitoring systems in the real world [1–3]. The aggressive improvement of the number of surveillance videos B. S. Gayal (B) Department of Electronics and Telecommunication Engineering, JSPM’s Rajarshi Shahu College of Engineering, Tathawade, Pune 411033, India e-mail: [email protected] S. R. Patil Department of Electronics and Telecommunication Engineering, Bharati Vidyapeeth’s College of Engineering for Women, Dhankawadi, Pune 411043, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_20
249
250
B. S. Gayal and S. R. Patil
for the personal and common resources over the last few short years shows that the surveillance system is most easily reachable and inexpensive [4]. In healthcare applications, digitizing and anomaly detection is necessary for efficient diagnosis [5]. For that motive, the development of computer-sustained tools is highly required for the automated inspection of the provided data performance [6]. To ensure security, safety, and efficiency as a consequence of the several criminal attacks, Closed-Circuit Television is available large in numbers in the common areas which record each second [7, 8]. When compared to other computer-related vision issues, frequently a smaller number of abnormalities happened than the normal incidents, which provides a sharp disproportion in-between the negative and positive models [9]. A higher waiting time [10] is required for the detection based on a manual process. The detection of anomaly mainly depends on the variations included from the knowledgeable description belonging to the typical group [11–13]. Consequently, the new phenomena that exist in the ordinary class be possible to significantly varies from the knowledgeable description and perchance it can be misdiagnosed as an anomaly [3]. Recently in image processing, the classification and detection of various anomalous functions have become attracted worthy of attention. Particularly in the detection of abnormalities, incidents in complicated environments, or the abnormal behaviors in the congested shots automatically is the most challenging task [14]. Due to the report of conflicting incidents captured in the videos not only based on the background situations even so on the interpretation related to humans [6, 7]. Based on the computer-based vision effort, machine learning methods such as convolutional neural network (CNN), deep CNN [15], Support Vector Machine (SVM), and other techniques play a major role in the detection of anomaly behavior in the video [16]. Among these deep learning methods, the structure of the autoencoder depends on the convolutional neural network has been extensively utilized for analyzing the data in videos [9]. In recent, the problem of detecting anomalies, the Multiple Instance Learning process is the designation of the weakly marked videos. The availability of several fragments is the division of video whereas the various succeeding frames are the representation of each fragment [3]. In terms of computer-based vision function and image processing, the deep learning-dependent method has become very successful in the detection of anomalies [6]. The major contribution is described as follows for this paper, • The Avenue dataset and Shanghai Tech campus datasets are provided as input for the detection of an anomaly in the surveillance video. Both day and nightfall activities are covered in the videos, which are varied based on the visualization of the images. • The features were extracted from the input surveillance image by the utilization of Histogram of Optical flow Orientation and Magnitude (HOFM) based feature descriptor for improving the classification performance. • The performance of the proposed grey wolf-based deep CNN is proved to be better than the existing methods for anomaly detection such as Accuracy, equal error rate (EER), Area under the curve (AUC), and receiver operating characteristic (ROC) analysis.
Video Anomaly Detection Using Optimization Based Deep Learning
251
The excess of the paper is arranged as follows: Sect. 2 exposes the various existing techniques for the detection of an anomaly. Section 3 exposes the proposed method using the optimized classifier. Section 4 exposes the proposed optimization. Section 5 reveals the results obtained for the proposed method and finally, Sect. 6 concludes the paper with achievements.
2 Literature Review Thomaz et al. [4] introduced a newly developed mcRoSuRe algorithm which is based on the moving camera for the detection of an anomaly. The performance involved in the detection process is enhanced with the disorder surroundings captured by videos. In some cases, the true positive detection is misleading when compared to the existing methods and the process of detecting small-size objects is very difficult. Zaheer et al. [3] utilized a method of weakly supervised self-reasoning for the detection of an anomaly. The fully connected layers clusters are greatly improved by the developed approach, which enhances the ability to separate the portion of anomaly. The obtained area under Curve (AUC) for the frame level using the University of Central Florida-crime (UCF-crime) is low when compared to the conventional approaches. The method developed by Elvan Duman and Osman Ayhan Erdem [6] introduced the convolutional autoencoder technique for determining the normal behavior in the videos. The anomalies present in videos can be detected in an unsupervised aspect with the help of Convolutional Autoencoder (Convolutional AE) and Convolutional Long short term memory (Convolutional LSTM). Depending on the AUC result, the conventional method is superior to the employed method. Thus, the better AUC is obtained, whereas, the accuracy and robustness for one of the datasets are low based on their size. However, the technique developed by Savath Saypadith and Takao Onoye [7] is based on a “multi-scale U-Net” structure for the anomaly identification in the video which depends on the Generative Adversarial Network. The total number of parameters involved in the detection of anomaly is less where the AUC analysis shows that in some cases, the existing method gains superiority. The convolution operator removes some useful features in the process of training that is stored in the “multi-scale U-Net”. Li et al. [9] bring to light a framework of Spatio-temporal for the process of detecting an anomaly in the sequence of video. The combination of U-Net and Convolutional LSTM shows several advantages in spatial-related information representation and their capabilities for data modeling and the arrangement of images. The Equal Error Rate and AUC are better in the conventional methods for the utilized datasets. The anomaly detection designed by Guo and Shui [17] utilized the K-Nearest Neighbour (KNN). Here, the weight-based features are extracted for the detection of the anomaly, but they failed to use the optimization to enhance the accuracy. Similarly, the anomaly detection developed by Naseer et al. [18] using the deep learning approach failed to use the optimization strategy, which further enhances the accuracy of detection. Wang et al. [19] modeled the optimization based deep learning for anomaly detection. They obtained better performance enhancement. Thus, this
252
B. S. Gayal and S. R. Patil
research developed anomaly detection with the proposed HOFM technique along with the optimized deep learning approach.
2.1 Problem Statement Anomaly detection is utilized for the detection of irregular or unusual activities in the video. It is widely used in several applications such as fraud detection in all areas. Several methods were developed for the detection of an anomaly; still, the more accurate detection is a challenging task. The challenges such as inaccurate detection, computational complexity, and several factors degrade the performance of the system. Hence, there is a need for automatic anomaly detection with more accuracy. This, research introduces a novel anomaly detection for the enhancement of detection accuracy with reduced computational complexity.
3 Proposed Anomaly Detection Using the Optimization-Based Deep CNN Classifier The proposed deep CNN classifier for the detection of anomaly is shown in Fig. 1. The input surveillance video dataset is considered for anomaly detection, which ensures the focus on the anomalous events occurring in public places. Accordingly, initially, the object tracking is done using the optical flow descriptor, and the detection is executed using the Grey wolf optimization-based deep convolutional neural network (GWO-based deep CNN) classifier. The extracted features using the Histogram of Optical flow Orientation and Magnitude (HOFM) descriptor are given as input for the GWO-based deep CNN classifier, where the parameters of the classifier are optimized using the GWO algorithm. The output of the classifier differentiates the normal or abnormal activities in the video surveillance. In the end, the anomaly events are closely tracked for localization.
3.1 Keyframe Selection and Object Detection The surveillance videos consist of a huge amount of information which makes the process of detection of anomaly difficult. By selecting the useful keyframes the complexity of detection is reduced and the moving object is also detected by analyzing the varying position of the object.
Video Anomaly Detection Using Optimization Based Deep Learning
253
Fig. 1 Proposed GWO-based deep CNN classifier
3.2 Object Tracking and Feature Presentation Using Optical Flow Descriptor The tracked object from the keyframes is further preceded for the extraction of features individually using the feature descriptor named HOFM, which calculates the vertical and horizontal axis gradients. The HOFM feature descriptor isolates the patterns in motion from the captured videos in the training phase. In each region, the stored patterns are analyzed with the incoming patterns in the testing phase. The conventional algorithms show the feature extraction process and the dimensions diminishing model. This proposed model helps to evaluate the clustered images and tends to diminish the dimensions of the input data. Object tracking: The optical flow is an effective method for tracking the motion of objects which depends on the velocity included in varying frames. The optical flow detection model helps to detect anomalies whether the object is in accurate motion. The regularity and data attachment term depend on both the optimal flow gradient and constraint function and it is expressed as, l(n) =
K p u + Kq v + K J + |u|2 + |v|2
(1)
254
B. S. Gayal and S. R. Patil
where I regulate the weight when compared with the constraints in the optical flow. The parameter I is get squared to maintain the units to stay at the grey level. The terms K p + Kq + K J that u, v follow the function of Gaussian distribution are partial derivatives and local averages. At the time t, HOFM is utilized to achieve the histogram and it is described as kϑ,c = kϑ,1 , kϑ,2 , . . . , kϑ,c for the individual block ϑ present in the frame. The binary mask is necessary for the computation of optical flow, which is done using subtraction in the image frame ab ab+c . HOFM uses the magnitude and orientation for the feature vector, in which the matrix is constructed as Cυ×τ , where τ and υ denotes the number of ranges in orientation and the magnitude ranges of HOOF. At each time instant t, the matrix C is taken into account and it is described as, 1 i f ε = mod (H, H ) and ( σ = mod(κ,τ )) C(e, x) = 0 other wise − →
(2)
τ
where the orientation and magnitude are represented as e and x, the achieved output after the HOFM process is given as, F = {F1 , F2 , . . . FG , . . . FL }
(3)
where L and FG denotes the total number of events and the Gth event.
3.3 Video Anomaly Detection Using the GWO-Based Deep CNN Classifier Deep CNN holds the characteristics of dealing with complicated analysis, which shares the enormous amount of data by their presence of multiple layers. The videos and images carry various patterns, which are identified by the GWO-based deep CNN classifier. The major focus of deep CNN is to detect the anomalies of the object in this research.
3.3.1
Deep CNN Structure
The detection of an anomaly in captured sequence, features are extracted which can be classified by the Deep convolutional neural network. The proposed optimized deep CNN is well suited for many other applications including the detection of an anomaly. The input for the Optimized classifier is the extracted features shown in Eq. (3). In the primary convolution layer, the edges of the images are shaped by the samples provided, and also the new filters are placed. Based on the primary convolution, the dimensions in the extracted features are eliminated by the pooling layer and some significant features are selected for the training. Depending on the complications
Video Anomaly Detection Using Optimization Based Deep Learning
255
Fig. 2 Deep CNN architecture
of the input data, the pooling layers and convolutional layers are involved. In the terminal connection, the output and fully connected layers are present (Fig. 2).
4 Grey Wolf Optimization for Deriving the Classifier Parameters Grey wolf optimization algorithm is a recently developed optimization technique with the new most effective investigation. It is mainly emulating the characteristics of grey wolves naturally for the hunting process. The communal ranking, chasing, surrounding, and seizing the prey are detailed mathematically in this section.
4.1 Communal Ranking For designing the communal ranking of Grey wolf, the various characters involved are gamma (γ ), lambda (λ), and theta (θ ). Among that gamma is supreme to accurately obtain the fittest solution. The available solutions next to the prime best solution are considered as the second (lambda) and third (theta) possible solutions respectively. The hunting process is done by the guidance of gamma, lambda, and theta.
4.2 Surrounding the Prey During the process of hunting, the wolves initially enclose the prey and the behavior is mathematically expressed as, + 1) = R(T) · U ; U = V · Rs (T) − R(T) R(T −P
(4)
256
B. S. Gayal and S. R. Patil
where, the present iteration is denoted as T, the coefficient vectors are P and V , R is the position of the superior grey wolf and Rs is the position of the prey, which is surrounded by the predator. The Coefficient vectors are estimated as, = 2m V = 2 · w 2 P ·w 1 − m;
(5)
4.3 Chasing Phase The intelligent behavior of prey can identify the position of prey and surround them. The chasing is normally initiated by the character gamma, whereas the lambda and theta are some of the participants in the chasing process. The best initial position of gamma, lambda, and theta is, Uγ = V1 · Rγ −
R ; Uλ = V2 · Rλ −
2 · Uλ ; R2 = Rλ − P
R ; Uθ = V3 · Rθ −
3 · Uθ R3 = Rθ − P
R
(6) (7)
The renovating stage is expressed as, + 1) = R1 + R2 + R3 R(T 3
(8)
The wolves present initially in the ranking are utilized to analyze the location of prey whereas the least wolves are needed to renovate their position and surround the prey accordingly.
4.4 Seizing the Prey When the prey is in a steady state without moving, the wolf stops seizing after catching the prey. In the mathematical modeling, m the value is reduced also the P is decreased by m where it is in the range of 2–0. Depending on the random value P in the range of [−1, 1], the predator can renovate its position in-between the present initial location and the location of prey.
Video Anomaly Detection Using Optimization Based Deep Learning
257
4.5 Exploring Phase is ranges from 1 to − 1. To enlarge the searching stage globally, the random values P When |P| > 1 the grey wolves are directed to move away from the enclosed prey and initiate to find better. When |P| < 1 the predator tends to move towards the targeted prey. And the termination takes place after the satisfaction of finding the best suitable solution.
5 Results and Discussion The proposed Grey wolf-based deep CNN classifier for the detection of anomaly is explained in the results and discussion section.
5.1 Preliminary Setup The developed Grey wolf-based deep CNN classifier for anomaly detection is implemented using MATLAB 2020a software in Windows 10 and 8GB RAM OS. Avenue Dataset: The avenue dataset [20] comprises 30,652 videos captured in CUHK Campus Avenue, in which 15,324 is utilized for the testing and 15,328 is utilized for the training purpose. Shanghai Dataset: The Shanghai dataset [21] comprises 317,398 videos with 17,090 irregular, 300,308 regular, 42,883 testings, and 274,515 training frames.
5.2 Preliminary Results In this section, the analysis based on the input videos performance from the Avenue and Shanghai datasets is shown. The object is identified and the suitable character from the captured video is stored in the dataset. Figure 3 reveals the original frame and the recovered frame depending on HOFM for the Avenue dataset and Shanghai dataset, respectively.
5.3 Comparability Methods The methods employed for the anomaly detection include k-nearest neighbor (KNN) [17], Neural Network (NN) [18], Convolutional Neural Network (CNN) [19], Deep CNN with the proposed Grey wolf-based deep CNN.
258
B. S. Gayal and S. R. Patil
Original Frame (Avenue and shanghai)
Recovered Frame (Avenue and shanghai) Fig. 3 Original and recovered frame using HOFM for the Avenue dataset and Shanghai datasets
5.4 Performance Metrics The measures involved for the performance analysis with the conventional methods KNN, NN, CNN, Deep CNN with the proposed Grey wolf based deep CNN are Accuracy, Equal error rate (EER), Area under the curve (AUC), and receiver operating characteristic (ROC).
5.5 Analysis Based on Comparability Methods (i)
Analysis based on the Avenue dataset Figures 4 and 5 reveal the comparative analysis based on the avenue dataset for the frame level and pixel level. The frame-level accuracy obtained for the methods KNN, NN, CNN, Deep CNN, and the proposed GWO-based deep CNN are given by 81.48%, 84.38%, 87.76%, 91.23%, and 92.76% for 80 training percentage, which is represented in Fig. 4a. The frame-level EER rate of the methods KNN, NN, CNN, Deep CNN, and the proposed Grey wolf based deep CNN are given by 18.52%, 15.62%, 12.24%, 8.77%, and 7.24%, which is presented in Fig. 4b for training percentage of 80. The frame-level AUC rate of the methods KNN, NN, CNN, Deep CNN, and the proposed Grey wolf based deep CNN are given by 79.74%, 82.82%, 86.39%, 90.04%, and 91.76% for training percentage 80, respectively, which is presented in Fig. 4c. The pixel-level accuracy obtained for the methods KNN, NN, CNN, Deep CNN, and the proposed Grey wolf-based deep CNN are given by 81.00%, 83.86%, 87.20%, 90.63%, and 92.13% for 80 training percentage, which is depicted in Fig. 5a. Figure 5b represents the EER evaluation of the methods KNN, NN, CNN, Deep CNN, and the proposed Grey wolf-based deep CNN are given by 19.00%, 16.14%, 12.80%, 9.37%, and 7.87% for training percentage
Video Anomaly Detection Using Optimization Based Deep Learning
259
Fig. 4 Frame-level analysis using the avenue dataset, a accuracy, b AUC, c EER
(ii)
80. The pixel-level AUC rate of the methods KNN, NN, CNN, Deep CNN, and the proposed Grey wolf based deep CNN are given by 79.08%, 82.39%, 86.18%, 90.06%, and 92.00% for training percentage 80 respectively, which is illustrated in Fig. 5c. Analysis based on the Shanghai dataset Figures 6 and 7 reveal the comparative analysis based on the Shanghai dataset for the frame level and pixel level. The Frame level accuracy obtained for the methods KNN, NN, CNN, Deep CNN, and the proposed Grey wolf based deep CNN are given by 82.31%, 85.19%, 88.55%, 92.00%, and 93.51% for 80 training percentage using the Shanghai dataset is depicted in Fig. 6a. The frame-level EER rate of the methods KNN, NN, CNN, Deep CNN, and the proposed Grey wolf-based deep CNN are 17.69%, 14.81%, 11.45%, 8.00%, and 6.49% for training percentage 80, which is depicted in Fig. 6b. The framelevel AUC rate of the methods KNN, NN, CNN, Deep CNN, and the proposed Grey wolf-based deep CNN are given by 80.85%, 83.92%, 87.48%, 91.13%, and 92.84% for training percentage 80 respectively, which is shown in Fig. 6c.
260
B. S. Gayal and S. R. Patil
Fig. 5 Pixel level analysis using the avenue dataset, a accuracy, b AUC, c EER
The pixel-level accuracy obtained for the methods KNN, NN, CNN, Deep CNN, and the proposed Grey wolf-based deep CNN is given by 82.42%, 85.27%, 88.59%, 92.01%, and 93.49% for 80 training percentage, which is shown in Fig. 7a. The pixellevel EER rate of the methods KNN, NN, CNN, Deep CNN, and the proposed Grey wolf-based deep CNNare given by 17.58%, 14.73%, 11.41%, 7.99%, and 6.51% for training percentage 80, which is shown in Fig. 7b. The pixel-level AUC rate of the methods KNN, NN, CNN, Deep CNN, and the proposed Grey wolf based deep CNN are given by 80.23%, 83.54%, 87.33%, 91.21%, and 93.15% for training percentage 80 respectively which is shown in Fig. 7b.
5.6 ROC Analysis for the Detection of Anomaly The ROC analysis is measured based on TPR and FPR values. ROC analysis values concerning the avenue dataset for frame-level of TPR are given by 0.8369, 0.8569, 0.8769, 0.8969, 0.9169 when FPR is 0.04, which is shown in Fig. 8a. The ROC analysis values concerning the avenue dataset for pixel-level of TPR is shown in
Video Anomaly Detection Using Optimization Based Deep Learning
261
Fig. 6 Frame level analysis using the Shanghai dataset, a accuracy, b AUC, c EER
Fig. 8b. The ROC analysis is measured based on TPR and FPR values. ROC analysis values concerning the Shanghai dataset for frame-level of TPR is 0.9393, 0.9443, 0.9493, 0.9543, 0.9593 when FPR is 0.06, which is shown in Fig. 8c. The ROC analysis values concerning the Shanghai dataset for pixel-level of TPR is shown in Fig. 8d.
5.7 Comparative Discussion The proposed anomaly detection technique outperformed other state of art techniques in terms of accuracy, AUC, and equal error rate. The proposed HOFM feature descriptor model helps to evaluate the clustered images and tends to diminish the dimensions of the input data. In addition, the object tracking based on the optical flow detection model helps to detect anomalies whether the object is in accurate motion, which reduces the computational complexity. Besides, the optimized classification process enhances the accuracy further. The conventional methods of anomaly
262
B. S. Gayal and S. R. Patil
Fig. 7 Pixel level analysis using the Shanghai dataset, a accuracy, b AUC, c EER
detection such as KNN and deep neural networks failed to consider the optimization strategy, which enhances the detection accuracy. Similarly, the optimized deep learning method is closer to the proposed method because of the optimized learning strategy. However, the inclusion of the proposed HOFM feature descriptor enhances the accuracy of detection further for the proposed method.
6 Conclusion The anomaly detection model based on a deep neural network is carried through grey wolf optimization. The effectiveness of this paper relies on the proposed HOFM feature descriptor which furthermore improves the performance of the classifier. The efficiency of the deep neural network classifier is proved by employing the Avenue and Shanghai dataset that justifies the performance by measuring the frame level and pixel-level values of accuracy, EER, and AUC. The accuracy rate of 92.76% and 92.13%, AUC rate of 91.76% and 92%, and EER rate of 7.24% and 9.37% are
Video Anomaly Detection Using Optimization Based Deep Learning
263
Fig. 8 ROC analysis a Avenue dataset frame-level b Avenue dataset pixel-level c Shanghai dataset frame-level d Shanghai dataset pixel level
obtained which shows that the efficiency is improved and the error rate is greatly reduced.
References 1. K.R. Mestav, L. Tong, Universal data anomaly detection via inverse generative adversary network. IEEE Signal Process. Lett. 27, 511–515 (2020) 2. T. Xiao, C. Zhang, H. Zha, Learning to detect anomalies in surveillance video. IEEE Signal Process. Lett. 22(9),1477–1481 (2015) 3. M.Z. Zaheer, A. Mahmood, H. Shin, S.-I. Lee, A self-reasoning framework for anomaly detection using video-level labels. IEEE Signal Process. Lett. 27, 1705–1709 (2020) 4. L.A. Thomaz, E. Jardim, A.F. da Silva, E.A.B. da Silva, S.L. Netto, H. Krim, Anomaly detection in moving-camera video sequences using principal subspace analysis. IEEE Trans. Circuits Syst. I Regul. Pap. 65(3),1003–1015 (2018) 5. H. Sumit, S. Bhambere, B. Abhishek, Rapid digitization of healthcare—A review of COVID-19 impact on our health systems. Int. J. All Res. Educ. Sci. Methods 9(2),1457–1459 (2021) 6. E. Duman, O.A. Erdem, Anomaly detection in videos using optical flow and convolutional autoencoder. IEEE Access 7, 183914–183923 (2019) 7. S. Saypadith, T. Onoye, An approach to detect anomaly in video using deep generative network. IEEE Access 9, 150903–150910 (2021)
264
B. S. Gayal and S. R. Patil
8. V. Rupapara, K.R. Thipparthy, N.K. Gunda, M. Narra, S. Gandhi, Improving video ranking on social video platforms, in 2020 7th International Conference on Smart Structures and Systems (ICSSS) (IEEE, 2020), pp. 1–5 9. Y. Li, Y. Cai, J. Liu, S. Lang, X. Zhang, Spatio-temporal unity networking for video anomaly detection. IEEE Access 7, 172425–172432 (2019) 10. S. Bhambere, The long wait for health in India—A study of waiting time for patients in a tertiary care hospital in Western India. Int. J. Basic Appl. Res. 7(12), 108–111 (2017) 11. K. Liu, W. Liu, H. Ma, W. Huang, X. Dong, Generalized zero-shot learning for action recognition with web-scale video data. World Wide Web 22(2), 807–824 (2019) 12. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, in Proceedings of the IEEE Conference of Computer Vision and Pattern Recognition (2014), pp.1725–1732 13. S. Ji, W. Xu, M. Yang, K. Yu, 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell 35(1), 221–231 (2013) 14. B. Vivekanandam, Design an adaptive hybrid approach for genetic algorithm to detect effective malware detection in android division. J. Ubiquitous Comput. Commun. Technol. 3(2), 135–149 (2021) 15. T. Vijayakumar, Posed inverse problem rectification using novel deep convolutional neural network. J. Innov. Image Process. (JIIP) 2(03), 121–127 (2020) 16. R. Sharma, A. Sungheetha, An efficient dimension reduction based fusion of CNN and SVM model for detection of abnormal incident in video surveillance. J. Soft Comput. Paradigm (JSCP) 3(02), 55–69 (2021) 17. Z.-X Guo, P.-L. Shui, Anomaly based sea-surface small target detection using K-nearest neighbor classification. IEEE Trans. Aerosp. Electron. Syst. 56(6), 4947–4964 (2020) 18. S. Naseer, Y. Saleem, S. Khalid, M.K. Bashir, J. Han, M.M. Iqbal, K. Han, Enhanced network anomaly detection based on deep neural networks. IEEE Access 6, 48231–48246 (2018) 19. W. Wang, R. Yang, C. Guo, H. Qin, CNN-based hybrid optimization for anomaly detection of rudder system. IEEE Access 9, 121845–121858 (2021) 20. Avenue Dataset. http://www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html, Last Accessed on 2022 21. Shanghai Dataset. https://svip-lab.github.io/dataset/campus_dataset.html. Last Accessed on 2022
A Fusional Cubic-Sine Map Model for Secure Medical Image Transmission Sujarani Rajendran , Manivannan Doraipandian, Kannan Krithivasan, Palanivel Srinivasan, and Ramya Sabapathi
Abstract Medical images are one of the moset significant attribute for diagnoising the disease in medical systems. In today modernization in digital environment medical images are hacked during transmission on insecure network. By considering the patient’s privacy and security their medical images has to be transferred in secure maner. This work aims for proposing a new medical image cipher architecture by based on fusional chaotic map. At first, a fusional Cubic-Sine Map (CSM) is proposed to generate pseudorandom numbers, then confusion and diffusion of image is executed based on the chaotic series produced by CSM. Experimental results and security analysis indicate that the developed chaotic map model generate sufficient random series and also the proposed cipher model has the ability to resisit Statistical, exhaustive, crop and noise attacks attacks.
1 Introduction In recent digital technology, images plays an dynamic role in all the fields like industries, social network, militaries, wireless sensor application and especially in medical field images role is very crucial for diagnosing the diseases. In most of the S. Rajendran (B) Department of Computer Science and Engineering, Srinivasa Ramanujan Centre, SASTRA Deemed University, Kumbakonam 612001, India e-mail: [email protected] M. Doraipandian · K. Krithivasan · P. Srinivasan · R. Sabapathi School of Computing, SASTRA Deemed University, Thanjavur 613401, India e-mail: [email protected] K. Krithivasan e-mail: [email protected] P. Srinivasan e-mail: [email protected] R. Sabapathi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_21
265
266
S. Rajendran et al.
e-healthcare application patient details including images like CT, MRI and X-ray images are transmitted over network [1]. To safeguard these images from piracy during transmission security functions are required. Different techniques like information hiding, watermarking and cryptography have been used for protecting such images, from this cryptography [2] used as a primary role for securing images because of transmitting the image in unreadable form [3]. Eventhough, different traditional crypto-algorithms such as AES, DES, IDEA have been utilized for securing images, some researchers proved that these systems are not efficient for protecting images due to the inherent properties of images like higher redundancy and strong correlation among pixels and huge pixel values [4]. Especially, medical images are having more sensitive than normal digital images, mostly these medical image are stored in DICOM or PEC form in which patient details are hidden in the scanned image [5]. DICOM images are represented by 16 bits per pixel for improving the quality of the image. So traditional approaches requires huge amount of time for encrypting these pixels. Even, watermarking act as a vital role in providing security for the images [6, 7], but in medical field already patient details are hidden in medical images after that if watermarking or steganography is adopted means, then it require higher dimensions stego-image, because stego image should be double the size of embedding image. Based on this limitations it can be concluded that watermarking and steganography is not an optimal one for medical image security. In order to overwhelm this problem different approaches have been developed by many researchers, from that chaotic system has identified as a compatible field for providing efficient security for images, because of its in-built attributes like, ergodicity, deterministic and sensitivity to initial condition [8]. Different researchers have been developed various chaos based image cryptosystem [9–11]. The base of the chaotic system is the chaotic map utilized to generated pseudo random numbers. Two variance of chaotic map like lower and higher dimensional chaotic map are employed for key generation in image cryptosystem. Some researchers developed image ciphers by utilizing lower dimensional chaotic map are discussed in the following. Yuan et al. [12] developed an asymmetric color image cipher by deploying one dimensional logistic map for scrambling the image and Singular Value Decomposition concept for diffusing the confused image. SVD used to decompose the matrix into different number of blocks and scrambling is applied to each blocks by utilizing the chaotic key series generated by the logistic map. Hua et al. [13] developed a secure image sharing scheme utilized piece wise chaotic map for compressing coding, and for encryption Chinese reminder theorem and logistic map are employed for encryption for providing dual layer security for the images. Cao et al. [14] developed a medical image cipher based on edge maps, at first image edges are identified by the edge map and bit-plane decomposition is applied, finally for encryption each decomposed blocks are encrypted by utilizing the key series produced by the one dimensional double sine map. An efficient medical image cipher has developed by Fu et al. [15] by adopting Arnold cat map for permutation of pixels and one dimensional logistic map for diffuse the pixel by executing xor operation between the confused pixel value with chaotic series generated by logistic map. Eventhough, different image cryptosystems have been developed based on one dimensional chaotic maps
A Fusional Cubic-Sine Map Model for Secure …
267
but some researchers have identified the limitations of one dimensional chaotic map like, short period of randomness and predictable chaotic behaviour which make it possible for the intruders to identify the keys. To overwhelm this drawback fusional chaotic maps are developed by many researcher for generating longer period random key sequence. Teh et al. [16] proposed a hybrid Tent-Logistic and Tent-Sine (TLTS) map and TSTS chaotic maps by combining three one dimensional chaotic map such as, Logistic, Sine and Tent map. They have been proved that the random behaviour of the hybrid chaotic series are higher than the single series. Amina et al. [17] advanced a new fusional Logistic-Tent (LT) and Logistic-Sine (LS) map for generating high random chaotic series which provide more security than the existing single one dimensional chaotic series. Attaullah et al. [18] improved the randomness of Chebyshev map chaotic series by improving the mathematical form of the Chebyshev map. Authors have constructed the S-box by using chaotic series generated by the improved Chebyshev map and this S-box is utilized for encrypting the image. Even different research work developed a fusional chaotic maps by combining different one dimensional map. Most of the developed fusional chaotic maps are lack in key size and range. So the requirement is the fusional map should have higher key size and range. Based the above motivation, in this work a novel fusional one dimensional CubicSine map is proposed by combining cubic and sine map. As a result, the developed fusional map has more random behaviour then the existing one. A novel circular bit shifting is proposed in the diffusion concept which entire depends on the chaotic key values generated by the fusional Cubic-Sine map. This makes the cryptosystem for efficiently protecting the medical images from unauthorized access. The proposed architecture is the combination of confusion which is employed for scrambling the pixel position and diffusion used for altering the pixel values. This article is framed as follows, Sect. 2 describes the proposed fusional chaotic map and Sect. 3 proves the randomness of the chaotic sequence and also compared with the existing map. Section 4 discoursed the architecture of the proposed encryption model and Sect. 5 deliberates the security analysis part and this article ends with the conclusion part described in Sect. 6.
2 Materials and Methods 2.1 Cubic Map This map is one of the good random behavior one dimensional chaotic map [19], which utilized triple arithmetic operations for generating the chaotic series. The mathematical form is given in (1). xn+1 = αxn3 + (1 − α)xn
(1)
268
S. Rajendran et al.
α is the system parameter which should be within the range of 3.2 ≤ α ≤ 4 for generating the series in random form.
2.2 Sine Map Sine map contains an efficient dynamic properties compared to other one dimensional chaotic maps [20]. Different image ciphers have been developed by utilizing the sine map chaotic series which is expressed in mathematical form in (2). xn+1 = β sin(π xn )/4
(2)
β is the 0 ≤ β ≤ 4, in this range of β sine map produce random chaotic sequence. If it exceeds beyond the range then this map generate a periodic series.
2.3 Proposed Cubic-Sine Map A novel fusional Cubic-Sine Map (CSM) is proposed to increase the randomness and the chaotic series range to withstand statistical attack. The proposed CSM is defined in the following (3). xn+1 = α sin(π xn ) + βxn
(3)
Here, the range of α, β > 1 which increased the probability of seed keys. The range of chaotic series depends on the value of α. For illustration, if α value is 4 then the chaotic range is in between −4 and 4, and if it is 10 means the range of xn is −10 ≤ xn ≤ 10. Hence the chaotic sequence is entirely depends on the initial values of system parameters which makes the cryptosystem to resist statistical attack.
3 Chaotic Behavior Analysis of Proposed Cubic–Sine Map The randomness of the chaotic series can be identified by three general components such as trajectories, bifurcation diagram and Lyapunov Exponent (LE) of the chaotic series [21]. At first, trajectories comparison has executed between the existing maps (Cubic and Sine) and proposed CSM and the results are exposed in Fig. 1 , based on the result it can be recognized that the proposed CSM has better randomness then the existing map. The main characteristic of chaotic system is that it should be completely dependent to initial seed values of system parameters. To identify the dependency, the bifurcation diagram is portrayed in Fig. 2a by changing the α value from 4.5 to 5 with time interval as 0.01. Hence, Fig. 2 depicts that the CSM is
A Fusional Cubic-Sine Map Model for Secure …
(a)
269
(b)
(c)
Fig. 1 Trajectories of a cubic map b sine map c cubic sine map
(a)
(b)
Fig. 2 a Bifurcation diagram b Lyapunov exponent of CSM
completely dependent on initial values. The component is the LE, positive value of LE specifies that the key series are highly sensitivity to a tiny changes in seed values of the system parameters. Figure 2b depicts the LE of x series with respect to α value. Positive value of LE demonstrates the efficiency, randomness and sensitivity of the proposed CSM. Hence, it’s proved that the proposed map has high randomness and more sensitivity to the initial parameters.
4 Proposed Medical Image Cipher The developed medical image cipher architecture is the combination of permutation and diffusion phase. The chaotic series of the proposed CSM is utilized to permutate the pixel position and bit-level xor operation is executed between the scrambled pixels and chaotic series for obtaining the final encrypted image. The step by step procedure of the developed medical image cryptosystem is described as follows and the block structure of the proposed model is depicted in Fig. 3. Step 1: Medical image (MI) with size (M × N),here (256 × 256) taken as an input. Most of the medical images are in the form of DICOM format with 16 bits representation, for the illustration purpose here we have taken the image as 8 bits.
270
S. Rajendran et al.
Divide the Image into Non-overlapping blocks
Image Confusion
Image Diffusion
Input Image Index of Sorted Chaotic series
Creation of Key1 and Key2 Image
Cipher Image
Seed keys (α, β,
)
Proposed 1D Cubic Sine Map
Fig. 3 Architecture of the proposed CSM based medical image cipher
Step 2: Proposed CSM is executed iteratively for (M × N)/2 times to generate the chaotic key series which represented as X = {x1 , x2 , x3 , . . . w M X N }. Then sort the generated X series in ascending form and also maintain the old index value of the unsorted X series. This can be achieved by the following (4). [sor t x, nindex] = sor t(X )
(4)
Step 3: Each pixel positions of the image are shuffled by using old index (oindex) of the unsorted and new index (nindex) value of the sorted chaotic series. The pseudocode 1 of the scrambling process is depicted as follows. Pseudocode 1: Image Confusion For p = 1: N For q = 1: M Temp = MI(i, j) MI(i, j) = MI(oindex(p), nindex(q)) MI(oindex(p), nindex(q)) = Temp End End CMI = MI;
Step 4: Confused image (CMI) is divided into 8 × 8 non overlapping blocks. Inside each block, pixel values are converted into 8-bit binary form. These binary form of the pixels inside each block are merged by column basis, then circular shifting is applied to each merged column binary values. The reverse process of splitting and conversion from binary to decimal is executed for getting the diffused image.
A Fusional Cubic-Sine Map Model for Secure …
271
The pseudocode of the creation of key image and diffusion process is discussed as follows. Pseudocode 2: key Image Creation k=1; For row = 1: M For col = 1: N Key(row,col) = mod( X(k)*10000),256); k= k+1; End End
Pseudocode 3: Image Diffusion Block=8; Imgblock=Image-to-8 x 8 Blocks(CMI); No.of Blocks(NOB) = M/8; For i = 1: NOB For j = 1:NOB BinaryBlocks(i, j) = dec2bin(Imgblock(i, j) End End // Vertical concatenation of each column and circularly shift the column inside each block. Fimageblock = circshift(vertcat(Binary Blocks)) Eimage = block to image(Fimageblock); EncryptedImage (EI) = Eimage key;
5 Experimental Results and Security Analysis The proposed cipher implemented in Matlab2016a, on a personal computer with 4.00 GB RAM, Intel i5 processor and Windows 8.1 OS. For simulation and comparison purpose, eight medical images are taken with size 256 × 256. Figure 4 illustrate the result of confusion and diffusion. For illustration purpose the system parameters and seed key values are taken as α = 4.6758493827, β = 4.238776234, x0 = 0.2398764523.
272
S. Rajendran et al.
(a)
(b)
(c)
Fig. 4 Experimental result a input image b confused image c diffused image
(a)
(b)
Fig. 5 Histogram illustration of a original b encrypted image
5.1 Histogram Analysis Histogram map out each pixel value of the image and it exposes the property of pixel distribution. An effectual image cipher should achieve uniform intensity distribution, which is entirely distinct from the input original image. For demonstration, the histogram of original medical image and its corresponding encrypted image is given in depicted in Fig. 5.
5.2 Correlation Between Adjacent Pixels In a normal digital image, pixel values are identical to its nearby pixel [22]. This provides an easy way for the attacker to retain the original image if the image is encrypted by using the same pattern of function for all the pixels. In the proposed cipher each pixels are encrypted by using the random values generated by the CSM. Hence, it increases the random distribution of pixels in cipher image. Evidence of uniform distribution of pixels in all the direction of cipher image in Fig. 4c is given in Fig. 6. Correlation coefficient result of test images and their comparison is represented in Table 1. Correlation diagram and the comparison result depicts that the developed medical image cipher greatly reduced the correlation and increased the security in term of uniform distribution of pixels.
A Fusional Cubic-Sine Map Model for Secure …
(a)
273
(b)
(c)
Fig. 6 Cipher image correlation result for the direction of a diagonal b vertical c horizontal
Table 1 Comparison of correlation coefficient Test images Sample 1 Sample 2 Sample 3 Ref. [23] Diagonal
Sample 4
Sample 5
Sample 6
0.0051
0.0095
0.0042
0.0080
0.0098
0.0031
Vertical
0.0092
0.0104
0.0184
0.0091
0.0163
0.0045
Horizontal
0.0182
0.0128
0.0117
0.0238
0.0175
0.0039
−1.89e−04
0.0059
−0.0033
−5.19e−04 −0.0033
−0.0015
Proposed Diagonal cipher Vertical
0.0046 −0.0037
−0.0049
−0.0019 −0.0043
0.0013
Horizontal −0.0048
0.0029
−9.87e−04 −0.0013
0.0054
0.0009
5.3 Entropy Analysis An incredible report to precise the amount of uncertainty in cipher image is the Entropy analysis which is computed based on (5). A good image cipher should have the entropy result as 8 or nearest to 8, because an ideal image have the entropy value as 8 [24]. E(E I ) =
M 2 −1
k=0
P(E Ik ) log2
1 P(E I )
(5)
EI represents the cipher image. The entropy value of the test images and their comparison is given in Table 2. In connection with the comparison table, it can be finalized that the proposed model greatly increased the randomness of the cipher image.
274 Table 2 Entropy result analysis
S. Rajendran et al. Test images
Entropy value
Sample 1
7.9973
Sample 2
7.9973
Sample 3
7.9967
Sample 4
7.9971
Sample 5
7.9975
Sample 6
7.9974
Sample 7
7.9972
Sample 8
7.9977
5.4 Key Space and Key Sensitivity Analysis To overcome the brute-force attack, image cipher should have atleast 2100 bit key size [25]. In the proposed cipher architecture three parameters act as a seed keys (α, β, x0 ) each has a maximum bits of 214 , Hence the total key size of the proposed scheme is >2100 , which depicts the strength of the cipher to withstand brute-force attack. Key sensitivity is one of the terrific feature to check the strength to cipher, even a single bit change in the key should produce an entirely different encrypted image. For evaluating the key sensitivity, here two key sets are taken with little bit change like, key 1 as (α = 4.6758493827, β = 4.238776234, x0 = 0.2398764523) and key 2 as (α = 4.6758493828, β = 4.238776234, x0 = 0.2398764523). Sample 2 image is taken as in input and two cipher images are produced by encrypting sample 2 with keys key1 and key2 further the difference between these two cipher image is visualized in Fig. 7. First fifty pixels are taken from both cipher images and the pixels are plotted in Fig. 7, based on visualizing the pixels it and be justified that the a little bit change in keys greatly affect the cipher image and makes the cipher to strongly resist the brute-force attack.
Fig. 7 Key sensitivity analysis
A Fusional Cubic-Sine Map Model for Secure …
(a)
(b)
275
(c)
(d)
(e)
Fig. 8 Robustness analysis: decrypted image affected by salt and pepper noise of a 0.02% b 0.05 c 0.09 and crop attack by data loss of d 0.5% e 5.0%
5.5 Robustness Analysis In real-time e-healthare application, medical images are transmitted over network. During transmission cipher image may be affected by crop and noise attacks. So, some unwanted values are updated in the pixels of the cipher image. Moreover, salt and peeper noise and gaussian noise are mostly affect the image data during transmission. An efficient cipher should reclaim the orginal image eventhough cipher images are affected by either noise or crop atack [26]. To assess the proposed cipher purposefully some percentage of images are cropped and some amout out of noises are applied to the cipher image. Decrypted images are taken for evaluating the robustness against these attacks and illustrates in Fig. 8. Based on the illustration, it can be finalized that the proposed cipher architecture greatly maintain the robustness.
6 Conclusion In this work, a fusional chaotic map based on two one dimensional chaotic map has developed for providing security for medical images. Fusional chaotic generate efficient chaotic series which increased the resistance of the developed cipher architecture against common cipher atacks. This highly enhanced the quality of the image cipher. Simulation and performance results indicates the strength of the proposed model against differential, exhaustive and robustness attack analysis. Comparison study indicates that the proposed mode is more effiecient than the state of the art. Now a days in medical field, color images are massively used for diagnoising different dieseases, in future this proposed work is extended for securing color images also. Acknowledgements The Authors gratefully acknowledge the Department of Science and Technology, India for Fund for Improvement of S&T Infrastructure in Universities and Higher Educational Institutions (SR/FST/ETI-371/2014), (SR/FST/MSI-107/2015) and Tata Realty-IT City— SASTRA Srinivasa Ramanujan Research Cell of our University for the financial support extended to us in carrying out this research work.
276
S. Rajendran et al.
References 1. A. Kannammal, S. Subha Rani, DICOM image authentication and encryption based on RSA and AES algorithms, in Communications in Computer and Information Science 330 CCIS (2012), pp. 349–360 2. S.R. Mugunthan, Soft computing based autonomous low rate DDOS attack detection and security for cloud computing. J. Soft Comput. Paradigm, 80–90 (2019) 3. S. Shakya, An efficient security framework for data migration in a cloud computing environment. J. Artif. Intell. Capsul. Netw. 01, 45–53 (2019) 4. S. Rajendran, M. Doraipandian, 8th biometric template security triggered two dimensional logistic triggered sine map by two dimensional biometric template security logistic sine map. Proc. Comput. Sci. 143, 794–803 (2018) 5. C. Karthikeyan, J. Ramkumar, B. Devendar Rao, J.M.: Medical Image Fusion Using Otsu’s Cluster Based Thresholding Relation, in International Conference on Innovative Data Communication Technologies and Application (2019), pp. 297–305 6. T. Jambhale, M. Sudha, A privacy preserving hybrid neural-crypto computing-based ımage steganography for medical images, in Intelligent Data Communication Technologies and Internet of Things: Proceedings of ICICI 2020 (2021), pp. 277–290. 7. R. Dhaya, Light weight CNN based robust ımage watermarking scheme for security. J. Inf. Technol. Digit. World. 3, 118–132 (2021) 8. G. Hu, D. Xiao, Y. Zhang, T. Xiang, An efficient chaotic image cipher with dynamic lookup table driven bit-level permutation strategy. Nonlinear Dyn. 87, 1359–1375 (2017) 9. J.S. Manoharan, A novel user layer cloud security model based on chaotic Arnold transformation using fingerprint biometric traits. J. Innov. Image Process. 3, 36–51 (2021) 10. Z. Hua, Y. Zhou, H. Huang, Cosine-transform-based chaotic system for image encryption. Inf. Sci. (Ny) 480, 403–419 (2019) 11. M. Doraipandian, S. Rajendran, Design of Medical Image Cryptosystem Triggered by Fusional Chaotic Map 12. L. Yao, C. Yuan, J. Qiang, S. Feng, S. Nie, Asymmetric color image encryption based on singular value decomposition. Opt. Lasers Eng. 89, 80–87 (2017) 13. W. Hua, X. Liao, A secret image sharing scheme based on piecewise linear chaotic map and Chinese remainder theorem. Multimed. Tools Appl. 76, 7087–7103 (2017) 14. W. Cao, Y. Zhou, C.L.P. Chen, L. Xia, Medical image encryption using edge maps. Sig. Process. 132, 96–109 (2017) 15. C. Fu, W.H. Meng, Y.F. Zhan, Z.L. Zhu, F.C.M. Lau, C.K. Tse, H.F. Ma, An efficient and secure medical image protection scheme based on chaotic maps. Comput. Biol. Med. 43, 1000–1010 (2013) 16. M. Alawida, A. Samsudin, J.S. Teh, R.S. Alkhawaldeh, A new hybrid digital chaotic system with applications in image encryption. Sig. Process. 160, 45–58 (2019) 17. S. Amina, F.K. Mohamed, An efficient and secure chaotic cipher algorithm for image content preservation. Commun. Nonlinear Sci. Numer. Simul. 60, 12–32 (2018) 18. J.A. Attaullah, T. Shah, Cryptosystem techniques based on the improved Chebyshev map: An application in image encryption. Multimed. Tools Appl. 78, 31467–31484 (2019) 19. M.A. Mokhtar, N.M. Sadek, A.G. Mohamed, Design of image encryption algorithm based on different chaotic mapping. Natl. Radio Sci. Conf. NRSC Proc. 197–204 (2017) 20. B. Idrees, S. Zafar, T. Rashid, W. Gao, Image encryption algorithm using S-box and dynamic Hénon bit level permutation. Multimed. Tools Appl. 79, 6135–6162 (2020) 21. W. Liu, K. Sun, S. He, SF-SIMM high-dimensional hyperchaotic map and its performance analysis. Nonlinear Dyn. 89, 2521–2532 (2017) 22. P. Rakheja, R. Vig, P. Singh, An asymmetric hybrid cryptosystem using hyperchaotic system and random decomposition in hybrid multi resolution wavelet domain. Multimed. Tools Appl. 78, 20809–20834 (2019)
A Fusional Cubic-Sine Map Model for Secure …
277
23. H. Nematzadeh, R. Enayatifar, H. Motameni, F.G. Guimarães, V.N. Coelho, Medical image encryption using a hybrid model of modified genetic algorithm and coupled map lattices. Opt. Lasers Eng. 110, 24–32 (2018) 24. Y.G. Yang, B.W. Guan, Y.H. Zhou, W.M. Shi, Double image compression-encryption algorithm based on fractional order hyper chaotic system and DNA approach. Multimed. Tools Appl. (2020) 25. S. Yoosefian Dezfuli Nezhad, N. Safdarian, S.A. Hoseini Zadeh, New method for fingerprint images encryption using DNA sequence and chaotic tent map. Optik (Stuttg) 224, 165661 (2020) 26. M.Z. Talhaoui, X. Wang, M.A. Midoun, Fast image encryption algorithm with high security level using the Bülban chaotic map. J. Real-Time Image Process. (2020)
Innovative Technologies Developed for Autonomous Marine Vehicles by ENDURUNS Project Pedro José Bernalte Sánchez, Fausto Pedro García Márquez, Mayorkinos Papaelias, Simone Marini, Shashank Govindaraj, and Lilian Durand Abstract In the last years, the interest for offshore areas exploitation has been increased and the marine industry has been experiencing a great growth. These facts motivate the employment of autonomous marine vehicles for monitorization, survey or maintenance works. In this paper, the ENDURUNS project has been proposed. This European initiative develops a sustainable offshore exploration system-based innovation in two coordinated autonomous marine vehicles (surface and underwater vessels), both powered by renewable energies. This project involves great technical challenges due to the goal of zero emission performance. The communications’ infrastructures are an important milestone to achieve a real time monitoring of the system by the user from the remote-control centre. The energy systems employed (solar photovoltaic and hydrogen fuel cell technologies) bring a distinguishing point with regards to the current marine vehicles market. Finally, the sensors and instrumentation implemented in these vehicles allow a high inspection capacity, where all of these are supported by a complex software customization.
P. J. B. Sánchez · F. P. G. Márquez (B) Ingenium Research Group, University of Castilla La Mancha, Ciudad Real, Spain e-mail: [email protected] P. J. B. Sánchez e-mail: [email protected] M. Papaelias School of Metallurgy and Materials, The University of Birmingham, Edgbaston, Birmingham, UK e-mail: [email protected] S. Marini Institute of Marine Sciences, National Research Council of Italy (CNR), La Spezia, Italy e-mail: [email protected] S. Govindaraj · L. Durand Space Applications Services NV/SA, Brussels Area, Belgium e-mail: [email protected] L. Durand e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_22
279
280
P. J. B. Sánchez et al.
1 State-of-the-Art In the last decade, the use of autonomous marine vehicles has encountered a considerable increment. In fact, there are several activities in this environment where these vehicles are the only possible solution, due to the great depths or adverse sea weather constraints [1]. The global Offshore Autonomous Underwater Vehicle (AUV) and Unmanned Surface Vehicle (USV) market was valued at around 55 million United State Dollar (US$) in 2019 and it is expected to reach 145 million US$ by the end of 2026. This market tendency proves the increase of these vehicles’ demand [2]. Scientifically, this topic has been a growing interest in the last five years as Fig. 1 shows. The technical evolution of these vehicles has been supported by the use of modern and light materials, moreover also by the application of novel hardware and software elements [3]. The energy management for a long endurance together with the employment of the renewable energies are the most notable research challenges for autonomous devices development [5]. In this line, appears the governmental enterprise initiatives supporting this evolution. The ENDURUNS project, in the Horizon 2020 framework financed by the European Commission, has the aim to design a modern offshore mapping system composed by two coordinated autonomous vehicles powered by renewable energies [6]. Due to the mentioned energy systems implemented, it is expected to achieve up to 6 months of energetic and operability autonomy. This goal supposes a considerable advance in the marine vehicles market endurances, some of the latest models as Bluefin 21 by General Dynamics in AUV case [7] and C-Enduro in USV case [8], achieving around 48 h and 70 days of autonomy respectively. These commercial vehicle’s references (electrical or conventional fuel) haven’t got a large endurance as this project expects, furthermore none of them employs the combination of surface and underwater devices for offshore inspections. This article describes in detail the last innovations developed by the ENDURUNS project in different areas. Section 2 presents the following points in regard with the current project status. It has detailed the novelties in terms of communications’
Publicaons
Autonomous Marine Vehicles Publicaons Tendency 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 2015
2016
2017
2018
2019
Years
Fig. 1 Scientific publications tendency about autonomous marine vehicles [4]
2020
Innovative Technologies Developed for Autonomous …
281
infrastructures employing last positioning and data transmissions technologies. The power and energy source has been developed as an innovative hybrid system with solar photovoltaic and hydrogen fuel cell combination. It has adapted, customized, and developed the sensors and instrumentations for the vehicles, the most innovative of them are camera and echo sounder that have been presented in this section. Section 3 summarises the conclusion and highlights the article.
2 Innovative Technologies Implemented The ENDURUNS project involves great design challenges regarding the current marine technology. The most remarkable innovations are in the line of communications, sensors, and energy systems [9]. In the following points, the technologies implemented have been illustrated.
2.1 Infrastructure The marine and submarine data transmissions have a complex hardware and software development that requires the latest market and scientist technologies. The ENDURUNS vehicles are used within different application contexts that perform missions autonomously [10]. In these missions, the USV and the AUV reach a specific area of interest and they survey the area. They collect and store the information, and transmits the relevant acquired data to the Remote Mission Control Centre (RMCC). The communications with the unmanned devices are acoustic and radio signals. The underwater communication between the AUV and USV is achieved throughout acoustic modems, by Wireless Fidelity (Wi-Fi) during surface movements [11]. The data concerning to the position or any commands from the RMCC supervisor is transmitted by the USV, to minimize the energy consumption. The AUV can communicate directly with RMCC by Iridium Short Bus Data (SBD). The AUV uses its acoustic modem to receive the necessary data to complete its mission and verify its position by pinging the USV whenever required. The USV communicates with the RMCC at the shore using Global Positioning System (GPS) L-Band. Figure 2 shows the communication technologies implemented for each transmission links. The remote management of the system is required to communicate with the autonomous vehicles, transmitting this data to the user, and storing the data in such a way that it can be easily accessible later. The communication and data types are compliant with most of the third-party hardware or software solutions. The following points resume the hardware and software systems employed for the communication systems.
282
P. J. B. Sánchez et al.
AIR Remote Onshore Center
L Band L Band
USV
SBD Wifi
Acousc Transmissions
AUV WATER
AUV SEAFLOOR
Fig. 2 ENDURUNS project communications’ infrastructure
• For Long-Range high-performance Acoustic system: the EVOLOGICS USBL model allows underwater acoustic communication with additional functionality of acoustic transponder. • For Wi-Fi transmissions: the WL-WN570HN2 modem and antenna guarantee an efficient performance. • For GPS communications: the UBLOX NEO-M8N module computes the absolute USV position and the GPS Antenna, AQHA.50. A.301111, works with active multi-band Global Navigation Satellite System (GNSS). • For Satellite transmissions: The Thales Vessel LINK allows a global satellite coverage for maritime communications, with uninterrupted coverage from pole to pole. The work methodology employed for the RMCC development, has been agile, uses JIRA software to draft and monitors the progress [12, 13]. Once all the specifications and system architecture have been designed, each one of the high-level features have been estimated and included into a work backlog. The RMCC development involves software engineering with robotics Model View Controller (MVC) software works, web technologies, and communication protocols [14].
Innovative Technologies Developed for Autonomous …
283
2.2 Power and Energy System The most valuable achievement of this project is the low environmental impact grade, as a consequence of employing renewable energies in the marine vehicles designed. This factor supports the project sustainability during its life cycle as it is evaluated in [15]. The use of solar energy for the USV and an innovative hydrogen fuel cell design for the AUV have been proposed. Both vehicles must develop an optimal energy management strategy with the corresponding battery packs implemented to maximize the endurance and autonomy during the mission. In the following points, each vehicle energy systems and the batteries pack have been detailed. The USV incorporates photovoltaic panels on the top of the hull to obtain the energy source. For this purpose, different market and technologies options have been examined, and one of the most suitable option is the use of thin film panels due to his properties. Thin film photovoltaic panels are available in amorphous, monocrystalline, and polycrystalline types [16]. Amorphous photovoltaic panels have the highest flexibility and robustness. However, they also have the lowest efficiency which typically does not exceed 7%. Their maximum theoretical efficiency is 15% for a single junction cell, which is considerably lower in comparison with monocrystalline and polycrystalline types. More recently, flexible monocrystalline Si photovoltaic panels have become popular for marine applications. These photovoltaic panels employ Si wafers produced using the Czochralski method [17]. From the photovoltaic panels commercially available, it has been identified that the SXX photovoltaic panels are the most appropriate candidate produced by SOLBIAN with an efficiency around 20%. Their thickness is in the range of 200 µm [18]. The monocrystalline Si wafers are encapsulated in highly flexible polymer materials that provide the mechanical support required during service. Commercially available flexible monocrystalline Si photovoltaic panels can reach efficiencies as high as 23%. The maximum curvature when flexing can be up to 25% [19]. The configuration connection of photovoltaic panels can be carried out either in parallel or in series depending on the voltage required by the system and the voltage of the individual photovoltaic panels. If 24 V photovoltaic panels are installed in parallel configuration as showed in Fig. 3, the voltage to the battery will be 24 V, being 48 V in series configuration case [20]. The AUV developed must be able to perform electrically powered long missions. This fact requires a large amount of energy proportional to the cruising distance stored inside the vehicle. The use of hydrogen fuel cells allows the direct production of electricity through the chemical reaction between hydrogen and oxygen with higher gravimetric capacity compared to batteries. Thus, by rerouting part of the energy request to fuel cells the vehicle is expected to have a lighter weight load and/or a more durable energy supply [21]. For the ENDURUNS AUV glider and USV, the operation of the fuel cell is focused on providing extended recharging capability for the batteries installed onboard. Therefore, the fuel cell is tasked with the generation of the necessary electrical charge at intervals and not continuously, as it is required per mission profile. In this way, the endurance of both vehicles can be maximised,
284
P. J. B. Sánchez et al.
Charge Controler
Solar Panels
Cable Tray Baery Pack
Power Inverter
Adaptor Kit Fig. 3 USV photovoltaic electric scheme with parallel configuration
a)
b)
Fig. 4 a AUV hydrogen fuel cell module prototype b real fuel cell stack developed for the project
and the hydrogen onboard can be used with maximum conservation and efficiency [22]. There exist different fuel cell technologies, some of them such as Alkaline Fuel Cells (AFC) have the advantage that their efficiencies reach 70%, in comparison with Proton Exchange Membrane Fuel Cell (PEMFC) which are unlikely to exceed a power generation efficiency of 60%. AFCs have been used extensively in space applications, e.g. in the International Space Station, in certain diesel-electric submarine types, such as the Type 214, etc. for the hybrid energy support [23]. AFCs are less tolerant to impurities than PEMFC and therefore, require high purity fuel and oxygen to operate [24]. PEMFCs can operate with air but in an isolated environment this is not straightforward and requires further design considerations to be put in place [25]. However, AFCs are compact and lightweight, and they also have a low
Innovative Technologies Developed for Autonomous …
285
b)
a)
b)
c) a) c)
d)
Fig. 5 a DVL model b acoustic modem model c camera model d sensors locations on the ENDURUNS AUV prototype
start-up time with operational temperatures ranging between 60 and 90 °C and operating pressure between 1 and 5 bar [26]. For these reasons, an AFC prototype has been developed, since AFCs are commercially available in the power rating range of 1–100 kW nowadays. Smaller power ratings between 50 and 1000 W are not commercially available and need to be a custom made [27]. In Fig. 4, the hydrogen fuel cell module designed for the ENDURUNS AUV can be observed. This prototype achieves a great relation energy production versus fuel consumption, achieving output values of 12 V and around 290 W, employing 16 gr/h of H2. The capacity of the energy generation is limited by the energy source power and the fuel tanks available. Their size and weight conditionate the manoeuvrability and energy efficiency of the vehicle [28]. In the other hand, the choice of the battery is a function of specific energy, specific power, charge and discharge rates, service lifetime, reliability, and safety. Li-ion batteries are ideal for applications which require high charge storage capacity combined with low weight and volume. Li-ion batteries are already certified for use in underwater operations including AUVs. Lithium Polymer variant (Li-Po) batteries are recently developed in Liion battery technology. Their specific energy depending on the exact chemistry employed, varies between 100 and 260 Wh/kg whilst the energy density varies between 240 and 720 Wh/l [29]. The Lithium Iron Phosphate (LiFePO4) battery is a relatively recent evolution of Li-Po, that have approximately 25% lower specific energy than Li-Po batteries. It allows faster discharge rates and is inherently safer
286
P. J. B. Sánchez et al.
and has a minimum nominal cyclic charge–discharge lifetime which is up to 500% higher than conventional Li-ion and Li-Po batteries. LiFePO4 are also more expensive in comparison with Li-Po batteries [30]. Therefore, the LiFePo4 technology has been opted for the batteries pack implementation (classical configuration for USV and tubular plate for AUV).
2.3 Instrumentations and Measurement To achieve the project mission requirements with optimal performance, a sophisticated vehicle’s configuration together with the employment of the latest technology sensors and components is necessary for marine applications. The two major sensors for data acquisition in ENDURUNS are the MultiBeam Echo Sounder (MBES) and the camera; they are both mounted on the AUV and receive directives directly from the mission planner. All the acquired data is administered by a low-cost embedded microcomputer, an ideal solution to reduce volume, weight and consumption requirements of the acquisition and data storage. This implies a big challenge though, because most systems are meant to work mounted on ships with an active user connected to the device during acquisition, however the mission of ENDURUNS acquisition process is fully autonomous. In other level, the navigation data needs to be fed by the MBES to the georeferenced acquisitions. Integrated systems usually have both navigation and acquisition connected at runtime to the processing software which produces the output data, and in this case it’s the georeferenced sea bottom map. The specific sensors development are discussed in the following paragraphs. The MBES is the main mission payload, and it jointly produces two sets of data: the backscattering (related to the intensity of sound waves reflected by the seabed) and the distance map of the seabed itself from the acoustic sensor. Backscattering data are used for seabed classification and distance data are used for mapping bathymetry [31]. The MBES acquisition system is implemented following a Client–Server architecture. The device itself acts as a server, listening on multiple channels, each one having a specific purpose. In this case, the relevant channels are the command link and the bathymetric data stream link. The command link is used to change the MBES settings, query its state and to issue commands. The latter is the channel through which the data flow passes connecting to this link, and it represents a subscription to the bathymetric data stream. During the acquisition mode, the MBES sends the data to all registered recipients whenever a ping is ready. Note that, even when it is in acquisition mode, the MBES does not start acquiring until at least one recipient is registered. The acquisition CPU hosts two components: the command server, and the acquisition client. The command server acts as a proxy between the Mission Planner and the MBES. Each command issued by the planner to the MBES passes through the command server, which validates it and forwards it. Furthermore, the command server has also the task to start the acquisition client. The acquisition client is responsible for retrieving the data produced by the MBES. After some processing,
Innovative Technologies Developed for Autonomous …
287
the relevant sections of the data packet are retrieved to the current file. At the end of the acquisition routine, the retrieved data are finalized and packaged [32]. The camera is used to execute a deep analysis of interest areas previously selected by the MBES data. Upon completion of data processing steps of the intensity and sea bottom maps, the RMCC or automated systems may decide to review in more detail parts of a previous mission. In this case, the camera may be used to take a closer look. The camera is electrically connected to the AUV and communicates with it through an Ethernet connection, also receiving the power supply from it. When the image acquisition session starts, the camera is switched on by the AUV mission planner. After the bootstrap, the camera connects to the AUV and downloads the image acquisition parameters (e.g., the image acquisition frequency), and then the acquisition session starts [33]. For the georeferencing of the acquired images, a Network Time Protocol (NTP) client installed on board the camera, is used for associating a time stamp to each acquired image [34]. Such a time stamp will be used to associate the acquired image to the GPS position of the AUV, and possibly to the MBES data. The acquired images can be processed at the same time of its acquisition or at the end of the acquisition session, as scheduled by the mission planner. After the relevant content is identified, only relevant images, or part of them, are transferred to the AUV first, then to the USV, and finally to the RMCC. The integration with the onboard mission planner is done using the command server, which receives the instructions and either dispatches them to the sensor or executes internal routines; all these routes are internal to the Local Area Network (LAN) of the AUV. In the case of planner to MBES and planner to camera, all commands are handled and dispatched by the command server on the acquisition CPU. The camera model “GUARD1” designed by the ISMAR (Marine Sciences Institute in La Spezia) is adapted for ENDURUNS AUV [35]. In this project, the acquisition is to be done on the AUV while the processing is on the USV. It is opted to store the two streams of data separately and then merge them during the subsequent processing step. Navigation data comes from the navigation CPU and are time tagged using a synchronized common time reference. During deep sea underwater missions, the only way to transfer the GPS data from the USV to the AUV is using an acoustic modem. Such acoustic communication has very low bandwidth and unstable connection; therefore the vehicle positioning cannot be fed in real-time. Inertial navigation data on the other hand accumulates quadratic errors in time, making the data more and more unreliable as the mission goes on. To mitigate this effect, the navigation system continuously merges data from a high precision Fiber Optic Gyroscope, a Doppler Velocity Log (DVL) (model 500 by Nortek) and uses the absolute position information coming acoustically from the USV to reset the internal navigation filter error [36]. Acoustic modems apart from being used for exchanging data between surface and underwater vehicles, they can also be used for effective and accurate positioning and navigation without the need for the underwater vehicle to resurface; in this case, the S2CR 18/34 model by Evologics has been selected. Data collected in this way can be processed onboard with a reasonable level of uncertainty and when the data is retrieved, more complex
288
P. J. B. Sánchez et al.
and heavy correction algorithms can be used to optimize the reconstruction. In Fig. 5, some of the sensors described previously have been showed.
3 Conclusions The ENDURUNS project autonomous marine vehicles are an exceptional innovative solution for complex offshore works and surveys, due to the novel systems implemented and thanks to the use of sustainable green energies. The main innovations and advances of this project are summarized in the following points: • The communications’ technologies developed, guarantee an optimal real time monitorization of vehicles from the Onshore Remote-Control Centre, with a large volume of data transmissions despite the environment constraints. • The employment of novel flexible monocrystalline photovoltaic panels in the Autonomous Surface Vehicle surface, allows an efficiently green energy obtention. For the Autonomous Underwater Vehicle, a hydrogen fuel cell achieves optimal output specifications around 300 W and 12 V. • The modern storage tanks developed for the vehicles has been simulated successfully. This fact, together with a corresponding batteries packs sizing achieve an energy storage that aids an advancement in the field of autonomous marine vehicles endurances, reaching a long autonomy (up to 6 months). • The latest technologies employed in sensors and software, guarantee a high-grade performance, in the forefront of the current marine vehicle’s market. The proposed ENDURUNS system brings a sustainable, efficient, and novel offshore inspections tool. The evolutions in offshore vehicles makes this project a major contribution to the global scientific and marine industry. Acknowledgements The work reported herewith, has been supported by the European Project H2020 under the Research Grants H2020-MG-2018-2019-2020, ENDURUNS.
References 1. Y. Liu, E. Anderlini, S. Wang, S. Ma, Z. Ding, Ocean Explorations Using Autonomy: Technologies, Strategies and Applications (Singapore), pp. 35–58 2. Global Offshore AUV & ROV Market Research Report 2021. Availabe online: https://www. industryresearch.co/global-offshore-auv-rov-market-17207325. Accessed on November 2021 3. P.J.B. Sanchez, F.P.G. Márquez, S. Govindara, A. But, B. Sportich, S. Marini, V. Jantara, M. Papaelias, Use of UIoT for offshore surveys through autonomous vehicles. Pol. Marit. Res. 28, 175–189 (2021) 4. Dimensions, A. Autonomous Marine Vehicles Publications. Availabe online: https://app.dim ensions.ai/analytics/publication/overview/timeline?search_mode=content&search_text=aut onomous%20marine%20vehicles&search_type=kws&search_field=full_search&year_from= 2015&year_to=2021. Accessed on January
Innovative Technologies Developed for Autonomous …
289
5. H. Weydahl, M. Gilljam, T. Lian, T.C. Johannessen, S.I. Holm, J.Ø. Hasvold, Fuel cell systems for long-endurance autonomous underwater vehicles–challenges and benefits. Int. J. Hydrogen Energ. 45, 5543–5553 (2020) 6. G. Bruzzone, R. Ferretti, A. Odetti, Unmanned Marine Vehicles (Multidisciplinary Digital Publishing Institute, 2021), vol. 9, p. 257 7. G. Schmidt, GPS based navigation systems in difficult environments. Gyroscopy Navig. 10, 41–53 (2019) 8. P. Asgharian, Z.H. Azizul, Proposed Efficient Design for Unmanned Surface Vehicles. arXiv preprint arXiv:2009.01284 (2020) 9. P.J.B. Sánchez, M. Papaelias, F.P.G. Márquez, Autonomous underwater vehicles: Instrumentation and measurements. IEEE Instrum. Meas. Mag. 23, 105–114 (2020) 10. S. Marini, N. Gjeci, S. Govindaraj, A. But, B. Sportich, E. Ottaviani, F.P.G. Márquez, P.J. Bernalte Sanchez, J. Pedersen, C.V. Clausen, ENDURUNS: An integrated and flexible approach for seabed survey through autonomous mobile vehicles. J. Mar. Sci. Eng. 8, 633 (2020) 11. P. Di Lillo, D. Di Vito, E. Simetti, G. Casalino, G. Antonelli, Satellite-based tele-operation of an underwater vehicle-manipulator system. Preliminary experimental results, in Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA) (2018), pp. 7504– 7509 12. Harned, D. Hands-On Agile Software Development with JIRA: Design and manage software projects using the Agile methodology; Packt Publishing Ltd: 2018. 13. J. Fisher, D. Koning, A. Ludwigsen, Utilizing Atlassian JIRA for Large-Scale Software Development Management (Citeseer, 2013) 14. K.S. Kalupahana Liyanage, M. Ma, P.H. Joo Chong, Controller placement optimization in hierarchical distributed software defined vehicular networks. Comput. Netw. 135, 226–239 (2018). https://doi.org/10.1016/j.comnet.2018.02.022 15. L.L.P. De Souza, E.E.S. Lora, J.C.E. Palacio, M.H. Rocha, M.L.G. Renó, O.J. Venturini, Comparative environmental life cycle assessment of conventional vehicles with different fuel options, plug-in hybrid and electric vehicles for a sustainable transportation system in Brazil. J. Clean. Prod. 203, 444–468 (2018) 16. A. Shah, P. Torres, R. Tscharner, N. Wyrsch, H. Keppner, Photovoltaic technology: The case for thin-film solar cells. Science 285, 692–698 (1999). https://doi.org/10.1126/science.285.542 8.692 17. Z. Galazka, Czochralski Method. Gallium Oxide Mater. Prop. Cryst. Growth Devices 293, 15 (2020) 18. L.C. Andreani, A. Bozzola, P. Kowalczewski, M. Liscidini, L. Redorici, Silicon solar cells: Toward the efficiency limits. Adv. Phys. X 4, 1548305 (2019) 19. S. Patil, R. Jani, N. Purabiarao, A. Desai, I. Desai, K. Bhargava, Flexible solar cells, in Fundamentals of Solar Cell Design (2021), pp. 505–536.https://doi.org/10.1002/9781119725022. ch16pp 20. N.B.M. Yusof, A.B. Baharuddin, The study of output current in photovoltaics cell in series and parallel connections. Int. J. Technol. Innov. Humanit. 1, 7–12 (2020) 21. N.F. Thomas, R. Jain, N. Sharma, S. Jaichandar, Advancements in automotive applications of fuel cells—A comprehensive review. Proc. ICDMC 2020, 51–64 (2019) 22. P. Bernalte, F. Márquez, S. Marini, F. Bonofoglio, L. Barbieri, N. Gjeci, E. Ottaviani, S. Govindaraj, S. Coene, A. But, New approaches for renewable energy management in autonomous marine vehicles. In Developments in Renewable Energies Offshore (CRC Press, 2020), pp. 739–745 23. S. Belz, B. Ganzer, E. Messerschmid, K.A. Friedrich, U. Schmid-Staiger, Hybrid life support systems with integrated fuel cells and photobioreactors for a lunar base. Aerosp. Sci. Technol. 24, 169–176 (2013). https://doi.org/10.1016/j.ast.2011.11.004 24. C. Acar, A. Beskese, G.T. Temur, Comparative fuel cell sustainability assessment with a novel approach. Int. J. Hydrogen Energ. (2021) 25. M. Rostami, M. Dehghan Manshadi, E. Afshari, Performance evaluation of two proton exchange membrane and alkaline fuel cells for use in UAVs by investigating the effect of operating altitude. Int. J. Energ. Res. (2021)
290
P. J. B. Sánchez et al.
26. C. Bernay, M. Marchand, M. Cassir, Prospects of different fuel cell technologies for vehicle applications. J. Power Sources 108, 139–152 (2002). https://doi.org/10.1016/S03787753(02)00029-0 27. V. Cigolotti, M. Genovese, P. Fragiacomo, Comprehensive review on fuel cell technology for stationary applications as sustainable and efficient poly-generation energy systems. Energies 14, 4963 (2021) 28. V. Jantara Junior, I.S. Ramirez, F.P.G. Márquez, M. Papaelias, Numerical evaluation of type I pressure vessels for ultra-deep ocean trench exploration 29. J. Garche, E. Karden, P.T. Moseley, D.A. Rand, Lead-Acid Batteries for Future Automobiles (Elsevier, 2017) 30. C. Sun, J. Liu, Y. Gong, D.P. Wilkinson, J. Zhang, Recent advances in all-solid-state rechargeable lithium batteries. Nano Energ. 33, 363–386 (2017) 31. D. Iwen, M. WA˙ ˛ z, Benefits of using ASV MBES surveys in shallow waters and restriced areas, in Proceedings of 2019 European Navigation Conference (ENC) (2019), pp. 1–3 32. D. Nathalie, S. Thierry, G. François, J. Etienne, V. Lucas, B. Romain, Outlier detection for multibeam echo sounder (MBES) data: From past to present, in Proceedings of OCEANS 2019-Marseille (2019), pp. 1–10 33. I.S. Ramírez, P.J. Bernalte Sánchez, M. Papaelias, F.P.G. Márquez, Autonomous underwater vehicles and field of view in underwater operations. J. Mar. Sci. Eng. 9, 277 (2021) 34. G.A. Hatcher, J.A. Warrick, A.C. Ritchie, E.T. Dailey, D.G. Zawada, C. Kranenburg, K.K. Yates, Accurate bathymetric maps from underwater digital imagery without ground control. Front. Mar. Sci. 7, 525 (2020) 35. S. Marini, L. Corgnati, C. Mantovani, M. Bastianini, E. Ottaviani, E. Fanelli, J. Aguzzi, A. Griffa, P.-M. Poulain, Automated estimate of fish abundance through the autonomous imaging device GUARD1. Measurement 126, 72–75 (2018) 36. X. Mu, B. He, S. Wu, X. Zhang, Y. Song, T. Yan, A practical INS/GPS/DVL/PS integrated navigation algorithm and its application on autonomous underwater vehicle. Appl. Ocean Res. 106, 102441 (2021). https://doi.org/10.1016/j.apor.2020.102441
Machine Learning Approaches to Predict Breast Cancer: Bangladesh Perspective Taminul Islam , Arindom Kundu , Nazmul Islam Khan , Choyon Chandra Bonik , Flora Akter , and Md Jihadul Islam
Abstract Nowadays, Breast cancer has risen to become one of the most prominent causes of death in recent years. Among all malignancies, this is the most frequent and the major cause of death for women globally. Manually diagnosing this disease requires a good amount of time and expertise. Breast cancer detection is time-consuming, and the spread of the disease can be reduced by developing machinebased breast cancer predictions. In Machine learning, the system can learn from prior instances and find hard-to-detect patterns from noisy or complicated data sets using various statistical, probabilistic, and optimization approaches. This work compares several machine learning algorithms’ classification accuracy, precision, sensitivity, and specificity on a newly collected dataset. In this work Decision tree, Random Forest, Logistic Regression, Naïve Bayes, and XGBoost, these five machine learning approaches have been implemented to get the best performance on our dataset. This study focuses on finding the best algorithm that can forecast breast cancer with maximum accuracy in terms of its classes. This work evaluated the quality of each algorithm’s data classification in terms of efficiency and effectiveness. And also compared with other published work on this domain. After implementing the model, this study achieved the best model accuracy, 94% on Random Forest and XGBoost.
1 Introduction Tumors form when a single cell divides unchecked, leading to an unwelcome growth known as cancer. Benign and malignant are the two types of classes for cancer detection. A malignant tumor develops fast and damages its tissues by invading them [1]. There is a malignant tissue that is forming in the breast that is called breast cancer. Breast cancer symptoms include an increase in breast mass, a change in breast size and form, a change in breast skin color, breast discomfort, and changes in the breast’s genetic makeup. Worldwide, breast cancer is the second leading cause of T. Islam (B) · A. Kundu · N. Islam Khan · C. Chandra Bonik · F. Akter · M. Jihadul Islam Department of Computer Science and Engineering, Daffodil International University, Ashulia, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_23
291
292
T. Islam et al.
death in women after heart disease. And it affects more than 8% of women at some point in their lives [2]. According to the WHO’s annual report, more than 500,000 women have breast cancer every year. It is predicted that the prevalence of this disease will arise in the future due to environmental damage. Obesity, hormone treatments therapy during menopause, a family medical history of breast cancer, a lack of physical activity, long term exposure to infrared energy, having children later in life or not at all, and early age at which first menstruation occurs are some of the risk factors for breast cancer in women. These and other factors are discussed further below. Many tests, including ultrasound, biopsy, and mammography, are performed on patients to determine whether they have breast cancer. This is because the symptoms of breast cancer vary widely. The biopsy, which includes the extraction of tissue or cell samples for analysis, is the most suggestive of these procedures. A human observer is required to detect specific signal characteristics to monitor and diagnose illnesses. Several computer-aided-diagnosis (CAD) techniques for computer-aided diagnostic systems have been developed during the last ten years to overcome this challenge due to the enormous population of clients in the critical care section and the necessity of constant surveillance of such circumstances. Using these methods, diagnostic criteria that are predominantly qualitative are transformed into a problem of quantitative feature categorization. Diagnosis and prognosis of breast cancer results can be predicted using a variety of machine learning algorithms. This work aims to assess those algorithms’ accuracy, sensitivity, specificity, and precision in terms of their efficiency and performance. This paper has another five sections. The literature review is covered in Sect. 2 of this paper. Section 3 explains the methodology of this work. Section 4 covers the experimental comparison with results. Section 5 presents the findings and discussions of this work. Finally, Sect. 6 finishes the paper with conclusions.
2 Literature Review V. Chaurasia and T. Pal applied their model to find the best machine learning algorithms to predict breast cancer. They applied SVM, Naïve Bayes, RBF NN, DT, and basic CART [3] in their work. After implementing their model that works achieved the best AUC 96.84% on SVM in Wisconsin Breast Cancer (original) datasets. Breast cancer survival time can be predicted using an ensemble of machine learning algorithms, as explored by Djebbari et al. [4]. Compared to earlier results, their method has a higher level of accuracy in their own breast cancer dataset. DT, SVM, Naïve Bayes, and K-NN are compared by S. Aruna and L. Nandakishore to determine the best classification in WBC [5]. They achieved the best AUC 96.99% on the SVM classifier in that work. Tumor cells were classified using six machine learning methods developed by M. Angrap. They developed and built the Gated Recurrent Unit, a variation of the long short-term memory neural network (GRU). neural network softmax layer was
Machine Learning Approaches to Predict Breast Cancer …
293
replaced with the support-vector machine layer (SVM). GRU SVM’s 99.04% accuracy was the best on that work [6]. Utilizing association rules and a neural network to train the model, Karabatak et al. [7] increased the model’s accuracy to 95.6% by using cross-validation. It was used Naïve Bayes classifiers with a new technique for weight modification. Mohebian et al. [8] investigated the use of ensemble learning to predict the recurrence of cancer. Researchers Gayathri et al. conducted an evaluation of three machine learning models that had the most significant outcomes when utilizing a relevance vector [9]. To get the best results, Payam et al. used preprocessing and data reduction techniques, such as the radial basis function network (RBFN), in combination [10]. Breast cancer survival prediction models were developed using data from research on breast cancer published in [11]. Breast cancer survivorship prediction algorithms were used for both benign and malignant tumors in this work. ML algorithms for breast cancer detection have been studied extensively in the past, as shown in [12]. They proposed that data augmentation approaches might help alleviate the issue of a small amount of data being available. Using computer-aided mammography image characteristics, the authors in [13] demonstrated a method for detecting and identifying cell structure in automated systems. According to [14], numerous classification and clustering techniques have been compared in the study. Classification algorithms outperform clustering algorithms, according to the findings. Table 1 states the clear comparison between this work and other previously published work.
3 Methodology An overview of the main technique of the study is provided in this section. The main workflow of this research is shown in Fig. 1. Dataset origin and features are discussed in this section. In addition, the surrounding background is discussed. This chapter concludes with a brief discussion of specific classification models and assessment techniques. Our researchers have collected the data manually. Then it is important to clean up noisy and inconsistent data using preprocessing techniques. Various preprocessing techniques have boosted the final performance of this work. We also had to eliminate erroneous data from the model to get it to work. For trials relevant to this study, we train numerous five classification algorithms. Figure 1. illustrates the main workflow of this research.
3.1 Data Description In this work, researchers have collected total of 456 data from three hospitals of Bangladesh named Dhaka Medical College Hospital, LABAID Specialized Hospital, and Anwar Khan Medical College Hospital. The data contains 254 benign and rest 202 are classified as malignant that shown in Fig. 2.
294
T. Islam et al.
Table 1 Comparison with previously published work Ref
Year
Contribution
Dataset
Algorithms
Best accuracy (%)
UCI machine learning repository
ANN, DT, SVM, and NB
86
[16] 2020 Developed a model UCI machine to predict breast learning repository cancer
RF, XGBoost
74.73
[17] 2020 Predicted breast cancer using effective data mining classifiers
Wisconsin Breast Cancer dataset
K-Means Clustering, DT
94.46
[18] 2018 Implemented machine learning techniques to predict breast cancer
Wisconsin Breast Cancer dataset
KNN, NB, SVM, RF
97.9
[15] 2020 Implemented machine learning techniques to predict breast cancer
[19] 2019 Predicted whether Multidimensional KNN, SVM, RF, the person has heterogeneous data GB breast cancer or not [20] 2021 Predict breast cancer at an early stage or malignant stage
Wisconsin Diagnostic dataset
93
SVM,K-NN, NB, 97 DT, K-Means, ANN
[21] 2017 The principle Wisconsin component analysis Diagnostic dataset (PCA) approach is applied to successfully increase the moral rectitude of the attributes addressing eigenvector issue
SVM
93
[22] 2018 Implemented model Wisconsin to classify tumor as Diagnostic dataset benign or malignant
RF, K-NN, NB
95
The entire dataset was factored in when doing the dataset analysis. It is counterplotted in Fig. 3. that the dataset’s mean radius feature. Patients suspected of having cancer have a radius greater than 1, whereas those who don’t appear to have the disease have a radius closer to 1. Heatmaps are shown in Fig. 4. to highlight the association between the dataset’s characteristics. 2D Correlation Matrix Correlation Heatmap illustrates a twodimensional matrix between two discrete dimensions, where each row and each
Machine Learning Approaches to Predict Breast Cancer …
295
Fig. 1 Proposed model workflow
Fig. 2 Data class distribution
column represents one of the two-dimension values. Colored pixels on a monochrome scale are used in this heatmap to highlight the correlation between the dataset’s attributes. There is a growing correlation as the intensity of the color increases. The number of observations that meet the dimensional values is directly proportional to the color value of the cells. The proportionality between the two characteristics is used to determine the dimensional value. The positive correlation is obtained when both variables vary and move in the direction. A reduction in one measure is correlated with a rise in the other, and the opposite is true. There are six features in this work. These are: mean radius, mean perimeter, mean texture, mean smoothness, mean area, and diagnosis.
296
Fig. 3 Mean radius of the dataset
Fig. 4 Correlation between the features
T. Islam et al.
Machine Learning Approaches to Predict Breast Cancer …
297
3.2 Data Preprocessing To begin a study, it is necessary to preprocess the data that has been collected. Our first step is to analyze the information we’ve gathered so far. For this reason, we rely on a variety of sources for our data. To begin, we prepare the data needed to remedy the issue. These data sets contain a wide variety of numerical values. A single piece of this data is analyzed at a time. Machine learning models can process only numerical data. Before data analysis, the mean and mode were trimmed. Before computing the average, the highest and most minor numbers are trimmed by a tiny percentage [11]. The percentage measurements are trimmed in two directions.
3.3 Machine Learning Models Machine learning is the most accessible way to predict breast cancer disease. In the Literature Review section, it is clear that the maximum of the work has been done successfully by machine learning and deep learning methods. We know deep learning is a subset of machine learning. Five different machine learning algorithms have processed this new dataset to discover the most accurate method. Decision Tree, Naïve Bayes, Extreme Gradient Boosting, Logistic Regression, and Random Forest are all classifiers of these techniques. These models have gained a brief emphasis in this section.
3.3.1
Decision Tree (DT)
DT is a robust machine learning algorithm that can classify and predict data [12]. For the vector to go ahead, each node acts as a test criterion, and the terminal nodes offer a projected class or prediction value. DT can be structured in this way: DT works well for a small number of class labels, but it doesn’t work as well when there are many classes and a low number of training data. Additionally, the computational cost of training DTs can be significant.
3.3.2
Random Forest (RF)
Many different paths can be explored in RF. The amount of trees in the forest has a direct bearing on the outcome. The more trees we have, the more accurate our results will be. The classifier in RF is either C4.5 or J48. Bagging and various feature selection for decision trees were proposed by Breiman in 2001 [13]. RF is a classifier that requires human supervision.
298
3.3.3
T. Islam et al.
Extreme Gradient Boosting (XGBoost)
Gradient-boosting frameworks employ a method called XGBoost, which is an ensemble decision-tree technique. In general, decision trees are simple to see and understand, but developing an intuitive understanding for another era of tree-based algorithms can be challenging [14].
3.3.4
Naïve Bayes (NB)
Naïve Bayes is a classifier that considers that each feature only influences the class [15]. As a result, each feature is merely a child of the class. NB is appealing because it provides a theoretical foundation that is both explicit and solid, ensuring the best possible induction from a given set of assumptions. According to several real-world examples, the total independence assumptions of the features concerning the class are broken. Inside the wake of such violations, however, NB has shown to be extraordinarily strong. Thanks to its straightforward structure, NB is quick, easy to deploy, and successful. Useful for high-dimensional data, as each feature’s probability is evaluated separately [16].
3.3.5
Logistic Regression (LR)
Logistic regression is a supervised learning classification approach. X can only have discrete impacts on the classification problem’s target variable (or output), y. Logistic regression is, in fact, a regression model. It constructs a regression method to forecast the probability data input falls into the “1” category [23]. Using logistic regression, classification challenges such as cancer detection can be addressed quickly.
3.4 Experimental Setup A training and testing phase were used to implement all five machine learning algorithms. The dataset was separated with distinct values assigned to the algorithm’s data selection to train and test the model. We sent 80% data for training and 20% for testing. The experiment was run on a Jupyter notebook running Python 3.0 with 12 GB RAM on an Intel Core i5 10th generation CPU. The experiment was conducted using the sklearn library such as pandas, TensorFlow, matplotlib, and Keras.
Machine Learning Approaches to Predict Breast Cancer …
299
3.5 Performance Measurement Unit Training and generalization errors are two of the most common types of mistakes. Increasing the complexity of the model can help reduce training errors since the complexity of the model reduces the training error rate. To minimize generalization errors, use the Bias–Variance Decomposition (Bias + Variance) technique. It’s called overfitting if a drop-in training error leads to a rise in test error rates. Accuracy, precision, recall, and F 1 -Score can be used to evaluate the performance of each categorization system. Various authors employed a diverse variety of to evaluate their models’ efficacy. Even though the bulk of the research utilized many indicators to evaluate their performance, a minor number of studies also used a single statistic to do the same. In this work, accuracy, precision, recall and F 1 -Score is considered for evaluating this research work. This four-measurement unit is the best for prediction data analysis. Accuracy relates to the capability to identify and classify instances [18] correctly. Here TP = True Positive, TN = True Negative, FP = False. Positive and FN = False Negative. Equation 1 shows the mathematical expression of accuracy. Accuracy =
TP + TN TP + FP + TN + FN
(1)
For statistical analyses, precision is defined as the number of observed positive events divided by the total number of expected positive events [17]. Equation 2 shows the mathematical expression of precision. Precision =
TP TP + FP
(2)
The model’s recall measures how well it can identify those people who have cancer [17]. Equation 3 shows the mathematical expression of recall. Recall =
TP TP + FN
(3)
Due to its reliance on both precision and recall, this is referred to as the harmonic mean. Equation 4 is an expression of a mathematical equation for F 1 Score [17]. F1 Score = 2
Precision × Recall Precision + Recall
(4)
300
T. Islam et al.
4 Experimental Evaluation There are a total of five machine learning methods that have been applied to this dataset. When comparing the performance of one algorithm to another, there is a tight difference found. Based on the accuracy level RF and XGBoost performs better than the other five algorithms. RF and XGBoost achieved the best 94% accuracy, whereas NB and LR achieved equally 93% accuracy. This study found 91% accuracy in DT. In this study, we found the best precision from the NB and LR, but in terms of overall accuracy, NB and LR stand jointly third position. On the other hand, XGBoost and RF provide the best recall 0.98 and 0.97 among the other algorithms. This work found the lowest performance from the DT so far. Table 2 illustrates the classification report comparison of five machine learning algorithms, where each method is evaluated in terms of its performance in two classes: benign and malignant. In Fig. 5, it clearly shows that, the accuracy of RF and XGBoost is higher than other five machine learning methods, as demonstrated by the graph. Figure 5 shows the accuracy percentage between applied five machine learning algorithms. Here it is clearly shows that, DT performs poor than all applied algorithms where NB and LR performs equally same. Finally RF and XGBoost perform best in terms of AUC comparisn. A classification algorithm’s performance can be summarized easily using a confusion matrix. Even if your dataset has just two classes, the accuracy of your classification might be deceiving if the number of observations for each class is uneven. We can better understand how accurate the classification model is by calculating a confusion matrix (CM) [19]. Figures 6, 7, 8, 9 and 10 show the five confusion matrix to present the performances of applied five machine learning algorithms where x-axis states the predicted level and y-axis states the true level. Table 2 Comparison of classification report among five algorithms Algorithms
Class
Precision
Recall
F 1 score
Accuracy
DT
Benign Malignant
0.91 0.91
0.88 0.94
0.89 0.93
0.91
RF
Benign Malignant
0.96 0.93
0.90 0.97
0.92 0.95
0.94
NB
Benign Malignant
0.83 1.00
1.00 0.89
0.91 0.94
0.93
XGBoost
Benign Malignant
0.98 0.92
0.88 0.98
0.92 0.95
0.94
LR
Benign Malignant
1.00 0.89
0.83 1.00
0.91 0.94
0.93
Machine Learning Approaches to Predict Breast Cancer …
301
Fig. 5 AUC comparison of five machine learning algorithm
Fig. 6 Confusion matrix of DT
5 Discussion The majority of the current research in this domain is devoted to improving the ability to predict the incidence of breast cancer. But few works have been done with a newly collected dataset. Machine learning algorithms were used to see if they could better and predict cancer with the best possible accuracy. The classification strategy worked effectively in this work. It is essential to compare this work with other works to present its contribution to the global society.
302
T. Islam et al.
Fig. 7 Confusion matrix of RF
Fig. 8 Confusion matrix of NB
The main goal of this work was to find the best machine learning techniques that can predict breast cancer with maximum performance. Breast cancer prediction is very alarming work. That’s why there is a lot of work on this domain. But maximum of the work has been done with a particular two or three open accessed datasets. For this reason, we found an almost similar difference between one to another algorithms. Besides, the AUC score was quite similar between various published works. UCI machine learning repository and Wisconsin Diagnostic dataset are the common datasets used on a maximum of the work. Those work found almost above 90% of accuracy. There are different numbers of AUC on breast cancer prediction where
Machine Learning Approaches to Predict Breast Cancer …
303
Fig. 9 Confusion matrix of XGBoost
Fig. 10 Confusion matrix of LR
this research work was compared. This work found 94% best accuracy on Rf and XGBoost algorithm. After pre-processing all the datas, we did the feature extraction. Then we applied the algorithms. This work has applied five machine learning algorithms with a newly collected dataset. There is the major difference between other published work. However, In terms of a new collected dataset we found the best accuracy on Random Forest and XGBoost algorithm. The accuracy level can be high in future with more accurate and balanced dataset.
304
T. Islam et al.
6 Conclusion In Bangladesh, breast cancer is the most dangerous disease for women that stands at the top level for its death ratio. There are several machine learning and data mining techniques that use to examine medical analysis. Classifiers for medical diagnostics that are both accurate and efficient in computing provide a significant challenge for data miners and machine learning researchers. This work has applied five leading machine learning algorithms, DT, RF, XGBoost, NB, and LR, to predict breast cancer on a new dataset. The dataset has been collected from three Bangladeshi hospitals. After implementing the model, this work achieved the best 94% accuracy on RF and XGBoost algorithms. We compared the findings with those of previous research and discovered work, and we found that this approach performed well. This research relied heavily on datasets and methodology. There are some limitations on this work. Collecting all quality new data in this pandemic circumstance was difficult. There can be more data to collect. But in terms of the new dataset, the outcome satisfies us. We should employ more efficient training and preprocessing approaches to achieve better results. Increasing the sample size of the dataset in the future will ensure the accuracy and effectiveness of this study.
References 1. T.J. Key, P.K. Verkasalo, E. Banks, Epidemiology of breast cancer. Lancet Oncol. 2(3), 133–140 (2001) 2. U.S. Cancer Statistics Working Group (2012) United States Cancer Statistics: 1999–2008 Incidence and Mortality Web-based Report. Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute, Atlanta (GA) 3. V. Chaurasia, S. Pal, Data mining techniques: to predict and resolve breast cancer survivability. Int. J. Comput. Sci. Mob. Comput. IJCSMC 3(1), 10–22 (2014) 4. A. Djebbari, Z. Liu, S. Phan, F. Famili, An ensemble machine learning approach to predict survival in breast cancer. Int. J. Comput. Biol. Drug Des. 1(3), 275–294 (2008) 5. S. Aruna, S.P. Rajagopalan, L.V. Nandakishore, Knowledge based analysis of various statistical tools in detecting breast cancer. Comput. Sci. Inform. Technol. 2(2011), 37–45 (2011) 6. A.F.M. Agarap, On breast cancer detection: an application of machine learning algorithms on the Wisconsin diagnostic dataset, in Proceedings of the 2nd International Conference on Machine Learning and Soft Computing (2018, February), pp. 5–9 7. V. Chaurasia, S. Pal, B. Tiwari, Prediction of benign and malignant breast cancer using data mining techniques. J. Algorithms Comput. Technol. 12(2), 119–126 (2018) 8. N. Fatima, L. Liu, S. Hong, H. Ahmed, Prediction of breast cancer, comparative review of machine learning techniques, and their analysis. IEEE Access 8, 150360–150376 (2020) 9. A. Toprak, Extreme learning machine (ELM)-based classification of benign and malignant cells in breast cancer. Med. Sci. Monitor Int. Med. J. Exp. Clin. Res. 24, 6537 (2018) 10. D.S. Jacob, R. Viswan, V. Manju, L. PadmaSuresh, S. Raj, A survey on breast cancer prediction using data mining techniques, in 2018 Conference on Emerging Devices and Smart Systems (ICEDSS) (IEEE, 2018, March), pp. 256–258 11. T. Padhi, P. Kumar, Breast cancer analysis using WEKA, in 2019 9th International Conference on Cloud Computing, Data Science and Engineering (Confluence) (IEEE, 2019, January), pp. 229–232
Machine Learning Approaches to Predict Breast Cancer …
305
12. T. Thomas, N. Pradhan, V.S. Dhaka, Comparative analysis to predict breast cancer using machine learning algorithms: a survey, in 2020 International Conference on Inventive Computation Technologies (ICICT) (IEEE, 2020, February), pp. 192–196 13. F. Livingston, Implementation of Breiman’s random forest machine learning algorithm. ECE591Q Mach. Learn. J. Paper 1–13 (2005) 14. R. Mitchell, E. Frank, Accelerating the XGBoost algorithm using GPU computing. PeerJ Comput. Sci. 3, e127 (2017) 15. M. Shalini, S. Radhika, Machine learning techniques for prediction from various breast cancer datasets, in 2020 Sixth International Conference on Bio Signals, Images, and Instrumentation (ICBSII) (IEEE, 2020, February), pp. 1–5 16. S. Kabiraj, M. Raihan, N. Alvi, M. Afrin, L. Akter, S.A. Sohagi, E. Podder, Breast cancer risk prediction using XGBoost and random forest algorithm, in 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (IEEE , 2020, July), pp. 1–4 17. S. Marne, S. Churi, M. Marne, Predicting breast cancer using effective classification with decision tree and k means clustering technique, in 2020 International Conference on Emerging Smart Computing and Informatics (ESCI) (IEEE, 2020, March), pp. 39–42 18. Y. Khourdifi, M. Bahaj, Applying best machine learning algorithms for breast cancer prediction and classification, in 2018 International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS) (IEEE, 2018, December), pp. 1–5 19. M.S. Yarabarla, L.K. Ravi, A. Sivasangari, Breast cancer prediction via machine learning, in 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI) (IEEE, 2019, April), pp. 121–124 20. P.S.S. Varma, S. Kumar, K.S.V. Reddy, Machine learning based breast cancer visualization and classification, in 2021 International Conference on Innovative Trends in Information Technology (ICITIIT) (IEEE, 2021, February), pp. 1–6 21. A. Sharma, S. Kulshrestha, S. Daniel, Machine learning approaches for breast cancer diagnosis and prognosis, in 2017 International Conference on Soft Computing and its Engineering Applications (icSoftComp) (IEEE, 2017, December), pp. 1–5 22. S. Sharma, A. Aggarwal, T. Choudhury, Breast cancer detection using machine learning algorithms, in 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS) (IEEE, 2018, December), pp. 114–118 23. G.I. Webb, E. Keogh, R. Miikkulainen, Naïve Bayes. Encyclopedia Mach. Learn. 15, 713–714 (2010) 24. T. Vijayakumar, Posed inverse problem rectification using novel deep convolutional neural network. J. Innov. Image Process. (JIIP) 2(03), 121–127 (2020) 25. A.P. Pandian, Review on image recoloring methods for efficient naturalness by coloring data modeling methods for low visual deficiency. J. Artif. Intell. 3(03), 169–183 (2021)
A Comparative Review on Image Analysis with Machine Learning for Extended Reality (XR) Applications P. Vijayakumar and E. Dilliraj
Abstract Progressions in Medical, Industrial 4.0, and training require more client communication with the information in reality. Extended reality (XR) development could be conceptualized as a wise advancement and a capable data variety device fitting for far off trial and error in image processing. This innovation includes utilizing the Head Mounted Devices (HMD) built-in with a functionalities like data collection, portability and reproducibility. This article will help to understand the different methodologies that are used for 3Dimensional (3D) mode of interaction with the particular data set in industries to refine the system and uncovers the bugs in the machineries. To identify the critical medical issues, the future technology can give a comfort for the medic to diagnose it quickly. Educators currently utilizing video animation are an up-rising pattern. Important methods used for various applications like, Improved Scale Invariant Feature Transform (SIFT), Block Orthogonal Matching Pursuit (BOMP), Oriented fast and Rotated Brief (ORB) feature descriptor and Kanade-Lucas-Tomasi (KLT), Semi-Global Block Matching (SGBM) algorithm etc., In high-speed real time camera, the position recognition accuracy of an object is less than 65.2% as an average in motion due to more noise interferences in depth consistency. So, more optimization is needed in the algorithm of depth estimation. Processing time of a target tracking system is high (10.670 s), that must be reduced to provide increased performance in motion tracking of an object in real-time. XR is a key innovation that is going to work with a change in perspective in the manner in which clients collaborate with information and has just barely as of late been perceived as a feasible answer for tackling numerous basic requirements.
P. Vijayakumar (B) · E. Dilliraj SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamilnadu, India e-mail: [email protected] E. Dilliraj e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_24
307
308
P. Vijayakumar and E. Dilliraj
1 Introduction Extended Reality is a “catchall” term that includes all the reality technologies like VR, AR, and MR that enhance or replace our view of the world. Extended Reality innovation focuses on the betterment of manufacturers, medics, and instructors in education in a real-world scenario. The advancement of time-critical communications over 5G networks makes use of Extended Reality innovation for edge cloud and enhances the user experience [22]. Obviously distant techniques have been helpfully conveyed for non-XR research, and apparently bring advantages, for example, more straightforward member enlistment, decreased enrollment cost and widened variety, without presenting significant inclinations. In any case, there is as yet a lack of exploration in regards to the degree to which far off XR examination can and has been utilized to use the special advantages of both XR (natural control, tactile deceptions, information assortment, replication) and remote (interest, common sense, cost-reserve funds) strategies, just as the expected effect of their consolidated constraints. Subsequently an outline of XR investigator experiences and feelings concerning distant XR investigation could help us with perceiving how these apply basically at the current time, and grasp the basic areas for future enhancements in this field [27]. With the expanding accessibility of buyer XR gadgets (gauges show 0.5 million top of the line in the year 2020, the selling of an XR HMDs in 2025 is expecting to raise up to 43,500,000), wellbeing and security worries around lab exercises, especially for research including Head Mounted Devices (HMDs). It appears to be a significant opportunity to comprehend the originations around remote exploration from specialists who use XR innovations [22].
1.1 Virtual Reality Virtual reality is generally utilized externally the lab is likewise where it is regularly capable (for example home clients, gaming). For AR, there is a more prominent qualification is required by users to works with AR environment. Since the interaction between data and user need a User Interface (UI), users must have knowledge on UI [22].
1.2 Augmented Reality Augmented reality innovation is an innovation that astutely incorporates virtual data with the genuine world, which utilizes PC produced text, pictures, three-layered models, audio, motion pictures, and other illusion data to recreate and have to do with them to the present reality. These two sorts of data are basically unrelated and augment to accomplish “improvement” of this present reality [23]. The AR
A Comparative Review on Image Analysis with Machine Learning …
309
innovation can be utilized to imagine information from many sensors all the while overlaying pertinent and noteworthy data over conditions through a headset. Mobile phones make the AR innovation effectively open in the field. With simpler and more open use, a wide range of utilization (other than video gaming) becomes more practical. In rundown, AR is a forthcoming innovation which will significantly affect numerous applications and over a wide kind of advances [25]. It’s not simply versatile applications and retail that AR and ML are having an effect. There has been research occurring on how it very well may be utilized in a medical procedure and medical care. We’ve spoken with regards to how computergenerated reality (VR) is causing ripple effects in medication previously, yet ML and AR could upgrade these considerably further, and keep on having a genuine effect. In addition, AR and VR add another degree of inundation to preparing conditions, and calculator learning takes into consideration more top to bottom preparation in medication, however training, the military, from there, the sky is the limit. It can customize and adjust preparing to enhance execution. With this truckload of being said, obviously increased reality has made considerable progress since it initially showed up in the tech market. From telephone applications like Snapchat to genuine use cases, AI could make more extravagant encounters for those utilizing AR and have a significant effect on the manner in which we use innovation.
1.3 Machine Learning Machine Learning is depicted with numerous varieties of algorithms to extend the help to the humans in every day activities. Especially concern with industries, these technologies are reducing the manual implementation error on identification of faults in the organization-level machineries [32]. These identifications can be done is very high computational speed with the help of ML algorithms and provide a high recognition accuracy on the object detection and tracking in industries [30]. To find the changes in the binary value represented for the image data, this system uses algorithms like detect spectral, temporal constraints, and spatial to identify the objects [31]. Supervised Learning—is the best example for the classification system which it provides the output result of a system with respect to the desired input by the human (or) machine. Unsupervised Learning—the inputs for these systems are not pre-defined by the user (or) any form of entities. The observations of the systems can be detected from the effects that are captured from the system activities. Semi-supervised Learning—which joins the labels of marker based (or) non-markerbased systems to execute an exact functionality of a system (or) classification of a system in real-time.
310
P. Vijayakumar and E. Dilliraj
Reinforcement Learning—the algorithms used here to understand the functional output of a system and generate a new output from the feedbacking input of a system. It is mostly used in the closed loop systems. Especially on the perspectives of industry applications. Multitask Learning—perform multiple tasks learning is a sub-field of Machine Learning that means to settle numerous various errands simultaneously, by exploiting the similitude between various assignments. This can further develop the learning effectiveness and furthermore go about as a regularize. Ensemble Learning—the cycle by which different models, like classifiers or specialists, are decisively produced and consolidated to take care of a specific computational knowledge issue. Group learning is principally used to work on the exhibition of a model, or lessen the probability of an appalling determination of a helpless one. Different utilizations of gathering learning incorporate appointing a certainty to the choice made by the model, choosing ideal elements, information combination, steady learning, non-fixed learning and blunder adjusting. Instance Based Learning—is the term which is comes from the classification and regression method. The training set for the model based on the similarity of the queries asked by the user and produce the output based on the similarity. Neural Network—the series formation of algorithms will provide the recognition of input data set and understanding the relation between input and trained model in the memory of the system. It is more similar to the process of the human brain. The following figure depicted the classification of algorithms present in the ML [39] (Fig. 1).
1.4 Deep Learning Deep Learning provides good support to the defect detection in production and provide good security application in internet of things approach in recent days. This method is very appropriate for image recognition in very high accuracy manner [26]. These innovations are self-organizing in nature on neural networks. So, to provide complexity in the system these methods can be implemented. Now we see the basics of deep neural networks [38]. • Convolutional Neural Network (CNN) • Recurrent Neural Networks (RNN) • Restricted Boltzmann Machine (RBN). More research on medical image analysis can be done through medics. Because lack of data from the medics will lead the researchers to cannot achieve automated diagnosis system preparation [38].
A Comparative Review on Image Analysis with Machine Learning … Supervised Learning Unsupervised Learning SemiSupervised Learning
Decision Tree
Naïve Bayes
311
Support Vector Machine (SVM)
Principal Component Analysis Generative Models
Transductive SVM
Selftraining
Reinforcement Learning
ML
Multitask Learning Ensemble Learning
Boosting
Instance Based Learning
k-nearest Neighbor
Neural Network
Bagging
Supervised Neural Network
Unsupervised Neural Network
Reinforced Neural Network
Fig. 1 Classification of machine learning algorithms
1.5 Motion Tracking Estimation Motion Tracking Estimation is the process to find the real-time motion vectors to show the images from 2D to any other form proposed by the researchers. Block motion estimation algorithms are more prominent to use in the real-time applications. To provide reduced processing time in motion tracking the following algorithms may proposed by [29] fast full search block matching algorithm, 3-step search, simple and efficient search, 4-step search, sub-pixel motion estimation using phase correlation, fast block-based true motion estimation using distance dependent thresholds, Variable shape search, adaptive rood pattern search, simplified block matching algorithm for fast motion matching, diamond search, etc.
1.6 Image Processing Image Processing is the process of converting images to digital form of an images. To provide operations like enhancement of an image or to extracting the small portion of an image. The digital images of any applications can be classified based on its qualities like enlightenment of an images, differentiation of colors in images, entropy
312
P. Vijayakumar and E. Dilliraj
and Signal-to-Noise Ratio (SNR). The histogram is the least difficult picture handling strategy. Histogram is the schemes which shows the qualities of the pixels and quality measure of an image is assessed using the grayscale histogram. Histogram leveling out is utilized to analyze many pictures gained on clear bases. The method works by evolving the histogram to become smooth, indistinguishable, and adjusted [40]. Image Enhancement: It is a technique, which improvise the image quality with the support of computer software, concentrating on point and local operations on images. This technique has two classifications, one is spatial domain and transform domain. The spatial domain is directly works on the pixel values, where transform domain concentrating on Fourier and advanced spatial techniques. Image segmentation: The segregation of an image is done through this technique. The main focus of segregating an image to simple inspection and interpretation with improved quality of an image. It is also used to track the image objects in real-time with the help of image boundaries. This technique is the main platform for create 3-Dimensional view of an objects for many applications. This technique is divided in to two types, local segmentation and global segmentation. Local segmentation concentrating on a smaller number of pixels in an image, where global segmentation focusses on full image. The segmentation is classified as follows based on its method, • Region method • Boundary method and • Edge method. In the perspective on Industry 4.0, The Technical and Vocational education and Training (TVET) for the business representatives is fundamental to try not to make disappointments. XR-based Human–Robot Cooperation (HRC) preparing will cause the representatives to feel entirely open to working with the apparatus. Recognizing the basic medical problems with the assistance of XR development causes the specialists to analyze the issues as quickly as time permits. XR advancements engage understudies in designing guidance to totally fathom complex thoughts in testing subjects in the stereoscopic portrayal of 3D models. XR innovations empower students in clinical instruction to completely comprehend complex ideas in testing subjects like clinical life systems through the stereoscopic representation of 3D models. The openness to the clinical preparation then becomes wide-based and adaptable. The consistently developing interest in student-focused teaching methods solidified XR an advanced situation in the clinical educational plan as an objective. However, the utilization of XR (virtual, expanded, or blended reality) in the clinical course educational plans is less unmistakable than that in the clinical abilities practice. Experiential learning depends on the making of information and which means from genuine encounters. With regards to clinical instruction, the term is most generally applied to ‘learning at work’ basically accessible to graduates or almost graduates. At lower instructive levels, experiential learning is exorbitant and functionally testing.
A Comparative Review on Image Analysis with Machine Learning …
313
2 Related Works Figure 2 shows the different applications which is reviewed in this article to elaborate the extended reality applications for future.
2.1 Medical Perspective To replacing the traditional medical inspection of a patients with the help medic presence in real scenario, there is a need for technology-based monitoring the patient condition without any physical contact with the patient [37]. Here, XR based monitoring patients’ is a promising technique for finding abnormalities of kidney, liver, lungs, heart, lesions, eye etc., This article represent some comparative analysis of medical inspection using reality technology and image analysis on various diseases.
2.1.1
Diagnosis of Various Diseases
Abnormalities of various organs in a human body can be analyzed through segmentation on rendering filtering process using various filtering process to remove the noise from the Magnetic Resonance Imaging (MRI) data sets [5]. Identification of soft tissues inside the human body to diagnose the lesions the Computed Tomography (CT) datasets is frequently used. Augmented Reality guided biopsy helps the
Fig. 2 Different applications discussed in this article
314
P. Vijayakumar and E. Dilliraj
physicians can accurately check the lesion position in a patient. They can align the biopsy needle to the lesion without disturbing any other bones or other regions of the patient skin [6]. Programmed clinical picture division and their representation in augmented experience, and presents a total pipeline that figures out how to remove physical models from clinical pictures and sets them up to be precisely envisioned on a stereoscopic head mounted showcase [7]. To manage informational indexes from constant 3D sensors of Red-Green-BlueDepth (RGB-D) or Time-Of-Flight (TOF) cameras, a strategy for enrolment of unformatted point clouds. We most importantly determine never changing shape setting descriptors for 3D information association. To displace the Fast-Marching methodology, a vertex-arranged triangle spread technique is supplement to ascertain the ‘point’ and ‘sweep’ in descriptor graphing, so the comparison precision at the bending and falling region is very largely refined [8]. To address the issues concentrating on a human abnormal situation, an intelligent video surveillance system based on small devices to be implanted. Using these small devices we can detect fire, falling of a patient, loitering and Intruders [9].
2.1.2
Patient Monitoring
For High-speed real-time motion tracking, this model completely uses feature focuses and include lines as scene elements, and joins various techniques to build cross breed includes and performs boundary assessment [1]. Building up the target perception model, bring the inadequate presentation into the molecule channel system, and update the sparse representation coefficients so the standard ranges the ideal arrangement and guarantee the precision of the following objective. At last, through the reproduction try, the achievement pace of the objective inclusion is determined [2]. Human movement highlight extraction and descriptor calculation. The element point recognition and situating technique appropriate for shrewd terminals is proposed in a designated way, which tackles the issue of befuddling of comparative constructions [3]. Progression of material handling and printing advancements in light of spray fly printing empowers solid assembling of skin-like sensors; while the adaptable half and half circuit in view of elastomer and chip joining permits agreeable coordination with a client’s head. Logical and computational investigation of an information characterization calculation gives an exceptionally precise apparatus to continuous discovery and order of visual movements [4] (Table 1).
A Comparative Review on Image Analysis with Machine Learning …
315
Table 1 Medical perspective review Literature
Key methods applied and result achieved
Challenges/future scope
Sun et al. [1]
• In this system, Scale Invariant Feature Transform (SIFT) feature matching target detection and tracking algorithm used to improve tracking stability of a camera • On an average of 65.2% positioning accuracy with augmented reality high-speed real-time tracking is a achieved for various objects like chess, fire, heads, office, pumpkin etc
Due to region-based template matching, this system is not very robust to interference like image affine transformation and depth field transition
Ma et al. [2]
• Here the basic target tracking algorithm based on sparse representation and Block Orthogonal Matching Pursuit (BOMP) algorithm of image reconstruction used to maintain high tracking accuracy and strong stability • Signal to Noise Ratio (SNR) compared with OMP algorithm, BOMP algorithm SNR is 28.463, OMP algorithm is 30.257 • Target success coverage rate is 0.92 at maximum
The complexity in the algorithm makes the system less optimizes
Yue et al. [3]
• Oriented fast and Rotated Brief (ORB) When the human is in stable, this feature descriptor and system causes tracking failure of an Kanade-Lucas-Tomasi (KLT) tracking object algorithm • The average accuracy of human tracking is 94%
Mishra et al. [4]
• Delicate, remote periocular electronic framework that coordinates a VR environment for locally situated therapeutics of eye problems • This framework preprocessed by Band Pass Filter (BPF) and feature extraction done through signal processing • The maximum accuracy is achieved through this system is 91% with the help of 3 electrodes placed in near eye position
To improve the accuracy in the system, ensemble classifier with embedded module in custom feature selection must to be incorporated
Vijayakumar et al. [5]
• Image segmentation algorithm with edge-preserving anisotropic diffusion filter used to remove noise in the image dataset • This system enables 100% accuracy of an identification of organs and 96% accuracy in 30 abnormality identification from image datasets
Image communication can be done by using content-based image compression to minimize the memory consumption of picture database and also memory compensation error
(continued)
316
P. Vijayakumar and E. Dilliraj
Table 1 (continued) Literature
Key methods applied and result achieved
Challenges/future scope
Bettati et al. [6]
• Using 3D slicer, threshold and Region of Interest restrictions the segmentation of volume of bones and skin are extracted in a human body • An identification of soft tissue/Lung lesions in the human body without AR by using 1.52 cm/1.12 cm average error and with AR an average error is 0.75 cm/0.62 cm from the centre of the tumour
Automated model generation with the help of deep learning and intensity thresholding to smooth out the interaction and diminish the time needed by a client to set up the AR models for the biopsy. Holo Lens2 can give an additional clinical scenario to explore applications
Cigánek et al. • Convolutional Neural Network (CNN) To improvise the precision [7] with U-net architecture is used for accuracy more training data is automatic segmentation needed • An accuracy of 0.959 on the data used in training and 0.884 on the data in used in testing He et al. [8]
• 3D cylindrical shape descriptor for Seek after to embrace the descriptor registration of unstructured point cloud as an element of deep learning with Time of flight (TOF) camera was proposed in this system • Maximum Average error of this descriptor is 0.435 cm and Min. Average error is 0.037 cm • Root Mean Squared Error (RMSE) of point cloud data of TOF camera is 0.161 cm
Kim et al. [9]
• Tensor flow and AlexNet used for face recognition with Haar-like scheme to detect the intruders. For fire detector HSV color detection scheme is used • Embedded module with Raspberry pi, zRam etc • The performance in this system showed as follows, – Fall detection—93.54% – Intruder Detection—88.51% – Fire detection—92.63% – Loitering detection—80%
Rajeswari and • In this work, they proposed classical Ponnusamy algorithms in machine learning like, [28] Linear regression and Support Vector Machine (SVM) used to predict the diabetes • The outcome of this system in testing model is 0.82 and training model is 0.75 accuracy were achieved
To optimize learning files for reducing processing time of the intelligent video surveillance
To increase the accuracy of the result more datasets handling can be done using advanced ML algorithms
A Comparative Review on Image Analysis with Machine Learning …
317
2.2 Industry Perspective The perspective of improving the industrial activities by reducing the human workload, inspecting the machineries with improved safety, technology-based instructions to the operators in high-risk machine handling are the most important challenges in industry 4.0 implementation. Hence this article provides the comparative methodologies implemented in industries.
2.2.1
Autonomous Vehicles
An autonomous robot for an industry plays a vital role to shift the machineries from one place to another. A dynamic 3D reproduction article is an appropriate from various coordinated casing steams and mobility handheld device system [10]. Finding a depth and width of an object in mobile robot uses a single-image super resolution method based on gradient information of an image can be proposed to use [11] with a GPS to be implement for localization of a robot in an industry environment [12] and GPU device for large-scale optimization [13].
2.2.2
Fault Identification
Customized creation is pushing the advancement of modern mechanization ahead, and requesting new apparatuses for further developing the decision-production of the administrators [14]. To improve the quality of an items developed by the industries need to have visual inspection on a production/assembly line. The projector-based visual aid for spot-welding using AR helps to achieve high accuracy while manual spot-welding through which an organization confirm the quality of the product [15].
2.2.3
Assembly Guidance
The customary gathering guidelines dependent on the client’s mental requirements for get together errands to further develop clients’ mental effectiveness of actual assignments. The meaning of customary get together directions is first emphasized, and the mathematical connection among it and AR gathering guidelines is clarified. Then, at that point, the meaning of AR gathering directions at the data level is given, the customary get together guidelines are arranged by this definition, and the clarification of every order is made. We analyzed the job of the old and new directions in average get together use cases. The information shows that there are critical contrasts in execution between the new and old directions. The new guidelines altogether further develop the client’s exhibition as far as gathering time and working experience (counting pleasure, focus, certainty, normal instinct, achievability, viability,
318
P. Vijayakumar and E. Dilliraj
convenience, and understand ability), using AR directions on the data level and helper renaming, to accomplish proficient and succinct activity purposes [18] (Table 2).
2.3 Education Perspective According to the many surveys related to enrichment of student’s skill set with respect to the industry requirements, the current education system cannot able to provide the sufficient knowledge to every student for industry aspects. Hence, adaptability of virtual augmented education makes students to realize the concept of topics, and enrich their skills for industry ready through self-learning. This collaborative learning will help the educators to promote the peer exchange of information to the students. This article gives the various reality technology implementations to enrich the student’s skillset.
2.3.1
Primary Education
With the arising innovations of expanded reality (AR) and computer-generated reality (VR), the learning system in the present study hall is considerably more compelling and inspirational. Overlaying virtual substance into this present reality makes learning strategies alluring and engaging for understudies while performing exercises. AR strategies make the learning system simple, and fun when contrasted with customary techniques. These techniques need centered learning and intelligence between the instructive substances [19].
2.3.2
Higher Education
Virtual and Augmented (VAR) innovation is in the beginning phases of being taken on as a showing stage in advanced education. The innovation can work with vivid learning in conditions that are not typically actually available to understudies by means of 3D models and intuitive 360° recordings. Until this point, reception paces of VAR innovation for showing have not been very much portrayed across an advanced education organization. Further, there is a shortfall of data on the ideal VAR lab plans also cost per understudy [20]. Through versatile expanded reality collaboration, the library can stay up to date with the client’s present data direction and establish the framework for the further foundation of an individual data information base. Simultaneously, based on network collaboration, it assumes a decent directing part for the development of library report assets, guaranteeing the viable utilization of archive assets [21] (Table 3).
A Comparative Review on Image Analysis with Machine Learning …
319
Table 2 Industry perspective review Literature
Key methods applied and result achived
Challenges/future scope
Bortolon et al. [10]
• Mobile based dynamic object 3D construction with Simultaneous Localization and Mapping on-board (SLAM) to estimate poses. Higher flexibility and adjustment through online by using Network Time Protocol (NTP)—synchronization • Average pixel value of this algorithm for video pose in easy mode 4.735 ± 1.195, medium mode 5.053 ± 1.072, hard mode 5.645 ± 1.316
The problem arises in this system, if non-expert member may capture a dynamic image from in effective view point. So improvement can be done on implementation of user interface to handle the system with guidance
Meng et al. • A gradient information distillation network (GIDN) [11] a light-weight and fast information distillation network (IDN) • Average peak signal to noise ratio (PSNR) is 32.15 and the structure similarity index (SSIM) is 0.8947 compare to other networks
Further enhancement in GIDN will improve the computing speed of the system
Liu et al. [12]
• Visual Simultaneous Localization and Mapping (vSLAM) with 2.5D map is used • ORBSLAM framework—typical feature based monocular SLAM library is used and Direct Sparse Odometry (DSO) direct method to optimize pose hypothesis to get the initial pose • Time performance as a result in this system in forward motion of a mobile robot is an average of 0.27 s/sequence in ORBSLAM and DSO to identify the depth of the building block • Whereas in Backward motion average time performance is 0.29 s/sequence in ORBSLAM method and, in DSO method it is 4.80 s/sequence
Position mapping of a robot done only with the number of structured building blocks, it won’t support to recognize the other objects like trees, pedestrians, industrial equipments etc., due to influence of building depth masking
Cao et al. [13]
• Novel structure from motion framework using parallel bundle adjustment to reduce the computational cost and large-scale optimization • The computational costs of various bundle adjustments were identified through this work. Comparison between Reduced/ Relative Bundle Adjustment (RBA) and Parallel Bundle Adjustment (PBA) were discussed here for different places. Dubrovnik—5.3746 s with RBA, 3.1632 s with PBA; Final—182.5367 s with RBA, 110.7825 s with PBA; Ladybug—38.1235 s with RBA, 17.983 s with PBA; Trafalgar—0.5928 s with RBA, 0.2614 s with PBA
Further to improve efficiency and accuracy with Fast Incremental Structure from Motion
Ojer et al. [14]
• Using Spatial Augmented Reality (SAR) system as a guide to process to assembling of electronics components properly by operators • The system usability scale (SUS) score is assessed. The systems achieved average values of 80 and 90 out of 100
To improve the robustness with the help of deep learning approach against severe illumination changes by changing the orientation of some components (continued)
320
P. Vijayakumar and E. Dilliraj
Table 2 (continued) Literature
Key methods applied and result achived
Challenges/future scope
Doshi et al. • Projector based Spatial Augmented Reality system [15] used to assure the quality on spot-welding in industries • The average Standard Deviation (SD) with AR assistance for 19 panels with 114 spot welds is calculated at 1.94 mm compared to without AR for 45 panels with 270 spot welds at 4.08 mm • This method helps to increase 52% in accuracy for trained operators
Welding distribution improvement within the 10 mm radial range
Zhai and Chen [16]
• Depth calculation of an object is done with the help of semi-global block matching algorithm to realize the tracking of virtual objects, Markov random field to enhance the depth map • This system works with minimal error percentage for the experimental distance of 30 cm (i.e., 2.19% error) • The running time for the depth data calculation of this system as an average of 0.50339 ms
Usage of nonlinear parallex mapping will helps to increase the depth range of a virtual object
Marinoa et al. [17]
• Augmented Reality tool were used for field experimentation with the users in the industry • The survey is done based on System Usability Scale (SUS) and NASA-TLX (Task Load Index) standard questionnaires • The SUS score for Group 1 [G1, Engineers] is an average of 86.56 and Group 2 [G2, Factory workers] is an average of 85.94 respectively • An overall cognitive workload by participants is very low. F(5) = 0.739 and probability [p] = 0.599 for G1 and F(5) = 0.219 and probability [p] = 0.953 for G2
To upgrade and animate the effectiveness of the labourers’ abilities during assessment exercises
Wang et al. • User-oriented Classification method of assembly [18] instructions used in this work • This system helps to see the assembly properties in 2D and 3D graphics. Assembly guidance in animation and visualization in information level visualization (ILV) • Comparison of result done with the help of questionnaire for Conventional Assembly Instructions (CAI) vs New Assembly Instructions (NAI) • A total of 82.6% of workers are agreed with NAI is helps to improve efficiency and very redundant to the problems • 86.9% participants can easily identify assembly intent implied in the task • 86.9% participants agree this system will help to improve an operating experience
This system should be enhancing with the human–computer interaction method for new assembly instructions
(continued)
A Comparative Review on Image Analysis with Machine Learning …
321
Table 2 (continued) Literature
Key methods applied and result achived
Challenges/future scope
Tripathi [24]
• Deep learning-based segmentation of images to provide good optimization high computation speed • Effectiveness of this system showed in result as in training data the accuracy of the system in 99.25% and testing data the accuracy in 100%
Optimization can be improved by improving the training and test images in the system
Sungheetha • 3D based images are converted to 2D to get the [30] accurate object identification in human–machine interaction environment • Image recognition can be done through ML algorithms in optimized way
Less complexity and more processing time is required in this system
Ariansyah et al. [33]
• Head Mounted Device (HMD) with Augmented Reality (AR) Software Development Kit (SDK) for investigating impacts in assembly task with the help of information mode and interactive modality on user performance • 3D animation led to a 14% improvement over the video instructions in task completion time
Precision optimization for tracking and registration of virtual objects to real components in an assembly task helps to achieve more than 14% improvement in the system is required
Gao et al. [34]
• The quantitative performance is done using 4 scenarios, 1. Six Vehicle Detection Algorithms Combined with our Three-Stage Post-Processing Scheme or Filtering by Shape index 2. Seven Post-Processing Schemes Combined with Ten Algorithms 3. Ten Algorithms Combined with the Best Three Post-Processing Schemes 4. Fully Convolutional (FC)-DenseNet: Two-Stage Machine Learning for Small Object Detection • The combined results prove that this system enhanced three-stage post-processing scheme achieves a mean average precision (mAP) of 63.9% for feature extraction methods and 82.8% for the machine learning approach
There is a need to develop the 3D motion filter to increase the recognition accuracy of small objects
2.4 Limitations and Issues There are multiple approaches are proposed in the perspective of medical, industrial and education with real-time image analysis with various algorithm for different applications. Even though, there is some flaws that need to be address in the future. Especially on the following, • Accuracy of target tracking still a challenging one in real-time video streaming. • Potential interoperability of human with machine issues for streamlining the assembly inspection in industries still require advancement for the perspective of Industry 4.0. • Reduced training time and increased first time fix rates.
322
P. Vijayakumar and E. Dilliraj
Table 3 Education perspective review Literature
Key methods applied and result achived
Challenges/future scope
Khan et al. [19]
• Handheld Marker-based Augmented Reality (AR) Technology used for teaching • Students learning achievements is done through Performance Evaluation Quiz (PEQ) • Group A—90.92% through AR based applications; Group B—70.02% through Video based presentation; Group C—56.34% through Traditional book based
AR applications ought to be planned with legitimate educational program and teaching method having savvy advancement tests and learning games for understudies to screen and test the understudy’s opportunity for growth and progress
Marks and Thomas [20]
• Oculus Rift headset unit technology used for 360° view of content in 3D model • The survey was done for 295 students out of which 202 (68.5%) students showed their interest towards use this technology for other units in future
The most detailed distresses are: migraines, wooziness, obscured vision, the heaviness of the HMDs and it didn’t fit over glasses The effects of pandemic have considered a quick transformation to self-learning and better approaches to utilize virtual and augmented reality innovation
Lu [21]
• Mobile Augmented Reality technology to play out the assignment of stacking a virtual scene to test the exhibition of a bunch of stage assets • Comparison of mobile push system is divided in to 3 types namely, Pack A (Doesn’t adopt any optimization), Pack B (Adopt some optimization), Pack C (Adopt all optimization process) • Time consuming model—Pack A require 1.9 s for loading the user information, 2 s for Downloading. Similarly for Pack B 1.6 s for loading information; 1.2 s for downloading data; and Pack C 1.5 s for loading user information; 0.7 s for downloading data from library
Optimization for program and interface needs to be improved when a large amount of data to be incorporated into the system
Kumar et al. [35]
• Augmented Reality Interactive Tabletop Environment (ARITE) system for Arduino UNO study to gain knowledge on embedded system course for engineering students • The students were felt 95% confidence level on understanding of system functionality using this system • The Confidence level of students were assessed through conduction of quiz among the participants
Preparing 3D reconstruction of an embedded hardware setup to find the defect/missing components based on experiments in the laboratory course on real-time is challenging one
(continued)
A Comparative Review on Image Analysis with Machine Learning …
323
Table 3 (continued) Literature
Key methods applied and result achived
Challenges/future scope
Phade et al. [36]
• In Vuforia SDK, Random sample consensus (RANSAC) algorithm is used to transform an object from normal plane to 3D model image • The accuracy of AR framework was measured in three ways from the user’s experience, – For components—85.71% – For images—95.71% – For board—91.43%
3D reconstruction of an object identification with its enhanced marker label is required to understand the concept of each component in detail
• Requirement of remote specialist support for the maintenance operators in industries. • Improvement is required for knowledge reachability to the individual students (Table 4).
3 Research Gap The research gap identified through the comparative analysis are listed below. 1.
2. 3. 4.
5. 6.
7.
Development in depth image analysis algorithm require some improvement to avoid noise interferences during target tracking to increase the accuracy of target recognition. In motion tracking system is not able to provide good accuracy, when a particular target in a stable condition. The focus towards reduction of processing time for target navigation system in real-time. Enhancements is required in the industry inspection systems to provide good robustness against different light conditions and orientation of products during the assembly process. Reducing the processing time for fault (missing hole, mouse bite, open/short circuit, etc.,) identification in PCB is still challenging. Impact of processing time on adaptive modality to identify the faults (motion under no servo control, short circuit in motor phase, arm position error, and collision) in the robot is high. Preparing 3D reconstruction of a hardware-based experiments in the laboratory course is still a challenging one for the educators to impart the knowledge among individual students.
324
P. Vijayakumar and E. Dilliraj
Table 4 Summary of algorithms/methods, parameters and application documented in the related works Algorithms/methods
Parameters
Improved Scale Invariant Feature Transform (SIFT), Block Orthogonal Matching Pursuit (BOMP), Oriented fast and Rotated Brief (ORB) feature descriptor and Kanade-Lucas-Tomasi (KLT), Edge-preserving anisotropic diffusion filter, 3D slicer, threshold and Region of Interest restrictions the segmentation, Convolutional Neural Network (CNN) with U-net architecture, 3D cylindrical shape descriptor, Tensor flow and AlexNet and Linear regression and Support Vector Machine (SVM)
Real-time video surveillance, Target Medical tracking, Electrodes near eyes, Kidney, Lung lesion, Unstructured point cloud with Time of flight (TOF) camera, Face recognition, Diabetes
Simultaneous Localization and Mapping on-board (SLAM), Gradient information distillation network (GIDN), Visual Simultaneous Localization and Mapping (vSLAM), Parallel Bundle Adjustment (PBA), Spatial Augmented Reality (SAR), Semi-Global Block Matching (SGBM) algorithm, Deep-learning based segmentation, Head Mounted Device (HMD) with Augmented Reality (AR), Fully Convolutional (FC)—DenseNet
Object pose estimation, Buildings, PCB Industry inspection, Spot welding inspection, Image segmentation and recognition, Assembly task, small objects
Handheld Marker-based Augmented Evaluation quiz, Questionnaire based Reality (AR), Oculus Rift headset unit survey, Experiment execution-based technology used for 360° view of evaluation, User’s experience content in 3D model, Mobile Augmented Reality technology, Augmented Reality Interactive Tabletop Environment (ARITE), Random sample consensus (RANSAC) algorithm
Applications
Education
4 Proposed Methodology 1. 2. 3. 4.
Capturing the sequence of images from Head Mounted Device (HMDs)/Smart phone camera and generate Image Data set for the sequence of images (Fig. 3). Image data set have been pre-processed with bi-quad filter and the outcome of this given to the feature extraction. Picture highlight extractions have been finished utilizing with the assistance of python programming. Recognition of item from the dark scale picture utilizing a product instrument.
A Comparative Review on Image Analysis with Machine Learning …
325
Fig. 3 Proposed system architecture/flow diagram
5. 6.
Object acknowledgment and order of pictures with a separated way has been finished by utilizing Convolutional Neural Network (CNN). Interaction with a picture informational index is finished by utilizing XR worldbased hand signal markers.
5 Motivation This article for the most part centers around Removal of Depth field progress obstruction on high-velocity constant camera, Optimization and acknowledgment of exactness moving following and situating of targets, Modification of 3D sensor qualities to work on the better precision of an outcome, Adopt the descriptor in the component of profound learning will assist with further developing the 3D point cloud following, 3D redoing in the 5G organization with the assistance of explicit calculation to make on-line camera adjustment, and Improvement of processing speed in a goal of a picture. The work we propose to utilize XR development with an HMD will assist with working on the precision in the previously mentioned terms.
6 Conclusion The overview of these respondents accepts XR research can possibly be a helpful examination approach. In any case, it right now experiences various constraints in regards to information assortment, framework advancement and an absence of lucidity around member enlistment. Prevalently used to concentrate on a member’s involvement in a XR framework, in a counterfeit yet controlled setting (research facility) utilizing outside information assortment strategies (overviews, cameras, and so forth). An XR gadget with set of properties for information assortment is very essential requirement to find the reasonable solutions for the research queries in many cases for real world environment. To determine the solutions for the queries of research, we have to work in the
326
P. Vijayakumar and E. Dilliraj
backward with the already available information assortment comfortable to an XR gadget. Further, we must understand the efficient reconsiderations for research place to develop a potential application, such as locally situated area instead the laboratory as definite place to develop an XR gadget-based applications. This makes the researchers to decide XR-based framework is a useful choice rather than non-XR based framework.
References 1. W. Sun, C. Mo, High-speed real-time augmented reality tracking algorithm model of camera based on mixed feature points. J. Real-Time Image Process. (2020) 2. W. Ma, F. Xu, Study on computer vision target tracking algorithm based on sparse representation. J. Real-Time Image Proc. 18(2), 407–418 (2020) 3. S. Yue, Human motion tracking and positioning for augmented reality. J. Real-Time Image Process. (2020) 4. S. Mishra, Y.-S. Kim, J. Intarasirisawat, Y.-T. Kwon, Y. Lee, M. Mahmood, H.-R. Lim, R. Herbert, K.J. Yu, C.S. Ang, W.-H. Yeo, Soft, wireless periocular wearable electronics for realtime detection of eye vergence in a virtual reality toward mobile eye therapies. Sci. Adv. 6(11) (2020) 5. P. Vijayakumar, P. Rajalingam, R. Nandakumar, S. Praveenkumar, A. Joshua Jafferson, Detection of abnormalities in kidneys using image processing. Int. J. Emerg. Trends Eng. Res. 8(10), 6750–6756 (2020) 6. P. Bettati, M. Chalian, J. Huang, J.D. Dormer, M. Shahedi, B. Fei, Augmented reality-assisted biopsy of soft tissue lesions, in Medical Imaging 2020: Image-Guided Procedures, Robotic Interventions, and Modeling (2020) 7. J. Ciganek, Z. Kepesiova, Processing and visualization of medical images using machine learning and virtual reality. in 2020 Cybernetics and Informatics (K&I) 8. Y. He, S. Chen, H. Yu, T. Yang, A cylindrical shape descriptor for registration of unstructured point clouds from real-time 3D sensors. J. Real-Time Image Proc. 18(2), 261–269 (2020) 9. J.S. Kim, M.-G. Kim, S.B. Pan, A study on implementation of real-time intelligent video surveillance system based on embedded module. EURASIP J. Image Video Process. (2021) 10. M. Bortolon, L. Bazzanella, F. Poiesi, Multi-view data capture for dynamic object reconstruction using handheld augmented reality mobiles. J. Real-Time Image Proc. 18(2), 345–355 (2021) 11. B. Meng, L. Wang, Z. He, G. Jeon, Q. Dou, X. Yang, Gradient information distillation network for real-time single-image super-resolution. J. Real-Time Image Proc. 18(2), 333–344 (2021) 12. R. Liu, J. Zhang, S. Chen, T. Yang, C. Arth, Accurate real-time visual SLAM combining building models and GPS for mobile robot. J. Real-Time Image Process. (2020) 13. M. Cao, L. Zheng, W. Jia, X. Liu, Fast incremental structure from motion based on parallel bundle adjustment. J. Real-Time Image Proc. 18(2), 379–392 (2020) 14. M. Ojer, H. Alvarez, I. Serrano, F.A. Saiz, I. Barandiaran, D. Aguinaga, L. Querejeta, D. Alejandro, Projection-based augmented reality assistance for manual electronic component assembly processes. Appl. Sci. 10(3), 796 (2020) 15. A. Doshi, R.T. Smith, B.H. Thomas, C. Bouras, Use of projector based augmented reality to improve manual spot-welding precision and accuracy for automotive manufacturing. Int. J. Adv. Manuf. Technol. 89(5), 1279–1293 (2017). [online] 16. L. Zhai, D. Chen, Image real-time augmented reality technology based on spatial color and depth consistency. J. Real-Time Image Proc. 18(2), 369–377 (2020) 17. E. Marino, L. Barbieri, B. Colacino, A.K. Fleri, F. Bruno, An augmented reality inspection tool to support workers in Industry 4.0 environments. Comput. Ind. 127, 103412 (2021)
A Comparative Review on Image Analysis with Machine Learning …
327
18. Z. Wang, X. Bai, S. Zhang, Y. Wang, S. Han, X. Zhang, Y. Yan, Z. Xiong, User-oriented AR assembly guideline: a new classification method of assembly instruction for user cognition. Int. J. Adv. Manuf. Technol. 112(1–2), 41–59 (2020) 19. K. Afnan, K. Muhammad, N. Khan, M.-Y. Lee, A. Imran, M. Sajjad, School of the future: a comprehensive study on the effectiveness of augmented reality as a tool for primary school children’s education. Appl. Sci. 11(11), 5277 (2021) 20. B. Marks, J, Thomas, Adoption of virtual reality technology in higher education: an evaluation of five teaching semesters in a purpose-designed laboratory. Educ. Inform. Technol. (2021) 21. J. Lu, Mobile augmented reality technology for design and implementation of library document push system. J. Real-Time Image Process. (2020) 22. Z. Lv, J. Lloret, H. Song, Internet of Things and augmented reality in the age of 5G. Comput. Commun. (2020) 23. K.J. Singh, L.P. Saikia, Implementation of Image processing using augmented reality. Int. Res. J. Eng. Technol. (IRJET) 5(6) (2018) 24. M. Tripathi, Analysis of convolutional neural network based image classification techniques. J. Innov. Image Process. 3(2), 100–117 (2021) 25. Z. Lv, J. Lloret, H. Song, Real-time image processing for augmented reality on mobile devices. J. Real-Time Image Process. (2021) 26. I.J. Jacob, P.E. Darney, Design of deep learning algorithm for IoT application by image based recognition. 3(3), 276–290 (2021) 27. J. Ratcliffe, F. Soave, N. Bryan-Kinns, L. Tokarchuk, I. Farkhatdinov, Extended reality (XR) remote research: a survey of drawbacks and opportunities, in Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (2021) 28. S.V.K.R. Rajesari, V. Ponnusamy, Prediction of diabetes mellitus using machine learning algorithm. Annals R.S.C.B. 25(5), 5655–5662 (2021). ISSN 1583-6258 29. P. Vijaykumar, A. Kumar, S. Bhatia, Latest trends, applications and innovations in motion estimation research. Int. J. Sci. Eng. Res. 2(7) (2011). ISSN 2229-5518 30. A. Sungheetha, 3D Image processing using machine learning based input processing for manmachine interaction. J. Innov. Image Process. 3(1), 1–6 (2021) 31. Dhaya R (2021) Hybrid machine learning approach to detect the changes in SAR images for salvation of spectral constriction problem. J. Innov. Image Process. 3(2), 118–130 (2021) 32. V. Ponnusamy, A. Coumaran, A.S. Shunmugam, K. Rajaram, S. Senthilvelavan, Smart glass: real-time leaf disease detection using YOLO transfer learning, in 2020 International Conference on Communication and Signal Processing (ICCSP) (2020) 33. D. Ariansyah, J.A. Erkoyuncu, I. Eimontaite, T. Johnson, A.-M. Oostveen, S. Fletcher, S. Sharples, A head mounted augmented reality design practice for maintenance assembly: toward meeting perceptual and cognitive needs of AR users. Appl. Ergon. 98, 103597 (2022) 34. X. Gao, S. Ram, R.C. Philip, J.J. Rodríguez, J. Szep, S. Shao, P. Satam, J. Pacheco, S. Hariri, Selecting post-processing schemes for accurate detection of small objects in low-resolution wide-area aerial imagery. Rem. Sens. 14(2), 255 (2022) 35. A. Kumar, A. Mantri, G. Singh, D.P. Kaur, Impact of AR-based collaborative learning approach on knowledge gain of engineering students in embedded system course. Educ. Inform. Technol. (2022) 36. G. Phade, K. Goverdhane, O.S. Vaidya, S. Gandhe, A novel ICT tool for interactive learning for electronics engineering based on augmented reality. Int. J. Sci. Technol. Res. 8(08) (2019). ISSN 2277-8616 37. C. Andrews, M.K. Southworth, J.N.A. Silva, J.R. Silva, Extended reality in medical practice. Curr. Treatment Opt. Cardiovascular Med. 21(4) (2019) 38. M.A. Abdou, Literature review: efficient deep neural networks techniques for medical image analysis. Neural Comput. Appl. (2022) 39. B. Mahesh, Machine learning algorithms—review. Int. J. Sci. Res. (IJSR) (2020). ISSN 23197604
328
P. Vijayakumar and E. Dilliraj
40. Y.M.Y. Abdallah, T. Alqahtani, Research in medical imaging using image processing techniques, in Medical Imaging—Principles and Applications. [online] Available at https://www. intechopen.com/books/medical-imaging-principles-and-applications/research-in-medicalimaging-using-image-processing-techniques. Accessed 21 December 2020
SWOT Analysis of Behavioural Recognition Through Variable Modalities Abhilasha Sharma, Aakash Garg, Akshat Thapliyal, and Abhishek Rajput
Abstract In this era of technology, recognising the human emotion to improve the human–machine interaction has become very important. The applications of emotion recognition have continued to grow in the past few years, from healthcare and security to improving customer care service and enhancing the user experience. With increased innovation, the number of use cases for this area would only grow more in the future. This forms the motivation for thoroughly analysing the market for recognition of human behaviour. The paper summarises the SWOT analysis of behavioural recognition through the use of variable modalities. This paper gives an overview of the currently employed techniques which involve the extraction of features from text, audio and video. The main objective is to get an overview of the internal potential and limitations of this branch of machine learning. The analysis brings out the merits and demerits of all the techniques discussed.
1 Introduction Given contemporary trends, emotion popularity generation can be a fast-growing (and world-changing) subset of sign processing for years to come. The range of professionals wanted on this subject is growing, and plenty of agencies are searching out proficient folks who need to be a part of it. The modern era is coming up with excessive interference of artificial intelligence bots and assistants like Alexa, Siri etc. which incorporate emotion classification using speech recognition. Thus the A. Sharma · A. Garg (B) · A. Thapliyal · A. Rajput Department of Software Engineering, Delhi Technological University, Delhi, India e-mail: [email protected] A. Sharma e-mail: [email protected] A. Thapliyal e-mail: [email protected] A. Rajput e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_25
329
330
A. Sharma et al.
Fig. 1 Workflow diagram of fusion of different modes (audio, text, video) and multimodal classification followed by their SWOT analysis
developing nature of this machine learning domain and its ever growing real-world applications are the phenomena that drive the motivation for this research work. The primary aim of the paper is twofold, one is the comparative analysis of various machine learning techniques to classify emotions based on multimodal signals, second is the SWOT analysis of the discussed techniques to uncover the internal capability and limitations of the area. Hence this task is kind of an overlap between the different areas of machine learning/deep learning and software engineering. The datasets used by the researchers include audio, video and text files which are used for feature extraction. The extracted features thus constitute the sets that are to be fed to a classification model. The reports from various models are analysed, along with the area’s current applications, potential applications and the market outlook. These combined together form the basis of our SWOT analysis for doing an in and out study of the behavioural recognition field (Fig. 1).
2 Multimodal Behavioural Recognition—An Overview Emotion classification through multimodal signals has been an area of great interest. This paper discusses the different approaches for this machine learning problem. Following is the research conducted by Zhang et al. where three separate models were created [1]. LSTM was used for audio and text signals and in visual signals, images [9] and audio were separately synthesised by DenseNetWork and LSTM. It was evident from the results that accuracy increased by applying the model on all modes together. Analysing the research made by Rozgi´c et al. K fold cross validation [2] is performed where the value of K is decided to be 10. For each split textual, visual and speech [15] features were extracted and a continuous feature vector was formed by concatenating each one of them. A multiway SVM classifier is used for achieving a maximum accuracy of around 73% on “anger”. Another novel research
SWOT Analysis of Behavioural Recognition Through Variable …
331
Fig. 2 Feature extraction from dataset
was to generate an ensemble of trees which performed better and gave a maximum accuracy of around 78% on the same output class. The approach followed by Zhalehpour et al. is that visual and audio features that belong to the same output class are likely to intersect so they should be combined together [3]. Considering the research made in the recent past, certain visual frames are examined wherever the degree of facial feature is at the pinnacle. The only common thing found in these researches is that all modes together perform with better accuracy than any single one. Table 1 depicts the comparative analysis of past research involving various datasets, feature extraction techniques and classification models. Feature extraction is an important part for training machine learning models as far as data from audio, video and text is concerned (Fig. 2). Classification, prediction and recommendation algorithms mostly involve feature extraction to convert some other form of data into model understandable form. All the features obtained through different approaches are then combined to form a multimodal fusion layer which is classified into output classes or emotional categories. The technique proposed in [1] states the collection of speech and textual data from the LSTM method and visual characteristics from DSCNN. All the feature vectors obtained from these models are designed to be 256 dimensional vectors which are then classified on the feature level by using a fusion feature matrix for the same. In contrast to this technique [3] has used decision level fusion operation. All the probabilities obtained for each modality from SVM classifier are multiplied for a feature vector and classification into the final output class is determined by the highest magnitude product. The feature level fusion process is of great interest as shown by Poria et al. [4]. The heterogeneous nature of feature vectors obtained from all the three modalities, text, audio and visual, are then concatenated with the help of Multiple kernel learning followed by cross validation techniques. The multiobjective optimization algorithm for the decision level fusion of audio and visual modalities, proposed in [19], is an effective method to combine the two modalities.
2020
2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP)
IEMOCAP
MFCC extraction technique
Author
Year of publication
Journal/conference reference
Dataset used
Audio feature extraction
2
3
eNTERFACE’05
2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings
2014
Zhalehpour et al.
Double Layer LSTM
DenseNet Convolutional Neural Networks
0.5663
Classification model (Text)
Classification model (Video)
Accuracy (Speech)
0.6090
Ensemble of SVM Classifier trees
Ensemble of SVM Classifier trees
0.7300
—
SVM classifier
SVM classifier
LSTM
Classification model (Speech)
Ensemble of SVM Classifier trees
Used facial recognition Motion capture technique Peak Frame Selection to extract 150 vibrant to identify various features features
Used Bag of words model — with stemming
MFCC extraction MFCC extraction along followed by normalisation with RASTA-PLP
IEMOCAP
Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference
2012
Rozgi´c et al.
Video feature extraction
Text feature extraction Used GloVe model for text segmentation
1
Zhang et al.
S. No
Table 1 Comparison study of various behavioural recognition techniques 4
0.6132
Recurrent Neural Networks
CNN based architecture
—
RNN based architecture consisting of multiple sized kernel layers
CNN based architecture
MFCC extraction through openSMILE
IEMOCAP
2016 IEEE 16th International Conference on Data Mining (ICDM)
2016
Poria et al.
5
0.713 (continued)
Deep separation CNN
—
Deep CNN
MFCC extraction through openSMILE
IEMOCAP
Wireless Communications and Mobile Computing, vol. 2021, Article ID 6971100, 10 pages, 2021
2021
Mingyong et al.
332 A. Sharma et al.
0.6838
Less effective on unimodal dataset
Better results on complex input
Accuracy (Video)
Accuracy (Fusion)
Demerits
Merits
Low accuracy results on visual mode
0.7640
0.3800
—
3
One model is sufficient for Best for emotion evaluation of different recognition using speech modals signals
Non versatile
0.6650
0.5090
0.6134 (Speech + Video)
Accuracy (Text)
2
1
0.6472 (Speech + Text) 0.4855
S. No
Table 1 (continued)
0.7268
0.691
—
5
Most optimal deep learning model for fusion layer
The decision level fusion method utilises a multi objective optimisation problem to effectively combine the two techniques
Comparatively low Absence of text mode results on individuals based classification modals
0.7685
0.6895
0.5928
4
SWOT Analysis of Behavioural Recognition Through Variable … 333
334
A. Sharma et al.
Summarization of results The summarization of the literature review states that regardless of the model applied and feature extraction techniques used, the results for multimodal fusion are better than each modality modelled individually. Examining the performance of all techniques, the anger output class has been classified with most accuracy followed by happy, this inference proves that all the models are indeed successful to classify extreme emotions as compared to emotions pertaining to neutral and disgusted classes. The automatic peak frame selection gives better average results on a combination of modalities as compared to the traditional LSTM techniques. Despite the promising results, more research work needs to be done on better techniques of feature extraction and classification. The accuracy of visual and audio features can be improved more as visual features give an accurate idea of the emotion being portrayed. Inclusion of attention based mechanisms for extracting the important features, and large datasets with a variety of speakers for speaker independent classification can significantly improve the recognition accuracy.
3 SWOT Analysis Strengths, Weaknesses, Opportunities and Threats, commonly abbreviated as SWOT, is an important tool for analysing the market of any subject, organisation, technique or research area. It lays down all the internal attributes of the area/field (strengths and weaknesses) as well as the external factors affecting the area (opportunities and threats). A thorough analysis has been done for this area by understanding the different techniques and models which have been employed since the inception of emotion recognition as well as by understanding the emotion recognition market. This paper involves the SWOT analysis of different classification techniques performed by researchers in the area of emotion recognition through different modalities. The key idea for this analysis is to evaluate the internal potential and limitations of this area of machine learning. Figure 3 highlights the different aspects of SWOT analysis. Fig. 3 SWOT analysis—an understanding
SWOT Analysis of Behavioural Recognition Through Variable …
335
3.1 SWOT Matrix SWOT analysis is represented by a 2 × 2 matrix composed of four principal components of SWOT i.e. Strengths, Weaknesses, Opportunities and Threats. Strengths and Weaknesses are the internal factors influencing the business or agenda of an organisation while Opportunities and Threats impact from an external environment. The matrix also specifies the positive and negative factors which need to be considered while developing a theory or plan of action. Figure 4 shows the structure of the SWOT matrix.
Fig. 4 SWOT matrix—a structural view
336
A. Sharma et al.
4 SWOT Analysis of Behavioural Recognition Through Variable Modalities SWOT analysis provides a comprehensive study of the different factors related to Behavioural Recognition. Figure 5 contains the SWOT matrix of behavioural analysis through variable modalities.
Fig. 5 SWOT matrix of behavioural analysis
SWOT Analysis of Behavioural Recognition Through Variable …
337
4.1 Strengths (1)
Versatility: Emotion recognition [20] is being used for a variety of purposes today, including but not limited to healthcare, video games, engaging audience, helping differently abled children, security measures, hr assistance, customer service and distress help lines. (a) (b)
(c) (d) (e)
(f)
(2)
(3)
(4)
(5)
(6)
Healthcare: Recognition of facial expressions [12, 14] can help in identifying the patients who need immediate attention. Video games: During the testing phase of a video game, the emotions of a tester are recognized to get real feedback. This can be ensured with the detection of peak frames [3]. Security: Emotion recognition can help to recognize suspicious activity by monitoring the people to prevent criminal activity. HR assistance: Emotion recognition is used with AI while assessing a candidate to determine his/her honesty. Customer service: With the use of facial expression recognition [16] through cameras in service centres, and audio expression recognition through calls, the level of satisfaction of a customer can be determined. Distress help lines: Analysing the emotion of a caller can go a long way in helping someone in need as the level of anxiety or distress in the speech can help in understanding the level of urgency and threat faced by the caller.
Strong market valuation: As per many market reports, the value of the emotion recognition market has been growing steadily for the past 5 years at a CAGR of 39.9%. The estimated valuation of this area of machine learning has grown to $36 billion in 2021 from $6.7 billion in 2016. Better adaptability of emotion recognition on different datasets over time: Emotion recognition models have become more adaptive on different datasets. This can ensure higher efficiency in real-world problems where there is a variety and diversity in the data. Interface between humans and machines: Emotion recognition acts as an interface of utmost importance between a machine and a human [22] as it helps decode the needs and reactions of a human which further helps with the feedback mechanism of the machine. Emotion recognition being adopted by big brands: The big brands are utilising emotion recognition to understand the behaviour and reactions of their consumer base. For example, Disney has been using an AI-powered model which helps understand the reaction of their viewers. Virtual assistants like alexa, siri, google assistant and cortana utilise behavioural recognition and NLP to understand the users better and enhance their experience. Support for multimodal classification: Behavioural recognition supports the fusion of multiple modes for the final classification [11, 13]. With the multimodal implementation, the features are extracted using all three modes, i.e.
338
A. Sharma et al.
textual/lexical, audio and visual, which gives a very clear idea of the actual expression. For example, someone might be using a professional language and speaking calmly but he might be looking infuriated. Here the text and audio modes would classify the emotion as calm or neutral but the visual mode would correctly classify it as anger.
4.2 Weaknesses (1)
(2)
(3)
(4)
(5)
Impurities in the collected data: In machine based emotion detection, the primary prerequisite is data. However, if the data gathered has some flaws in it, it leads to bad results and poor classification. For example, the video could be blurry or the lighting could be inadequate hence hindering the visual feature extraction, or the audio could be noisy causing issues in both the audio based feature extraction and the text-based extraction in absence of a transcript (text would have to be extracted through NLP [10] on the audio). Context unaware feature extraction: An utterance is like a single unit of speech, which is preceded and succeeded with a pause. Different utterances in a sentence can depict a different emotion so it is difficult to tell the overall emotion of the sentence [8]. When an utterance is considered without considering the preceding and succeeding utterance, it leads to bad results. Presence of negation in an utterance is difficult to interpret: While a model can try and understand the emotion of 9 out of 10 words very well, if the last word is a negation it changes the meaning and emotion of the sentence completely hence resulting in a wrong classification. For example, “he thought that the book would be engaging, entertaining and epic, but it was not”. Here there are multiple words of praise in the sentence, but the negation in the last part of the sentence changes the emotion entirely. Below par results with visual features: Of all the works which have been performed in the area of multimodal emotion recognition, only a few could produce decent results with the visual mode, while others had poor results [3]. However the visual mode produces a lot of emotions which if utilised correctly could help easily recognize the correct emotion. Requirement of high computation power: With the complex task of utilising all the three modes of emotions and then applying a good enough fusion technique to produce good results, the basic classification models seem to be ineffective. For performing such a complex task, more and more intellectual models like bi-directional LSTM, attention mechanism-based deep learning techniques [5] and different ensemble models need to be researched upon which require a lot of computation power [7].
SWOT Analysis of Behavioural Recognition Through Variable …
339
4.3 Opportunities (1)
Wide range of applications possible: Emotion recognition as an area has many applications currently and can also be used in the future in unexplored application areas. (a)
(b)
(c)
(d)
(e)
(2)
(3)
Car safety measures: Emotion recognition is one of the parameters of smart transportation as video based emotion recognition can be of vital importance in case of car safety measures. The facial expressions of a driver can tell a lot about the state of the driver and hence ensure his safety in case of drowsiness. Understanding the driver’s emotions can also help in avoiding situations of road rage. Home healthcare: Healthcare surveillance systems [21] can help recognize the behaviour of a blood sugar or blood pressure patient can help identify whenever his/her sugar level is dropping or BP is increasing to provide immediate care. Call-routing based on emotion: The emotion of a customer can be recognized to take the appropriate action. If the customer seems to be in a happy mood, the call can be redirected to the sales team for pitching a new product, while if the customer is angry, the call can be redirected to the client-retention team. Customer personalised emotions of virtual assistant: The emotions depicted by a virtual assistant during a call or normal conversation can be personalised depending on the emotion of the user. The virtual assistant can show empathy and choose warm words to uplift the mood of an upset user. Real-time behaviour recognizing security cameras: Security cameras with real-time behavioural recognition can be installed in societies and residential areas. These cameras can constantly monitor human activity and immediately alert the guards on detection of suspicious behaviour.
Increasing competitiveness in the market with multiple interested companies: There is a huge growth in the adoption of emotion recognition among the different organisations and individuals in the world. The market is captured by a few players at the moment but the competition is steadily increasing with the involvement of more and more players. Ever growing amounts of data and the need to make sense of it: With the advent of social web (twitter, facebook, instagram, youtube etc.) a huge amount of data is being transferred on a daily basis. The data needs to be presented in a summarised manner so that it can be used efficiently. This opens up scope for the application of emotion recognition and that too in different modes. Nowadays video based reviews are becoming very common and a customer can find tons of reviews on every product. Automatic emotion recognition on such videos can help the customer to easily reach the conclusion regarding the product without spending much time on researching.
340
A. Sharma et al.
(4)
Growing research on new methodologies: Incorporating upcoming methodologies like attention mechanisms [5] (as all parts of a sentence don’t have the same importance) and encoder-decoder architecture would help to perform better feature extraction on lexical [6] and audio modes and drastically improve the emotion recognition results. Data shift from unimodal to multimodal: Earlier all the data on the internet used to exist in textual format. But of late, there is an abundance of multimodal data, i.e., texts, images and videos. This creates a lot of opportunities for multimodal emotion recognition [11, 13] which performs better than unimodal and bimodal recognition [17, 18].
(5)
4.4 Threats (1)
(2)
(3)
Curating a dataset is difficult and might compromise with generalizability of models: Small sample size in the training dataset can cause poor performance on real-world data. In order to curate a dataset like the IEMOCAP dataset, different speakers have to be collected and assigned different scripts to read. This is a troublesome task and cannot ensure generalizability (addressed in [4]) as the number of speakers is small and the scope of the scripts used is very limited, which results in a limited sample size of the collected data. Lack of person-independent methods of emotion recognition: The datasets curated for the emotion recognition models are very limited in number and each dataset uses a very limited number of speakers. This can lead to a bias as the models and techniques could be biased towards these handful of speakers and won’t be scalable. Zhalehpour et al. try to overcome this issue [3]. Limited Competition with only a handful of big players: Despite the fact that emotion recognition as a technology has been growing fast and that many different individuals and organisations are invested into it, there are only a handful of competitors in the big picture. This could lead to the monopoly of these companies and could also highlight the underlying threat due to which other big names haven’t joined the race.
Summarization of SWOT Analysis This field of machine learning has many strengths ranging from versatility in terms of applications to supporting multimodal fusion. Emotion recognition acts as an interface between humans and machines, which has led to its adoption by different big companies. However impurities in the collected data, context unaware feature extraction and below par accuracy with visual features are few weaknesses of this field. Generalizability in the curated dataset and absence of speaker independency pose as threats for the emotion recognition market. The market has many opportunities due to an increased competition among companies for utilising emotion recognition in unexplored areas. With growing research on new methodologies and growing
SWOT Analysis of Behavioural Recognition Through Variable …
341
amount of data which is shifting from unimodal to multimodal, recognition of human behavior through variable modalities has a huge potential.
5 Conclusion and Future Work The emotion recognition area of machine learning is relatively new with many potential applications. Our research work has a two-fold conclusion. Firstly, the comparative analysis performed on the work already done by researchers highlights the better performance of multimodal recognition as compared to unimodal or bimodal recognition. Classifying extreme emotions was found to be easier than classifying the neutral emotions, and the peak frame detection method proved to be more effective. Secondly, the SWOT analysis presented, gives a complete picture of the behavioural recognition field, enlisting all the major pros and cons with respect to the internal as well as the external factors. Collecting more person-independent data from different sources to increase diversity and computing the percentage of each emotion during the classification can help in limiting future threats. The opportunity to apply behavioural recognition in areas like smart transportation, security and healthcare can be utilised by classifying emotions in real-time. The analysis can prove to be helpful for all the organisations and individuals looking to invest their resources into the emotion recognition field and can also be a guide for the fellow researchers and students interested in this domain of machine learning.
References 1. X. Zhang, M.-J. Wang, X.-D. Guo, Multi-modal emotion recognition based on deep learning in speech, video and text, in 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP) (2020), pp. 328–333. https://doi.org/10.1109/ICSIP49896.2020.9339464 2. V. Rozgi´c, S. Ananthakrishnan, S. Saleem, R. Kumar, R. Prasad, Ensemble of SVM trees for multimodal emotion recognition, in Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (2012), pp. 1–4 3. S. Zhalehpour, Z. Akhtar, C. Eroglu Erdem, Multimodal emotion recognition with automatic peak frame selection, in 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings (2014), pp. 116–121. https://doi.org/10.1109/ INISTA.2014.6873606 4. S. Poria, I. Chaturvedi, E. Cambria, A. Hussain, Convolutional MKL based multimodal emotion recognition and sentiment analysis, in 2016 IEEE 16th International Conference on Data Mining (ICDM) (2016), pp. 439–448. https://doi.org/10.1109/ICDM.2016.0055 5. A. Zadeh, P.P. Liang, S. Poria, Multi-attention recurrent network for human communication comprehension, in Proceedings of the 32th AAAI Conference on Artificial Intelligence (2018), New Orleans, USA, pp. 5642–5649 6. A. Bandhakavi, N. Wiratunga, S. Massie, P. Deepak, Lexicon generation for emotion analysis of text. IEEE Intell. Syst. 32(1), 102–108 (2017) 7. E. Cambria, S. Poria, D. Hazarika, K. Kwok, SenticNet 5: discovering conceptual primitives for sentiment analysis by means of context embeddings, in AAAI (2018), pp. 1795–1802
342
A. Sharma et al.
8. S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh, L.-P. Morency, Context-dependent sentiment analysis in user-generated videos (2017), pp. 873–883. https://doi.org/10.18653/v1/ P17-1081 9. A. Zadeh, R. Zellers, E. Pincus, L.-P. Morency, Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. IEEE Intell. Syst. 31(6), 82–88 (2016) 10. A. Ashok, R. Elmasri, G. Natarajan, Comparing different word embeddings for multiword expression identification, in Natural Language Processing and Information Systems (Springer, Cham, 2019) 11. C. Fadil, R. Alvarez, C. Martínez, et al., Multimodal Emotion Recognition Using Deep Networks (Congreso Latinoamericano De Ingeniería Biomédica Claib, 2014) 12. C.D. Alves, G. Sullivan, Facial expression of emotion: interaction processes, emotion regulation and communication, in Handbook on Facial Expression of Emotion, vol. 1 (2013) 13. P.K. Atrey, M.A. Hossain, A. El Saddik, M.S. Kankanhalli, Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16, 345–379 (2010) 14. A. Tawari, M.M. Trivedi, Face expression recognition by cross modal data association. IEEE Trans. Multimedia 15, 1543–1552 (2013) 15. R. Gajsek, V. Struc, F. Mihelic, Multi-modal emotion recognition using canonical correlations and acoustic features, in 2010 20th International Conference on Pattern Recognition (ICPR) (2010), pp. 4133–4136 16. X. Zhu, D. Ramanan, Face detection, pose estimation, and landmark localization in the wild, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012), pp. 2879– 2886 17. Y. Wang, L. Guan, A.N. Venetsanopoulos, Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition. IEEE Trans. Multimedia 14, 597–607 (2012) 18. M. Meghjani, F. Ferrie, G. Dudek, Bimodal information analysis for emotion recognition, in 2009 Workshop on Applications of Computer Vision (WACV) (2009), pp. 1–6 19. M. Li, X. Qiu, S. Peng, L. Tang, Q. Li, W. Yang, Y. Ma, Multimodal emotion recognition model based on a deep neural network with multiobjective optimization. Wirel. Commun. Mob. Comput. 2021, 10 (2021). https://doi.org/10.1155/2021/6971100, Article ID 6971100 20. A. Kołakowska, A. Landowska, M. Szwoch, W. Szwoch, M. Wróbel, Emotion recognition and its applications. Adv. Intell. Syst. Comput. 300, 51–62 (2014). https://doi.org/10.1007/978-3319-08491-6_5 21. M. Dhuheir, A. Albaseer, E. Baccour, A. Erbad, M. Abdallah, M. Hamdi, Emotion recognition for healthcare surveillance systems using neural networks: a survey. Int. Wirel. Commun. Mob. Comput. (IWCMC) 2021, 681–687 (2021). https://doi.org/10.1109/IWCMC51323.2021.949 8861 22. Y. Bian, L. Zhao, H. Li, G. Yang, L. Geng, X. Deng, Research on multi-modal human-machine interface for aerospace robot, in 2015 7th International Conference on Intelligent HumanMachine Systems and Cybernetics (2015), pp. 535–538. https://doi.org/10.1109/IHMSC.201 5.74
E-Mixup and Siamese Networks for Musical Key Estimation Pranshav Gajjar, Pooja Shah, and Harshil Sanghvi
Abstract With the rapid transition from “Classic” culture to the “Popular” music culture, now there is no restriction for a song playing around a global key. It’s pretty common to find a musical piece to change continuously from one key to another during its playback period. In this scenario, it becomes essential to estimate keys around which song revolves during different timestamps. This estimation of keys enables musicians and analysts to obtain information about important features of music such as tonality, modulation, etc. which poses an important purpose and motivation for this study. This paper provides a deep learning implementation for key estimation in a musical excerpt. Siamese Networks are used for obtaining high-level embeddings and the E-Mixup algorithm for augmentation which is fed into an MLP for further classification. A subset of the famous GTZAN genre collection containing 837 elements and corresponding key truth values is used to train a similarity learning pipeline. The proposed model is thoroughly assessed and compared to a baseline Convolutional Neural network, which it outperforms with an absolute classification accuracy of 84.02%.
1 Introduction A musical key can be defined as the minor or major scale around which a musical piece revolves. A song composed in a minor or major key is said to be based on minor or major scale respectively. For instance, when a song is said to be played in the D minor key, then it revolves around the pitches—D, E, F, G, A, B, and C. This shows that the essential notes which give shape to the bassline, melody, and chords of a song are derived from that particular group of notes. A song playing in the A P. Gajjar · P. Shah (B) · H. Sanghvi Institute of Technology, Nirma University, Ahmedabad, India e-mail: [email protected] P. Gajjar e-mail: [email protected] H. Sanghvi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_26
343
344
P. Gajjar et al.
minor key revolves around the pitches A, B, C, D, E, F, and G. Similarly, a musical piece can be in a major key and revolve around a natural major scale. For instance, a song playing in the E minor key revolves around pitches—E, F, G, A, B, C, and D. Any major scale or natural minor scale can serve as a key for a musical piece. Scales and keys provide a base of acceptable notes to build melodies and harmonies [27]. A musical key is a song’s home. The key tells you several things about a song: the sharps and flats used, the scale the song is based on, the scale note that serves as is the song’s home note, and much more. There is only one home note in a song, and all of the other notes in the song are related to it depending on how distant or close they are to home. As a result, comprehending a musical key entails comprehending the connections between notes [9]. A song in the D key is mostly based on the D major scale, since keys are connected to scales, using mostly (or only) notes from that scale for the song’s melody and harmony. Now listener’s ears get cozy with the notes from the D major scale throughout the song. Now if the composer throws in a bunch of other notes from another scale, it would become unsettling to the ears of listeners [9]. To approximate the global key of a musical piece, numerous ways have been put forward to date. Template-based approaches [16] or geometry-based approaches [19] have been presented as symbolic entities. Other template-based algorithms [7, 10, 26, 32, 34], HMMs [25], or geometry [5] have been depicted for audio-based data. A minor part of tonality analysis is determining the dominant key of a musical piece. Even though a musical composition starts and concludes in a single key known as the main or global key, it is common for a composer to switch between keys. Modulation is the term used to describe the transition of a piece of music from one key to another. Pitches in Western tonal music are structured in a succession of key regions around one or more steady tonal centers. In contrast to the global key, these regions of the key are referred to as local keys [25]. The global key for Western music is defined as the collection of pitches or scales which delineate the tonality of a composition or a musical excerpt. The accurate recognition of musical keys portrays important notes and emphasizes the playability, timbre, and range of various instruments. Manual key identification necessitates high perceptual ability, and studies have shown that persons with varying levels of training can recognize tonality in different ways. Krumhansl [16]. Within a single song, music genres such as ‘Classical’ have a strong conviction in a global key, but this notion changes dramatically in ‘Popular’ music owing to notions such as no tonality, local tonality, seeming tonality, and so on. As a result, automated key estimate and analysis is a significant aspect of retrieving music information, and numerous approaches have been developed to accomplish this task. As shown in [29], many tasks that looked impossible have become doable and efficient thanks to recent developments in the fields of Artificial Intelligence, Computing [2, 3, 28], and Deep learning. With the said motivations this paper proposes a similarity learning model with embedding augmentation for key estimation while thoroughly assessing each associated hyper-parameter. The next section talks about related modern literature, Section 3 describes the methodology followed by results analysis and the conclusion.
E-Mixup and Siamese Networks for Musical Key Estimation
345
2 Related Work “Key profiles” are the foundation for a lot of work in the field of musical key estimation. Krumhansl and Schmuckler suggested one such notion in [16]. They offered an algorithm based on key characteristics developed by Krumhansl and Kessler throughout their trials. Participants of trials were asked to assess how well each pitch class matched a prior context presenting a key, like a scale or cadence, in this experiment. By linking each key profile with the piece’s “input vector," the recommended method finds the key of a particular piece. The correlations between the input vector and each key profile were calculated to discover the key for a certain piece. Finally, the key profile with the highest correlation was chosen as the preferred key. David Temperley in [30] reviewed and expanded this approach by suggesting many improvements. The first suggestion was to make the matching formula simpler. A recommendation for alternative values for key profiles themselves was the second modification. Instead of aggregating the periods of all appearances of pitch class, the new model splits the piece into smaller segments and categorizes every pitch category as existent or missing in each segment in the third alteration. The most recent update punished switching keys from one segment to the next. When it came to finding keys, the revised approach was a big success. Another method for estimating local keys is to utilize a Hidden Markov Model (HMM) using pre-defined templates. The work in [25] discusses such a model. The authors propose a paradigm in which an audio file is first segmented into portions, then these sections are labeled with local keys. This approach may be used to estimate the key progression of Western tonal polyphonic passages. The approach proposed in this research combines and extends many previous methods for estimating global keys. The key progression is described using chord progression in the model suggested in this research since chord progression is tightly related to the key in Western tonal music. The approach presented here is based on metrical structure and analytical windows that are related to bars. For key estimation, segments that are represented in terms of the tempo period are employed instead of empirically picked segments. As a consequence, the key estimate is done using parts that are specifically adapted to the piece’s musical material. Filip Korzeniowski and Gerhard Widmer introduce a global key estimation technique [14] powered by a convolutional neural network (CNN) [17]. The model we’re talking about here can teach you from start to finish automatically. The autonomous training capacity of the model removes the need for a professional grasp of feature design and some pre-processing steps such as tuning correction and spectrum whitening. On the GiantSteps dataset [13] and the Billboard dataset [1], the model received Mirex scores of 74.3 and 83.9, respectively. The model, however, has the drawback of only being able to estimate a global key for a full piece. Matthias Mauch and Simon Dixon describe a completely automated technique for predicting the chord sequence—bass notes, key, and chord metric positions from an audio waveform in [20]. The method is based on a six-layered dynamic Bayesian network with four hidden layers that represent metrics such as bass pitch
346
P. Gajjar et al.
class, chord, position, and key, and two observed layers that simulate low-level audio input related to treble and bass tonal content. While compared to previous methods, the suggested technique can deliver much more harmonic information while keeping excellent accuracy when employing 109 distinct chords. On the 176 audio files from the MIREX 2008 Chord Detection Task, the model achieves a classifier accuracy of 71%. The harmonic progression analyzer (HPA), which is based only on machine learning methods, is another option. HPA, a new bass, key, and chord concurrent estimation method, is detailed in [22]. HPA uses a method called chromagram extraction, which was motivated by research on loudness perception. The approach described here improves estimate performance, and since it relies only on machine learning methods, further gains are certain as additional data becomes available. Finally, because of its excellent memory, speed, and time complexity tradeoffs, HPA is well suited to real-world harmonic analysis applications. In this article, we present a new method for key estimation that combines a Siamese Network [6] with E-Mixup embedding augmentation [33]. Due to the rare availability of research on similarity learning for musical keys, and while analyzing the corresponding constraints in the contemporary literature linked with sparse datasets, this methodology was chosen. When compared to a baseline CNN, the model performs very well on a tiny dataset due to the application of similarity learning.
3 Methodology 3.1 Dataset The famous GTZAN genre collection [31] comprising of a thousand 30 second audio files with the key values from [18] is used. Image data is required for training convolutional neural networks hence; spectrograms are computed for every 7.5 s of an excerpt for which a global key is available. The final result is 3348 images with an array size of 180 × 40 spread across 9 genre classes and 24 global key types (12 tonics, major or minor modes). 80% of the data for training is arbitrarily chosen, and the remaining data is used to evaluate the model’s accuracy of classification (Table 1). The dataset contains a Minor to Major key ratio of 1.318 signaling a wellgeneralized dataset for training and testing, the same train-test split is used for assessing the proposed model and the control model.
3.2 Siamese Network For predicting the similarity index between the input pairs, a Siamese network with twin convolutional neural networks is trained. The Cross-Entropy loss function [11]
E-Mixup and Siamese Networks for Musical Key Estimation Table 1 Major and minor key distribution for full-sized audio Genres Major Keys Blues Country Disco Hip-hop Jazz Metal Rock Reggae Pop
3 94 43 13 52 4 55 53 44
347
Minor Keys 95 5 55 68 27 89 43 44 50
is linked to the base networks, calculated by using the representations learned at the last fully connected layer, As the network parameters are identical, the model learns the same transformations resulting in a favorable feature space. The base network is used for embedding generation, which is further classified by an MLP (Multi-Layer Perceptron) [8]. The convolutional layers have a stride of 1 with a unit dilation rate [24] and ’ReLU’ activation [23] and are followed by a Max-pooling layer [15] with a pooling window of 2 × 2. The Max-pool layers reduce computing effort by providing an abstract view of the input and halving the dimensionality. A flattening layer and a fully connected layer comprised of 256 Sigmoid activated neurons [23] follow the final pooling layer. The absolute difference of the embeddings is used as an input for the final layer consisting of one sigmoid-activated neuron. The Siamese network is trained on 50000 pairs with a validation split of 0.2, comprising of equal dissimilar and similar pairs with the ‘Adam’ optimization algorithm [12], for 25 epochs with a batch size of 200 with a learning rate of 0.001. The term epochs implicate a computational iteration where the network works on the entire dataset once, for convenient training loss and accuracy values the model follows the said parameters (Fig. 1).
Fig. 1 Model schematics for the base network, 32, Conv2D (3,3) implies a Convolution layer with 32 units and a filter size of 3 × 3
348
P. Gajjar et al.
3.3 Embedding Augmentation and Classification Embeddings have a dimensionality of 256 and are further augmented using the EMixup technique. A weighted average is taken over two embeddings and the binary matrix representation of the class values with the weight as lambda, which is computed as a random value from a Beta distribution with alpha as 0.2. By using the said method, the training set is tripled and is classified by using an MLP with two ReLU activated hidden layers of size 1024, 2048, and a Softmax layer [23] which returns a probability distribution of the 24 key classes. All outputs are in the range [0,1] and aggregate to 1, the maximum of the 24 values is selected as the key class. The model is trained with the Cross-entropy loss function and Adam optimization with a learning rate of 0.01 for 30 epochs.
4 Results and Discussion The training time for the proposed Siamese network aggregated to 124.716 s or 4.988 seconds per epoch. For evaluating the proposed models, they are validated and compared to a baseline CNN, which consists of three convolutional blocks, a flattening layer, followed by a SoftMax layer. The convolutional blocks have a convolutional layer operating on a stride of 1 and unit dilation rate, a Max-pool layer of pooling size 2 × 2 and the first two blocks are regularized by a dropout layer [21] with a dropout rate of 0.2. The convolution layers follow the order of 32 units with a filter size of (7,7), 64 units with a filter size of (5,5), and 128 units with a filter size of (3,3). The model is trained with cross-entropy loss function and Adam optimization with a learning rate of 0.01 for 20 epochs. The specific iteration amount was used to avoid overfitting. The mentioned methods and deep pipelines were implemented using Keras [4] (Table 2). The use of Siamese networks is deemed successful as a 2.23% increase in accuracy is observed without the use of embedding augmentation and a 4.17% increment with the use of E-Mixup. Using augmentative methods with the SiameseNet enhanced the percentage accuracy values by 1.94 while using the same MLP architecture.
Table 2 Percentage classification accuracy Architecture Baseline CNN SiameseNeta + MLP SiameseNet + E-Mixup + MLP a Siamese
network is referred as SiameseNet
Accuracy 79.85 82.08 84.02
E-Mixup and Siamese Networks for Musical Key Estimation
349
5 Conclusion and Future Work With the motivation to automate the task of musical key classification, the paper proposed a similarity learning-based approach for embedding generation. E-Mixup was used for augmenting the embeddings for increasing the training size with relevant elements. The proposed methodology outperformed a Convolutional Neural Network and a vanilla Siamese network for classification accuracy with a significant margin of 4.17 and 1.94 respectively. Due to the use of relatively smaller excerpts, the proposed approach can lead to a better understanding of key modulation. Automated Key identification is a small part of MIR (Music Information Retrieval), for further studies, we would focus on using similarity learning and other augmentative methods for tasks like tempo estimation, audio fingerprinting, Genre identification, and ArtificialIntelligence based recommendation systems.
References 1. J. Ashley, B. Jonathan, W.I. Fujinaga, An expert ground truth set for audio chord recognition and music analysis, in Proceedings of the 12th International Society for Music Information Retrieval Conference (2011) 2. A. Bashar, S. Smys, Physical layer protection against sensor eavesdropper channels in wireless sensor networks. IRO J. Sustain. Wireless Syst. 3(2), 59–67 (2021) 3. J.I.Z. Chen, Modified backscatter communication model for wireless communication network applications. IRO J. Sustain. Wireless Syst. 3(2), 107–117 (2021) 4. F. Chollet, et al., Keras (2015). https://github.com/fchollet/keras 5. C.H. Chuan, E. Chew, Audio key finding: considerations in system design and case studies on chopin’s 24 preludes. EURASIP J. Adv. Signal Process. 2007, 1–15 (2006) 6. M. Fiaz, A. Mahmood, S.K. Jung, Deep siamese networks toward robust visual tracking, in Visual Object Tracking with Deep Neural Networks. IntechOpen (2019). https://doi.org/10. 5772/intechopen.86235 7. E. Gómez, Tonal description of polyphonic audio for music content processing. INFORMS J. Comput. 18(3), 294–304 (2006) 8. S. Hung, H. Adeli, H, Multi-layer perceptron learning for design problem solving, in Artificial Neural Networks (Elsevier , 1991), pp. 1225–1228. https://doi.org/10.1016/b978-0-44489178-5.50057-9 9. M.A. Ishiguro, The Affective Properties of Keys in Instrumental Music from the Late Nineteenth and Early Twentieth Centuries (2010) 10. Ö. Izmirli, Template based key finding from audio, in ICMC. Citeseer (2005), pp. 211–214 11. K. Janocha, W.M. Czarnecki, On Loss Functions for Deep Neural Networks in Classification (2017). https://doi.org/10.4467/20838476si.16.004.6185 12. D.P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization (2014). https://arxiv.org/abs/ 1412.6980 13. P. Knees, A. Faraldo, P. Herrera, R. Vogl, S. Bock, F. Horschlager, M.L. Goff, Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections, in 16th International Society for Music Information Retrieval (ISMIR) Conference (2015) 14. F. Korzeniowski, G. Widmer, End-to-end musical key estimation using a convolutional neural network, in 2017 25th European Signal Processing Conference (EUSIPCO), IEEE (2017). https://doi.org/10.23919/eusipco.2017.8081351
350
P. Gajjar et al.
15. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks 60(6), 84–90 (2017). https://doi.org/10.1145/3065386 16. C.L. Krumhansl, Cognitive Foundations of Musical Pitch. Oxford University Press (2001). https://doi.org/10.1093/acprof:oso/9780195148367.001.0001 17. S. Kulshrestha, What is a convolutional neural network?, in Developing an Image Classifier Using TensorFlow. Apress (2019). https://doi.org/10.1007/978-1-4842-5572-8_6 18. A. Lerch, Audio Data Set Annotations (2013). https://github.com/alexanderlerch/gtzan_key 19. A. Mardirossian, E. Chew, skefis–a symbolic (midi) key-finding system, in 1st Annual Music Information Retrieval Evaluation eXchange, ISMIR (2005) 20. M. Mauch, S. Dixon, Simultaneous estimation of chords and musical context from audio 18(6), 1280–1289 (2010). https://doi.org/10.1109/tasl.2009.2032947 21. G.S. Nandini, A.S. Kumar, Dropout technique for image classification based on extreme learning machine 2(1), 111–116 (2021). https://doi.org/10.1016/j.gltp.2021.01.015 22. Y. Ni, M. McVicar, R. Santos-Rodriguez, T.D. Bie, An end-to-end machine learning system for harmonic analysis of music 20(6), 1771–1783 (2012). https://doi.org/10.1109/tasl.2012. 2188516 23. C. Nwankpa, W. Ijomah, A. Gachagan, S. Marshall, Activation Functions: Comparison of Trends in Practice and Research for Deep Learning (2018). http://arxiv.org/abs/1811.03378v1 24. H. Pan, X. Lei, X. Huang, A dilated CNN model for wide-band remote sensing image classification, in 2019 IEEE International Conference on Real-time Computing and Robotics (RCAR), IEEE (2019). https://doi.org/10.1109/rcar47638.2019.9043976 25. H. Papadopoulos, G. Peeters, Local key estimation from an audio signal relying on harmonic and metrical structures 20(4), 1297–1312 (2012). https://doi.org/10.1109/tasl.2011.2175385 26. S. Pauws, Musical key extraction from audio, in ISMIR (2004) 27. A. Pouska, Keys in Music. https://www.studybass.com/lessons/harmony/keys-in-music/ 28. S. Shakya, P. Joby, Heart disease prediction using fog computing based wireless body sensor networks (wsns). IRO J. Sustain. Wireless Syst. 3(1), 49–58 (2021) 29. K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition (2014). http://arxiv.org/abs/1409.1556v6 30. D. Temperley, What’s key for key? The krumhansl-schmuckler key-finding algorithm reconsidered 17(1), 65–100 (1999). https://doi.org/10.2307/40285812 31. G. Tzanetakis, P. Cook, Musical genre classification of audio signals 10(5), 293–302 (2002). https://doi.org/10.1109/tsa.2002.800560 32. S. Van De Par, M.F. McKinney, A. Redert, Musical key extraction from audio using profile training, in ISMIR (2006), pp. 328–329 33. C.R. Wolfe, K.T.L.: E-stitchup: Data Augmentation for Pre-trained Embeddings (2019). https:// arxiv.org/abs/1912.00772 34. Y. Zhu, M.S. Kankanhalli, Precise pitch profile feature extraction from musical audio for key detection. IEEE Trans. Multimedia 8(3), 575–584 (2006)
Microarray Data Classification Using Feature Selection and Regularized Methods with Sampling Methods Saddi Jyothi, Y. Sowmya Reddy, and K. Lavanya
Abstract In recent studies of medical field especially, it is essential to assess the expression levels of genes using the microarray technology. Most of the medical diseases like breast cancer, lung cancer, and recent corona are estimated using the gene expressions. The study in this paper focused on performing both classification and feature selection on different microarray data. The gene expression data is high dimensional and extraction of optimal genes in microarray data is challenging task. The feature selection methods Recursive Feature Elimination (RFE), Relief, LASSO (Least Absolute Shrinkage And Selection Operator) and Ridge were initially applied to extract optimal genes in microarray data. Later, applied a good number of multi classification methods which includes K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Multilayer Perceptron Networks (MLP), Random Forest (RF) and Logistic Regression (LR). But the combination of mentioned feature selection and classifications required high computation. However, resampling method (i.e., SMOTE = Synthetic Minority Oversampling Technique) prior to the feature selection which enhances the microarray data analysis in classification respectively. The resampling method, with combination of RFE and LASSO feature selection using SVM and LR classification outperforms compared to other methods.
1 Introduction Most of the real time applications especially in the medical field, it is known that gene expression levels are analyzed simultaneously using the microarray data [1–4]. The desired expression levels are helps to determine the suitable concussions related to the activity according to the biological status of the samples. Moreover, it is observed that a strong relationship is established among the gene expression and diseases like lung cancer, breast cancer, and Leukemia [5, 6]. However, it is not a big task to perform classification to such kind of gene expression data. Also, there are number S. Jyothi (B) · Y. Sowmya Reddy · K. Lavanya Lakireddy Bali Reddy College of Engineering, Mylavaram, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_27
351
352
S. Jyothi et al.
of machine learning approaches are explored most of the researchers to perform classification on microarray data and its results recommended for better diagnosis. It is observed that, prior to the classification task in case of the microarray data number of methods are required to perform evaluation of genes, and selection of suitable genes [9]. The study compared feature selection methods includes Recursive Feature Elimination (RFE) [10] Relief [18], LASSO [17], Ridge [12] and E-Net [8] with set of classifiers, Support Vector Machine, Naïve Bayes, Multilayer Perceptron, Random Forest, and Logistic Regression. The RFE method [5] uses backward feature elimination and assigns weights to features, helps to recursively eliminates the unsuitable features. The extend version with less computation time is called variable step size RFE (VSSRFE) [10]. The conditional dependencies between features is evaluated by the method is called the Relief [18]. The Least Absolute Shrinkage and Selection Operator (LASSO) is one of the regularized methods with L1 penalty. Other end, two techniques Ridge and E-Net are the other regularized methods with L2 and L1 + L2 penalty. The mentioned feature selection methods are helps to retrieve better features of gene expression levels on microarray data. Due to nature of high dimensional the microarray data analysis leads to class imbalance problem. The problem of class imbalance affects to the performance of the classification accuracy. Hence, in addition to the feature selection it is also essential to apply suitable resampling method to the microarray data prior to the classification techniques. The prominent resampling method is SMOTE [10], but it is not suitable for microarray data. Number of ensemble methods recommended [13], but study considered Random value-based oversampling (RVOS) [8]. The method allows to balance the samples and solves the problem of small sample size accordingly. In this study, proposed a combination of feature selection, resampling and classification on microarray data. In microarray data analysis one of the most important criteria is Classification and few of researches give less importance to it and affects to model performance. Among the techniques includes K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Multilayer Perceptron Networks (MLP), Random Forest (RF) and Logistic Regression (LR) applied over microarray datasets [5]. The microarray analysis using resampling with SVM and LR retaining better results and with RFE and LASSO feature selection [8]. The complete idea is performed on 10 microarray data sets with multiclass. The resampling method, with combination of RFE and LASSO feature selection using SVM and LR classification outperforms compared to other methods. Section 2 describes the Existing methods used in this study. The dataset, feature selection, and classification methods explained clearly in this section. Section 3 provides the simulation and results. Also, this section provided the complete discussion on the results. At last section the study is concluded and described the future work.
Microarray Data Classification Using Feature Selection …
353
Table 1 The standard microarray datasets Notation
Data set
Disease
Samples
Features
Class
D1
Chowdary
Breast cancer
104
22,283
2
D2
Sorlie
Breast cancer
85
456
5
D3
Chin
Breast cancer
118
22,215
2
D4
Pomeroy
CNSET
60
7128
2
D5
Alon
Colon caner
62
2000
2
D6
Golub
Leukemia
72
7129
2
D7
Yeoh
Leukemia
248
12,625
6
D8
Gordon
Lung cancer
181
12,533
2
D9
Singh
Prostate cancer
102
12,600
2
D10
Khan
SRBCT
63
2308
4
2 Methods and Materials The section first it starts with description about the data sets later few of the feature section and classification method were provided. At last proposed method is discussed.
2.1 Microarray Datasets There is total 10 data sets were considered related to the micro array data and the complete details provided in Table 1. The majority of the mentioned in the data sets related to predicting the sample belongs to healthy or diagnosis. The Datasets are in the form of CSV files, and it can be found in the given URL: https://github.com/kivancguckiran/microarray-data. There is total 10 data sets were considered related to the micro array data and the complete details provided in the Table. The majority of the mentioned in the data sets related to predicting the sample belongs to healthy or diagnosis. The Datasets are in the form of CSV files, and it can be found in the given URL: https://github.com/kivancguckiran/microarray-data.
2.2 Feature Selection The main advantage of the feature selection methods is to analyses the given input data and then retrieved only suitable or relevant data as output. However, it is essential in the case of the microarray data which is composed of high dimensional consist unrelated information. After proper applying of, feature selection optimal features are recommended for further analysis. However, there are number of feature selection
354
S. Jyothi et al.
methods were existed but penalized-based methods like LASSO, Ridge and E-Net are suitable for high dimensional microarray data. The LASSO and Ridge are set of the linear models and included the penalize coefficients of L1 and L2 [5]. Later E-Net is a combination of both L1 and L2 penalties. From all regularized methods, the resultant coefficients are almost close to zero. RFE, it is one of iterative based feature selection method, in every step of iteration, a ranking mechanism is estimated to every feature and whose values is less will be excluded in the final feature selection.
2.3 Classification In recent studies of machine learning, Classification and Regression are the two important concepts but study is focused about only classification. The advantage of the classification criteria is it allow to determine the likelihood of the concern data set. There are number of the classification study focused only few methods which includes: K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Multilayer Perceptron Networks (MLP), Random Forest (RF) and Logistic Regression (LR). SVM is best choice for classifier for microarray data reported by most of the research study. It was run using the two mechanism one is kernel techniques and other is Lagrange dual solver. In the present community of the machine learning MLP is powerful classifier and it composed of a greater number of hidden layers among both input and output. There are total N nodes, among all relate to each other between layers. For every connection there is a weight assigned using the backpropagation algorithm. Random Forest is classification and regression algorithm using bagging with randomization. Support Vector Machines are the most used classifier among bioinformatics studies. LR is not only performs two-class classification also support for the multi-class case. Moreover, it defines an estimate of the underlying class probabilities.
2.4 Cross Validation To assess the quality of the results the most essential criteria is Cross Validation. The high dimensional data sets (i.e., Gene expression level-based microarray data), recommended to apply the Leave-one-out Cross Validation (LOOCV) [10]. Here, method can allow leaving sample one at time and later test model with same sample. At end overall score is computed by the consolidation of all samples.
Microarray Data Classification Using Feature Selection …
355
3 Proposed Methodology The study focused mainly on microarray data analysis in multiple classifications using feature selection. Number of classification technique among MLP is suitable for the high dimensional data. Some of the study worked with 2 hidden layers with 100 and 50 neurons using the Rectified Linear Units (ReLU) with softMax activation functions at output layers respectively. To enhance the capability of the classification task there is a need of the additional booster (i.e., Adamax) and produced result of 40–60 epchos during the training. Later, few of the works introduce Random Forest and observed nothing changes in the classification performance. However, it is observed that by varying the maximum depth and estimator count affects to the training time but not accuracy. The study, first applied feature selection RFE, LASSO and Ridge to the microarray data sets and selected the desired features. In case of the regularized methods LASSO and Ridge considered alpha parameters 0.002 and 0.003. Later, applied the different classification algorithms include KNN, SVM, MLP, RF and LR to classify the features. Finally the results were validated using the measure of LOOCV and retained the final score of the each method. Moreover, the desired features of microarray dataset after applying feature selection are shown in Table respectively. The complete idea of the methodology is also shown in Fig. 2 . The architecture of MLP classifier with hidden layers is described in the (Fig. 1).
Fig. 1 The architecture of MLP with 2 hidden layers with 100 and 50 neurons
356
S. Jyothi et al.
Fig. 2 The framework of proposed methodology for microarray dataset
4 Results and Discussion The results shown from Tables 2, 3, and 4, the LASSO feature selection with resampling concept promising the accuracy score compared to the RFE and LASSO feature selection. Also, it is observed that the classifier SVM perform operation much faster Table 2 Results analysis of SVM classification and feature selection methods on microarray dataset Data set
Features
RFE (%)
LASSO (%)
LASSO_RS (%)
D2
74
87.45
95.28
97.43
D3
112
78.23
92.90
87.25
D4
56
81.82
85.60
89.67
D5
58
75.60
81.69
87.21
D6
64
80.31
82.76
83.80
D7
231
90.57
93.80
98.21
D8
97
85.29
89.56
93.18
D9
84
92.65
94.90
95.58
D10
56
88.42
90.71
92.21
Table 3 Results analysis of RF classification and feature selection methods on microarray dataset Data set
Features
RFE (%)
LASSO (%)
LASSO_RS (%)
D2
74
88.65
96.20
98.10
D3
112
87.73
93.58
90.34
D4
56
83.02
87.49
90.70
D5
58
77.80
85.54
89.47
D6
64
81.51
86.48
86.92
D7
231
91.47
95.76
98.65
D8
97
86.49
94.60
94.37
D9
84
93.55
98.21
96.40
D10
56
88.54
93.85
94.10
Microarray Data Classification Using Feature Selection …
357
Table 4 Results analysis of MLP classification and feature selection methods on microarray dataset Data set
Features
RFE (%)
LASSO (%)
LASSO_RS (%)
D2
74
90.90
99.00
98.17
D3
112
82.38
94.28
91.94
D4
56
90.58
87.62
92.32
D5
58
90.23
93.77
87.95
D6
64
84.10
89.98
93.54
D7
231
93.07
97.20
98.95
D8
97
90.78
98.40
96.54
D9
84
91.19
98.90
98.42
D10
56
90.95
96.41
97.11
compared to the MLP especially during the training. However, the accuracy sore of MLP is denominating and better compared to the MLP. The study also concerns other method Random Forest, and it is superior in training time compared to the MLP but took less high time than SVM. In the case of accuracy score is concern RF deteriorates its performance compared to both MLP and RF. Moreover, applied LOOCV for validation of training of training by each classification method and results LASSO with resampling combination outperforms compared to others. Especially observed from the Table 2, the microarray datasets, D1: 98.17%, D6: 98.95% and D8: 98.42% in case of the MLR classification retained high accuracy score compared to other data sets. In case of SVM the data sets retained scores in D1: 97.43%, D6: 98.21%, D8: 95.58% and RF the data sets retained scores D1: 98.10%, D6: 98.65%, D8: 96.40% respectively.
5 Conclusion The gene expression data is high dimensional and extraction of optimal genes in microarray data is challenging task. The feature selection methods Recursive Feature Elimination (RFE), Relief, LASSO and Ridge were initially applied to extract optimal genes in microarray data. The study, focused with applying feature selection RFE, Ridge and LASSO with SVM, MLP and LR to classify microarray datasets. In addition to the feature selection resampling method is introduced for the microarray data sets to enhance the classification results. The novel methodology is produced superior results compared to the standard methods in terms of accuracy score. Moreover, it is observed that LASSO with MLP is producing better results with combination of the resampling approach compared to the SVM and RF. The combination of resampling with LASSO prominent to the high dimensional micro array data sets.
358
S. Jyothi et al.
References 1. Z.M. Hira, D.F. Gillies, A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015, 198363 (2015) 2. D.P. Kingma, J. Ba, Adam: a method for stochastic optimization (2014). arXiv preprint arXiv: 1412.6980 3. R. Nakayama, T. Nemoto, H. Takahashi, T. Ohta, A. Kawai, K. Seki, T. Hasegawa, et al., Gene expression analysis of soft tissue sarcomas: characterization and reclassification of malignant fibrous histiocytoma. Modern Pathol. 20(7), 749 (2007) 4. B.C. Christensen, E.A. Houseman, C.J. Marsit, S. Zheng, M.R. Wrensch, J.L. Wiemels, D.J. Sugarbaker, et al., Aging and environmental exposures alter tissue-specific DNA methylation dependent upon CpG island context. PLoS Genet. 5(8), e1000602 (2009) 5. R. Arias-Michel, M. García-Torres, C.E. Schaerer, F. Divina, Feature selection via approximated Markov blankets using the CFS method, in 2015 International Workshop on Data Mining with Industrial Applications (DMIA) (IEEE, 2015, September), pp. 38–43 6. C. Huertas, R. Juarez-Ramirez, Automatic threshold search for heat map based feature selection: a cancer dataset analysis. World Acad. Sci. Eng. Technol. Int. J. Compu. Electr. Autom. Control Inform. Eng. 10(7), 1341–1347 (2016) 7. H.A. Le Thi, D.N. Phan, DC programming and DCA for sparse Fisher linear discriminant analysis. Neural Comput. Appl. 28(9), 2809–2822 (2017) 8. P.H. Huynh, V.H. Nguyen, T.N. Do, Random ensemble oblique decision stumps for classifying gene expression data, in Proceedings of the Ninth International Symposium on Information and Communication Technology (ACM, 2018, December), pp. 137–144 9. L. Nanni, S. Brahnam, A. Lumini, Combining multiple approaches for gene microarray classification. Bioinformatics 28, 1151–1157 (2012) 10. W. You, Z. Yang, G. Ji, PLS-based recursive feature elimination for high dimensional small sample. Knowl.-Based Syst. 55, 15–28 (2014) 11. W. You, Z. Yang, M. Yuan, G. Ji, Totalpls: local dimension reduction for multicategory microarray data. IEEE Trans. Human-Mach. Syst. 44, 125–138 (2014) 12. J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi et al., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7, 673–679 (2001) 13. Y. Piao, M. Piao, K. Park, K.H. Ryu, An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data. Bioinformatics 28, 3306–3315 (2012) 14. W. You, Z. Yang, G. Ji, Feature selection for high-dimensional multicategory data using PLSbased local recursive feature elimination. Exp. Syst. Appl. 41, 1463–1475 (2014) 15. K. Yang, Z.P. Cai, J.Z. Li, G.H. Lin, A stable gene selection in microarray data analysis. BMC Bioinform. 7 (2006) 16. I. Tsamardinos, E. Greasidou, G. Borboudakis, Bootstrapping the out-of sample predictions for efficient and accurate cross-validation. Mach. Learn. (available online) 107(12), 1895–1922 (2018) 17. R. Tibshirani, Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Ser. B (Methodological) 58(1), 267–288 (1996) 18. Y. Saeys, I. Inza, P. Larranaga, A review of feature selection techniques in bioin-formatics. Bioinformatics 23(19), 2507–2517 (2007) 19. C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995) 20. V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, Feature selection for highdimensional data. Prog. Artif. Intell. 5(2), 65–75 (2016)
Visual Place Recognition Using Region of Interest Extraction with Deep Learning Based Approach P. Sasikumar and S. Sathiamoorthy
Abstract Visual Place Recognition (VPR) is process of properly identifying a formerly visited place under varying viewpoints conditions. VPR becomes a challenging task due to the variations in lighting conditions (day time or night time), shadows, weather conditions, view points, or even different seasons. Therefore, it is needed to develop VPR algorithms which can handle the variation in visual appearances. This article designs a visual place recognition using a novel region of interest (RoI) extraction with deep learning based (VPR-ROIDL) technique on changing environments. The proposed VPR-ROIDL technique initially enables the extraction of RoIs using saliency map then features of RoIs using local diagonal extrema patterns (LDEP) respectively. Besides, the extracted features of RoIs are passed into the convolutional neural network based residual network (ResNet) model and the computed deep features are stored in the database. Upon providing a query image (QI), the RoIs and features of RoIs are extracted as same as reference images. Moreover, the deep features from the ResNet model are generated and Euclidean distance based similarity measurement is used to finally recognize the places. By the use of RoI and descriptors matching process, the VPR-ROIDL model has the ability to recognize places irrespective of changing illuminations, seasons, and viewpoints. The simulation results pointed out the supremacy of the ROIDL-VPR technique over the recent state of art approaches.
1 Introduction A visual place recognition (VPR) system describes the capability of a computer system to define whether it has formerly visited place utilizing visual data [1]. The usage of visual semantics aids in resolving complicated problem that requires humanlike interpretation of the environment and allows considerable interaction with real time application. This kind of issue pertains to VPR under viewpoint variation as P. Sasikumar (B) · S. Sathiamoorthy Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Chidambaram, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_28
359
360
P. Sasikumar and S. Sathiamoorthy
extreme as front- and rear-view image matching [2]. It is regularly encountered by humans while driving, is especially difficult since subset of the scene is frequently observed from both directions, and become increasingly difficult when the presence of the environment differs because of seasonal or weather condition, day-night cycles. Even though VPR has attracted growing interest and has been widely investigated in robotics and computer vision research community, but still there are many open challenges to handle [3]. Though the VPR problem is well-determined, it remains a challenging task to implement consistently since there is number of problems that should be addressed. At first, a revisited place looks absolutely distinct from when it was first recorded and seen because of the different viewpoints, seasonal changes, dynamic elements, illumination levels, etc. [4, 5]. VPR is taken into account as an image retrieval process that comprises of a previously visited location and determining a match between the present scenes [6]. Advanced VPR algorithm includes FAB-MAP match the appearance of the present scene to a formerly visited place by transforming the image into bag-of-words (BoW) representation [7] based on local features namely SURF or SIFT. But current study proposes that feature extracted from Convolution Neural Network (CNN) trained on larger data sets considerably outperforms SIFT feature on different vision tasks [8], like fine-grained recognition, object recognition, object detection, and scene recognition. Inspired by this result, current study has shown that advanced performance from the place detection could be accomplished by applying intermediate representation from CNN that is previously trained on object detection datasets [9, 10]. This article introduces an intelligent region of interest extraction with deep learning based visual place recognition (VPR-ROIDL) technique on changing environments. The VPR-ROIDL technique performs extraction of LDEP features from RoIs identified using saliency map approach. In addition, the extracted features of RoIs are fed into the convolutional neural network based residual network (ResNet) model and the deep features are stored in the database. Upon providing a query image (QI), the features of RoIs are extracted as same as reference images. At last, the deep features from the ResNet model are generated and Euclidean distance based similarity measurement is used to finally recognize the places. For assessing the significant outcomes of the VPR-ROIDL technique, a comparative study is made using a set of benchmark datasets.
2 Related Works Hui et al. [11] developed an effective point cloud learning network (EPC-Net) for generating global descriptors of point cloud for place detection. When attaining better efficiency, it considerably reduces inference time and computational memory. Firstly, presented a light weight but efficient neural network system named ProxyConv, for aggregating the local geometric feature of point cloud. It is leveraged the proxy points and adjacency matrix for simplifying the new edge convolution for low memory utilization. Next, designed a light weight grouped VLAD network for creating global
Visual Place Recognition Using Region of Interest Extraction …
361
descriptor to retrieval. To integrate the benefit of appearance and geometry, Guo et al. [12] proposed coupling of the traditional geometric data in LiDAR using calibrated intensity return. This approach removes beneficial data through a novel descriptor model, coined ISHOT outperforming current advanced geometric-only descriptor by important margin from local descriptor assessment. Uy et al. [13] presented the PointNetVLAD whereby it is leverage on the current accomplishment of deep network for resolving point cloud based retrieval for place detection. Moreover, the authors presented the “lazy triplet and quadruplet” loss function that could accomplish generalizable and discriminative global descriptors for tackling the retrieval process. Hausler et al. [14] presented a multiscale fusion of patch features and shows that the fused feature is invariant to viewpoint (rotation and translation) and condition (illumination, season, and structure) changes. The authors in [15] presented a hybrid model which creates a higher performance primary match hypotheses generator with short learned sequential descriptor. A sequential descriptor is created by a temporal convolution network dubbed SeqNet, encoded short image sequence with 1D convolution that is matching over the respective temporal descriptor from the reference dataset for providing an arranged list of places match hypotheses. Chen et al. [16] proposed a multi constraint loss function for optimizing the distance constraint relations in the Euclidean space. The novel architecture could assist another type of CNN like VGGNet, AlexNet, and another user-determined network for extracting distinguished features. They have compared the result with the conventional deep distance learning model, also the result shows that the presented approach could enhance the accuracy by 19–28%. Thus, there is a lack in the accuracy of present VPR systems.
3 The Proposed Model In this study, a new VPR-ROIDL technique has been developed for the identification of visual places irrespective of changing to illuminations, seasons, and viewpoints. Primarily, the proposed VPR-ROIDL technique involves the extraction of features using LDEP model from ROIs identified using saliency map from the reference images. Then, they are fed into the ResNet50 model to derive deep features and are saved in the database. During the testing process, the QI is provided as input and the VPR-ROIDL technique enables to derive RoIs from salency map then LDEP features from RoIs. Followed by deep features are extracted using the ResNet50 model. Finally, the similarity between the QI and reference images is measured using Euclidean distance and the images with maximum similarity are recognized. Figure 1 illustrates the overall process of VPR-ROIDL technique.
362
P. Sasikumar and S. Sathiamoorthy
Fig. 1 Overall process of VPR-ROIDL technique
3.1 LDEP Model At the primary step, the features involved in the reference image and QI are extracted using the LDEP model. It is an image feature descriptor presented by Dubey et al. [17]. It encodes the relationship of centralized pixels using their local diagonal neighbor. Assume a centralized pixel, LDEP defines their local diagonal extremas (minima and maxima) with initial-order local diagonal derivative. Next, this attained local diagonal extremas are encoder for creating the LDEP patterns. The LDEP pattern to pixel (k, l) is shown as follows: k,l L D E P k,l = L D E P1k,l , L D E P2k,l , . . . , L D E Pdim
(1)
whereas dim denotes the length of LDEP pattern and LDEP k, l indicates the jth component of LDEP pattern of pixel (k, l). The arithmetical depiction of jth component for L D E P k,l is given below: L D E Pjk,l =
1, i f j = (ϕmax + 8λ) or j = (ϕmin + 4 + 8λ) 0, other wise
(2)
Here j = 1, 2, 3, . . . , dim, ϕmin and ϕmax indicates the indexes of local diagonal minimal and local diagonal maximal of the pixels (k, l), correspondingly and λ denotes the diagonal extrema-central pixel relations as:
Visual Place Recognition Using Region of Interest Extraction …
⎧ k,l
k,l ⎪ 0, if sign = 0 = 0 and sign ⎪ max min ⎨
λ = 1, if sign k,l = 1 and sign k,l = 1 max min ⎪ ⎪ ⎩ 2, or else
363
(3)
whereas sign(x) =
1, x ≥ 0 0, x < 0
(4)
k,l and k,l max min represents the intensity variance among the centralized pixel (k, l) and local diagonal extremas is given in the following. k,l k,l k,l max = Pϕmax − P
(5)
k,l k,l k,l min = Pϕmin − P
(6)
and Pϕk,l shows the intensity value at local diagonal maximal In the equation Pϕk,l max min and local diagonal minimal correspondingly. The LDEP feature vector to whole image of sizes M × N is calculated by: L D E P = (L D E P1 , L D E P2 , . . . , L D E Pdim )
(7)
N −1 M−1 1 L D E P jk,l (M − 2)(N − 2) k=2 l=2
(8)
Now L D E Pj =
3.2 ROI Extraction For extracting the ROIs, saliency region approach is employed. Generally, the salient regions of an image are the regions which are attractable to the user’s attention and signify the image content. The steps involved in the saliency based RoI detection model are given below. Saliency detection: It is utilized the spectral residual approach (SR) approach for getting the saliency maps of an image. Threshold segmentation: In this study, the threshold segmentation approach is used. Assume S(x) as input image, the object map O(x) is attained:
364
P. Sasikumar and S. Sathiamoorthy
O(x) =
1 i f S(x) > thr eshold 0 i f other wise,
(9)
Set threshold = E(S(x)) × 3, whereas the E(S(x)) denotes the average intensity of saliency map. Indeed, the selective of threshold is a tradeoff problem among false alarm and neglect of objects. While rising the threshold the noise could be decreased, but it increases the neglect of objects. The tradeoff problems amongst false alarm and neglecting of objects are the problems among recall ratio and precision ratio. It generates the entire ROI extracted exactly but it could not accomplish high accuracy. Generally. It could not eliminate the noise. The result of the ROI extraction result is offered.
3.3 CNN Based ResNet Model In this work, the handcrafted features and RoI are fed into the ResNet model instead of passing the whole image to reduce computational burden and improve network efficiency. Related to classical NNs, CNNs contains 2 features namely weight sharing and local connection that significantly enhance its capability of extracting features and leads to enhanced efficacy with minimally trained parameters. An important structure of typical CNN comprises input, convolution, fully connected (FC), pooling, and output layers. The outcome of one layer serves as input to the subsequent layer from the framework. The convolution layer comprises several feature maps with several neurons. The function of pooling layer is imitating the human visual system for reducing the dimensions of data, and for representing the image with superior level features as: = X lj ⊗ k l+1 + bl+1 X l+1 j j j ,
(10)
where ⊗ implies the pooling function. An essential pooling technique contains median pooling, maximal pooling, and average pooling. During the FC layer, the maximal likelihood operation was utilized for calculating the probability of all the samples, and the learning features were mapped to target label [18]. For solving the issues of gradient vanishing explosion and efficiency degradation as a result the depth improves, ResNets are simpler for optimizing and is reach accuracy in significantly improved depth. The residual block utilizes shortcut connections permitting it to directly learn the residual F(x) = H (x) − x as for making the objective resultant [F(x) + x], so avoid the issue of performance degradation and accuracy reduction because of several convolution layers. Like shortcut connection is skipped more than 2 layers and directly execute identity mapping. It generates reference (X ) to input all the layers, learned for procedure a remaining function, rather than learning many functions with no reference. This remaining function was simpler for
Visual Place Recognition Using Region of Interest Extraction …
365
optimizing and is significantly deepen the count of network layers. The ResNet structure block contains 2 layers and utilizes the subsequent remaining mapping function [18]: F = W2 σ (W1 x),
(11)
where σ implies the activation function ReLU. Next, with shortcut connections and a second ReLU, one is developing the resultant y:
y = F x, W j + x.
(12)
If there is a change in input as well as output dimensions, linear transformation Ws to x can take place in the shortcut, as given below.
y = F x, W j + Ws x.
(13)
3.4 Similarity Measurement At the final stage, the Euclidean distance is applied to determine the similarity between the reference (FV) of QI can be repre image and QI. The feature vector sented as F Vqi = F Vqi (1), F Vqi (2), . . . , F Vqi (n) and the FV of the reference image can be indicated as F Vdbi = [F Vdbi (1), F Vdbi (2), . . . , F Vdbi (n)]. The aim of the similarity metric is to determine the optimum ‘n’ as same as QI. The Euclidean distance measure (Ed ) is mathematically formulated using Eq. (14). n
2 Ed (db, q) = FVdb (f) − Fq (f)
(14)
f =1
where, n indicates FV length, FVqi and FVdbi denotes FVs of QI and reference images.
4 Experimental Validation The performance validation of the VPR-ROIDL model is tested using four benchmark datasets namely Garden Point [19–22], ESSEX3IN1 [23], Synthia [24], and Crossseason [25] dataset. A few sample images are demonstrated in Fig. 2. Figure 3 depicts the query and matched reference images.
366
P. Sasikumar and S. Sathiamoorthy
Fig. 2 Sample images
Fig. 3 a Query images, b matched reference images
Figure 4 demonstrates the precision recall curve of the VPR-ROIDL model with recent methods on Gardens Point dataset. For better recognition outcomes, the value of precision-recall should be as high as possible. From the figure, it is noticed that the HOG model has failed to show effective outcomes with the minimal values of precision-recall. At the same time, AlexNet, RMAC, and NetVLAD techniques have tried to show moderately closer values of precision-recall. Along with that, the Region-VLAD and CoHOG models have resulted in reasonable values of precisionrecall. However, the VPR-ROIDL model has outperformed the other models with the maximum values of precision-recall.
Visual Place Recognition Using Region of Interest Extraction …
367
Fig. 4 Precision recall curve analysis of VPR-ROIDL technique under Gardens Point dataset
Figure 5 displays the precision recall curve of the VPR-ROIDL approach with recent techniques on ESSEX3In1 dataset. For optimum recognition outcomes, the value of precision-recall should be as high as possible. From the figure, it can be obvious that the HOG algorithm has failed to depicted effective outcomes with the lesser values of precision-recall. Similarly, AlexNet, RMAC, and NetVLAD approaches have tried to exhibit moderately closer values of precision-recall. Besides,
Fig. 5 Precision recall curve analysis of VPR-ROIDL technique under ESSEX3IN1 dataset
368
P. Sasikumar and S. Sathiamoorthy
Fig. 6 Precision recall curve analysis of VPR-ROIDL technique under Synthia dataset
the Region-VLAD and CoHOG techniques have resulted in reasonable values of precision-recall. Eventually, the VPR-ROIDL model has outperformed the other models with the higher values of precision-recall. Figure 6 portrays the precision recall curve of the VPR-ROIDL technique with recent approaches on Synthia dataset. For better recognition outcomes, the value of precision-recall should be as high as possible. From the figure, it is observed that the HOG model has failed to showcase effective outcomes with the minimal values of precision-recall. In addition, AlexNet, RMAC, and NetVLAD techniques have tried to depict moderately closer values of precision-recall. Likewise, the Region-VLAD and CoHOG models have resulted in reasonable values of precision-recall. Lastly, the VPR-ROIDL algorithm has demonstrated the other models with the increased values of precision-recall. Figure 7 depicts the precision recall curve of the VPR-ROIDL technique with recent methods on Cross Season dataset. For improved recognition outcomes, the value of precision-recall can be as high as possible. From the figure, it can be stated that the HOG method has failed to showcase effective outcomes with the minimal values of precision-recall. Simultaneously, AlexNet, RMAC, and NetVLAD techniques have tried to depict moderately closer values of precision-recall. Also, the Region-VLAD and CoHOG techniques have resulted in reasonable values of precision-recall. At last, the VPR-ROIDL algorithm has outperformed the other models with the higher values of precision-recall. Finally, an average precision (APE) examination of the VPR-ROIDL model with recent methods on distinct datasets is offered in Table 1 and Fig. 8 [23]. The experimental values indicated that the VPR-ROIDL model has gained improved
Visual Place Recognition Using Region of Interest Extraction …
369
Fig. 7 Precision recall curve analysis of VPR-ROIDL technique under CrossSeason dataset Table 1 Average precision analysis of VPR-ROIDL technique with recent methods Dataset
HOG AlexNet NetVLAD Region-VLAD RMAC CoHOG VPR-ROIDL
GardensPoint
4.62 75.49
82.29
92.82
78.69
92.59
97.93
ESSEX3IN1
10.37 23.85
88.42
59.53
22.17
88.75
95.87
Synthia
53.65 94.50
97.84
93.63
95.09
95.38
98.92
CrossSeason
73.03 93.67
96.02
93.18
89.45
70.25
98.61
Fig. 8 Average precision analysis of VPR-ROIDL technique with recent approaches
370
P. Sasikumar and S. Sathiamoorthy
values of APE under all datasets. For instance, on GardensPoint dataset, the VPRROIDL model has accomplished higher APE of 97.93% whereas the HOG, AlexNet, NetVLAD, Region-VLAD, RMAC, and CoHOG models have obtained lower APE of 4.62%, 75.49%, 82.29%, 92.82%, 78.69%, and 92.59% respectively. Meanwhile, on ESSEX3IN1 dataset, the VPR-ROIDL method has accomplished superior APE of 95.87% whereas the HOG, AlexNet, NetVLAD, Region-VLAD, RMAC, and CoHOG techniques have obtained lower APE of 10.37%, 23.85%, 88.42%, 59.53%, 22.17%, and 88.75% respectively. Followed by, on Synthia dataset, the VPR-ROIDL methodology has accomplished maximum APE of 98.92% whereas the HOG, AlexNet, NetVLAD, Region-VLAD, RMAC, and CoHOG models have obtained lower APE of 53.65%, 94.50%, 97.84%, 93.63%, 95.09%, and 95.38% correspondingly. Eventually, on CrossSeason dataset, the VPR-ROIDL algorithm has accomplished higher APE of 98.61% whereas the HOG, AlexNet, NetVLAD, Region-VLAD, RMAC, and CoHOG models have obtained lower APE of 73.03%, 93.67%, 96.02%, 93.18%, 89.45%, and 70.25% correspondingly. From the above mentioned results and discussion, it is apparent that the VPRROIDL model has better performance than the other methods. Therefore, it can be considered as the VPR-ROIDL model can be utilized as an effectual tool for VPR under varying conditions.
5 Conclusion In this study, a new VPR-ROIDL technique has been developed for the identification of visual places irrespective of changing to illuminations, seasons, and viewpoints. The proposed model involves four major components namely ROI extractor, LDEP feature extractor, ResNet50 model, and Euclidean distance based similarity measurement. With the utilization of RoI and descriptors matching process, the VPR-ROIDL model has the ability to recognize places irrespective of changing illuminations, seasons, and viewpoints. For assessing the significant outcomes of the VPR-ROIDL technique, a comparative study is made using benchmark dataset. The experimental results reported the better outcomes of the VPR-ROIDL technique over the recent state of art approaches. In future, the presented VPR-ROIDL technique can be extended to the incorporation of metaheuristics based hyperparameter optimizers.
References 1. B. Arcanjo, B. Ferrarini, M.J. Milford, K. Mcdonald-Maier, S. Ehsan, An efficient and scalable collection of fly-inspired voting units for visual place recognition in changing environments. IEEE Robot. Autom. Lett. (2022) 2. S. Lowry et al., Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2016)
Visual Place Recognition Using Region of Interest Extraction …
371
3. A.K. Gogineni, R. Kishore, P. Raj, S. Naik, K.K. Sahu, Unsupervised clustering algorithm as region of ınterest proposals for cancer detection using CNN, in International Conference On Computational Vision and BioInspired Computing (Springer, Cham, 2019), pp. 1386–1396 4. D. Bai, C. Wang, B. Zhang, X. Yi, X. Yang, Sequence searching with CNN features for robust and fast visual place recognition. Comput. Graph. 70, 270–280 (2018) 5. M.J. Milford, G.F. Wyeth, SeqSLAM: visual route-based navigation for sunny summer days and stormy winter nights, in Proceedings of IEEE International Conference on Robotics and Automation (2012), pp. 1643–1649 6. M. Zaffar, A. Khaliq, S. Ehsan, M. Milford, K. McDonaldMaier, Levelling the playing field: a comprehensive comparison of visual place recognition approaches under changing conditions (2019). arXiv:1903.09107 7. B. Ferrarini, M. Waheed, S. Waheed, S. Ehsan, M.J. Milford, K.D. McDonald-Maier, Exploring performance bounds of visual place recognition using extended precision. IEEE Robot. Automat. Lett. 5(2), 1688–1695 (2020) 8. S. Garg, N. Suenderhauf, M. Milford, Lost? Appearance-invariant place recognition for opposite viewpoints using visual semantics (2018). arXiv:1804.05526 9. S. Hausler, A. Jacobson, M. Milford, Multi-process fusion: visual place recognition using multiple image processing methods. IEEE Robot. Automat. Lett. 4(2), 1924–1931 (2019) 10. A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012) 11. L. Hui, M. Cheng, J. Xie, J. Yang, M.M. Cheng, Efficient 3D point cloud feature learning for large-scale place recognition. IEEE Trans. Image Process. (2022) 12. J. Guo, P.V. Borges, C. Park, A. Gawel, Local descriptor for robust place recognition using lidar intensity. IEEE Robot. Autom. Lett. 4(2), 1470–1477 (2019) 13. M.A. Uy, G.H. Lee, PointNetVLAD: deep point cloud based retrieval for large-scale place recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 4470–4479 14. S. Hausler, S. Garg, M. Xu, M. Milford, T. Fischer, Patch-NetVLAD: multi-scale fusion of locally-global descriptors for place recognition, in Procdings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 14141–14152 15. S. Garg, M. Milford, SeqNet: learning descriptors for sequence-based hierarchical place recognition. IEEE Robot. Autom. Lett. 6(3), 4305–4312 (2021) 16. L. Chen, S. Jin, Z. Xia, Towards a robust visual place recognition in large-scale vSLAM scenarios based on a deep distance learning. Sensors 21(1), 310 (2021) 17. S.R. Dubey, S.K. Singh, R.K. Singh, Local diagonal extrema pattern: a new and efficient feature descriptor for CT image retrieval. IEEE Sig. Process. Lett. 22(9), 1215–1219 (2015) 18. E. Jing, H. Zhang, Z. Li, Y. Liu, Z. Ji, I. Ganchev, ECG heartbeat classification based on an improved ResNet-18 model. Comput. Math. Methods Med. (2021) 19. N. Sünderhauf, S. Shirazi, F. Dayoub, B. Upcroft, M. Milford, On the performance of convnet features for place recognition, in Proceedings of IEEE International Conference on Intelligent Robots and Systems (2021) 20. A. Saravanan, S. Sathiamoorthy, Autocorrelation based chordiogram ımage descriptor for ımage retrieval. in International Conference on Communication and Electronics Systems (ICCES) (2019), pp. 1990–1996. https://doi.org/10.1109/ICCES45898.2019.9002528 21. S. Sathiamoorthy, S. Arunachalam, R. Ponnusamy, Chordiogram image descriptor based on visual attention model for image retrieval. Array 7 (2020).https://doi.org/10.1016/j.array.2020. 100027 22. A. Saravanan, S. Sathiamoorthy, Image retrieval using autocorrelation based chordiogram ımage descriptor and support vector machine. Int. J. Rec. Technol. Eng. 8(3) (2019) 23. M. Zaffar, S. Ehsan, M. Milford, K.M. Maier, Memorable maps: a framework for re-defining places in visual place recognition (2018). arXiv:1811.03529 24. G. Ros, L. Sellart, J. Materzynska, D. Vazquez, A.M. Lopez, The SYNTH˙IA dataset: a large collection of synthetic images for semantic segmentation of urban scenes, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 3234–3243
372
P. Sasikumar and S. Sathiamoorthy
25. M. Larsson, E. Stenborg, L. Hammarstrand, M. Pollefeys, T. Sattler, F. Kahl, A cross-season correspondence dataset for robust semantic segmentation, in Proceeings of Conference on Computer Vision and Pattern Recognition (2019), pp. 9532–9542 26. Z. Wang, L. Zhu, J. Qi, ROI extraction in dermatosis images using a method of chan-vese segmentation based on saliency detection, in Mobile, Ubiquitous, and Intelligent Computing (Springer, Berlin, Heidelberg, 2014), pp. 197–203 27. M. Zaffar, S. Ehsan, M. Milford, K. McDonald-Maier, CoHOG: a light-weight, computeefficient, and training-free visual place recognition technique for changing environments. IEEE Robot. Autom. Lett. 5(2), 1835–1842 (2020)
Electronic Mobility Aid for Detection of Roadside Tree Trunks and Street-Light Poles Shripad Bhatlawande, Aditya Joshi, Riya Joshi, Kasturi Joshi, Swati Shilaskar, and Jyoti Madake
Abstract This paper presents an electronic system for proactive detection of roadside tree trunks and streetlight poles. The development of the system is based on need assessment and requirement analysis study with visually impaired people. This system is implemented in the form of a smart clothing system. It uses a chest mounted camera and portable computing system. It interprets the surrounding environment and detects the presence of said objects with an accuracy of 82.24% in 570 ms. It uses decision tree classifier for recognition of said objects. It was evaluated with 20 experiments. Four blindfolded users walked an environment wearing this system along with a white cane. There was a tree trunk and a streetlight pole in the path. Users could correctly detect tree trunks 17 times and streetlight poles 14 times of 20. These results indicate the relevance of this system as an electronic travel aid for visually impaired people.
S. Bhatlawande (B) · A. Joshi · R. Joshi · K. Joshi · S. Shilaskar · J. Madake Department of Electronics and Telecommunication, Vishwakarma Institute of Technology, Pune 411037, India e-mail: [email protected] A. Joshi e-mail: [email protected] R. Joshi e-mail: [email protected] K. Joshi e-mail: [email protected] S. Shilaskar e-mail: [email protected] J. Madake e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_29
373
374
S. Bhatlawande et al.
1 Introduction Visual Impairment refers to varying degrees of impairment to the ability of a person to see. According to WHO (World Health Organization), 2.2 billion people have distance or near visual impairment [1]. Though loss of vision can affect people of any age, however, the majority of people facing visual challenges are above the age group of 50 years. As per a survey named National blindness and visually impaired conducted in 2019 by Union health minister of India, the most visually disabled people are above the age of 80 years (11.6% of the total population of visually challenged people), 60–69 years of age group (1.6%), and people lying in the age group of 50–59 years (0.5%). One of the most strenuous challenges faced by these people is the limitations in free and safe mobility. Arora [2] talks about problems faced by blind while walking on the road like fear of proximity to open manholes, uneven surfaces and bumping on fast moving vehicles. It is observed that the most number of accidents happen outdoors with visually impaired people as it is seen in paper [3] which states that for a visually impaired person footpaths can be one of the highly hazardous places because of existence of obstacles like stray dogs, open manholes, garbage tanks, or even pedestrians [4]. Aged people are the most affected when it comes to mobility on roads because even if they are able to detect an obstacle, their reflex movement can’t stop the accident from occurring. And as mentioned earlier, the maximum number of visually challenged people are aged ones above the age group of 50 years [5]. There are mobility problems faced by the visually challenged even while crossing roads like fear of getting hit by a vehicle, unwanted potholes or not having knowledge of changing traffic signs [6]. Having no substantial signal to notify the visually impaired pedestrians lead to many mobility accidents [7]. The paper talks about how blind face obstacles not only outdoors but also indoors in schools along with facing the different treatment that they get [8]. Visually impaired people thus feel left out of many social activities because of the limitations that they face during navigation. Support dogs are also useful for such navigational purposes. But finding a properly trained dog can be hard to find. Also taking care of the dog is a fair amount of work. Today, there are many developments in the technologies used for assistance to blind. These assistive navigation systems can be divided into 3 categories namely—Electronic Orientation Aids (EOAs) which helps blind to find a navigational path, Position Locator Devices (PLDs) help in accurately position GPS and GIS systems, and Electronic Travel Aids (ETAs) which help in avoiding obstacles. Multiple such ETAs based on sensors and neural networks were reviewed. They are very useful in their own ways but fail in quick response times, are not particularly comfortable or affordable. The acceptance of the existing systems among the visually impaired is quite low due to factors such as lack of useful functions, inconsistent aesthetics, size and shape. Addressing the requirements of visually impaired people and limitations of existing aids, this paper presents a novel aid for proactive detection of streetlight poles and trees.
Electronic Mobility Aid for Detection of Roadside Tree Trunks …
375
The Literature review presented for this paper and methodology of the aid are presented in Sects. 2 and 3 of the paper respectively. The Results are presented in Sect. 4 and the conclusions are presented in Sect. 5.
2 Literature Survey Gadaet al. [9] proposes computer vision techniques for recognizing objects. Haar cascade is used on an image in parts by dividing the image into multiple sub-parts of a pre-trained dataset. Raspberry Pi is used to take the image as an input and the output class of this object is then compared with the dataset. This part specifically aims for an end-to-end solution of not just recognizing the object, but also to avoid any accidents for a hazard free aid for the visually impaired. They further talk about improving the scope of this project by adding more objects and not just limiting this project for the visually impaired, but also using it for security purposes. Another use of Raspberry pi and Haar cascade is for Object detection and Human Identification [10] and Human Detection [11] which is useful for warning the visually impaired of human presence and recognizing faces when a person comes in contact with the ultrasonic waves of the sensor. This is a Raspberry Pi based solution which uses Haar Cascade for face recognition. The proposed system is unable to detect any hard corners or blind spots and the object detection is limited to the distance up to which the Ultrasonic sensor can detect. MAVI—Mobility assistance for visually impaired helps for ease in mobility for visually challenged people [12, 13]. RGB camera, GPS and IMU sensors are used as input sensors. The modules used for image processing include- Texture Detection (TD), Signboard Detection (SBD), Face Detection and animal detection. The algorithm used for object detection is Convolutional Neural Network (CNN). It was the only system that had 63.4% mAP value and 45 FPS on real time data. Hardware components consist of Jetson Nano and Camera module with Raspberry Pi. The results showed accurate detection of objects in frame with 94.87% accuracy. CNN is also used in [14] to develop a system which assists the visually impaired by implementing two modules of object recognition as well as colour detection. CNN and RNN are used for object recognition along with SoftMax function at the end for classification. The input images are convolved with filters and then passed to the rectified linear activation function (ReLU). The system is trained to classify up to 11 colours. The colours are converted from RGB to HSI. The processing is done at pixel level by selecting a window around the central pixel. The scope of this project can be broadened to include large scale datasets since this system is limited to specific objects at this stage. The system developed in [15, 16] recognizes up to 1000 classes of objects and gives a voice signal to the user. The user can capture an image using a smartphone, which when uploaded on the proposed website classifies the input image. YOLO is used for implementing object recognition. The output is converted into a voice signal using the google voice library. The system was tested on various smartphones and the highest
376
S. Bhatlawande et al.
accuracy of 75% was recorded. The accuracy of the system changes with respect to the camera and there is a possibility of misclassification, thus cannot be completely reliable. The same method of YOLO is used by [17] for identifying crosswalks for detection and recognition. As an added feature, the model also displays the distance between the objects in front of the user. The objects include streetlights, cars, people etc. The measured distance is susceptible to errors as the distance between object and the user increases. YOLO and YOLO_v3 are also used for object detection like in [18] Indoor Navigation System and [19] Object detection and Audio Feedback. The proposed methodology includes a depth map which is generated using monocular depth estimation techniques. Webcam is used to detect the object and classify it using the algorithms. The classification can be made more accurate by deploying deep learning techniques and can be made more user friendly by using some IOT Tools. The proposed system in [20, 21] is a smartphone application for object detection in which the camera can capture real time images. Rashid Fahim et al. [22] is a similar android application that captures images in real time to further perform object detection. YOLO, R-CNN, and Fast R-CNN are implemented for object detection. YOLO takes an entire image for processing whereas R-CNN takes part of the image with the highest probability of having the object. YOLO uses DARKNET-19 which uses 19 convolution layers for feature extraction from the image. The algorithm cannot detect an object if it is too close to the camera or too far away from it. Another limitation of this system is that if the FPS (frames per second) increases, the object cannot be detected. Authors [23, 24] is depth-based technique for obstacle detection. It uses a 3-D sensor to record the colour and depth of the object. The image captured then goes through two phases of image segmentation and obstacle extraction. After calculating the depth map, the edges are eliminated to get a more accurate image depth. Noise reduction is also performed on the images. The limitations of this system are that it is not efficient in the dark. An Obstacle detection model [25], which uses specialized sounds in 3-D works by taking readings from a multidirectional sonar system. The model can be divided into two parts. One is a compass control unit and the second is a sonar unit. This further incorporates a microcontroller and six ultrasonic sensors, each one pointing in a radial direction surrounding the person using it. It also has a 3-D sound translation unit. However, the system lacks optimum navigational speed and the design is not considered ergonomically feasible. Wearable Mobility Aid [26] and Smart Glasses [27] are handheld devices that operate in real time for assisting the visually impaired by guiding them to avoid obstacles. The systems use audio feedback to notify the user in case of an obstacle by using the speech to text module. Although the system is extremely lightweight and easy to use because of the compact size, the accuracy of the wearable mobility aid is 72% which makes it not completely reliable. Another handheld visual aid device is a cane which is modified in [28] as a technique to incorporate a microwave radar to the traditional assisting cane used by the unsighted. The entire system is designed around the XMC4500 microcontroller. The antennae transmit a signal which then echoes off the object and is detected by the homodyne receiver. Fast Fourier Transform is
Electronic Mobility Aid for Detection of Roadside Tree Trunks …
377
performed on the signal through which the distance can be measured. The technique however cannot detect an object completely till beyond 4 m due to losses. An assistant for the visually impaired is developed by [29] by capturing the image using a Pi Camera along with an Ultrasonic sensor and ROS LiDAR. This system helps in detecting obstacles with the help of the beams emitted by the ROS LiDAR, covering 180° of area around the target object. A comparative study of three different cameras, namely PS Camera, Pi Camera and Logitech C170 is carried out to ensure the best performing device for the system. The system can be further improved by adding functionalities such as gender detection. SLAM can be used to program an indoor/outdoor navigation system. Object recognition using binocular vision sensors is a method put forward in [30, 31]. Binocular vision offers better understanding of the environment by offering 3-D composition of the scene. Additionally, it mimics a person’s sight. The sensor array passes the captured images to a Convolutional Neural Network present in a cloud. This drastically reduces the computational requirements and reduces the size restraints on the actual wearable device. Smartphone based method for obstacle avoidance is proposed in paper [32, 33] to guide the user independently in indoor and outdoor environments. In this case, the algorithm will measure gyroscope readings of inertial movements and also the visual information from the camera. A sparse depth map is created and obstacles are detected. While the ultrasonic sensor helps in detecting and measuring the obstacle distance, the image captured from the camera is used for obstacle recognition. The output is in the form of audio or vibration. The drawback is the inefficient distinction between ground and hanging obstacles. A wearable device which helps in converting visual information into vibrational signals so that visually impaired people can navigate freely is proposed in [34, 35]. It consists of few distinct touch points which help in obstacle free movement. A tractor belt which has 14 vibrator motors which are placed laterally, a portable camera which is carried in the backpack and also two web cameras attached onto a camera belt. In the other paper, the device is attached to the top wear of the belt and ultrasonic mapping is done for ease in commutation. But there is no fair differentiation between objects on different eyesight levels. A similar approach is followed by [36] where the proposed system is a tactile waist with complex signals displayed on it. To warn the visually impaired user about approaching obstacles, it has a vibrational output. Multiple levels of actuators are positioned on the waist with varying intensity values. Horizontal and vertical levels are allowed by using a 3 × 3 display on the torso. Distance from the object in this case is coded in rate of repetition. The limitations of this project are inability to detect more than four obstacles at a time, to detect ground based as well as hanging objects and provide a wide field of regard. Several ETAs were reviewed which are advantageous in various aspects. However, there is a need to improvise these devices as very few are actually able to attract potential users. Some of the reasons being the complexity of the system, its limitation to only android devices. In some cases, the navigation system’s performance is poor, whereas some systems are too bulky. Miniaturization of the system can be carried out by using nano electronics [37] and relevant integrations. Systems that provide
378
S. Bhatlawande et al.
vibratory response often fail to generate different patterns for different obstacles. Furthermore, they provide less reaction time. These shortcomings could be addressed by mutual discussion about the need and solution between the developer and the unsighted. The authors conducted a detailed survey with visually impaired people. Structured sets of questionnaires were used to understand the requirements of the visually impaired. There were many incidents reported by the visually impaired regarding collision with roadside trees and streetlight poles. The visually impaired people expressed the need for technological aid for the proactive detection of these objects. In the literature studied, multiple instances can be found where trees and streetlights are one of the obstacles to be identified. However, for the system proposed in this paper, trees and streetlights are the primary obstacle to be detected. No other literature was obtained wherein the obstacle detection was solely focused on the trees and streetlights by the roadside. After the exploration of all the before mentioned aids and various methodologies, a different approach is proposed in this paper. It will most definitely help in detecting and classifying trees and streetlights on the roads and footpaths. The proposed aid will be made as a wearable device so it will be easy to use and portable.
3 Methodology This paper presents an electronic travel aid for detection of roadside trees and street lights. The block diagram of the system is shown in Fig. 1. It consists of a RaspberryPi camera, RaspberryPi (8 GB target board) and an earphone. The camera acquires roadside information. An algorithm has been implemented for classification and detection of roadside tree trunks and streetlight poles. The system converts the detected objects into audio feedback and conveys the information to the visually impaired through an earphone. This system has been implemented in the form of smart clothing. The camera is mounted at the chest level height, just below the sternum. Fig. 1 System level block diagram
Camera
Raspberry Pi 8GB Target Board
Earphone
Convert name of classified object to audio
Electronic Mobility Aid for Detection of Roadside Tree Trunks …
379
3.1 Dataset Generation Visually impaired people reported many incidents regarding collision with roadside trees and streetlight poles. The authors explored existing literature and datasets for desired images and details. However, there was no suitable dataset found for streetlight poles and tree trunks. Total 9232 images were used to implement the proposed solution. Of these images, 60% images were compiled by the authors by using smartphone camera (12 megapixel). The remaining 40% of the dataset was obtained from the internet (Google). The sample images in the dataset are shown in Fig. 2. The distribution of images in the dataset is shown in Table 1. All ‘.jpg’ images in the dataset were resized to 100 × 133 and were also converted into greyscale.
3.2 Compilation of Optimized Feature Vector Scale Invariant Feature Transform (SIFT) was used to extract the features from all images in the dataset. SIFT is scale invariant and rotational invariant. It is also less vulnerable to occlusions and clutters. SIFT performs the detection independent of image properties such as depth, scale etc. SIFT extracts N × 128 feature vector array for each image. Here N stands for the number of key points in the image. SIFT provided a large feature vector of size 913,602 × 128 as shown in Fig. 3. This feature vector was optimized in the interest of fast convergence of the algorithm. This optimization (dimensionality reduction) was anticipated for systematic use of limited resources on RaspberryPi target board. Dimension optimization approach uses KMeans Clustering and Principal Component Analysis (PCA). The large size feature vector was divided into 5 clusters. The number of clusters (K = 5) was decided based on the elbow-point method. The K-means clustering algorithm provided a feature vector of size 9232 × 5. This feature vector was further optimized to 9232 × 4 by using PCA. The 4 principal components were selected based on maximum information content. The overall process of dimensionality reduction is presented in Algorithm 1.
380
S. Bhatlawande et al.
Fig. 2 Sample images from dataset Table 1 Distribution of images in the dataset
S. No
Name of class
No. of images
Class label
1
Tree
3053
0
2
Streetlights
3179
1
3
Negatives
3000
2
Electronic Mobility Aid for Detection of Roadside Tree Trunks … Fig. 3 Final optimized feature vector
SIFT Feature Vector (913602, 128)
Input to classifier
381 K-Means
Feature Vector (9232, 5) Feature Vector after PCA (9232, 4)
Algorithm 1. Dimensionality Reduction
Input: Large feature vector (913602 x 128) Output: Final feature vector (9232 x 4) 1. Dataframe (DF) = [] 2. Train K-Means for K = 5 3. for image in image dataset do 4. Extract features of image - SIFT 5. DF = Append features //final DF (913602 x 128) 6. Predict clusters - pretrained K-Means [K=5] 7. histogram = predicted clusters (1 x 5) 8. N = normalize (histogram) 9. FinalData.append(N) //FinalData dimension (9232 x 5) 10. end for 11. S = Standardization (FinalData) 12. Fx = PCA(S) with n = 4 (9232 x 4) 13. return Fx
3.3 Classification and Recognition of Tree Trunks and Streetlights The final feature vector of size 9232 × 4 was then used to train Classifiers such as K-nearest neighbours (KNN), Support vector Machine (SVM) with one versus one (OVO) and one versus rest (OVR) approaches, Decision Trees and Random Forest. The overall process for classification and recognition of desired objects is illustrated in Algorithm 2.
382
S. Bhatlawande et al.
Algorithm 2. Prediction of Trees and Streetlights
Input: Images Output: Predicted class of each image 1. for image in image dataset do 2. DF = [] 3. Access one image 4. DF = Extract features of image – SIFT (N x 128) 5. Predict cluster – K-Means 6. S = Standardize (DF) 7. xpca = PCA (S) 8. Passing xpca to model 9. Predict class of xpca 10. if xpca == 0: 11. Image predicted as tree 12. else if xpca == 1: 13. Image predicted as streetlight 14. else: 15. pass 16. end while Total 5 classifiers were used for classification and recognition of trees and streetlight poles. The first classifier used was SVM. Classification in SVM takes place by finding a hyperplane that differentiates the classes very well. For correct identification of hyperplanes, the one that segregates three classes better needs to be chosen. w.x + b = 0
(1)
In Eq. (1), the variable w is the vector normal to the said hyperplane, x is the data point and b is the bias. The KNN was the second classifier used. This algorithm works by considering K nearest data points to predict the class. Learning of this algorithm is Instance based, i.e.; weights from training data isn’t used for class prediction whereas all the instances of training are used for predicting class of new data. The equation for KNN is described in Eq. (2). x d( j, k) = ( ji − ki )2
(2)
i=1
Equation (2) is the Euclidian distance, where the distance between j and k is given by the square root of summation of distance of every neighbor from the unknown point. This distance is then used as a probability factor to assign unknown point to a particular task. Third classifier used was Decision Tree. Decision Trees are preferred for classification purposes primarily due two reasons. One of them being
Electronic Mobility Aid for Detection of Roadside Tree Trunks … Fig. 4 Classification of images
383
Start
Pre-Processed Image
Classifier
YES
Object is Tree
Audio
If classified as Tree NO
Object is Streetlight
YES Audio
If classified as Streetlight
their mimicking as a human thinking during decision making and second being the ease of understanding the logic behind it. The equation for Decision tree algorithm can be expresses as E=
x
− f × log2 ( f )
(3)
i=0
In Eq. (3), E is the entropy, x is the number of classes and f is the frequency of the classes. The Random Forest model improves accuracy by taking the average of predictive accuracy obtained from each tree and it does not rely on one decision tree. Random forest algorithm consists of such multiple decision trees, the equation for random forest algorithm can be written as RF =
T
c
(4)
In Eq. 4, c is the Entropy of all the decision trees in the random forest and T stands for the number of trees. The stepwise process of object classification is shown in Fig. 4.
4 Result The detailed performance analysis of the classifiers is represented in Table 2. Among the classifiers used, Decision Tree provided the highest accuracy (82.24%). The SVM
384 Table 2 Classifiers and their accuracies
Table 3 Precision, Recall and F1-score of the classifiers
S. Bhatlawande et al. S. No
Classifier
Accuracy (%)
1
Decision tree
82.24
2
Random forest
75.90
3
KNN
79.10
4
SVM Linear kernel (OVR)
80.50
5
SVM Polynomial kernel (OVR)
79.80
6
SVM Radial kernel (OVR)
81.19
7
SVM Linear kernel (OVO)
80.50
8
SVM Polynomial kernel (OVO)
79.80
9
SVM Radial kernel (OVO)
81.19
Classifier
Precision Recall F1-score
Random forest
0.74
0.70
0.72
Decision tree
0.52
0.55
0.53
KNN
0.70
0.69
0.69
SVM Linear kernel (OVR)
0.74
0.66
0.68
SVM Polynomial kernel (OVR) 0.78
0.65
0.68
SVM Radial kernel (OVR)
0.74
0.68
0.70
SVM Linear kernel (OVO)
0.74
0.66
0.68
SVM Polynomial kernel (OVO) 0.78
0.65
0.68
SVM Radial kernel (OVO)
0.68
0.70
0.74
kernels exhibit a pattern. The OVO and OVR decision functions provided relatively same accuracies irrespective of the kernel. Performance parameters such as recall, precision and F1-score were used to evaluate classification algorithms. These values are described in Table 3. The precision score for majority of the classifiers was 0.74, with the SVM Polynomial kernels having the highest precision of 0.78. Decision Tree shows the lowest overall Precision Recall and F1-score of 0.52, 0.55 and 0.53 respectively. The classifier was applied on 200 images which comprised of images of trees, streetlights and potholes. All classifiers provided consistent classification results. Upon examination of the precision, F1 scores and recall of all the classifiers, it can be duly noted that for practical usage, these values need to be improved further. The listed values will provide uneventful classification if the live feed is even slightly blurred. The proposed system was evaluated with 4 blindfolded users in controlled environment. The user was asked to walk on a 5 m wide and 50 m long straight road. There was a street pole (left side) and tree trunk (right side) in the walkable path. Each user walked in this environment wearing the proposed system along with a white cane. Two parameters namely, (i) tree trunk detection and (ii) street pole detection were
Electronic Mobility Aid for Detection of Roadside Tree Trunks …
385
measured during each evaluation. In these 20 experiments, the user could detect tree trunks in 17 experiments and streetlights in 14 experiments.
5 Conclusion This paper presented a novel system for proactive recognition of streetlight poles and tree trunks. The functions were implemented in the system based on requirements of visually impaired people. The system was realised in the form of smart clothing. The overall aesthetics, cosmetics, carry method, usage method and size of the system were decided in consultation with visually impaired users. The system uses decision tree algorithm for classification of detected objects. Detection accuracy of 82.24% was achieved on target board. Total 20 experiments were carried out to assess the utility of the system in real-world environment. The system proactively and correctly detected the presence of tree trunks and street poles on real time basis. It consistently provided audio feedback of detected object to the user via an earphone. The system involves minimum interference with natural sensory channels of the user. This system requires uniform illumination for its consistent performance. This illumination dependency can be overcome with the help of multisensory system based on ultrasonic range finders and LiDARs. Acknowledgements We express our sincere gratitude to the visually impaired participants in this study, orientation and mobility (O&M) experts and authorities at The Poona Blind Men’s Association, Pune. The authors thank the Rajiv Gandhi Science and Technology Commission, Government of Maharashtra, Mumbai, and Vishwakarma Institute of Technology Pune for providing financial support (RGSTC/File-2016/DPP-158/CR-19) to carry out this research work.
References 1. Blindness and vision impairment, 14 October 2021. Accessed on 15 December (2021). [Online]. Available https://www.who.int/news-room/fact-sheets/detail/blindness-and-visualimpairment 2. A. Arora, ijsr., Abstract of common problems faced by visually impaired people. Ijsr.Net (2021). https://www.ijsr.net/get_abstract.php?paper_id=OCT14621. 3. A. Riazi, F. Riazi, R. Yoosfi, F. Bahmeei, Outdoor difficulties experienced by a group of visually impaired Iranian people. J. Curr. Ophthalmol. 28(2), 85–90 (2016) 4. D.M. Brouwer, G. Sadlo, K. Winding, M.I.G. Hanneman, Limitation in mobility: experiences of visually impaired older people. A phenomenological study. Int. Cong. Ser. 1282, 474–476 (2005). Elsevier. https://doi.org/10.1016/j.ics.2005.05.100 5. N. Högner, Challenges in traffic for blind and visually impaired people and strategies for their safe participation. Klinische Monatsblatter fur Augenheilkunde 232(8), 982–987 (2015). https://doi.org/10.1055/s-0035-1545729 6. J. Siers-Poisson, Challenges faced by visually impaired pedestrians. Wisconsin Public Radio, 28-Jun-2018. [Online]. Available https://www.wpr.org/challenges-faced-visually-impairedpedestrians. Accessed: 25-Dec-2021
386
S. Bhatlawande et al.
7. R. Kapur, Challenges experienced by visually impaired students in education. Unpublished Paper Available at Researchgate (2018). Accessed: 25-Dec-2021 8. W. Jeamwatthanachai, M. Wald, G. Wills, Indoor navigation by blind people: behaviors and challenges in unfamiliar spaces and buildings. Br. J. Vis. Impair. 37(2), 140–153 (2019). https:// doi.org/10.1177/0264619619833723 9. H. Gada, V. Gokani, A. Kashyap, A.A. Deshmukh,Object recognition for the visually impaired, in 2019 International Conference on Nascent Technologies in Engineering (ICNTE) (IEEE, 2019), pp. 1–5. https://doi.org/10.1109/ICNTE44896.2019.8946015 10. M.R. Sunitha, F. Khan, R.G. Ghatge, S. Hemaya, Object detection and human identification using raspberry pi, in 2019 1st International Conference on Advances in Information Technology (ICAIT) (2019), pp. 135–139. https://doi.org/10.1109/ICAIT47043.2019.8987398 11. J.A. Mohammed, A. Paul, A. Kumar, J. Cherukuri, Implementation of human detection on raspberry pi for smart surveillance, in 2020 IEEE International Conference for Innovation in Technology (INOCON) (2020), pp. 1–8. https://doi.org/10.1109/INOCON50539.2020.929 8383 12. R. Joshi, M. Tripathi, A. Kumar, M.S. Gaur, Object recognition and classification system for visually impaired, in 2020 International Conference on Communication and Signal Processing (ICCSP) (2020), pp. 1568–1572. https://doi.org/10.1109/ICCSP48568.2020.9182077 13. A.G. Sareeka, K. Kirthika, M.R. Gowthame, V. Sucharitha, pseudoEye—mobility assistance for visually impaired using image recognition, in 2018 2nd International Conference on Inventive Systems and Control (ICISC) (2018), pp. 174–178. https://doi.org/10.1109/ICISC.2018.839 9059 14. R. Kumar, S. Meher, A novel method for visually impaired using object recognition, in 2015 International Conference on Communications and Signal Processing (ICCSP) (2015), pp. 0772–0776. https://doi.org/10.1109/ICCSP.2015.7322596 15. K. Matusiak, P. Skulimowski, P. Strurniłło, Object recognition in a mobile phone application for visually impaired users, in 2013 6th International Conference on Human System Interactions (HSI) (2013), pp. 479–484. https://doi.org/10.1109/HSI.2013.6577868 16. J. Nasreen, W. Arif, A.A. Shaikh, Y. Muhammad, M. Abdullah, Object detection and narrator for visually impaired people, in 2019 IEEE 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS) (2019), pp. 1–4. https://doi.org/10.1109/ICE TAS48360.2019.9117405 17. S. Tian, M. Zheng, W. Zou, X. Li, L. Zhang, Dynamic crosswalk scene understanding for the visually impaired. IEEE Trans. Neural Syst. Rehabil. Eng. 29, 1478–1486 (2021). https://doi. org/10.1109/TNSRE.2021.3096379 18. S. Davanthapuram, X. Yu, J. Saniie, Visually impaired indoor navigation using YOLO based object recognition, monocular depth estimation and binaural sounds, in 2021 IEEE International Conference on Electro Information Technology (EIT) (2021), pp. 173–177. https://doi. org/10.1109/EIT51626.2021.9491913 19. M. Mahendru, S.K. Dubey, Real time object detection with audio feedback using Yolo vs. Yolo_v3, in 2021 11th International Conference on Cloud Computing, Data Science and Engineering (Confluence) (2021), pp. 734–740. https://doi.org/10.1109/Confluence51648.2021.937 7064 20. N.N. Saeed, M.A.-. Salem, A. Khamis, Android-based object recognition for the visually impaired, in 2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS) (2013), pp. 645–648. https://doi.org/10.1109/ICECS.2013.6815497 21. S. Vaidya, N. Shah, N. Shah, R. Shankarmani, Real-time object detection for visually challenged people, in 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS) (2020), pp. 311–316. https://doi.org/10.1109/ICICCS48265.2020.9121085 22. M.A. Khan Shishir, S. Rashid Fahim, F.M. Habib, T. Farah, Eye assistant: using mobile application to help the visually impaired, in 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT) (2019), pp. 1–4. https://doi.org/10. 1109/ICASERT.2019.8934448
Electronic Mobility Aid for Detection of Roadside Tree Trunks …
387
23. C.-H. Lee, Y.-C. Su, L.-G. Chen, An intelligent depth-based obstacle detection system for visually-impaired aid applications, in 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services (2012) 24. S.B. Mane, S. Vhanale, Real time obstacle detection for mobile robot navigation using stereo vision, in 2016 International Conference on Computing, Analytics and Security Trends (CAST) (2016), pp. 637–642. https://doi.org/10.1109/CAST.2016.7915045 25. D. Aguerrevere, M. Choudhury, A. Barreto, Portable 3D sound/sonar navigation system for blind individuals, in Presented at the Second International Latin American and Caribbean Conference for Engineering and Technology (LACCEI), Miami, FL, 2–4 June 2004 26. M. Poggi, S. Mattoccia, A wearable mobility aid for the visually impaired based on embedded 3D vision and deep learning, in 2016 IEEE Symposium on Computers and Communication (ISCC) (2016), pp. 208–213. https://doi.org/10.1109/ISCC.2016.7543741 27. J.-Y. Lin, C.-L. Chiang, M.-J. Wu, C.-C. Yao, M.-C. Chen, Smart glasses application system for visually impaired people based on deep learning, in 2020 Indo–Taiwan 2nd International Conference on Computing, Analytics and Networks (Indo-Taiwan ICAN) (2020), pp. 202–206. https://doi.org/10.1109/Indo-TaiwanICAN48429.2020.9181366 28. E. Cardillo et al., An electromagnetic sensor prototype to assist visually impaired and blind people in autonomous walking. IEEE Sens. J. 18(6), 2568–2576 (2018). https://doi.org/10. 1109/JSEN.2018.2795046 29. P. Vyavahare, S. Habeeb, Assistant for visually impaired using computer vision, in 2018 1st International Conference on Advanced Research in Engineering Sciences (ARES) (2018), pp. 1–7. https://doi.org/10.1109/ARESX.2018.8723271 30. B. Jiang, J. Yang, Z. Lv, H. Song, Wearable vision assistance system based on binocular sensors for visually impaired users. IEEE Internet Things J. 6(2), 1375–1383 (2019). https://doi.org/ 10.1109/JIOT.2018.2842229 31. G. Huang, W. Zhang, Recognizing and locating of objects using binocular vision system, in 2013 International Conference on Advanced Robotics and Intelligent Systems (2013), pp. 135– 140. https://doi.org/10.1109/ARIS.2013.6573548 32. A. Caldini, M. Fanfani, C. Colombo, Smartphone-based obstacle detection for the visually impaired. 9279, 480–488 (2015). https://doi.org/10.1007/978-3-319-23231-7_43 33. S. Patel, A. Kumar, P. Yadav, J. Desai, D. Patil, Smartphone-based obstacle detection for visually impaired people, in 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS) (2017), pp. 1–3. https://doi.org/10.1109/ ICIIECS.2017.8275916 34. L.A. Johnson, C.M. Higgins, A navigation aid for the blind using tactile-visual sensory substitution, in Proceedings of 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, New York (2006), pp. 6298–6292 35. V. Babu, R. Karthikeyan, S. Bharadwaj, B. Sai, Wearable technology for visually challenged people commutation using ultrasonic mapping (2021). https://doi.org/10.1088/1742-6596/ 2040/1/012041 36. J.B.F. van Erp, L.C.M. Kroon, T. Mioch, K.I. Paul, Obstacle detection display for visually impaired: coding of direction, distance, and height on a vibrotactile waist band. Front. ICT 4 (2017) 37. S. Shakya, Automated nanopackaging using cellulose fibers composition with feasibility in SEM environment. J. Electron. 3(02), 114–125 (2021)
Real Time Video Image Edge Detection System A. Geetha Devi, B. Surya Prasada Rao, Sd. Abdul Rahaman, and V. Sri Sai Akhileswar
Abstract In Surveillance and Medical imaging applications edge detection acts as a vital character in finding the edges. Problems such as missing edges due to noise may arise during edge localization and any edge detection technique takes high computational time. Therefore, edge detection with high accuracy is a critical issue to handle in these applications. One of the solutions to overcome this problem is to use canny edge detection which is one of the superior edge extraction schemes. It reduces the number of false edges and consequently make a superior starting point for other additional processes. This project deals with real time video image edge detection using MyRIO (Reconfigurable Input Output) hardware. A camera which is supported by the device is used to capture the images. Canny edge operator is utilized to identify the edges present in the image. The edge can be handled best by adjusting the extra to an assortment of settings. Depending on the threshold values the edges will be detected. Edge detection plays an essential role in many of the applications like Finger print recognition, This work developed a portable and manageable device that detects the edges in real time environment which can be utilised in any of the applications.
1 Introduction Edge Detection [1–9] is an approach for portioning an image into regions of irregularities. It is generally utilized in advanced image handlings such as pattern recognition, morphological image processing and extraction of features. Edge detection [2] permits clients to notice the highlights of an image for a critical change in the brightness level. The edge showing the end of one part in the image is the start of another. The edge decreases the measure of information in an image and preserves the primary properties of a picture. A. Geetha Devi (B) · B. Surya Prasada Rao · Sd. Abdul Rahaman · V. Sri Sai Akhileswar Department of ECE, Prasad V. Potluri, Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_30
389
390
A. Geetha Devi et al.
Some of the edge detectors like step edge finders have been an essential part of numerous vision frameworks [10–14]. The edge detection measure assists in improving the examination of images by drastically reducing the amount of information to be handled, while concurrently saving underlying helpful data about the item limits. There is unquestionably a lot of variety in the practices of edge detection, yet it is felt that a typical arrangement of prerequisites is required for numerous applications. These pre-requisites yield a theoretical edge identification issue, and the procedure of which can be applied in any of the first subject areas. This paper is utilising with the Canny edge operator for detection of edge. The real time video is captured and extracted into frames. Each frame is applied by the edge detection algorithm and the edge of the video is going to be recognised. The functionality is going to be developed on NI My RIO Hardware. The paper is organised into six sections. The first section introduces the paper and the second one will give the literature of the methodologies that are existing in the area of the interest. The third session deals with the various edge operators and fourth section deals with the basic My RIO architecture. The fifth session discusses the results and sixth section concludes the paper.
2 Review of Literature The edge detection algorithms are useful for moving object identification algorithms [15–17] and many edge detection algorithms are proposed in the literature for various applications like video surveillance. Roshni and Raju proposed an Image Segmentation algorithm technique utilizing Multiresolution Texture Gradient and Watershed Algorithm [18]. Wang et al. presented a methodology using multiple adaptive threshold values and boundary evaluation for the moving object detection from background [19]. Ganesan et al. [20] extracted video object using the various edge detection algorithms. Kaur et al. [21] concentrated n the comparative study of various edge detection algorithms. Sabirin et al. suggested spatiotemporal graph to detect and track video objects [22]. Sathesh and Adam [23] proposed a three stage thinning process to remove a foreground pixel of a binar image. They have concluded that the end result of the hybrid parallel processing gives betters result when compared with Abu-ain, K3M, Zhang and Huang methods. They have implemented parallel processing without effecting the topological shape and connectivity of the image. Darney et al. [24] proposed a work that shows the reconstructed images obtained a superiority in Peak signal to Noise Ratio (PSNR) value. From the noisy image set, the produced images are fairly clear. The image receives sparse free parameters from the rain streak due to perfect coarse estimation. Because actual rain is diverse and complicated, developing a distinctive framework to house the combination of model-driven and data-driven approaches is a difficult task. Rain streaks from distant thundershowers build up across the landscape, distorting vision in the same way that
Real Time Video Image Edge Detection System
391
fog does, namely by scattering light and creating haze scattering light and lowering perceptibility. N. Mittel et.al. [25] proposed an edge detection algorithm is proposed to detect the changes in remote sensing data using particle swarm optimization. Basha et al. [26] suggested a medical image processing technique that utilizes X-ray image to diagnosis different orthopaedic and radiology-based mussel disorders. For classifying mussel disorders, they developed a Canny edge detection and a machine learning optimization technique.
3 Edge Detection Techniques Various types of edge detection techniques are available in literature. Most widely used edge detection techniques are 1. 2. 3. 4. 5. 6. 7.
Roberts edge operator Sobel operator Prewitt’s operator Kirish edge detection Robinson edge detection Marr-Hildreth edge operator and Canny edge detection.
Among these edge detection techniques Canny algorithm for edge detection is considered to be the best edge detection technique due to the following reasons • Non Maximum Suppression • Hysteresis process • Smoothening of the image with a Gaussian filter to reduce noise From the literature and results obtained after experimentation it is found that Canny Edge detection is superior than all other edge detection techniques. Hence to develop edge detection system using MyRIO the Canny edge detection technique has been utilized.
3.1 Algorithm for Canny Edge Detection Canny edge detection is a technique for getting useful primary data (Edge) from several graphical objects while substantially reducing the amount of data to be prepared. It is widely utilised in various PC vision frameworks. Canny discovered that the requisites for utilising edge location on various vision contexts are generally comparable. As a result, an edge recognition key to answer these requirements can be carried out in a wide range of situations. The general instructions for edge recognition are as follows:
392
A. Geetha Devi et al.
Fig. 1 Block diagram of Canny edge detection
1. 2. 3.
Edge discovery with a minimum error rate, which resembles that the identification should precisely get as many edges as are visible in the image. The edge point recognized from the administrator ought to precisely limit on the core of the edge. The particular edge map should be checked once, and wherever feasible, picture disorder should not make false edges.
In order to fulfil these needs, Canny utilized the analytics of diversities a method which discovers the capability which improves a given useful information. The ultimate capacity of Canny’s identifier is exposed by the amount of four dramatic terms, however it very well may be estimated by the principal subordinate of a Gaussian. The Canny edge location calculation is quite possibly the most rigorously characterized techniques among all the strategies till now. It provides great and dependable discovery. Inferable from its optimality to meet with the three rules for edge recognition and the effortlessness of interaction for execution, it got quite possibly the most mainstream calculations for edge discovery (Fig. 1).
4 MyRIO Architecture The NI Academic RIO Device is the insightful type of the National Instruments gathering of reconfigurable I/O (RIO) designing stages including the cutting-edge assessment Compact RIO and Single-Board RIO are expected for the embedded control and notice applications. The RIO plan on an extremely essential level includes two targets offering correlative capacities 1.
2.
Progressing with Real Time (RT) processor—runs a LabVIEW VI as a PC, anyway with a persistent working structure (Real Time Operating System) to achieve deterministic (obvious) measure circle timing; the RT processor similarly manages a gleam-based report system, USB port, UART, and association connectors for both wired and remote systems administration Field-Programmable Gate Array (FPGA)—“runs” a LabVIEW VI by stacking a bit stream plan record (FPGA “character”) requested clearly from the VI source code; the FPGA manages the hardware interface that partners with sensors, actuators, and periphery devices that build up the introduced structure.
Real Time Video Image Edge Detection System
393
Fig. 2 Block diagram of MyRIO hardware
In the Academic RIO Device stage, both the RT and FPGA targets truly live on the Xilinx Zynq-7000 framework on-chip (SoC) gadget (Fig. 2).
5 Results and Discussion Canny edge detection algorithm is developed on the My RIO hardware and is used for the real time edge detection. Figure 3 represents the actual circuit diagram which is connected physically. Web cam is connected to MyRIO and MyRIO is configured with LabVIEW software. The video signal is acquired through web cam and it
394
A. Geetha Devi et al.
Fig. 3 Edge detection circuit
processes with the help of MyRIO to get the edge detected image as shown in below figures. Figure 4 consists of input image on the left side of the figure and edge detected output on the right side. Filter parameters used in the edge detection process are Sigma 1.00, high threshold value 0.90, low threshold value 0.20 and a window size of 9 to detect the edges with high accuracy. Figure 5 consists of input image on the left side of the figure and edge detected output on the right side of the figure. Filter parameters used in the edge detection process are Sigma 1.00, high threshold value 0.93, low threshold value 0.20 and a window size of 9 to detect the edges with high accuracy (Fig. 6).
Fig. 4 Output obtained for input given through web cam
Real Time Video Image Edge Detection System
Fig. 5 Output obtained for input given through USB web cam
Fig. 6 Output obtained for input given through mobile camera
Applications • • • • •
Surveillance Medical science Fingerprint Satellite images Robotics vision.
395
396
A. Geetha Devi et al.
6 Conclusion The drive behind the paper is to express an audit of different methodologies for image segmentation using edge detection operators. In this paper, an endeavor is made to audit the edge discovery procedures based on the discontinuities of the image intensities. The canny edge detector is superior in detecting the edges. Hence, this methodology has been adopted for development of hardware using NI-My RIO. This paper realizes a portable hardware based video edge detection system on MyRIO. After simulating the code in the LabVIEW software, the code is dumped into the MyRIO hardware and a camera is connected to MyRIO to get the input video signals for edge detection. All the edges present in the video signal are displayed on the monitor after running the code. A portable and handy device for real time edge detection has been developed which can be easily accessible in any of the practical environments. The work can be extended by developing the algorithm in My RIOFPGA module for obtaining faster results when compared to My-RIO Real Time Module. .
References 1. J. Canny, A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–714 (1986) 2. W. Zhang, F. Bergholm, Multi-scale blur estimation and edge type classification for scene analysis. Int. J. Comput. Vision 24(3), 219–250 (1997) 3. D. Ziou, S. Tabbone, Edge detection techniques: an overview. Int. J. Pattern Recognit Image Anal. (1998) 4. T. Lindeberg, Edge detection and ridge detection with automatic scale selection. Int. J. Comput. Vis. (1998) 5. T Lindeberg, Edge detection, in Encyclopedia of Mathematics (EMS press, 2001) 6. J.M. Park, Y. Lu, Edge detection in gray scale, color and range images (2008) 7. P. Zhou, Q. Wang (2011) An improved canny algorithm for edge detection 8. T. Geback, P. Koumoutsakos, Edge detection in microscopy images using curvelets 9. T. Moeslund, Canny edge detection (2009) 10. M.H. Asghari, B. Jalali, Edge detection in digital images using dispersive phase stretch. Int. J. Biomed. Imaging 2015 (2015) 11. M.H. Asghari, B. Jalali, Physics-inspired image edge detection, in IEEE Global Conference on Signal and Information Processing (GlobalSIP) (2014), pp. 293–296 12. R. Haralick, Digital step edges from zero crossing of second directional derivatives. IEEE Trans Pattern Anal Mach Intell (1984) 13. A. Khashman, Automatic detection, extraction and recognition of moving objects. Int. J. Syst. Appl. Eng. Dev. 2(1) (2008) 14. Y. Ramadevi, T. Sridevi, B. Poornima, B. Kalyani, Segmentation and object recognition using edge detection techniques. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 2(6) (2010) 15. K.A. Joshi, D.G. Thakore, A survey on moving object detection and tracking in video surveillance system. Int. J. Soft Comput. Eng. (IJSCE) 2(3) (2012). ISSN: 2231-2307
Real Time Video Image Edge Detection System
397
16. B. Bansal, J.S. Saini, V. Bansal, G. Kaur, Comparison of various edge detection techniques. J. Inf. Oper. Manag. 3(1), 103–106 (2012). ISSN: 0976–7754 & E-ISSN: 0976–7762 17. J.C. Nascimento, J.S. Marques, Performance evaluation of object detection algorithms for video surveillance. IEEE Trans. Multimedia 8(4) (2006) 18. V.S. Roshni, G. Raju, Image segmentation using multiresolution texture gradient and watershed algorithm. Int. J. Comput. Appl. 22(6), (0975–8887) (2011) 19. L. Wang, N.H.C. Yung, Extraction of moving objects from their background based on multiple adaptive thresholds and boundary evaluation. IEEE Trans. Intell. Transp. Syst. 11(1) (2010) 20. K. Ganesan, S. Jalla, Video object extraction based on a comparative study of efficient edge detection techniques. Int Arab J. Inf. Technol. 6(2) (2009) 21. B. Kaur, A. Garg, Comparative study of different edge detection techniques. Int. J. Eng. Sci. Technol. (IJEST) 22. H. Sabirin, M. Kim, Moving object detection and tracking using a spatio-temporal graph in H.264/Avc bitstreams for video surveillance. IEEE Trans. Multimedia 14(3) (2012) 23. A. Sathesh, E.E.B. Adam, Hybrid parallel image processing algorithm for binary images with image thinning technique. J. Artif. Intell. 3(03), 243–258 (2021) 24. P.E. Darney, I.J. Jacob, Rain streaks removal in digital images by dictionary based sparsity process with MCA estimation. J. Innovative Image Process. 3(3), 174–189 (2021) 25. N. Mittal, A. Gelbukh, Change detection in remote-sensed data by particle swarm optimized edge detection image segmentation technique, in Innovative Data Communication Technologies and Application (Springer, Singapore, 2021), pp. 809–817 26. C. Zeelan Basha, T. Sai Teja, T. Ravi Teja, C. Harshita, M. Rohith Sri Sai, Advancement in classification of X-ray images using radial basis function with support of Canny edge detection model, in Computational Vision and Bio-Inspired Computing (Springer, Singapore, 2021), pp. 29–40
Research Paper to Design and Develop an Algorithm for Optimization Chatbot Bedre Nagaraj and Kiran B. Malagi
Abstract Digital era has witnessed the globe with the emergence of a special category of software bot-chatbot which is manifested in millions of popular and demanding applications. Prominance to study details behind popularity of chatbot and its characteristics which created a milestone and drastic changes in modern applications is quite fascinating. Aim is to detect potential knowledge gaps to identify, compare and evaluate chatbot. Objective is to propose an optimization chatbot model for evaluation with design and developing an optimization chatbot algorithm to predict levels of chatbot optimization. Focus on evaluating 16 chatbot-Botmother, Botpress, Botsify, Botsociety, Botstar, Bot.xo, Chatize, Chatfuel, Chengo, Clustaar, Crisp, Drift, Engati, Flow.xo, Flow.ai and Freshchat, based on 14 features visualflow builder, text chatbot, use, setup, tutorials, documentation, help, keywords, intents, entities, dialogflow integration, optimization A/B testing, multiple languages and live chat. Algorithm systematically computes scores of chatbot and feature, levels of optimization, mean, median, range, standard deviation and percentage of features optimized. The results depict importance of the distribution of key performance indicator features in chatbot, 9 optimized chatbot, 74.55% features in chatbot, 64.28% of optimized features. Thus, Optimization chatbot algorithm useful to predict aspects related to improving overall chatbot performance.
1 Introduction Best conversational responses can be obtained by semi or fully automated chatbot using methods, tools, trends, technologies, capabilities and dimensions. Chatbot merits drives them to be used in many applications requiring attention to some of B. Nagaraj (B) Research scholar, Department of Computer Science and Engineering, School of Engineering, Dayananda Sagar University, Bangalore, India e-mail: [email protected] K. B. Malagi Associate Professor, Department of Computer Science and Engineering, School of Engineering, Dayananda Sagar University, Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_31
399
400
B. Nagaraj and K. B. Malagi
its challenges. Optimization has an important role in the best and effective use of the computer resources (input, output, memory and CPU) for the design of a high performance chatbot software. High performance chatbot can be optimized chatbot with support for plentiful features. Optimization is the use of resources for design and operation of a particular part of system or part of process to increase performance. This study is motivated to detect different knowledge gaps in chatbot evaluation and to fill that gap with best possible algorithm. Design of an efficient algorithm for optimizing chatbot is significant due to several key aspects including: • • • • • •
Proposing model to predict chatbot is optimized or not. Identify, compare and contrast different features supported by chatbot. Evaluating chatbot tool. To detect whether chatbot is optimized or needs optimization. Identify level of optimization chatbot (maximum, average, minimum). Optimization chatbot indicates percentage of features supported by chatbot.
Our study is motivated to detect knowledge gaps in literature for different functionalities of chatbot tools, evaluate the chatbot along with the analysis for following queries which are addressed in different sections: 1.1. 1.2. 1.3. 1.4. 1.5. 1.6. 1.7. 1.8.
What is the importance of chatbot evaluation? How the existing research has classified and evaluated chatbot? Can we propose new algorithm for optimized chatbot evaluation? Which of the base characteristics used in evaluating chatbot? What are results of chatbot evaluation? Is there any relation between selected features and chatbot? How future enhancements can be done further? How users can pick best optimized chatbot so as to fit to their corresponding application? How developers get prediction about improvements needed for chatbot while designing it?
2 Related Work Robotic Process Automation is the popular software configuration paradigm for implementing automated tasks. In recent technological advancements, software bots have attracted drastic attention for performing repetitive tasks in widespread application domains. Chatbots mimic the environments of real conversations to the user by providing automated text communications. New era of chatbots derived from 1950 along with Alan Turings paper “Computing machinery and intelligence”. Overview of the chatbot historical events revealed the different chatbots. ELIZA (1964:66) by Joseph weizenbaum, MIT AI lab is open source, text based chatbot. PARRY (1972) by Kenneth Colby at stanford university is advanced form of ELIZA with complex assumptions and emotional responses. JABBERWACKY (1982) by Rollo carpenter supports text, natural human simulation, contextual pattern matching, association based and self learning. Dr. sbaisto (1991) used for speech synthesis. ALICE (1995)
Research Paper to Design and Develop an Algorithm …
401
by Richard Wallace has features of heuristic pattern matching and input, AIML, open source, text based, NLP. It is first most human computer which won 3 loebner prizes in years 2000, 2001 and 2004. Smarterchild (2001) for activebuddy, instant messaging and sms network. Watson IBM (2006) support question answering system, text, self learning, IBM DeepQA and APACHE UIMA. SIRI (2011) an intelligent personal assistant with self learning, uses java, java script, objective c, voice queries, natural language user input, NLP. Google Now (2012) by google for Galaxy nexus smartphones. MITSUKU (2012) makes use of AIML, own-training and text. Alexa (2014) and ALEXA (2015) support audio and own learning. Tay (2016) by Microsoft used for posting on twitter. Google Allo (2016) instant messaging mobile app voice and auto reply [1–5]. Review of chatbot trends for general domain [6] explores to basic aspects of chatbot with chatbot classifications. Algorithm design proposed to read natural language documents [7]. Chatbot comparison wrt algorithm, platforms and tools are done using AIML, XML, WEKA and Dialog flow [8]. Education inquiring school information FAQ, an efficient rule based chatbot [9] for user query with merits of high accuracy and saving time powered by AIML based divideconquer methodology. General chatbots compared. Bag of words algorithm, beam search decoding, seq2seq model is used with corpus data set for training to answer particular domain [10]. E-Learning chatbot [11] with machine learning, naïve bayes is used for E-Learner classification and to understand behaviour. University chatbot [12] is used to study student performance and chatbot acceptability. Pizza ordering chatbot [13] with self learning model for user interaction is proposed by using Mango db, RASA NLU model. General [14] chatbot for retrieval shown better responses. Education in college [15] chatbot provides efficient usage of college lab by chatbot and avoids manual work. Education [16] domain chatbot for student acceptance of chatbot based on studio learning and Technology Acceptance Model for feedback with iterative and critique sessions. Improved learning exhibited [17] for AI in education. Easy learning [18] exhibited by student chatbot inquiry model. University education [19] with AIML tags discussed but no concept of chatbot development. General chatbot [20] discused with evaluation of chatbots and AIML. Content oriented user modeling [21] uses NLTK, machine learning (Random forest and svm) are incorporated. General [22] question answer chatbot uses NLTK. Chatbot [23] for auto generation of questions from children stories used stanford tool, machine learning: logistic regression. Now we present developments from review of 24 papers in existing literature [4–8, 14–18, 24–28, 30–38]. Review by papers [4, 6, 14, 18, 25] indicates that Chat bots are best for specifically acquiring best responses to users. Chatbot framework should be formulated on user tasks with flexibility. Chatbot capabilities based evaluation include self consciousness (self awareness, same answer for same question at different times), Humor (no violent for optimizing chatbot method information), purity (good answers), Intelligent Quotient (IQ) (Access human intelligence), Emotional Quotient (EQ) (Emotion management), Memory (short/long/working), Self learning (concept/entity learn and identify) and Charisma (attractiveness).
402
B. Nagaraj and K. B. Malagi
Framework [5, 7, 16, 26–28] proposed to be consisting of dialog manager, inference engine, knowledge base, planner, external service interface with text. Most of outstanding frameworks utilize the If-This-Then-That procedure with common chat or features to be consisting of modifiability, accuracy, security, privacy, interoperability and reliability. The base factors of prediction, direction, type, interaction, guidance and channel of chat or the communication is introduced. A classification framework used with timing, AI/non AI chat bot, effectiveness, efficiency and satisfaction are essential for “quality of chatbot”. Chatbot classified based on services, knowledge domain, goals, input and methods, amount of human aid. Based on these facts the chatbot is evaluated. Review of papers [24, 27] suggest that the framework of chatbot can also include classifying and evaluating based on the type, direction, guidance, predictability, interaction style and communication channel [27]. Taxonomy based chatbot classification uses three dimensions as Environment, Intrinsic and Interaction for evaluation [24]. Another study classified chatbot framework on the basis of timing, flow, platform and understanding [38]. Some chatbot on basis of Content based, Complexity, Model, AI/Non AI and domain based are capable to provide natural interactions with reliability [4, 19, 24, 25, 30–32]. Chatbot classification which categorizes chatbot with different dimensions/base factors including services, knowledge domain, goals, input and methods, amount of human aid, build method, type, direction, probability, interaction style Communication channel, content based, model, AI/Non AI, domain, application areas, environmental, intrinsic, interaction, timing, flow, platform and understanding [4, 5, 9, 14, 15, 17, 19, 24, 25, 27, 30–32] Papers [5, 8, 25, 27, 29, 31, 32] Suggest that popular chatbot evaluation methods are automatic content evaluation (CE), session user satisfaction (US) and final product Functional Evaluation (FE) as in Table 1. None of existing papers considered the measures in terms of the percentage evaluation of bot which is significant to identify the bot performance. In this paper percentage calculation (Table 1) is included depending on all three, any two and one evaluations as 100%, 66.66% and 33.33% respectively. From existing literature potential gaps are found to compare the different chatbot tools by evaluation. Functionality of Open source chatbot tools (Microsoftbot-Luis, Botpress, botkit, pandoraboats, Rasa) are similar to closed source chatbot tools (Amazon Lex, dialogflow, gupshup, IBM watson, bot society, wit.AI). Open source tools software can be viewed, analyzed and updated by any users. In closed source chatbot tools the bugs are fixed only by manufacturers. In terms of privacy and trust both have merits and demerits. Existing models have certain issues and limitations including: • No standards set to identify, compare and contrast different features supported by chatbot. • No proper method to evaluating chatbot tool. • No calculations to predict whether chatbot is optimized or needs further optimization. • Not classify the levels of optimization (maximum, average, minimum). • Percentage of features supported by chatbot are not computed.
Research Paper to Design and Develop an Algorithm …
403
Table 1 Chatbot classification, comparison, evaluation with evaluation methods Year
Name
Domain
Base
CE
US
FE
Evaluation percentage
2018
AIMe
Open
AI
Y
Y
Y
100
2005
Fruedbot
Education
Rule
Y
Y
Y
100
2018
Li et al.
Business
AI
N
Y
Y
66.66
2018
SOGO
Open
AI
N
Y
Y
66.66
2017
Xu et al.
Business
AI
Y
Y
N
66.66
2014
Higashinaka
Open
Both
N
Y
Y
66.66
2018
Divya et al.
Health
AI
N
N
Y
33.33
2018
LaLiga
Sports
AI
N
Y
N
33.33
2018
Farmbot
Agriculture
AI
N
N
Y
33.33
2017
Super agent
E-com
Rule
Y
N
N
33.33
2017
Allergybot
Health
AI
N
N
Y
33.33
2017
Mandy
Health
AI
N
Y
N
33.33
2015
Nombot
Health
AI
N
Y
N
33.33
2015
Pharmbot
Health
Rule
N
Y
N
33.33
2014
Mcclendon
Open
AI
Y
N
N
33.33
2012
Medchatbot
Health
AI
N
Y
N
33.33
2009
CSIEC
Education
Rule
N
Y
N
33.33
2007
Calmsystem
Education
AI
N
N
Y
33.33
CE (Content Evaluation), US (User Satisfaction) and FE (Functional Evaluation)
• User of chatbot lacks information to select best possible chatbot for intended application. • Developer of chatbot lacks with which features to be included in chatbot during its development. • Key performance indicators not considered. • Chatbot scores and Feature scores not done.
3 Research Gaps Research gaps found from literature review are as follows. • • • • • •
Explore, compare and evaluate different chatbot [5] Increase user satisfaction [8] Examine beneficiary attributes of chatbot [18] Examine functionality of chatbot [5] Compare chatbot based on key performance indicators [39] Improve design functionality and developer interaction [27]
404
B. Nagaraj and K. B. Malagi
• None of papers calculated the features score (Fscore) for each chatbot during evaluation. No work proposed based on how the selected features and chatbot are distributed/related. Statistical analysis evaluation is not done [4, 5, 9, 14, 15, 17, 19, 24, 25, 27, 30–32].
4 Methodology: Proposed Model and Design of Algorithm In this section, a chatbot optimization model is proposed. This model has different components including set of inputs, set of outputs, set of features (f1, f2, …fm), set of chatbot (c1, c2, …cn), features scores (fscores) ranging from 0 to n, chatbot scores (cscores) ranging from 0 to m, category of chatbot i.e. level of optimization chatbot (A: max, B: avg, C: min or not optimized). If input to model is a chatbot then it is evaluated based on features to predict the level of chatbot optimization. Corresponding output is cscore for each chatbot along with its category and information related to whether it is best optimized chatbot, average optimized chatbot or chatbot not optimized. Number of features and number of chatbot are denoted by NF and NC, respectively. Otherwise, if input to model is feature then obtained output is whether that feature is incorporated in most of chatbot. It also predicts feature is optimized to max extent or min extent. Design of an optimization chatbot algorithm In this section an algorithm for optimized chatbot is presented along with discussion of method. Design of an algorithm for Optimized chatbot tool evaluation method (wrt Sect. 1) outlined as follows. Algorithm: Chatbot optimization NF: number of key performance indicator features, NC: number of chatbot Step 1: Identify factors for comparison (column 2, Table 2). Step 2: Include chatbot for comparison (From column 3 Table 2). Step 3: Mark for features available in chatbot tools (✔ indicates feature supported and ✖ indicates features not supported). Step 4: Compute the score for each chatbot based on features supported (vertical sum i.e. chatbot score or cscore). Step 5: Compute the score for each feature which is supported by tool (Horizontal sum i.e. Fscore). Step 6: Assign the grades (A: High, B: Average and C: Low) based on chatbot score to each of the chatbot (Tables 3, 4 and 5). Step 7: Assign the grades (A: High, B: Average and C: Low) based on feature score to each of the features (Table 6). Step 8: Highest graded chatbot indicate optimized chatbot. Lowest graded chatbot still need optimization.
✔
✔
✔
✖
✔
✔
✔
✔
✔
✔
✖
✖
✔
✔
✖
✖
✔
✖
✖
✔
✔
✔
✖
✖
✖
✖
✔
✔
6
F1
F2
F3
F4
F5
F6
F7
F8
F9
F10
F11
F12
F13
F14
Cscore
11
C2
C1
C/F
10
✔
✔
✖
✔
✖
✖
✔
✔
✔
✔
✔
✔
✔
✖
C3
10
✖
✖
✖
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
✖
C4
11
✔
✔
✖
✔
✖
✖
✔
✔
✔
✔
✔
✔
✔
✔
C5
11
✔
✔
✖
✖
✖
✔
✔
✔
✔
✔
✔
✔
✔
✔
C6
9
✔
✖
✖
✖
✖
✔
✔
✔
✔
✔
✔
✔
✔
✖
C7
11
✔
✔
✔
✖
✖
✖
✔
✔
✔
✔
✔
✔
✔
✔
C8
10
✔
✖
✖
✖
✖
✔
✔
✔
✔
✔
✔
✔
✔
✖
C9
Table 2 Chatbot (C1, C2, …C16) evaluation across features (F1, F2, …F14)
12
✔
✔
✖
✖
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
C10
11
✔
✔
✖
✔
✖
✔
✔
✔
✔
✔
✖
✔
✔
✔
C11
11
✔
✔
✔
✖
✖
✖
✔
✔
✔
✔
✔
✔
✔
✔
C12
12
✔
✔
✖
✖
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
C13
10
✔
✔
✖
✔
✖
✖
✔
✔
✔
✔
✔
✔
✔
✖
C14
12
✔
✔
✖
✖
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
C15
10
✔
✔
✖
✖
✖
✔
✔
✔
✔
✖
✔
✔
✔
✔
C16
167
15
13
02
05
05
10
16
16
16
14
13
16
15
10
FScore
Research Paper to Design and Develop an Algorithm … 405
406
B. Nagaraj and K. B. Malagi
Table 3 Final score card and calculation of categories for Botmother, Botpress, Botsify, Botsociety, Botstar and Bot.xo Chatbot
Bot mother Bot press Botsify (C3) Bot society Botstar (C5) BotXO (C6) (C1) (C2) (C4)
CScore
06
11
10
10
11
11
Category
C
A
B
B
A
A
Optimization ✖
✔
✖
✖
✔
✔
Table 4 Final score card and calculation of categories for Chatize, Chatfuel, Chengo, Clustaar, Crisp and Drift Chatbot
Chatize (C7)
Chatfuel (C8)
Chengo (C9)
Clustaar (C10)
Crisp (C11)
Drift (C12)
CScore
09
11
10
12
11
11
Category
B
A
B
A
A
A
Optimization
✖
✔
✖
✔
✔
✔
Table 5 Final score card and calculation of categories for Engati, Flow.xo, Flow.ai and Freshchat Chatbot
Engati (C13)
Flow.xo (C14)
Flow.ai (C15)
Freshchat (C16)
CScore
12
10
12
10
Category
A
B
A
B
Optimization
✔
✖
✔
✖
Table 6 Final Fscore for feature, grades and optimized feature
Feature
Fscore
Grade
Feature optimized
Visual flow builder
10
B
✖
Text chatbot
15
A
✔
Easy use
16
A
✔
Easy setup
13
A
✔
Tutorials
14
A
✔
Documentation
16
A
✔
Customer help
16
A
✔
Keywords
16
A
✔
Intents
10
B
✖
Entities
05
C
✖
DialogFlow integration
05
C
✖
Optimization A/B testing
02
C
✖
Multiple languages
13
A
✔
Live chat
15
A
✔
Research Paper to Design and Develop an Algorithm …
407
Step 9: Highest graded feature indicate that feature is optimized and incorporated in most of chatbot tools. Where as lowest graded feature indicates that still it is not incorporated in chatbot tool requiring attention by its developers. Step 10: Percentage of features incorporated can be computed as (Sum of all Fscores × 100)/(no. of features × no. of chatbot). Step 11: Mean, median, mode, range and standard deviation are computed to indicate that mean computation value is free from error, mode denoting highest peak value, range indicating range of values of features and standard deviation indicating features are distributed properly.
5 Evaluation and Results We have used our methodology on NF = 14, NC = 16 (Steps 1 and 2). Considered features with key performance indicators (F1, F2, …F14) (m = 14) are Visual flow builder, Text chatbot, Easy use, Easy setup, Tutorials, Documentation, Customer help, keywords, Intents, Entities, DialogFlow integration, Optimization A/B testing, Multiple languages and Live chat. Chatbot (C1, C2, …C16) (n = 16) considered are Botmother, Botpress, Botsify, Botsociety, Botstar, Bot.xo, Chatize, Chatfuel, Chengo, Clustaar, Crisp, Drift, Engati, Flow.xo, Flow.ai and Freshchat. As in Table 2, if chatbot supports corresponding feature then it is marked as ✔ otherwise it is marked as x (Step 3). Chatbot Botmother (column 2, Table 2) supports feature documentation so it is marked with ✔. Compute chatbot score (cscore i.e. vertical sum). For instance, (Table 2) bot mother supports 6 features so its cscore is 6 (Step 4). Similarly, Features score is computed (i.e. horizontal sum). As in Table 2, for feature visual flow builder the horizontal sum is 10 (i.e. fscore) as it is supported by 10 chatbot (Step 5). Final chatbot scores (cscore) are marked with grades A or B or C. If cscore is between 0 and 6 then grade is C. If Cscore is between 7 and 10 then grade is B. If Cscore is between 11 and 14 then grade is A. We can notice this in Tables 3, 4 and 5. For instance, the chatbot score (cscore) for chatbot botmother is 6 (as in Table 2) so it is marked with grade C in Table 3 (Step 6). Based on Fscore (obtained in Table 2) each feature is graded with A or B or C (Table 6). The feature-Visual flow builder has fscore of 10. It is marked with B grade (fscores 0–6 (C grade), 7–10 (B grade), 11–16 (A grade). This feature is considered to be not optimized (Table 6). Chatbot is considered as optimized if and only if its grade is A (ex: chatfuel is optimized chatbot, chatize is not optimized due to its B grade indicating still it needs optimization Table 4) (Step 8). Feature is considered as optimized if and only if its grade is A (ex: easy use is optimized feature where as visual flow builder with B grade is not optimized, Table 6) (Step 9). Percentage of features incorporated = (Sum of all Fscores × 100)/(no. of features × no. of chatbot) = (167 × 100)/(14 × 16) = 74.55% (Step 10). Let us consider total features supported in Table 2, tf i.e. 167 (horizontal sum of fscores row elements). Based on FScores, Mean = (tf/NC) = (167/14) = 11.9285. Fscores 10, 15, 16, 13, 14, 16, 16, 16, 10, 5, 5, 2, 13, 15.
408
B. Nagaraj and K. B. Malagi FScore
cscores
Visual Flow Builder
Text Chatbot
Easy Use
Botmother
Botpress
Botsify
Botsociety
Easy setup
Tutorials
Documentation
Botstar
BotXO
Chatize
Chatfuel
Customerhelp
Keywords
Intents
Chengo
clustaar
Crisp
Drift
Entities
Dialog Flow Integration
Optimization A/B testing
Engati
Flow.XO
Flow.AI
Freshchat
Multiple Languages
Live Chat
Fig. 1 Cscores and Fscores
Median = 13.5, mode = 16 and Range = 14 (Step 11). The results depict the importance of the distribution of features in chatbot and which of chatbot are optimized, which of features are optimized. Fscores range from 2 to 16. Chatbot scores range from 6 to 12. Figure 1 shows chart of results. Results (Table 2) shows that only 167 features among (14(NF) × 16(NC) = 224) 224 are incorporated there by percentage of inclusion of features as 74.55% (As in step 9 calculation). Tables 3, 4 and 5, represents that Botpress, Botstar and Bot.xo have chatbotscores of 11. These can be considerable as optimized. Botsify and Botsociety with B category needs some features to be supported by them. Botmpther which support only 6 features needs to incorporate other features. Other A category chatbot score based tools are Chatfuel, Clustaar, Crisp, Drift, Engati and Flow.ai., Chatize, Chengo, Flow.xo and Freshchat are in B Categories. We found 9 optimized chatbot and 7 not optimized chatbot out of 16 chatbot considered. Tables 6 and 7 indicates which of features are fully included in the chatbot tool. The features Text chatbot, Easy use, Easy setup, Tutorials, Documentation, customer help, keywords, multiple languages and live chat are optimized feaures. Features visual flow builder and intents supported in only some of chatbot tools. Entities, dialog flow integration and optimization testing are found in very less tools which must be included by developers. Table 8 shows number of frequency of features distributesd in different slabs. Nine out of sixteen features are found in slab 13–16 indicating 64.28% of features in Table 7 Distribution of frequency of features in different slab Fscores
0–4
5–8
9–12
13–16
No. of feature
1
2
2
09
Table 8 Distribution of frequency of chatbot in different slab
Fscores
0–6 (C)
7–10 (B)
11–14 (A)
No. of chatbot
1
6
9
Research Paper to Design and Develop an Algorithm … Table 9 feature (col 1), fscore (col 2) and standard deviation (SD) (col 3)
409
Feature
Fscores
Standard deviation (SD)
Visual flow builder
6
19.69
Text chatbot
11
0.316
Easy use
10
0.191
Easy setup
10
0.191
Tutorials
11
0.316
Documentation
11
0.316
Customer help
9
2.066
Keywords
11
0.316
Intents
10
0.191
Entities
12
2.441
DialogFlow
11
0.316
Integration
11
0.316
Optimization A/B testing
12
2.441
Multiple languages
10
0.191
Live chat
31.93
SD
optimized category. We found Mean = (tf/NC) = (167/14) = 11.9285. Fscores are 10, 15, 16, 13, 14, 16, 16, 16, 10, 5, 5, 2, 13, 15. Median = 13.5, mode = 16 and Range = 14. Table 9 shows calculation of standard deviation indicating features are distributed properly. 31.93 is summation (row II) and standard deviation is 1.412. Fscore v/s Features
Live Chat Mulple Languages Opmizaon A/B tesng Dialog Flow Integraon Enes Intents Keywords Customerhelp Documentaon Tutorials Easy setup Easy Use Text Chatbot Visual Flow Builder 0
2
4
6
8
FScore
10
12
14
16
18
410
B. Nagaraj and K. B. Malagi
Comparision of results Similar type of work by paper [33] (2021 jan) has considered only cscores assigning that score to chatbot. They have not considered the concept of optimizated chatbot. They have not assigned scores to features, No calculations for percentage of features optimized and percentage of chatbots optimized, Calculation of mean, median, range and standard deviation are also not done by them. Thus our work results better in evaluation into A, B and C categories and optimization chatbot. We have presented a clear algorithm and tested it successfully with chatbot scores, features scores, number of chatbot in grades A or B or C, Number of features in grades A or B or C, Number of optimized chatbot, number of features optimized, percentage of features optimized, percentage of chatbot optimized and calculations for mean, median, mode, range and standard deviations for Fscore and differences.
6 Conclusion and Future Enhancements This study worth to identify and compare bots by evaluation of 18 chatbot (Table 1) which is classified into 100, 66.66 and 33.33% evaluated based on content, user satisfaction and functionality with importance of all types of evaluation for chatbot performance which is not discussed in any paper. This paper distinctly throws light on aspects: (i) Sects. 1 and 2 reveals what is known? Chatbot evaluation based on frameworks of different dimensions, (ii) Sect. 3 detects what is unknown? and what knowledge gap we want to fill? (iii) Sects. 4 and 5 lights on how this knowledge gap is filled ? A new algorithm is proposed for optimization chatbot and tested with data set of 14 features among 16 chatbot.
14 12 10 8 6 4 2 0
cscores
cscores
This work proposed optimization chatbot evaluation model and optimization chatbot algorithm which is useful for: (i) comparing, identifying and evaluating chatbot based on features of key performance indicators (ii) users to select best optimized chatbot for required task (iii) developers to improve chatbot development by incorporating right features that optimizes chatbot (iv) identifies optimized chatbot (v) predict the relationship between chosen features and chatbot (vi) predict the chatbot level of optimization as maximum or average or minimum.
Research Paper to Design and Develop an Algorithm …
411
Thus, our work addressed and resolved the issues and research gaps of existing literature (Sect. 3).“Optimization has a key role in providing best performance by maximizing/minimizing certain characteristic and improves overall capabilities and performance of chatbot in terms of how chatbot provide service to its users.” Future enhancements include identifying relationship between optimized chatbot tools and optimized features, considering more number of chatbot and features, performing correlation or regression analysis, including significant more key performance indicators.
References 1. M. Lacity, L. Willcocks, A. Craig, Robotic process automation at Telefónica O2. The outsourcing unit working research paper series, paper 15/02 (2015) 2. L. Willcocks, M. Lacity, A. Craig, The IT function and robotic process automation. The outsourcing unit working research paper series, paper 15/05 (2015) 3. B. Kohli, T. Choudhury, S. Sharma, P. Kumar, A platform for human-chatbot ınteraction using python, in IEEE Second International Conference on Green Computing and Internet of Things (ICGCIoT), Bangalore, India (2018), pp. 439–444 4. C. Wei, Z. Yu, S. Fong, How to build a chatbot: chatbot framework and its capabilities, in ACM, ICMLC (2018), pp. 369–373 5. E. Adamopoulou, L. Moussiades, An overview of chatbot technology, in Artificial Intelligence Applications and Innovations, AIAI 2020, eds. by I. Maglogiannis, L. Iliadis, E. Pimenidis. IFIP Advances in Information and Communication Technology, vol. 584 (Springer, Cham, 2020) 6. T.P. Nagarhalli, V. Vaze, N. K. Rana, A review of current trends in the development of chatbot systems, in IEEE, 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India (2020), pp. 706–710 7. N. Madaan, M. Saxena, H. Patel, S. Mehta, Feedback-based keyphrase extraction from unstructured text documents, in IEEE International Conference on Communications Systems and NETworkS (COMSNETS), Bengaluru, India (2020) pp. 674–676 8. S. Fernandes, R. Gawas, P. Alvares, M. Femandes, D. Kale, S. Aswale, Survey on various conversational systems, in 2020 IEEE International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India (2020), pp. 1–8 9. N.N. Khin, K.M. Soe, University chatbot using artificial ıntelligence markup language, in IEEE Conference on Computer Applications (ICCA), Yangon, Myanmar (2020), pp. 1–5 10. S.P. Reddy Karri, B. Santhosh Kumar, Deep learning techniques for ımplementation of chatbots, in IEEE International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India (2020), pp. 1–5 11. R. Rajkumar, P. Ganapathy, Bio-ınspiring learning style chatbot ınventory using brain computing ınterface to ıncrease the efficiency of e-learning. IEEE Access 8, 67377–67395 (2020) 12. F.A.J. Almahri, D. Bell, M. Merhi, Understanding student acceptance and use of chatbots in the United Kingdom universities: a structural equation modelling approach, in IEEE 6th International Conference on Information Management (ICIM), London, United Kingdom (2020), pp. 284–288 13. P. Thosani, M. Sinkar, J. Vaghasiya, R. Shankarmani, A self learning chat-bot from user ınteractions and preferences, in IEEE 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, (2020) pp. 224–229 14. L. Zhang, Y. Yang, J. Zhou, C, Chen, L. He, Retrieval-polished response generation for chatbot. IEEE Access 8 (2020)
412
B. Nagaraj and K. B. Malagi
15. M. Ghadge, A. Dhumale, G. Daki, U.D. Kolekar, N. Shaikh, Chatbot for efficient utilization of college laboratories, in IEEE 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India (2020) pp. 834–838 16. J.A. Kumar, P.A. Silva, Work-in-progress: a preliminary study on students’ acceptance of chatbots for studio-based learning, in IEEE Global Engineering Education Conference (EDUCON), Porto, Portugal (2020) pp. 1627–1631 17. L. Chen, P. Chen, Z. Lin, Artificial intelligence in education: a review. IEEE Access 8, 75264– 75278 (2020) 18. A. Alkhoori, M.A. Kuhail, A. Alkhoori, UniBud: a virtual academic adviser, in IEEE 12th Annual Undergraduate Research Conference on Applied Computing (URC), Dubai, United Arab Emirates (2020) 19. B.R. Ranoliya, N. Raghuwanshi, S. Singh, Chatbotfor university related FAQs, in IEEE International Conference on Advances in Computing, Communication and Informatics(ICACCI) (2017) 20. W. Liu, J. Zhang, S. Feng, An ergonomics evaluation to chatbot equipped with knowledgerich mind, in IEEE 3rd International Symposium on Computational and Business Intelligence (2015), pp. 95–99 21. B. Liu, Z. Xu, C. Sun, B. Wang, X. Wang, D.F. Wong, M. Zhang, Content-oriented user modeling for personalized response ranking in chatbots. IEEE/ACM Trans. Audio, Speech, Lang. Process. 26(1), 122–133 (2018) 22. S.A. Abdul-Kader, J. Woods, Question answer system for online feedable new born chatbot, in IEEE Intelligent Systems Conference, London, UK (2017), pp. 863–869 23. C.-H. Lee, T.-Y. Chen, L.-P. Chen, P.-C. Yang, R.T.-H. Tsai, Automatic question generation from children’s stories for companion chatbot, in IEEE International Conference on Information Reuse and Integration for Data Science (2018) pp. 491–494 24. L.N. Michaud, Observations of a new chatbot: drawing conclusions from early interactions with users. IT Prof. IEEE Comput. Soc. 20(5), 40–47 (2018) 25. W. Maroengsit, T. Piyakulpinyo, K. Phonyiam, S. Pongnumkul, P. Chaovalit, T. Theeramunking, A survey on evaluation methods for chatbot, in ACM (2019), pp. 111–119 26. P. Kucherbaev, A. Bozzon, G. Houben, Human-aided bots. IEEE Internet Comput. 22(6), 36–43 (2018) 27. E. Paikari, A. van der Hoek, A framework for understanding chatbots and their future, in IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE), Gothenburg (2018), pp. 13–16 28. V. Hristidis, Chatbot technologies and challenges, in IEEE First International Conference on Artificial Intelligence for Industries (AI4I), Laguna Hills, CA, USA (2018), pp. 126–126 29. S. Das, E. Kumar, Determining accuracy of chatbot by applying algorithm design and defined process, in IEEE 4th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India (2018), pp. 1–6 30. C. Lebeuf, A. Zagalsky, M. Foucault, M. Storey, Defining and classifying software bots: a faceted taxonomy, in IEEE/ACM 1st International Workshop on Bots in Software Engineering (BotSE), Montreal, QC, Canada (2019) pp. 1–6 31. G. Molnár, Z. Szüts, The role of chatbots in formal education, in IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), Subotica (2018), pp. 197–202 32. C. Lebeuf, M. Storey, A. Zagalsky, Software bots. IEEE Softw. 35(1), 18–23 (2018) 33. C.D. Gkikas, K.P. Theodoridis, G. Tzavella, G. Vlachopoulou, I. Kondili, M. Tzioli, Chatbot Tools Evaluatio (Research Gate Publication, 2021) 34. C.-H. Li, K. Chen, Y.-J. Chang, When there is no progress with a task oriented chatbot: a conversation agent, in ACM (2019) 35. A.M. Rahman, A.A. Mamun, A. Islam, Programming challenges of chatbot: current and future prospective, in IEEE Region 10 Humanitarian Technology Conference (R10-HTC), Dhaka (2017) 36. M. Samyuktha, M. Supriya, Automation of admission enquiry process throughchatbot—a feedback-enabled learning, in International Conference on Communication, Computing and Electronics Systems: Proceedings of ICCCES 2019, vol. 637 (Springer Nature, 2020), p. 193
Research Paper to Design and Develop an Algorithm …
413
37. A. Suresan, S.S. Mohan, M.P. Arya, V. Anjana Gangadharan, P.V. Bindu, A conversational AI chatbot in energy informatics, in Proceedings of International Conference on Intelligent Computing, Information and Control Systems (Springer, Singapore, 2021), pp. 543–554 38. J. Cerezo, J. Kubelka, R. Robbes, A. Bergel, Building an expert recommender chatbot, in IEEE/ACM 1st International Workshop on Bots in Software Engineering (BotSE), Montreal, QC, Canada (2019), pp. 59–63 39. R. Sindhgatta, A.H.M. Hofstede, A. Ghose, Resource Based Adaptive Robotic Process Automation (LNCS 12127, Springer, 2020), pp 451–466
Analysis of MRI Images to Discover Brain Tumor Detection Using CNN and VGG-16 Aravind Vasudevan and N. Preethi
Abstract Brain tumor is a malignant illness where irregular cells, excess cells and uncontrollable cells are grown inside the brain. Now-a-days Image processing plays a main role in discovery of breast cancer, lung cancer and brain tumor in initial stage. In Image processing even the smallest part of tumor is sensed and can be cured in early stage for giving the suitable treatment. Bio-medical Image processing is a rising arena it consists of many types of imaging approaches like CT scans, X-Ray and MRI. Medical image processing may be the challenging and complex field which is rising nowadays. CNN is known as convolutional neural network it used for image recognition and that is exactly intended for progression pixel data. The performance of model is measured using two different datasets which is merged as one. In this paper two models are used CNN and VGG-16 and finding the best model using their accuracy.
1 Introduction As the central nerve of the body, the brain controls many functions in the human body. It regulates emotions, vision, motor skills, respiration, reactions, and many other body functions. Analysis of a Brain tumor is very complicated and stimulating as associated with cancer from the slightly additional portion of the physique [1]. Early analysis and instant action of brain tumors confidently raise the chances of survival [2]. The presence of a tumor in the brain is more complicated as the brain is the most complex part of the human body [3]. When compared with different imaging techniques, MRI is cast-off major then it delivers good and better contrast imageries of the brain [4]. The presence of a tumor can be found by MRI. Normally in a human body, newer cells are formed which substitute the longstanding and injured A. Vasudevan · N. Preethi (B) Christ University, Bangalore, India e-mail: [email protected] A. Vasudevan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_32
415
416
A. Vasudevan and N. Preethi
cells in a skillful way. But considering the circumstance of tumors in the brain, cells with tumors get on the increase and it is unmanageable [5]. The Treatment will be depending on the phase of the tumor at the period of diagnosis. Deep learning has been used so much in the imaging study of a brain tumor [6]. In terms of the medical arena, MRI is generally used to detect and visualize the details of the interior body structure. It is principally used for spotting the changes in the physique apkins, measured to remain the improved fashion as equated to reckoned imaging [7]. According to the National Brain Tumor Society, around 70,000 people in the United States have a primary brain tumor. The most common type of brain tumor is glioma. In India, it is the tenth most frequent tumor. A tumor’s presence Magnetic Resonance Imaging [MRI] detects it scanning. The MRI scan should be diagnosed by a physician. The therapies will be determined by a physician and depending on the results to get started It is possible that this procedure will take some time. As a result, the proposed work offers a solution. A computerized system that determines whether or not the subject patient is afflicted with a brain tumor [5].
2 Literature Review MRI Images of the tumor in the brain cannot precisely mean the location of the tumor in the brain, subsequently follows out the exact position of cancerous cells inside the MRI scans pre-processing, division, structural process and removal are cast-off. This stretches the meticulous figure of the cancerous cells in that MRI scan then lastly detecting the tumor in brain using MRI images is attained [8]. MRI scanned brain tumorous images are capable since the starting us with the use of Faster R-CNN. Faster R-CNN groups AlexNet model and RPN. The planned process attained self-assured outcome in judgement to division of brain tumor recognition structure. In this work the planned manner is similarly used for dataset of stomach cancer and got improved act [9]. A main trial led on the MRI brain image shows inspirational results. By using the features inferred from CIELab color model can give reputable segmentation performance with the planned system and thus the location of an excrescence or lesion are often exactly detached from the colored image. In adding, the planned system just combines color restatement, K-means clustering and histogram clustering, so making it real and really informal to apply [10]. The trials of deliberate fashion are achieved with admiration towards the brain excrescence division delicacy with the use of 120 MR images of dissimilar cases. The scans used to test are of size 676 × 624 pixels, eight bits per color channel. Scans that we used for the purpose of testing shelter tumor of brain of interesting intensity, size and shape. In command to see the delicacy of involuntary divided excrescence part, excrescence as of all scans is divided bodily by ophthalmologist. The physically divided scans are cast-off as minced verity. The true positive proportion is that the rate of amount of correct cons (pixels that really fit in to excrescence) and entire amount of excrescence pixels within the MR scan. Untrue positive proportion is the
Analysis of MRI Images to Discover Brain Tumor Detection …
417
rate of untrue cons (pixels that do not belong to excrescence) by entire amount of non-excrescence pixels of the MR copy [11]. In scan division is used in health field for the purpose of ID of tumor in brain. MRI assists to intellect tumor present in the brain. The presented section technique determinations the multi prototypical brain examination tests (MICCAI BraTS 2013). The constructions remote now are strength changes, resident locality and feel. The lonely construction is studied and differentiated by smearing random forest method that supports to estimate diverse modules by using numerous areas. The goal of this investigate is for precise categorization of cancer cells from the healthy cells judgement to added methodology [3]. RaouiaAyachi et al. predominantly middles on partition of MRI scans of brain. In this research mainly reflect the direction matter where the aim of distinct amid infrequent and regular pixels on the elementary of several kinds of highpoints, to stay precise apparent and supremacies. Later this partition process further surely had finished the preparation process using Support Vector Machine (SVM). In the trial inspect Gliomas dataset is used and it has different sizes, picture powers, areas and different forms of tumor [12]. It’s exposed that affair produced is fairly accurate and vibrant. Delicacy attained at the conclusion trusts upon handling of each phase. Here are lot of escaping styles for each stage, henceforth the styles that bid improved results are named. At the preceding, brain excrescence bracket takes habitation. To descry brain excrescence discovery there live altered usual styles but the existing work uses the oldfashioned neural network method for the detection of brain excrescence, then the brain excrescence discovery scans trusts upon the community picture element [13].
3 Overview The dataset consists of 3146 images, which has 2 folders that is tumorous and nontumorous in tumorous 1500 images and in non-tumorous 1586 images. The dataset is of Magnetic resonance imaging (MRI). The existing work intents to advance a detecting model that identifies the brain tumor in the MRI scan of a patient. The overall process for detecting brain tumor is shown in Fig. 1. 1.
2. 3.
Input: It is supposed that the patient is perfectly fine and tolerant to do the MRI scanning according to Medic’s support. The existing work consists of patient MRI scan as input. Data Pre-processing: The workflow of Information pre-processing is shown in Fig. 2. Algorithm used: The procedures used in the suggested work are:
418
A. Vasudevan and N. Preethi
Fig. 1 The flowchart shows the general process for detecting brain tumors
Fig. 2 Data pre-processing system
3.1 Convolutional Neural Network A convolutional neural network (CNN) is a type of artificial neural network used in image recognition and handling that is precisely planned to develop pixel data. Individual layer in CNN has a diverse collection of filters like usually hundreds and thousands of it associates the output. CNN mechanically learns what value for what filter. The convolution layer reflects the fundamental constituent of a “Convolutional Neural Network”. The convolution layer parameter contains of a collection of K learnable filter (i.e., kernels). Reflect the frontward pass of the CNN [5]. The flow of CNN architecture is shown in Fig. 3. Training the Model The model was trained for 50 Epochs and these are the Loss and Accuracy plots are shown in Figs. 4 and 5.
3.2 Vgg-16 The Visual Geometry Group (at Oxford University) developed VGGNet. In the classification task, this architecture came in first runner-up to GoogleNet. VGG-16 may be a CNN that’s 16 layers profound. You can content a pre-trained kind of the net skilled on quite 1,000,000 images from the ImageNet database. The pretrained network can categorize images into 1000 object classifications, like pencil, keyboard and mouse and lots of creatures. The VGG network remained presented by Simonyi and Zisserman in this 2014 paper, “Veritably deep convolutional network for large
Analysis of MRI Images to Discover Brain Tumor Detection …
Fig. 3 The flowgraph of CNN
Fig. 4 CNN loss table
419
420
A. Vasudevan and N. Preethi
Fig. 5 CNN accuracy table
Fig. 6 VGG-16 architecture
scale image recognition”. Fig. 6, I have shown the architecture of VGG-16 which was taken from the implementation. The VGG family of CNN can be branded by binary crucial facts. In the below 1. 2.
Wholly complication coats in the network makes use of solitary 3 × 3 Filters. Mounding numerous complication RELU s groups (the sum of successive complication coats RELU layers typically rises the profounder we go) beforehand relating a pool procedure. In the offered work, we obligate used the transferal. Educating Fashion, rather we’ve erected the VGG-16. Armature and completed the required alterations in the armature to take the enhanced delicacy.
Training the model The model was trained for 50 Epochs and these are the Loss and Accuracy plots are shown in Figs. 7 and 8.
Analysis of MRI Images to Discover Brain Tumor Detection …
421
Fig. 7 VGG-16 loss table
Fig. 8 VGG-16 accuracy table
4 Result So far, In the built two models that are CNN and VGG-16. Epochs are stopped at 50 for both model, for CNN the accuracy of 94.3%, and for the VGG-16 model the accuracy of 97.2%. The metrics of the two models are mentioned in Table 1.
422 Table 1 Metrics wise evaluation
A. Vasudevan and N. Preethi Parameter
CNN
VGG-16
No of images used
3146
3146
Epochs carried out
50
50
Accuracy
0.943
0.972
5 Conclusion In this proposed work the built two models that is CNN and VGG-16 both have been giving effective results but according to Accuracy VGG-16 is more effective. Hence, The conclusion is VGG-16 is the best deep learning model in these two models for Brain tumor Recognition. In the forthcoming, the enactment of this CNNgrounded CAD structure can be farther improved by directing farther exploration and discovering other deep networks, deviations of CNN, point charts besides Addition ways.
References 1. A. Naseer, T. Yasir, A. Azhar, T. Shakeel, K. Zafar, Computer-aided brain tumor diagnosis: performance evaluation of deep learner CNN using augmented brain MRI (2021) 2. G. Hemanth, M. Janardhan, L. Sujihelen, Design and implementing brain tumor detection using machine learning approach (2019) 3. G.S. Rao, D. Vydeki, Brain tumor detection approaches: a review 4. P.T. Gamage, Identification of brain tumor using image processing techniques (2017) 5. S. Grampurohit, V. Shalavadi, V.R. Dhotargavi, M. Kudari, S. Jolad, Brain tumor detection using deep learning models (2020) 6. S. Deepak, P.M. Ameer, Brain tumor classification using deep CNN features via transfer learning (2019) 7. S.S. Hunnur, A. Raut, S. Kulkarni, Implementation of image processing for detection of brain tumors 8. P. Natarajan, N. Krishnan, N.S. Kenkre, S. Nancy, B.P. Singh, Tumor detection using threshold operation in MRI brain images 9. R. Ezhilarasi, P. Varalakshmi, Tumor detection in the brain using faster R-CNN 10. J. Amina, M. Sharif, M. Yasmina, S.L. Fernandes, A distinctive approach in brain tumor detection and classification using MRI 11. M.U. Akram, A. Usman, Computer aided system for brain tumor detection and segmentation 12. P.M. Shakeel, T.E.E. Tobely, H. Al-Feel, G. Manogaran, S. Baskar, Neural network based brain tumor detection using wireless infrared imaging sensor 13. P.B. Kanade, P.P. Gumaste, Brain tumor detection using MRI images
Missing Data Recovery Using Tensor Completion-Based Models for IoT-Based Air Quality Monitoring System Govind P. Gupta and Hrishikesh Khandare
Abstract In IoT-based air quality monitoring system, a set of IoT devices are deployed for sensing of the air quality data at different junction of a smart city. These deployed IoT devices periodically forward the sensed data to the base station for further processing and analytics. Missing of Air Quality Index (AQI) data is very challenging issues in real-time monitoring of AQI in a smart city due to failure of IoT devices, data corruption in the wireless transmission, malfunction of sensors etc. Missing data recovery is a very fundamental issue with real-time IoT-based AQI monitoring system. To solve the missing data recovery problem, this paper has used tensor complete based data recovery models such as Bayesian Gaussian Canonical Polyadic (BGCP) decomposition, Bayesian Augmented Tensor Factorization (BATF) and High accuracy Low Rank Tensor Completion (HaLRTC) to recovery the AQI missing data. Performance analysis of the tensor complete based data recovery models is evaluated using real time AQI dataset in terms of Root Mean Square Error and Mean Absolute Percentage Error.
1 Introduction In IoT-based air quality monitoring system, a set of IoT devices are deployed over different junction points in a smart city. Each IoT device periodically sense and forward the AQI parameters to the nearby base station (BS) using wireless communication. To obtain high precision and accurate AQI information from these systems, rich and complete data is needed [1–4]. However, due to various factors such as noise or interference in the environment, loss of data during transportation from source to destination, malfunction of intermediate systems or malfunctioning of IoT devices itself, missing data arises in the observed AQI dataset at base station. Thus, missing data recovery issue is a fundamental challenge to impute the missing entries for accurate data analytics [1–4]. G. P. Gupta (B) · H. Khandare Department of Information Technology, National Institute of Technology, Raipur, Chhattisgarh 492010, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_33
423
424
G. P. Gupta and H. Khandare
There are various studies [3–10] available in literature to recover the missing data. Most of the known techniques are modeled using either pure temporal or pure spatial correlation data. To utilize the both spatial and temporal correlation to recover the missing data, recently researchers have focused on matrix completion and tensor completion-based techniques. Matrix completion-based techniques are very useful if missing ratio is very low. However, tensor completion-based techniques are very efficient if missing ratio in the observed dataset is very high. Thus, in this research work, we have focused of tensor completion-based data recovery techniques to impute the missing data in the observed readings of AQI at base station of IoT-based AQI monitoring system. Various techniques are available for data recovery of the missing entries in big dataset. For example, recovering the missing values based on usage of nearest neighbor techniques or based on mean of several values can be done. However, these techniques prove to be inefficient for spatio-temporal data or multidimensional data. Spatio-temporal means related to both time and space i.e., it varies both with respect to space as well as time. Also, traditional techniques can work well for less amount of missing data like less than 5% missing. However, when missing data percentage is increased beyond these, accuracy of recovered data is very low. Thus, traditional techniques for data recovery cannot be used on this type of big data since it will only worsen the problem rather than solving them. To recover the sparse missing data problem, tensor completion approach can be very useful. Hence, this paper tries to evaluate the performance of some existing algorithms based on tensor completion approaches on an air quality index data set. Air quality depends on multiple factors such as PM 2.5, PM 10.0, PM 1.0, NO2 , SO2 , CO, NH3 , etc. Each of these readings are captured by variety of sensors. For this research work, we have considered only readings from the PM 2.5, PM 10 and PM 1.0 sensors. PM represents amount of particulate matter i.e., small particles present in the air. The number 2.5, 10.0 and 1.0 represents micron. For example, PM 2.5 stands for particulate matter present in air which are smaller than 2.5 µm in width. High amount of PM 2.5, PM 10.0 and PM 1.0 in air can cause many health hazards. This paper has used three tensor completion-based schemes such as Bayesian Gaussian CP decomposition (BGCP), Bayesian Augmented Tensor Factorization (BATF) and High accuracy Low Rank Tensor Completion (HaLRTC). These algorithms will take advantage of the spatio-temporal correlation and will impute the missing data. Later, we will evaluate their performance based on Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). This paper is organized as follows. Section 2 discuss related work. In Sect. 3, missing data problem is presented. Section 4 discuss tensor completion-based missing data recovery schemes. In Sect. 5, result analysis and discussion is presented. Section 6 finally conclude the paper.
Missing Data Recovery Using Tensor Completion-Based Models …
425
2 Related Work This section explores the related works on missing data imputation and discuss their limitations. In Alsaber et al. [1], authors have used statistical modeling techniques for imputing the missing data in air quality dataset. The main limitation of this work is its poor accuracy, and it does not exploit spatial–temporal correlations. In Rivera-Muñoz et al. [3], matrix factorization based missing data imputation method is proposed which is evaluated using air quality observations. This method also suffers from high error. Yang et al. [4] have proposed a federated learning based soft impute scheme for recovering the missing values in a multidimensional dataset. This scheme is tested used real work dataset. In Chen et al. [5], tensor-train rank method is proposed for modelling of low-rank tensor completion to capture latent features of tensor data for imputing the missing values. The main limitation of this method is its computational complexity which is not suitable for low-power IoT devices. Ahmed et al. have proposed a tensor completion scheme which considered spatiotemporal correlation of the observed dataset for imputing the urban traffic data. In [7], Chen et al. have extended the probabilistic matrix factorization. The proposed model is called Bayesian Gaussian CANDECOMP/PARAFAC. This work considered tensors of high rank and used the traffic data collected from Guangzhou, China for imputation of missing value. It is an example of spatio-temporal data set collected over a period of 9 weeks. In this paper, the authors have compared their proposed work with some other methods and 3 types of data representations i.e. matrix representation, 3rd order tensor and 4th order tensor. The missing data percentage have been varied in the range of 10–50% for all algorithms. Random missing as well as fiber missing data are used. BGCP is found to offer ample amount of performance in both the scenarios. It is also found that this model worked best with third order tensor in this scenario. In Chen et al. [8], the author has proposed Bayesian Augmented Tensor Factorization model. The data set used is freely available speed data in Guangzhou, China. This method claims to have better imputation capabilities than Bayesian tensor factorization. The missing scenarios considered are both random and non-random missing scenario for 10%, 30% and 50% missing each respectively. The RMSE and MAPE values of BATF are compared with other techniques such as BGCP, BCPF, STD. Under random missing scenario, BATF is better as compared to STD and tends to have lesser effect with spike in missing rate. BATF and BGCP have a slight advantage over BCPF. In non-random missing, BATF is still better than other two techniques. Hence, the author claims that, the proposed BATF method is having an ability to predict values by exploiting spatio-temporal correlation. In [9], the author has proposed an LRTC-TNN technique which stands for Low Rank Tensor Completion-Truncated Nuclear Norm. These techniques have been applied on traffic-based data for the format location x day x time of the day. They have introduced a universal rate parameter to control the degree of truncation on all tensor modes in the proposed LRTC-TNN model. Experiments have been performed over 4 spatio-temporal traffic data sets. High random missing rates i.e., 50, 60 and
426
G. P. Gupta and H. Khandare
70% have been used in this study in both random missing and non-random missing scenarios for all four data sets. Throughout the experiments, LRTC-TNN performs better than BGCP, BTMF and HaLRTC.
3 The Missing Data Problem Many times, during transmission of data from sender to receiver, the observed data packet is lost due to problems in the underlying infrastructure, noise in medium or interference that can lead to loss of data or corruption in dataset. In addition, sometimes malfunction of IoT hardware like sensors can also lead to missing data. To handle this situation, multiple redundant sensors are deployed in proximity of each other. Now, it is unlikely, that at the same time all of them start malfunctioning. To handle missing data in these scenarios, we need to deploy a robust data recovery model in IoT-based air quality monitoring system. In this paper, we have considered three typed of missing data values such as random missing, non-random missing and blackout missing scenario whose description is given as follows.
3.1 Random Missing (RM) In random missing scenario, there are many IoT-based sensors deployed in an area of smart city. Any one of these sensors can lose data at any random time interval. This will lead to random missing data. No specific pattern can be found in this kind of missing values. Figure 1 illustrates random missing scenario for PM 2.5 attribute of AQI dataset.
Fig. 1 Random missing scenario for PM 2.5 attribute
Missing Data Recovery Using Tensor Completion-Based Models …
427
3.2 Non-random Missing (NM) In non-random missing scenario, a IoT-based sensor may lose data during specific days or time. In other words, a pattern can be identified in the time at which the values are truncated by a sensor. Figure 2 illustrates non-random missing scenario for PM 2.5 attribute of AQI dataset.
4 Tensor Completion-Based Missing Data Recovery Scheme In the IoT-based air quality monitoring system, edge node is responsible of storing the observed data that are coming from different IoT devices equipped with AQI sensors. At the edge node, a data recovery module is deployed for imputing the missing data. This paper focus on the design of data recovery module using tensor completion techniques. The purpose of tensor completion technique is to retrieve a low-rank tensor from sparse dataset. It is a logical higher-order version of matrix completion [4–6]. Figure 3 shows the representation of the times series data collected from different sensors as a tensor. In literature, various tensor completion-based techniques is given for exploiting the spatiotemporal correlation of the dataset. Time series data collected from IoT-based sensor networks deployed for AQI monitoring are often deployed in large-scales. These collected time series data at edge node is incomplete with considerable corruption and missing values that are making it difficult to perform accurate prediction. There are many types of missing data such as random missing data, nonrandom missing data, etc. This paper focus on to recover the missing data using well known tensor decomposition models such as such as Bayesian Gaussian CP
Fig. 2 Non-random missing values
428
G. P. Gupta and H. Khandare
Fig. 3 Arrangement of data into tensor
decomposition (BGCP) [7], Bayesian Augmented Tensor Factorization (BATF) [8] and High accuracy Low Rank Tensor Completion (HaLRTC) [9]. A brief description of these tensor completion techniques is presented as follows. (a)
Bayesian Gaussian CP decomposition (BGCP): This is tensor decomposition model which is used in this work to fill in missing values in a tensor represented dataset. This method works in three phase such as the first part Canonical Polyadic (CP) decomposition, second Gaussian assumptions and third is Bayesian model. Description of each phase is discussed as follows: • CP decomposition: This is also known as tensor rank decomposition. It’s a tensorization of the matrix singular value decomposition (SVD) that’s used in statistics, signal processing, computer vision, psychometrics, and chemometrics. This decomposition uses the following formula for the tensor rank decomposition, given by Alfred Hitchcock [7]. The formula is given as follows: Yˆ = yˆi jt =
r s=1 r
us ◦ v s ◦ x s , u is v js xts , ∀(i, j, t),
(1)
s=1
Here, y is a third-order tensor (y ∈ Rm×r×f ). Vectors u s ∈ Rm , vs ∈ Rn , xs ∈ Rf are columns of factor matrices. u ∈ Rm×r , v ∈ Rn×r , x ∈ Rf×r , respectively. The symbol ◦ denotes vector outer product. • Gaussian Assumption: Given a third-order tensor y ∈ Rm×n×f which suffers from missing values, then the factorization can be applied to reconstruct the missing values within y by using following formula [7]:
Missing Data Recovery Using Tensor Completion-Based Models …
yi jt ∼ N
r
429
u is v js xts , τ
−1
, ∀(i, j, t)
(2)
s=1
Here, vectors u s ∈ Rm , vs ∈ Rn , xs ∈ Rf are columns of latent factor matrices. u is , v js , xts are their elements. • Bayesian inference model: This is used for learning the various parameters used Eq. (2). Based on the Gaussian assumption over tensor elements yijt , (i, j, t) ∈ (where is a index set indicating observed tensor elements), the conjugate priors of model parameters (i.e., latent factors and precision term) and hyperparameters are used to calculate the values of u i , v j , xt that are given as follows [7]: ui ∼ N µu , −1 u , ∀i v j ∼ N µv , −1 v , ∀j x t ∼ N µx , −1 x , ∀t τ ∼ Gamma(a0 , b0 ) µu ∼ N µ0 , (β0 u )−1 , u ∼ W(W0 , ν0 ) µv ∼ N µ0 , (β0 v )−1 , v ∼ W(W0 , ν0 ) µx ∼ N µ0 , (β0 x )−1 , x ∼ W(W0 , ν0 ) (b)
(c)
(3)
Bayesian Augmented Tensor Factorization (BATF) [8]: This model uses augmented tensor factorization techniques for decomposition of the 3-order tensor [8]. It uses low-rank tensor structure by folding data along day dimensions. The working of this model contains four phases CP decomposition, vector combination, tensor unfolding and Gibbs’s sampling. In tensor unfolding, tensor is folded into matrix and matrix into tensor. In Gibbs Sampling, when direct sampling is problematic, Gibb’s sampling is a Markov chain Monte Carlo (MCMC) approach for getting a sequence of observations that are estimated from a defined multivariate probability distribution. In this scheme, fully Bayesian inference technique is used to estimate the model parameters to estimate the value of the missing entries [8]. High-accuracy Low-Rank Tensor Completion (HaLRTC) [9]: This technique uses tensor norm approach and formulated the tensor completion problem as a convex optimization problem [9]. The key of HaLRTC is to provide an additional temporal dimension to the original multivariate time series matrix, allowing us to describe the intrinsic rhythms and seasonality of time series as global patterns. This scheme turns the time series prediction and missing data imputation issues into a universal low-rank tensor completion problem using the tensor structure. In HaLRTC, a unique autoregressive norm is used on the original matrix representation into the objective function in addition to minimizing tensor rank. Both components have distinct functions [9].
430
G. P. Gupta and H. Khandare
5 Performance Analysis and Discussion This section first presents a brief description of the air quality dataset used in testing the data recovery model for imputing the missing values. Next, performance comparison of the models using different scenarios.
5.1 Description of Air Quality Dataset In this work, real-time air quality sensor’s dataset is used which is taken from ‘Purple Air’ [11]. AQI dataset is taken of the Delhi region, IIT Delhi campus. There are various AQI parameters provided by purple air like temperature, humidity, PM 2.5, PM 10.0, PM 1.0 etc. These dataset is collected using five sensors that are deployed within IIT Delhi Campus names SAMOSA_0024, SAMOSA_0032, SAMOSA_0035, SAMOSA_0069, SAMOSA_0138 and selected PM 2.5 (ug\m3 ), PM 10.0 (ug\m3 ) and PM 1.0 (ug\m3 ) data from these sensors which gives the hourly readings. For this research work, we have selected PM 2.5, PM 10.0 and PM 1.0. We have selected dataset of the five sensors during the period of 2 September 2021–20 October 2021 for experiments. We have created tensor for each attribute like PM 2.5, PM 10 and PM 1.0. The created tensor representation contains observations of the five sensors in the given time (minute wise reading) [12–15]. Figure 4 describe the time series plotting of PM 2.5 and date, the PM 10.0 and PM 1.0 datasets are also like PM 2.5 data except the readings and similar plotting can be obtained for those data sets too. For performance analysis, two metrics are used such Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). Figure 5 describe the sensor location in IIT Delhi Campus [11] are shown below.
Fig. 4 Plot between PM 2.5 (ug/m3 ) and the date
Missing Data Recovery Using Tensor Completion-Based Models …
431
Fig. 5 Location pins showing PM 2.5, PM 1.0 and PM 10 sensors on the map are from IIT Delhi Campus
5.2 Result Analysis Figures 6 and 7 illustrates the performance of BGCP, BATF and HaLRTC on the given dataset for PM 2.5 sensor data and shows the result in terms of MAPE and RMSE at different missing ratio. The missing ratio in which the test is conducted are vary from 20 to 60% in random missing (RM) and it is tested for 20, 30, 40 and 60% for non-random missing (NRM) scenario. From Fig. 6, it clearly shows that for all three algorithms in case of RM, the lower the missing ratio the performance of model is better in terms of MAPE and RMSE. However, for NRM, all three algorithms are showing better performance in terms of both MAPE and RMSE with increase in missing data percentage.
0.14
8 BGCP
7
BGCP 0.12
BATF
6
HaLRTC
HaLRTC
0.1
MAPE
5
RMSE
BATF
4 3
0.08 0.06 0.04
2
0.02
1 0
0 20
30
40
50
60
% of the random missing entries in dataset
20
30
40
50
60
% of random missing in the dataset
Fig. 6 Results showing comparison in term of RMSE and MAPE for PM 2.5 AQI attributes for random missing scenario
432
G. P. Gupta and H. Khandare BGCP
BGCP BATF HaLRTC
BATF HaLRTC
40
0.9
35
0.8 0.7
30
0.6
MAPE
RMSE
25 20 15
0.5 0.4 0.3
10
0.2
5
0.1
0
0 20
30
40
60
20
% of non-random missing entries in the dataset
30
40
60
% on non-random missing entries in the dataset
Fig. 7 Results showing comparison in term of RMSE and MAPE for PM 2.5 AQI attributes for non-random missing scenario BGCP
BGCP
BATF
BATF HaLRTC
HaLRTC
70
0.5 0.45
60
0.4 0.35
40
MAPE
RMSE
50
30
0.3 0.25 0.2 0.15
20
0.1
10
0.05
0
0 20
30
40
50
60
% of random missing entries in the dataset
20
30
40
50
60
% of random missing entries in the data set
Fig. 8 Results showing comparison in term of RMSE and MAPE for PM 10 AQI attributes for random missing scenario
Missing Data Recovery Using Tensor Completion-Based Models …
433
Figure 8 represents the comparison between the performances of BATF, BGCP and HaLRTC algorithms in terms of MAPE and RMSE for PM 10 sensor readings data by varying the % of random missing ratio from 20 to 60. From Fig. 8, it can be observed that for BGCP algorithm, the RMSE and MAPE values tend to increase alongside increase in the random missing rate. However, the RMSE value is less for 20% RM then increases suddenly for 30% random missing and later keeps on decreasing until 60% random missing rate. No such pattern can be found in the behavior of the MAPE values in the case of BGCP algorithm. For BATF algorithm, MAPE and RMSE values increase with increase in missing rate for RM. Whereas, the values for MAPE and RMSE increase with increase in missing rate until 50% missing rate and 60% again depicts a decline in both error measures. For HaLRTC, the error values increase for increase in random missing rate and for the non-random missing, the RMSE increases with increase in missing while a MAPE decreases for values till 40% missing rates, but fluctuate for 60% i.e. slightly increases.
6 Conclusion This paper has presented a missing data recovery process using tensor completionbased techniques for IoT-based AQI monitoring system. For data imputation, three well known tensor completion techniques such as BGCP, BATF and HaLRTC algorithms is employed in the proposed framework. The performance of the proposed missing data recovery framework is evaluated using real-time AQI dataset in terms of RMSE and MAPE in two different scenario such as random and non-random. The RMSE for non-random missing scenario are not very great for all the 3 algorithms and for all two types of sensor readings as far as the air quality data set is concerned when compared to the random missing case. However, it is observed from the results that, the error values are initially high and are tending to decrease with increasing missing rate or increase to a point and then start to descend. Also, the BATF algorithm tends to perform better in terms of random missing and non-random missing both than the BGCP and HaLRTC algorithms in most of the cases. The difference between random missing errors is not considerably less in BGCP and BATF, but the difference is more in terms of non-random missing. Acknowledgements This work was supported in part by the Mathematical Research Impact Centric Support (MATRICS) project funded by the Science and Engineering Research Board (SERB), India (Reference No. MTR/2019/001285).
References 1. A.R. Alsaber, J. Pan, A. Al-Hurban, Handling complex missing data using random forest approach for an air quality monitoring dataset: a case study of Kuwait environmental data (2012–2018). Int. J. Environ. Res. Public Health 18(3), 1333 (2021)
434
G. P. Gupta and H. Khandare
2. U.P. Chinchole, S. Raut, Federated learning for estimating air quality, in 12th International Conference on Computing Communication and Networking Technologies (ICCCNT) (IEEE, 2021), pp. 1–7 3. L.M. Rivera-Muñoz, J.D. Gallego-Villada, A.F. Giraldo-Forero, J.D. Martinez-Vargas, Missing data estimation in a low-cost sensor network for measuring air quality: a case study in Aburrá Valley. Water Air Soil Pollut. 232(10), 1–15 (2021) 4. J. Yang, C. Fu, H. Lu, Optimized and federated soft-impute for privacy-preserving tensor completion in cyber-physical-social systems. Inf. Sci. 564, 103–123 (2021) 5. C. Chen, Z.-B. Wu, Z.-T. Chen, Z.-B. Zheng, X.-J. Zhang, Auto-weighted robust low-rank tensor completion via tensor-train. Inf. Sci. 567, 100–115 (2021) 6. A.B. Said, A. Erradi, Spatiotemporal tensor completion for improved urban traffic imputation. IEEE Trans. Intell. Transport. Syst. (2021) 7. X. Chen, Z. He, L. Sun, A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Trans. Res. Part C: Emerg. Technol. 98, 73–84 (2019) 8. X. Chen, Z. He, Y. Chen, Y. Lu, J. Wang, Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transp. Res. Part C: Emerg. Technol. 104, 66–77 (2019) 9. J. Liu, P. Musialski, P. Wonka, J. Ye, Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 208–220 (2012) 10. X. Chen, J. Yang, L. Sun, A nonconvex low-rank tensor completion model for spatiotemporal traffic data imputation. Transp. Res. Part C: Emerg. Technol. 117, 102673 (2020) 11. Air Quality Dataset. https://www2.purpleair.com/ 12. S.K. Khadka, S. Shakya, Imputing block of missing data using deep autoencoder. In International Conference on Mobile Computing and Sustainable Informatics (Springer, Cham, 2020), pp. 697–707 13. G.P. Gupta, B. Saha, Load balanced clustering scheme using hybrid metaheuristic technique for mobile sink based wireless sensor networks. J. Ambient Intell. Human Comput. (2020). https://doi.org/10.1007/s12652-020-01909-z 14. D. Liu, Y. Zhang, W. Wang, K. Dev, S.A. Khowaja, Flexible data ıntegrity checking with original data recovery in IoT-enabled maritime transportation systems. IEEE Trans. Intell. Transp. Syst. (2021) 15. G.S.W. Hagler, R. Williams, V. Papapostolou, A. Polidori, Air quality sensors and data adjustment algorithms: when is it no longer a measurement? 5530–5531 (2018)
Stock Market Prediction Through a Chatbot: A Human-Centered AI Approach Anoushka Halder, Aayush Saxena, and S. Priya
Abstract Accurate prediction of stock market prices is a very challenging task due to the volatile and non-linear nature of financial stock markets. A furthermore difficult job is to present these predictions and insights to the users in a human-centric and user-friendly approach. In this paper, a deep learning to predict the closing value of stocks and attempt to respond to the related questions asked by users via chatbot. Long short-term memory (LSTM) model was used in predicting stock market prices based on the data provided by Yahoo Finance, these insights and information is then served to investors through a chatbot. The aim of this paper is to provide full insights in a graphical and an easy to understand manner so that people with no experience in both information technology and financial world can interpret the insights provided by the model. Finally, the results are tested and the chatbot is trained using the wizard of Oz experiment to ensure user satisfaction. The proposed method fetched a MAPE value of 2.38 and the chatbot response was recorded using the aforementioned method.
1 Introduction Detecting changes in the stock market price trend has always been an important and the most discussed problem in the financial sector. The price of a stock determines and affects a lot of factors including the growth of the company, industry prospects, financial relations with other industries or companies, and the flow of the stock market.
A. Halder · A. Saxena · S. Priya (B) SRM Institute of Science and Technology, Kattankulathur, India e-mail: [email protected] A. Halder e-mail: [email protected] A. Saxena e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_34
435
436
A. Halder et al.
Thus, predicting the price of a stock is considered important and it has been given a lot of emphasis by researchers around the globe. Many different methods have been introduced that predict the price or the trend of a particular stock or industry. These methods can be classified into mainly two categories, the fundamental analysis method and the technical analysis method. The fundamental analysis method emphasizes on calculating the stock price trend by using parameters like exchange rate, interest rate, inflation, international or domestic relations of the company, and political factors. While the technical analysis takes into account the current price, trading volume, and the market sentiment. This method basically captures the trajectory of the stock and uses the K-line chart method and some other tools to predict the upcoming trends [1, 2]. There are some questions regarding the accuracy of the traditional fundamental analysis method as the prediction results depend upon the quality and the experience of the analyst. A machine learning approach was introduced to get better results and accurate predictions. Some researchers use the power of statistics and probability theory to predict the value of a stock, using a linear forecasting model, they can predict prices in a short-term from a large data set containing the previous stock performance [3]. Using neural networks has become the favourite method to predict stock prices in the long or short term, using an ARIMA model [4] or a GARCH model combination with neural networks fetched accurate results [5–7]. Currently, there is no user interface that interacts with the user and creates a friendly environment that engulfs the user into a one-on-one conversation to get stock price prediction and insights. There are some chatbots available on stock investment and banking websites that focus on giving the user information regarding the stock and the company, but none of them predict the price of the stock to help the user understand when he can invest. Thus, when a person wants to start learning about how and when to invest there is no interactive platform to make it easy. This paper proposes a method that involves RNN-LSTM [8] to obtain an accurate stock market prediction. It predicts the stock closing price of the next day and is able to plot an accurate stock price trend graph. Using the data obtained from the model a chatbot was created that arranges the data into a meticulously created table. This table gave the user insights into the highs and lows of the particular stock, as requested by the user. An LSTM model was used as it addresses the common issues posed by classical RNN models like the vanishing gradient problem and the error-back propagation, it also has a less training time than the other complex RNN algorithms, thus making it better for our purpose [9]. The LSTM model uses realtime stock details obtained from the Yahoo Finance API of the particular stock that was requested by the user using the chatbot interface. The Google Stock price was used as the price range to be used in this experiment has numerous numbers of peaks and lows, thus comparison between the predicted price and actual price would be critical.
Stock Market Prediction Through a Chatbot…
437
2 Related Work Currently, the financial market is surrounded by a lot of noise and unwanted news, this makes it seem like an unpredictable tottering structure to a novice. There exist two main kinds of methods to predict the stock price: the traditional analysis method and the machine learning method [10]. The market mentality and the growth of the particular company or industry plays a very important role on the price of the stock. Thus using neural networks has been the trend in recent years to predict the price of a stock, as they can handle enormous amounts of raw data and extract important data features required to predict price movements without relying on previous knowledge. H. White was one of the first people to use artificial neural networks to predict the price of the IBM stock, alas the results were not as expected, with predicting inaccurate values [11]. A different approach used neural networks and the ARIMA model to predict prices [4]. The results showed that the neural networks were better at forecasting non-linear data, still, there was a problem of low accuracy that needed attention as the trend detection was not up to the mark [12]. For many years all different methods were proposed that increased the prediction accuracy slightly. In 2020, a method combined the use of CNN, MLP and LSTM to predict the stock prices of four major US-based companies. The experimental results determined that these 3 methods combined fetched better and more accurate predictions of stock price compared to other studies that predicted the same price trends [13]. It was after 2020 that many researchers provided accurate price prediction and precise change of trends using neural networks. There was still a gap in technology that did not cater to a user with a simplified interface. Rin Tohsaka, a discord bot was made by Axelsson for community management, it was successful at creating an actively growing and interactive community within discord. It was one of the first successful community bots that catered to several users at once, being a basic bot with no intelligence [14]. Keyner in 2019 made a chatbot that had 250 handcrafted messages and used the RASA framework [15], it had the ability to search multiple openly available online datasets and serve the information to the user, it still did not have the ability to predict using the data acquired [16, 17]. In 2021, researchers were successfully able to create a chatbot that catered to users in the e-commerce field. It used least squares regression to engage users in a friendly manner to gain their trust and serve information. It still had some drawbacks that were exposed when the users switched to a professional question, the change in tone of the chatbot was rather disturbing, thus not delivering an intuitive user experience [18]. According to all the work that has already been submitted, there exist methods that predict accurate stock prices and chatbots that make the user experience friendly and serve demanded information. Still there is no method that combines both fields to produce one chatbot that gives the user a better experience when they want to predict prices or get information about their favourite stocks. Even if there are enough chatbots that possess the ability to provide financial and stock market information to the user, none of them has the ability to predict the prices of a given stock, let alone give the user a friendly and intuitive experience too.
438
A. Halder et al.
3 Proposed Approach Due to a lack of such user-friendly platforms to assist investors and give accurate predictions, a chatbot to help users in their investment journey by predicting future closing prices of requested stock is proposed. Our objective is to predict the Stock closing price using real-time data and to implement using a chatbot, which will serve the predicted information on a platter to the user, to generate an illusion that the user is interacting with a stock market analyst.
3.1 Pre-processing The training data is imported as CSV which was obtained using Yahoo finance API, and the max duration of stock data is imported. The Google stock close price data is used to predict the upcoming trend. Figure 1 shows a snippet of the data and Fig. 2 shows the scatter plot.
Fig. 1 Snippet of data
Fig. 2 Scatter plot of Google stock closing prices
Stock Market Prediction Through a Chatbot…
439
3.2 Feature Scaling and Reshaping The next step is to scale the stock prices between (0, 1) to avoid intensive computation. General methods would be Normalization and Standardization. In the proposed method Normalization performed with a Sigmoid function as the output layer. Using the sigmoid function generates an s-shaped curve which makes the trained model more useful as it makes the input data non-linear thus essentially making the model capable of adapting to non-linear changes. Normalization of data: xnorm = x − min(x)/ max(x) − min(x)
(1)
It is observed that taking 60 days of past data gives the best results in terms of predicting future stock prices. Here, 60 days meant 3 months i.e. each month with 20 days of data. A special data structure to cover 60-time stamps is required, based on which RNN will predict the 61st price. As discussed, the stock close price was considered to predict the trend, thus essentially stating that only one indicator or feature is used in the data structure.
3.3 Building Model Now a NN regressor is built to ensure continuous value prediction using LSTM. After that the LSTM layers are added, using 50 units i.e. 50 neurons per LSTM layer to capture the trends in the stock price. Finally, after adding the LSTM layers, an output layer is added to predict 1 price each time.
3.4 Training Model The RNN model is compiled with an SGD algorithm and a loss function. Using the SGD algorithm gives us the potential to increase the randomness and also reduce the computation load. For the optimizer, Adam is used. The RNN is incorporated by ensuring that the weights are updated after every 32 stock prices with a batch size of 32. After executing the training, the model started with a loss of 0.062 to a loss of 0.0015 at epoch 100.
440
A. Halder et al.
3.5 Model Evaluation The test data was imported and the same pre-processing steps were performed as earlier mentioned. For creating the input for prediction, the index was starting from the date 60 days before the first date in the test dataset. Then, the data was reshaped to only one column and the training set scale set was used to scale the test inputs. After creating the data structure, our test data was ready for prediction. Finally, the acquired prediction was plotted and visualized to review and compare the accuracy. The following figure shows the difference between the predicted and actual values. It is to be noted that the model can accurately predict the trends of the stock price when the change is linear and accurately follows the trend too, but has some difficulty in predicting the precise spikes when the price prepares for a trend change. Furthermore, it does detect the trend change and does give some insight on what path the price will go on. A MAPE (Mean Absolute Percentage Error) value [19] of 2.38 was achieved by the proposed model, which was pretty accurate and better compared to previous models, thus helping us achieve a better grip on the prediction values to serve using our chatbot (Fig. 3).
4 Building the Chatbot Our primary aim is to make sure that the insights obtained using the model was delivered in easy language to our investors. Before building the chatbot, our aim was to conduct a Wizard of Oz experiment and understand the user’s mind before they
Fig. 3 Prediction: real prices versus predicted prices
Stock Market Prediction Through a Chatbot…
441
invest in any stock. Previously research on the kinds of chatbot being used by popular investment platforms revealed chatbots like “AskMotabhai”. A new chatbot named “AskMotaBhai” was released by a popular stock exchange in India, the Bombay Stock Exchange. It was in partnership with companies like Microsoft. The chatbot gave insights on stock details and the latest news about the stock. Similarly, a lot of chatbots are being used in multiple investment and banking platforms, however, these chatbots only aim at providing investors with basic stock and market information. Currently, when a budding investor wants to enter the stock market, he/she/they do not have a support system to give tips on when to buy or sell a particular stock, thus there is a lack of such user-friendly platforms to assist them.
4.1 The Chatbot Under Development This paper proposes the working on an AI-based chatbot modelled on a Stock Market advisor. The primary purpose of this chatbot is to make the users feel comfortable expressing their concerns about where to invest thus helping them in more ideal choices. In matters of money, it is very important to gain trust from our users and in order to achieve that one must ensure that the user believes the efficiency of the bot and there is complete transparency about the predictions of the bot. Furthermore, the need for the bot to have an advisory and friendly behaviour and give responses based on the user’s answers would make the interface more intuitive.
4.2 Preparation for WoZ Study The conversations took place in a platform called Discord, most popularly used for communities or people with similar interests such as gaming, finance, etc. Three different chatbot personalities namely Bruno, Tom and Jerry were created. Using gender-neutral names ensured that the users don’t feel biased from their personal experiences in case they had any. The personalities are explained in Fig. 4.
Fig. 4 Chatbot personalities
442
A. Halder et al.
Our primary aim was to understand which kind of bot personality would they trust the most and have a friendly experience with. Finally, a collection on the types of questions investors usually ask was prepared to help in the creation of our chatbot.
4.3 Participants The experiment had a total of 5 participants carefully picked from different backgrounds to ensure the diversity of our target audience for our bot. 2 of them belonged from the age category of 20–25 coming who have barely invested anywhere and have just started earning. 1 of them is a chartered accountant (42-year-old) hence coming from a financial background but with zero experience with the Stock market. 2 of our participants (35- and 46-year-old respectively) have been investing for 5+ years and have invested across multiple investment platforms. All participants were fluent in English and Hindi as well as have had experience with Discord. They were not given any information on the identity of the bot since the aim of the experiment was to find out with which of the bot personalities they felt they were interacting with a real person. The strength of our study is that this demographic covers majority of the users investing and are the target audience of most of the investment platforms. The limitations are that there is a possibility of extreme opinions in case of the risk factors i.e., how much money they are willing to invest and can afford to lose at the same time.
4.4 The Wizard The wizard was a professional working in the customer service department of a well-known investment platform with 8+ years of experience. He is 37-year-old and has done his masters in Economics. He spoke fluent English and Hindi. He was well experienced with Discord and has had some previous experience with user research and case studies before.
4.5 Documentation Below is the collection of all 15 chat sessions with the users. Some of the snippets of the conversations can be seen below. Figure 5 depicts a conversation with Bruno who is a typical chatbot, like how most investment platforms have added in their application to assist the users. On asking questions, Bruno doesn’t have the answer to, they direct the users to websites that could have the answer.
Stock Market Prediction Through a Chatbot…
443
Fig. 5 Conversation snippet with Bruno
Figure 6 depicts a conversation with Tom, who is as friendly as any generalpurpose chatbot. The major difference being Tom wants to know a problem specifically and try to look for the correct answer. However, the bot isn’t very friendly while counter questioning our participants.
Fig. 6 Conversation snippet with Tom
444
A. Halder et al.
Fig. 7 Conversation snippet with Jerry
Figure 7 depicts a conversation with Jerry, who behaves like a friendly advisor. In terms of functionality, there is not much difference with Tom, however, Jerry has a tint of emotion with their response and it appears as if they are trying to understand you better.
4.6 Results The conclusion was made that Jerry was most helpful to our users. 5/5 users felt that Jerry was human however only 3/5 users felt that Tom is a human and none of them found Bruno to be human. To our surprise, 3/5 users felt comfortable investing upon the advice given by Jerry, however, the other two felt like since they were talking to a human, the calculations of Jerry might not be as efficient as an AI Bot. One of the major reasons why the participants felt like Jerry was the only human was due to the kind of easy language used while communicating. As you can see, Tom and Bruno used words that most of the participants didn’t understand at once and seemed copied from websites/books. On the prediction side achieving a MAPE value of 2.38 showed that using the LSTM model was a success, thus those accurate values were then served to the user using the chatbot. While investing money, users are comfortable with a bot that can not only provide them friendly responses but they also want reasons behind why they think the user should invest in a particular stock at that time. Hence, the inference from this experiment was that too friendly or casual responses might reflect that the bot is not taking the situation seriously and thus result in losing the trust of users. Another helpful
Stock Market Prediction Through a Chatbot…
445
feature that our young investors recommended was the calculator. Some kind of means for a user to understand how much return they could possibly be expecting if they invest in a certain pattern. However, a survey showed that this feature would be more relevant in the case of Mutual funds and not Stocks. Lastly, only prediction or calculative facility is not entirely helpful if the user comes from a background with no information on Stock Market. Hence, including some general FAQs would also be beneficial for our investors. Some of the common questions collected that were addressed to our chatbots were: 1. 2. 3. 4.
What are some of the most traded Stocks? How many stocks should I buy? During what time, will the stock have low value? When should I sell the Stock?
5 Conclusion RNN-LSTM approach was used to predict the stock market close prices and generate a graph to forecast the upward and downward trends accurately. This was then plotted into a date v price graph for easy understanding of the user. This data was then converted into JSON format to forward to the chatbot framework. The chatbot was tested before building process by using the Wizard of Oz testing method, the testing method had 5 users with 0–100 understanding of the interface and the financial market. A conclusion was made that users not only looked for a friendly interface but they needed precise predictions and reasons behind those predictions. The main chatbot is currently under development as our aim is to make an intuitive and a highly-interactive chatbot experience that tries to bridge the gap between the user and the chatbot interface to such an extent that the user can fully depend on the chatbot. Our future research work will mainly focus on improving the chatbot to an extent that the user should not feel the need to go to an experienced stock-broker after using the chatbot, we would also like to improve the MAPE value from 2.38 by implementing better prediction models.
References 1. J. Sousa, J. Montevechi, R. Miranda, Economic lot-size using machine learning, parallelism, metaheuristic and simulation. J. Logistics, Inform. Serv. Sci. 18(2), 205–216 (2019) 2. A. Coser, M.M. Maer-Matei, C. Albu, Predictive models for loan default risk assessment. Econ. Comput. Econ. Cybern. Stud. Res. 53(2), 149–165 (2019) 3. C. Jung, R. Boyd, Forecasting UK stock prices. Appl. Financ. Econ. 6(3), 279–286 (1996) 4. G.E.P. Box, D.A. Pierce, Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J. Am. Stat. Assoc. 65(332), 1509–1526 (1970)
446
A. Halder et al.
5. A. Adebiyi, A. Adewumi, C. Ayo, Stock price prediction using the ARIMA model, in Proceedings of the 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, IEEE, Cambridge, UK (2014) 6. C. Zhang, X. Cheng, M. Wang, An empirical research in the stock market of Shanghai by GARCH model. Oper. Res. Manag. Sci. 4, 144–146 (2005) 7. C. Anand, Comparison of stock price prediction models using pretrained neural networks. J. Ubiquitous Comput. Commun. Technol. (UCCT) 3(02), 122–134 (2021) 8. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 9. H.K. Andi, An accurate bitcoin price prediction using logistic regression with LSTM machine learning model. J. Soft Comput. Paradigm 3(3), 205–217 (2021) 10. J. Li, S. Pan, L. Huang, X. Zhu, A machine learning based method for customer behavior prediction. Tehnicki Vjesnik-Tech. Gazette 26(6), 1670–1676 (2019) 11. H. White, Economic prediction using neural networks: the case of IBM daily stock returns. Earth Surf. Proc. Land. 8(5), 409–422 (1988) 12. G.P. Zhang, Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50(1), 159–175 (2003) 13. E. Alibasic, B. Fazo, I. Petrovic, A new approach to calculating electrical energy losses on power lines with a new improved three-mode method. Tehnicki Vjesnik-Tech. Gazette 26(2), 405–411 (2019) 14. E. Axelsson, A. Fathallah, M. Schertell, Rin Tohsaka—a discord bot for community management (2018) 15. T. Bocklisch et al., Rasa: open source language understanding and dialogue management. arXiv preprint arXiv:1712.05181 (2017) 16. S. Keyner, V. Savenkov, S. Vakulenko, Open data chatbot, in European Semantic Web Conference (Springer, Cham, 2019) 17. N. Mehta, P. Shah, P. Gajjar, Oil spill detection over ocean surface using deep learning: a comparative study. Mar. Syst. Ocean Technol. 16(3), 213–220 (2021) 18. X. Cheng, Y. Bao, A. Zarifis, A. Zarifis, W. Gong, J. Mou, Exploring consumers’ response to text-based chatbots in e-commerce: the moderating role of task complexity and chatbot disclosure (2021) 19. A. De Myttenaere, B. Golden, B. Le Grand, F. Rossi, Mean absolute percentage error for regression models. Neurocomputing 192, 38–48 (2016)
Air Writing Recognition Using Mediapipe and Opencv R. Nitin Kumar, Makkena Vaishnavi, K. R. Gayatri, Venigalla Prashanthi, and M. Supriya
Abstract Have you ever wished you could draw something just by waving your finger in the air? This can be accomplished in this day and age using a variety of technologies. This paper describes an application that uses the Python modules MediaPipe, Numpy, and OpenCV to write/draw geometrical shapes in real time over air in front of a camera. The proposed method accomplishes two goals: first, it recognizes the hand and the fingers from video frames, and then it monitors hand movements to create diverse designs. Furthermore, this technology allows for genuine Human-System interaction without the use of any form of character input device. The application is created in Python and tested in a variety of circumstances with variable illumination. This Software-based solution is relevantly basic, quick, and straightforward to use.
1 Introduction A global health crisis triggered by the outbreak of Coronavirus disease 2019 (COVID19) has profoundly altered the way we see the world and how we live our lives. It has gone from walking around freely to wearing masks everyday when going out. Most countries have closed schools, training institutes and higher education facilities due to the lockdown and social isolation caused by the COVID-19 pandemic. The world is shifting towards online learning at a rapid pace. Even though the challenges posed to both educators and learners are considerable, online learning, distance learning, and continuing education have become panaceas for this unprecedented global pandemic.
R. Nitin Kumar (B) · M. Vaishnavi · K. R. Gayatri · V. Prashanthi · M. Supriya Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India e-mail: [email protected] M. Supriya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_35
447
448
R. Nitin Kumar et al.
An intelligent interactive system should be developed to improve learner experience and learning effectiveness in multimedia online education, followed by the necessity to create a Virtual Canvas tool. Schools and teachers are going above and beyond to engage their kids in the classroom by moving to online discussions, using online whiteboards, soliciting input from parents, and closely monitoring their performances. Virtual Canvas tool enables teachers to write on the air, making students feel more involved in the class and also making it more interactive. Virtual Canvas tool is the Python application that is being developed in this paper which makes use of modules like MediaPipe, Numpy and OpenCV. These packages have helped to provide a smooth implementation of tracking hands and fingers from video frames in real time to drawing geometrical shapes in the air. OpenCV is an open-source Python library used for computer vision. This package has been used to capture the video from the webcam, resize, edit and display it frame by frame. MediaPipe is a Python library with ready-to-use solutions. The MediaPipe Hands, which is a Machine Learning solution, is used in the Virtual Canvas tool to infer 21 3D landmarks of a hand from a single frame.
2 Problem Statement This project aims to develop a Virtual Canvas tool that is basic, quick and easy to use. First, it must recognize the hand and the fingers from video frames, and then it should monitor hand movements to create diverse designs. Furthermore, this technology must not allow any type of character input device for Human-System interaction.
3 Literature Survey Various research papers have helped to gain insights on the development of the Virtual Canvas tool. The Air Canvas [1] tool is one such project that tracks finger movement using numpy and OpenCV. Another similar tool [2] uses a colored marker to draw. The work proposed in [3] is one such project that has dealt with the same problem statement. This project however has not implemented some features like save, clear etc. that are a must in this paper. These projects can be very useful in this global pandemic for online teaching as discussed in [4] which is the main reason behind its development. The work proposed in [5] has discussed about converting the finger movements to text and another approach to this is by using an external device to interact which is discussed in [6]. Most of these papers have made use of many common libraries like OpenCV, numpy etc. The proposed paper aims to use a library named Mediapipe along with the other libraries mentioned above.
Air Writing Recognition Using Mediapipe and Opencv
449
In a variety of technological domains and platforms, the ability to perceive the shape and motion of hands can improve the user experience. It can be used to understand sign language and to control hand gestures, as well as to overlay digital information on top of the physical world in augmented reality. The work presented in [7] is one such paper that has discussed interaction using gestures in Virtual Reality. Authors of [8–10] have manifested the use of gesture recognition in real time. These gestures can also be recognized by training a model. An insight on this was provided in [11]. A use of this type of model is explained in [12] which makes use of a trained CNN model to recognize gestures. These papers have shown the importance of Machine Learning in the field of Computer Vision and how the accuracy can be greatly improved by using it. Literature also presents a newer approach that uses gesture recognition to paint on the canvas. The approach discussed in [13] is one such approach that focuses mainly on fingertip detection. Another comparable approach is discussed in [14] that uses gesture recognition to paint. The working of real-time finger tracking and gesture recognition is discussed in [15, 16]. An alternative approach is to use a glove to detect gestures in real-time which can be seen in [17]. The use of gesture recognition is vast. One of the boundless benefits is discussed in [18]. These studies have aided in the use of gestures to choose and use various tools. An insight on OpenCV is presented in [16, 19]. The work discussed in [16] shows the real-time tracking while [19] explores Image Pre-processing. These works have provided insights on how to make use of the OpenCV library as this project makes use of real-time tracking as well. These papers helped to gain an understanding on the technologies that should be used to bring Virtual Canvas tool into existence.
4 Proposed System This paper presents a tool that allows users to draw various geometric shapes in the air. This application is designed to make online discussions and meetings more interactive. It not only allows us to draw geometric forms but it also includes a tool for free hand sketching, erasing, clearing the drawing on the Canvas and there is also an option to save the drawing locally. It provides a simple user-friendly interface that allows novice users to get the most out of the program right away. After palm detection across the entire image, our subsequent hand landmark model provides accurate localization of 21 3D coordinates of hand knuckles inside the detection hand regions via regression, which is direct coordinate prediction. For ground truth, we manually annotated 30 K real-world images with 21 3D coordinates, as shown in Fig. 1. MeidaPipe Hands provides a Machine Learning model that can infer 21 3D landmarks on a hand. It has demonstrated a high degree of accuracy in tracking movements. Figure 1 depicts how the landmarks are parted. This model makes the Virtual Canvas tool unique. It also provides a really smooth implementation of the same.
450
R. Nitin Kumar et al.
Fig. 1 Hand landmark model [19]
The flowchart of the virtual canvas tool representing the steps in the proposed model is presented in Fig. 2. The Webcam has been used to get real-time input. OpenCV library takes these inputs in frames. Since the input is taken using a webcam, each frame is flipped to make it easier to write/draw for the user. These frames are provided to the Mediapipe library which detects the hand from the frame and highlights it. As discussed, Mediapipe provides 21 coordinates if the hand is detected in a frame. Out of these 21 coordinates, the coordinates of the fingertips are discovered and stored separately. These coordinates can be used to select and use the tools. They can also be used for gesture recognition.
Fig. 2 Flow-chart of virtual canvas
Air Writing Recognition Using Mediapipe and Opencv
451
5 Experiment The MediaPipe library plays a major role in detecting and tracking the 21 landmarks on hand in this project. To track/detect the hand, an input should be provided. This is done using the OpenCV library. OpenCV captures the video, provides it as an input to the MediaPipe library which detects the hand provided in the frame as shown in Fig. 1. Two sub modules from MediaPipe, namely drawing utils and hands have been used to develop this project. Drawing utils has been used mostly to sketch and draw the geometric forms. Three parameters have been optimized in the hands module of MediaPipe to get the current results. The first parameter was max_num_hand. This parameter is used to provide the maximum number of hands the model can detect. It was assigned a value of one. The second parameter is min_detection_confidence which tells if a hand is detected, only if the model confidence is above a particular threshold. min_tracking_confidence is the third parameter that will detect the landmarks on the hand only if the confidence value is above a certain minimum.
6 Results The Index finger can be used to select a tool by hovering it on one of the tools provided in the menu. A tool can be selected by pointing at it for 3 s. Once the tool is selected, both the fore finger and the middle finger can be used to activate the tool. Bringing down the middle finger will deactivate it. The top section of Fig. 3 shows the sketching tools provided in the menu. The line tool, rectangle tool, circle tool and the free hand tool have been used to draw the objects in Fig. 3. The erase feature has been demonstrated in Fig. 4. Additionally to clear all the sketches on the screen, a clear all button has also been provided. Further, a save tool has been provided to save the sketches onto the local system. This saved image can be ported to any document for later use.
7 Conclusion The field of Human–Computer interaction is still in its early stages. The development of potentially natural interfaces to computer controlled settings could be enabled through visual interpretation of hand movement. The number of diverse approaches to video-based hand tracking has been explored in recent years as a result of this potential.
452
Fig. 3 Output screen for draw feature
Fig. 4 Output screen for erase feature
R. Nitin Kumar et al.
Air Writing Recognition Using Mediapipe and Opencv
453
This Video-based Virtual Canvas tool is a mode of Human–Computer interaction. This technology allows both viewers and users to learn in an interactive manner. The possibilities of this model is immense in today’s era, as the entire world is changing toward online learning. This model has proven to be basic, quick and very straightforward to use. This work can be improved by incorporating gesture recognition for using different tools. This will make the model more user-friendly. The 3D object representation can also be explored. A color palette can be included to allow users to write/draw in a variety of colors, making the activity more engaging. In addition, a white board can be added to show the user’s drawing without the background. As a result of this, the viewers will be able to see it more clearly.
References 1. S.U. Saoji, N. Dua, A.K. Chodhary, B. Phogat, Air canvas application using Opencv and numpy in python. IRJET 8 (2021). e-ISSN: 2395-0056, p-ISSN: 2395-0072 2. S. Pranavi, M. Eswar Pavan, S. Srilekkha, V. Sai Pavan Kalyan, C. Anuradha, Virtual sketch using Opencv. Int. J. Innovative Technol. Exploring Eng. (IJITEE) 10(8) (2021). ISSN: 22783075 (Online) 3. S. Dhaigude, S. Bansode, S. Waghmare, S. Varkhad, S. Suryawanshi, Computer vision based virtual sketch using detection. IJRASET 10(1) (2022). ISSN: 2321-9653 4. G.R. Kommu, An efficient tool for online teaching using Opencv. IJCRT 9(6) (2021). ISSN: 2320-2882 5. P. Ramasamy, G. Prabhu, R. Srinivasan, An economical air writing system is converting finger movements to text using a web camera, in International Conference on Recent Trends in Information Technology (ICRTIT), Chennai (2016) pp. 1–6 6. S.K. Vasudevan, T. Naveen, K.V. Padminy, J. Shruthi Krithika, Marker-based augmented reality interface with gesture interaction to access remote file system. Int. J. Adv. Intell. Paradigms 10(3), 236 (2018) 7. A.K. Ingale, J. Divya Udayan, Gesture based interaction in immersive virtual reality: a case study, in International Conference on Artificial Intelligent, Automation Engineering and Information Technology, Singapore (2020) 8. Y. Araga, M. Shirabayashi, K. Kaida, H. Hikawa, Real time gesture recognition system using posture classifier and Jordan recurrent neural network, in IEEE World Congress on Computational Intelligence, Brisbane, Australia (2012) 9. E. Ohn-Bar, M.M. Trivedi, Hand gesture recognition in real-time for automotive interfaces. IEEE Trans. Intell. Transp. Syst. 15(6), 2368–2377 (2014) 10. V.I. Pavlovic, R. Sharma, T.S. Huang, Visual interpretation of hand gestures for humancomputer interaction: a review. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 677–695 (1997) 11. L. Dai, J. Shanahan, An ıntroduction to computer vision and real time deep learning—based object detection, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2020), pp. 3523–3524 12. S. Preetha Lakshmi, S. Aparna, V. Gokila, P. Rajalakshmi, Hand gesture recognition using CNN, in Ubiquitous Intelligent Systems, Proceeding of ICUIS (2021), pp. 371–382 13. P. Vidhate, R. Kadse, S. Rasal, Virtual paint application by hand gesture recognition system. Int. J. Tech. Res. Appl. 7(3), 36–39 (2019). e-ISSN: 2320-8163 14. G. Rokade, P. Kurund, A. Ahire, P. Bhagat, V. Kamble, Paint using hand gesture. Int. Res. J. Eng. Technol. (IRJET) 07(02) (2020). e-ISSN: 2395-0056, p-ISSN: 2395-0072
454
R. Nitin Kumar et al.
15. K. Oka, Y. Sato, H. Koike, Real-time fingertip tracking and gesture recognition. IEEE Comput. Grap. Appl. 64–71 (2002) 16. R.M. Guravv, The real time tracking using finger and contour detection for gesture recognition using OpenCV, in International Conference on Industrial Instrumentation and Control (ICIC) (2015), pp. 974–977 17. H. Bharadwaj, M. Dhaker, K. Sivani, N. Vardhan, Smart gloves: a novel 3-D work space generation for compound two hand gestures, in Proceedings of the 2018 International Conference on Control and Computer Vision pp. 28–32 18. V. Dasari, H.M. Aishwarya, K. Nishkala, B. Toshitha Royan, T.K. Ramesh, Sign language to speech conversion, IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) (2017) 19. V.S. Sethupathy, OpenCV based disease identification of mango leaves. Int. J. Eng. Technol. 8(5), 1990–1998 (2016)
Blockchain Based Email Communication with SHA-256 Algorithm L. Sherin Beevi, R. Vijayalakshmi, P. Ilampiray, and K. Hema Priya
Abstract The cryptocurrency transaction by utilizing bitcoins is the present work scenario in industries with an appealing feature of blockchain technology. Business is increasingly embracing blockchain systems because they make transactions expedite, cost effective, and efficient. Thus, the proposed system aims is to develop a decentralized application to make a secured email transmission using Secured Hash Algorithm(SHA)-256 hashing algorithm. The existing system for Email transmission is the Gmail application. The Gmail is centralized by google company in which all the authorities are at high level and personal data are been centralized and it leads to various security issues. A Decentralized Email Solution on a mission to protect email users’ digital rights in a decentralized manner.
1 Introduction A blockchain is a peer-to-peer network that stores distributed transactions in an immutable ledger in the form of blocks. Each block is linked to the next in the form of a chain using a cryptographic method. All the blocks in the blockchain are connected with hashing algorithm. Hashing algorithm is used for security purposes. Modification of the data is possible only after the removal of block from the chain. Multiple transactions details are stored in a single block. Data is encrypted in the form of fixed number of digits. Clients own block is called node. For authentication, node maintains client’s digital signature and each transactions timestamp details. Blockchain L. Sherin Beevi (B) R.M.D Engineering College, Kavaraipettai, India e-mail: [email protected]; [email protected] R. Vijayalakshmi Rajalakshmi Institute of Technology, Kuthambakkam, Chennai, India P. Ilampiray R.M.K Engineering College„ Kavaraipettai, Chennai, India K. Hema Priya Panimalar Institute of Technology, Poonamallee, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_36
455
456
L. Sherin Beevi et al.
is a decentralized network so all the nodes are validated with help of miners. Miners are selected through the consensus mechanism and after the successful validation miners get some block rewards. Viruses easily spread through email attachments. Spam, spyware, and viruses are difficult to detect. Transferring emails by accident at the push of a button, and dispatch can mistakenly go to the wrong person, exposing confidential data and important corporate information. Most of the free email services out there, like Gmail and Outlook, keep your data safe from outside hackers. Still, more frequently than not, they intrude on your data for the purpose of posting advertisements. Whereas in decentralization provides high security level from all kinds of attacks also possess hashing of data, decision making and problem solving are done much faster. Hence the prevailing system Gmail is secured only to a certain extent because it’s centralized and the messages stored or transmitted can be handled by third parties. There has been centralization in the current way of traditional procedure, resulting in excessive traffic. Decentralized approaches are a new way of leveraging modern technology that will be more effective and reduce the risk of attackers.
2 Related Works Data is now one of the most precious assets [1]. Centralized organizations might be public or private, and they store and use vast amounts of personal and sensitive data. They have little or no control over data storage and usage. As a result, blockchains are decentralized blocks of records that use Public Key Infrastructure (PKI) to authenticate people and devices, resulting in increased data privacy and security. The objective is for trusted people to digitally certify the keys and documents [2]. PKI technology is used in a variety of industries, including secure dispatch and virtual private networks, to ensure the secrecy and integrity of information transfer, as well as the authenticity and denial of identity, which are all assured by hand utilizing certificate authorities. PKI technology, on the other hand, has some drawbacks in specific applications [4]. The computer and the Internet are not completely secure memory storage devices. So, in order to ensure the security of our stored and sent data,digital signature and message authentication are employed to prevent data from being disclosed or tampered with. The incremental hash function was proposed by Gold Reich and S. Goldwasser. He advocated that, similar to Facebook and Gmail, these services be made available to everybody in the world. The data structure for all nodes in the block is a merkle tree. It’s faster to encrypt the hash value of expedited communication based on the previous hash value than it is to re-compute it from scratch as with a standard hash function [5]. The fundamental concept of iterative hash functions processing messages block by block and the development of a new hash structure that prevents assaults by combining an incremental hash function with an iterative hash function [6]. Any length message input is converted into a fixed length output by a hash function algorithm. MAC calculation is a hash function that uses a secret key as an alternate input parameter that is only known by the sender and recipient. Hash functions have abecedarian properties such as collision
Blockchain Based Email Communication with SHA-256 Algorithm
457
resistance, resistance to calculating an input if an affair is known, and resistance to chancing a brace inputs with the same output [6]. Two frequent forms of assaults are key recovery attacks and forging attacks [5, 7]. Bitcoin is a safe and secure decentralized payment system that uses blockchain technology to store, verify, and audit data or information [8]. The methods of gaining access to a blockchain network are growing increasingly sophisticated. Interference occurs often, and security concerns are becoming increasingly significant. It’s also vital to continuously monitor the activities of network users and to detect new incursions that may occur at any time. More intelligent systems and models are proposed for future developments in Ledger mail. Because of the features of blockchain technology, tracking of quality assurance, data integrity, record management, and so on are taken into account while integrating blockchain with smart Ledger mail system [13]. The use of Authenticated Encryption storage of image data in a blockchain-based E-mail system is proposed. Using blockchain technology, the signature is exchanged amongst the numerous terminals, allowing for a safe interaction between the user and the receiver. The data supplied in this method will be traceable, irreversible, and unaltered [9]. To make the proposed work more practical, a storage option that combines an off-chain server and a blockchain is employed, bypassing the block’s compute and storage limits. The new Ledger mail technology will provide more data privacy protection. There are several algorithms that prove to be more efficient than the proposed technique when compared to other methodologies [8]. However, the proposed work continues to provide the highest level of security.
3 System Architecture The proposed system involves the creation of a block. Initially the user needs to create an account to be added in the block. Each user has their own user id and password. A size of the message modified into a fixed length after the sender authentication. The sender send the message to the receiver, during the transmission, messages are encrypted by cryptographic algorithm (SHA-256) into a fixed number of length. After applying the cryptographic algorithm miners validate the communication, sender and receiver’s address, digital signature and timestamp then the message transferred through the protocol, added in the node in the form of merkle tree. When a significant number of nodes are linked together, a block is produced. The chain has been extended with the addition of a new block. After the block is created, it is disseminated to the other nodes in the network. The preceding node was likewise preserved with the hash value of the receiver’s node. As a result, if the receiver or someone else tries to change the original value, the hash value is likewise changed. As a result, it will select the preceding node again. As a result, after successfully receiving the message, changing the original message is not possible. One of the most appealing features of the blockchain E-mail network is its immutability. The Proof of Work (PoW) algorithm chooses the miners. PoW is a protocol designed to prevent an
458
L. Sherin Beevi et al.
Fig. 1 How does email communication does get into the blockchain
attack or unsolicited e-mail messages known as spam from disrupting the operation of any system. It is a technique designed to prevent unwanted messages from being transmitted with offensive e-mail, sometimes known as spam (Fig. 1).
4 Methodology The proposed works aims to provide message transfer through secure email in a decentralized way using blockchain. Ledger mail is an email service that provides cost effective and customizable secure mail transfer with the help of blockchain technology. The methodologies are as follows: A. B. C. D. E. (A)
Ledgermail – Block Creation Storage Layer Cryptographic Layer Consensus Layer Performing Message Transmission Ledger mail Block Creation
LedgerMail combines standard email transfer protocols like IMAP/SMTP with Blockchain Technology, which is immutable, tamper-proof, and revolutionary. Each email transfer is treated as a Blockchain transaction, which is validated via the PoW consensus process. LedgerMail uses the Hybrid Blockchain Platform, which is lightning fast, enterprise-ready, extremely secure, accountable, and based on the PoW algorithm, in partnership with blockchain Network. LedgerMail is ID agnostic, which means that users can sign up for it with any legitimate email address, regardless of domain. Users can use any custom domain they like, such as gmail.com, yahoo.com, outlook.com, or any premium company domain.
Blockchain Based Email Communication with SHA-256 Algorithm
459
Each email transfer is treated as a blockchain transaction by LedgerMail, and users will be assigned a unique Wallet ID, which will be mapped to the supplied Email ID internally. Users, on the other hand, don’t have to worry about the complicated backend procedures and can concentrate on having a smooth experience with LedgerMail. We’ve been using these systems for a while and are familiar with their user interface, LedgerMail. When sending encrypted emails with LedgerMail, the sender must ensure that the recipient is also a member of the LedgerMail Community.If the recipient is not already a member of the LedgerMail ecosystem, the sender can invite them and have them sign up with just a few clicks. This is quite similar to utilizing messaging apps like Signal, Telegram, or WhatsApp, which need both the sender and the receiver to be using the same programme. It’s an email blockchain system that’s entirely segregated and safe. This programme will allow us to continue receiving emails. Users can also use their existing email address to log in (Fig. 2).
Fig. 2 How ledger mail works
460
(B)
L. Sherin Beevi et al.
Storage Layer
The decentralized email storage layer is in charge of storing mailbox information, node information, and exit information. (1)
Mailbox information • • • • • •
(2)
Node information • • • •
(3)
Node id Node priority Number of nodes participating in the historical consensus Message
Exit Information • • • •
(C)
Address Publickey based mailbox address Username id Hash value Message Node id Signature of mailbox owner
Sender Receiver Message content Message signature storageofevidence
Cryptographic Layer
Emails are a common form of communication in our day-to-day lives. Users will have complete control over their data with the advent of the Decentralized Email Solution, and they won’t have to worry about email security or privacy. As with a blockchain transaction, each email sent is confirmed and validated. To provide the highest level of security and to develop a tamper-resistant decentralized email environment. The SHA 256 Hash Functions use a cipher to generate a hash value of a fixed length (256) from plaintext. The recovery of plain text from cipher text is practically difficult. Avalanche effect, uniqueness deterministic, speed, and no reverse engineering are all advantages of the blockchain. Hash functions play a key role in linking blocks together and maintaining the integrity of the data recorded inside each block. The distributed email blockchain is built on a foundation of hashing, public–private key pairs, and digital signatures. These cryptographic features enable blocks to be securely linked by other blocks, as well as ensuring the data recorded on the blockchain’s stability and immutability. The public key is used as the person’s address. The public key is exposed globally, which means it may be seen by any participant. The private key is a secret value that is used to gain access to address data and authorize any of the ‘address’ actions, which are typically used to transmit messages (Fig. 3).
Blockchain Based Email Communication with SHA-256 Algorithm
461
Fig. 3 How hashing algorithm works
(D)
Consensus Layer
Miners must show proof of work covering all block data in order for a block to be accepted by network members. For example, one of the programmes that use hashcash to combat email spam is one of them. Because a single email does not need much labour, valid emails will be able to show proof fast utilizing proof of work. Nonetheless, the difficulty of producing the requisite evidence for widespread spam emails necessitates massive computing resources. The difficulty of the task can be tweaked so that new blocks in the network are formed at a rate of one every ten minutes. As miners compete to prove their efforts, this protocol provides fresh value to them. The network accepts a new block every time a miner gives a new proof of work, which happens every 10 min. Miners who follow the regulations determine their earnings. Maintaining a comfortable temperature while solving work proofs on a miner’s computer with high processing power, electricity, and other peripherals can be costly. The nonce is calculated by hashing the value of the block data, and the miner’s efforts to solve the puzzle correctly are proof. Miners waste energy and resources to construct blocks that take a lot of computational effort to solve, yet no miner is rewarded for doing so. To secure the long-term usage of blockchain technology, a consensus layer that enables cost-effective and rapid techniques is required. Nodes compete with each other during the mining process to verify that the information in each block of transactions is correct. Miners are attempting to solve problems in order to authenticate the validity of recent blockchain transaction blocks. Proof of work (PoW) is a type of cryptographic zero-knowledge proof in which one party, the examiner, shows that a specified amount of computation has been used to other examiners. There is no central body or direction that keeps track of users
462
L. Sherin Beevi et al.
and their communications in Ledger mail. Instead, miners, who are agents in the ecosystem who leave evidence of their efforts, bring blocks to life. (E)
Performing Message Transmission
The name “blockchain” is frequently associated with digital technology, but it has a wide range of applications, including email. Every year, people send 112.6 trillion emails, and more than 92 percent of everyone over the age of 15 in the United States uses it. Furthermore, 66% of consumers subscribe to email lists to receive promotional offers, compared to 25% of consumers who follow brands on social media for the same purpose. Email marketing is a key growth driver for many firms, with a potential return on investment of up to 4400 percent. Hacking a blockchain network is difficult due to the large number of nodes involved. To get control of the entire blockchain, a cybercriminal would need to take over 51% of existing nodes. That’s no minor task when you’re dealing with billions of nodes. Spam, spoofing, and phishing are all greatly reduced as a result of this enhanced protection. Because a blockchain is an immutable set of records that are authenticated by each of the machines that hold copies of it, a blockchain email system would have a message database that accurately reflects everyone using it’s sending and receiving activity. Despite the introduction of security tools like SPF, DKIM, and DMARC, spam continues to haunt our inboxes, and nearly all of us have been the victims of phishing and spear phishing attacks. Block chain email, on the other hand, would give a single source of truth for all messages sent and received, allowing users to simply verify that everything they receive is from a trusted sender. Bad performers would be unable to impersonate others and would have a difficult time doing so. By storing email on a decentralized blockchain, no single party will be able to control user’s accounts or communications. Many consumers nowadays “pay” for free email services by allowing firms to analyze their messages for advertising signals. Even if the snoopers are silicon-based, blockchain email would allow them to keep their correspondence safe from prying eyes. Furthermore, unlike many free services that basically put accounts in sleep when users try to shut them down, if someone wanted to shut down their blockchain email account, they could do so without fear of their information not being totally wiped. If a single third party controls the data, they can do whatever they want with it; however, if a blockchain email database is replicated over multiple computers, only users will have control over how their data is handled. • Direct Interaction with Others Two distinct nodes can connect directly with each other thanks to blockchain technology. In some circumstances, this method eliminates the need for a third-party supplier and also ensures that no one else has access to your company’s data. • Nobody would ever lose another email The database of a blockchain email system could not get corrupt because there would be no central email server to be damaged or fail. Even deleted communications
Blockchain Based Email Communication with SHA-256 Algorithm
463
could be saved via a technique like an off-chain node, allowing users to create an archive that they could access at any time.
5 Results and Discussion In this article, we looked at how to send email using a Node.js web application. You’ve mastered SMTP and Nodemailer, as well as the Ledgermail transactional email service (Fig. 4). The most prevalent transport mechanism is Simple Mail Transfer Protocol (SMTP), which is a technique for transmitting outgoing emails across networks.
Fig. 4 Ledgermail communication
464
L. Sherin Beevi et al.
Fig. 5 Comparison graph between SNMP and distributed ledgermail
It functions as a relay service for email transmission from one server to another. When you use an email client like Gmail to send an email to a friend, an outgoing (SMTP) server receives it and connects with your friend’s receiving server. The SMTP protocol is used to communicate between the two servers, defining who the receiver is and how they can receive the incoming mail. When it comes to email delivery, most email clients use an SMTP server. Using a ledger email service is relatively simple to set up and use, especially because most providers come with extensive instructions. They offer email delivery monitoring as well as web analytics and reporting, which includes bounce rate, open, click, and unsubscribe tracking (Fig. 5). The system transaction delay time increases after the use of the improved node quality control strategy, which is due to the increased communication between nodes after the use of the node quality control strategy, resulting in the system processing time becoming longer, which leads to a longer transaction confirmation delay time. The following stage will be to thoroughly verify and optimize the problem.
6 Conclusion and Future Work This paper offers a decentralized e-mail system with trust and responsibility to address the shortcomings of the untrustworthy and vulnerable centralized e-mail system utilizing SMTP. In this paper, we propose a node quality control technique to improve the SHA 256 algorithm in the cryptographic layer and the PoW consensus
Blockchain Based Email Communication with SHA-256 Algorithm
465
algorithm in the blockchain consensus layer. With the help of our blockchain-based email system, we can provide reliable evidence storage and dispute resolution service. The future enhancement is to make the other multiple users in the block to know and identify who are all the active users whether the user is in online or offline while transferring the message. To make sure whether the message is been seen and read by the recipient. The message size extension needs to be increased in order to transfer junk file via email crossing the Gmail size limitations. Instead of using user id and password to login each time other security features such as fingerprint, face recognitions, or voice commands can be enhanced.
References 1. D. Piedrahita, J. Bermejo, F. Machio, A secure eamil solution based on blockchain. in Blockchain and Applications. BLOCKCHAIN 2021. Lecture Notes in Networks and Systems, ed. by J. Prieto, A. Partida, P. Leitao, A. Pinto, vol 320. (Springer, Cham, 2022). https://doi. org/10.1007/978-3-030-86161-9_36 2. S.K. Dhaurandher, J. Singh, P. Nicopolitidis, et al., A blockchain-based secure routing protocol for opportunistic networks. J. Ambient Intell. Human Comput. (2021). https://doi.org/10.1007/ s12652-021-02981-9 3. C. Pinzón, C. Rocha, J. Finke, Algorithmic analysis of blockchain efficiency with communication delay. Fundamental Approaches Softw. Eng. 12076, 400–419 (2020). Published 2020 Mar 13. https://doi.org/10.1007/978-3-030-45234-6_20 4. R. Ben Fekish, M. Lahami, Application of blockchain technology in healthcare: a comprehensive study. in The Impact of Digital Technologies on Public Health in Developed and Developing Countries. ICOST 2020. Lecture Notes in Computer Science, ed. by M. Jmaiel, M. Mokhtari, B. Abdulrazak, H. Aloulou, S. Kallel (eds), vol 12157 (Springer, Cham, 2020). https://doi.org/ 10.1007/978-3-030-51517-1_23 5. X. Bao,A decentralized secure mailbox system based on blockchain. In 2020 International Conference on Computer Communication and Network Security (CCNS) (2020), pp. 136– 141.https://doi.org/10.1109/CCNS50731.2020.00038 6. C. Pinzon, C. Rocha, J. Finke, Algorithmic analysis of blockchain efficiency with communication delay. in Fundamental Approaches to Software Engineering, FASE 2020. Lecture Notes in Computer Science, ed, by H. Wehrheim, J. Cabot, vol 1206. (Springer, Cham, 2020). https:// doi.org/10.1007/978-3-303-45234-6_20 7. D. Dasgupta, J.M. Sherin, K.D. Gupta, A survey of blockchain from security perspective. J Bank Financ Technol 3, 1–17 (2019). https://doi.org/10.1007/s42786-018-00002-6 8. R. Wang, J. He, C. Liu, Q. Li, W. Tsai, E. Deng, A privacy-aware PKI system based on permissioned blockchains. In 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS) (2018), pp. 928–931. https://doi.org/10.1109/ICSESS.2018.8663738 9. A. Manimuthu, V. Raja Sreedharan, G. Rejikumar, D. Marwaha, in A literature Review on Bitcoin: Transformation of Crypto Currency into a Global Phenomenon in the National Conference of Singapore University of Technology and Design (Singapore, IEEE, 2018), pp. 34–51 10. Dataeum, First blockchain solution that produces 100% accurate data through crowdsourcing (2018) Retrieved November 1, from https://www.cnbcafrica.com/apo/2018/04/12/dataeumfrst-blockchain-solution-that-produces-100-accurate-data-throughcrowdsourcing/ 11. A. de Vries, Bitcoin’s growing energy problem. Joule 2(5):801–805 (2018); Android random number flaw implicated in Bitcoin thefts (2013) Retrieved November 1, 2018, from https://nakedsecurity.sophos.com/2013/08/12/android-random-number-faw-implicatedin-bitcoin-thefts/
466
L. Sherin Beevi et al.
12. M. Amine Ferrag, M. Derdour, M. Mukherjee, A. Derhab, Blockchain Technologies for the Internet of Things: Research Issues and Challenges (IEEE, New York, 2018) 13. S.K. Dhurandher, A. Kumar, M.S. Obaidat, Cryptography-based misbehavior detection and trust control mechanism for opportunistic network systems. IEEE Syst. J. 12(4), 3191–3202 (2018). https://doi.org/10.1109/JSYST.2017.2720757 14. J.H. Jeon, K.-H. Kim, J.-H. Kim, Blockchain based data security enhanced IoT server platform. in 2018 International Conference on Information Networking (ICOIN), (Kualalumpur, Malasiya, 2018).https://doi.org/10.1109/ICOIN.2018.8343262 15. S. Yunling, M. Xianghua, An overview of incremental hash function based on pair block chaining. Int. Forum Inform. Technol. Appl. 2010, 332–335 (2010). https://doi.org/10.1109/ IFITA.2010.332
Sentimental Analysis on Amazon Reviews Using Machine Learning Rajashekhargouda C. Patil
and N. S. Chandrashekar
Abstract Because of the rapid advancement of web technology, Internet users now have access to a significant amount of data on the web. This information is primarily derived from social media platforms such as Facebook and Twitter, where millions of people express their views in their everyday interactions. Many online buying platforms, such as Amazon, Flipkart, and Ajio, contain a wealth of information in the form of reviews and ratings. Amazon is one of the many e-commerce perks that people use every day to purchase online, since it allows you to browse thousands of other consumer evaluations about the things you are interested in. These evaluations offer useful information about the product, such as its qualities, quality, and suggestions. The goal of this research is to do Sentimental Analysis on product-based evaluations. Product-based reviews from online buying sites such as Amazon.com might be classified as positive, negative, or neutral. Random Forest and Logistics Regression, a machine learning technique, is used to examine the suggested task and has obtained an overall accuracy of 96%.
1 Introduction Because of its high quality, quick logistic system, and large discounts, online shopping has grown in popularity in recent years [1]. It has also made buying exceedingly convenient. As a result, user feedback becomes critical information for establishing what people think about a product, which benefits businesses by allowing them to improve their offerings. Customers are adopting social media to express un-organized ideas and opinions as technology and thus the IT industry develops. Opinions shared on social media are frequently classified to distinguish favorable, negative, and neutral responses to the content uploaded [2]. In recent years, sentiment analysis has received a lot of attention. Sentiment categorization is a technique for analyzing and obtaining feedback on product remarks. R. C. Patil (B) · N. S. Chandrashekar Don Bosco Institute of Technology, Bengaluru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_37
467
468
R. C. Patil and N. S. Chandrashekar
Fig. 1 Rating system for Amazon.com
Sentiment analysis is a machine learning technique for classifying text based on the Sentimental Orientation (SO) of the text opinions are being evaluated and taken under consideration. With the development of technology and therefore the customers are using social media to deliver unorganized opinions and opinions. Opinions expressed on social media are often categorized to identify all kinds of positives. The sentiment analysis technique can help service providers and product manufacturers achieve their goals by examining the enormous amount of available data and generating an opinion from it. As the popularity of the internet grows, so does the number of user reviews. i.e., Some customers leave a one-word review, such as “wonderful,” “excellent,” or “poor,” while others detail issues that arise after they utilize the products. Some individuals have claimed that delivery problems, poor packaging, late arrivals, … have nothing to do with the quality of the product. However, some people accurately explain the quality of a product by giving specific details about the product. Along with the reviews in text format, most of the ecommerce websites accept the grading or rating of the product as shown in Fig. 1. Therefore, it is necessary to identify individual opinions and characteristics and calculate a polarity score for each trait.
2 Background Customers can submit evaluations on a wide variety of products on e-commerce websites, which is why they are growing increasingly popular [3]. Every day, millions of reviews are written by customers, making it tough for manufacturers to keep track of their thoughts on the product. As a result, it is critical to disperse massive and complicated data in order to extract usable information from them. Classification algorithms are the best technique to deal with such problems. Classification is the process of dividing data into groups or classes based on common qualities. The capacity to automate the classification process when working with enormous datasets is a major concern for businesses [4]. Sentiment analysis, also known as opinion mining, is a type of natural language processing (NLP) task that entails subjective data extraction from text sources and
Sentimental Analysis on Amazon Reviews Using Machine Learning
469
abstract data. The goal of emotion classification is to look at user feedback and identify it as good or negative. This avoids the requirement for the system to completely comprehend each sentence’s semantics. However, just classifying concepts as positive or negative is insufficient. There are several challenges to overcome. Positive and negative polarity cannot always be used to classify words and sentences. For example, the word “amazing” used to have a positive connotation, but when coupled with a negative word like “not,” the meaning might radically shift. Emotion classification has been tried in a variety of contexts, including product reviews, movie reviews, and hotel reviews. To identify feelings, machine learning methods are frequently utilized.
2.1 Literature Survey In this section, we have reviewed the most recent papers. We have limited our survey to the last three years. Table 1 gives the summary of the papers considered for the survey in which the parameters used for comparison are: the dataset on which the methodology is experimented, the methodology used and the result. Table 1 Literature survey restricted to last three years Ref
Dataset
Methodology
Result
[5]
Amazon dataset: 252,000 reviews from snap dataset
Bag of words model with Naïve bayes and SVM
Naïve Bayes accuracy: 92.72% SVM accuracy: 93.20%
[6]
Amazon book reviews [7]
Term frequency-inverse document frequency with random forest
Accuracy:- 90.15%
[8]
Review of cellphone and accessories: 21,600 reviews, review of electronics: 24,352 reviews, reviews of music instruments: 2548 reviews [9]
Term frequency-inverse document frequency with linear support vector machine
Accuracy:- 93.52% to 94.02%
[10]
Customer reviews from amazon products
Apriori algorithm with Naïve Bayes and SVM
Naïve Bayes Accuracy:90.423% SVM Accuracy:- 83.423%
[11]
Amazon products: [12]
SVM classifier (RBF Kernel, BP-ALSA)
Accuracy:- 97%
[13]
Amazon DVD musical product:.net Crawler, 9555 reviews
POS tagging (Constituent likelihood automatic wordtagging system (CLAWS)) hybrid approaches
Precision:- 0.89, Recall:- 0.84, F-Measure:-0.86
470
R. C. Patil and N. S. Chandrashekar
3 Methodology Existing tasks use a variety of methodologies, and most of them use machine learning algorithms like SVM, Random Forest, and KNN. The proposed method is based on previous work. Sentiment analysis was done in a variety of methods, with differing levels of analysis for each method. For all efforts that classify the text as positive or negative, the result is essentially the same. Some authors use a graph to explain their findings, while others use a table. According to the results, KNN has the lowest classification accuracy. Most of the work done so far has relied on probabilistic methods. Limitations of Existing System are that it is Time consuming and suffers from Less accuracy [14, 15].
3.1 Proposed Methodology Every word in a phrase has a syntactic function that determines how it is utilized. The parts of speech are another name for syntactic roles. Nouns and pronouns, for example, are frequently devoid of sentiment. With the use of negative prefixes, words like adjectives and verbs can communicate the opposite sentiment. We have a solution to this ambiguity, which is carefully seen in the Fig. 2.
Fig. 2 Block diagram of proposed methodology
Sentimental Analysis on Amazon Reviews Using Machine Learning
1. 2. 3.
4.
5. 6.
7.
471
Collect the reviews from the amazon website and store it in the Excel File and import those reviews to the database. User will provide the product name and the keyword is sent to the database and the reviews are retrieved from the database. Pre-processing will be used for reviews, which will comprise data cleaning, data integration, data transformation, and data reduction techniques such as tokenization and stop words removal After pre-processing the reviews will undergo Sentimental analysis where different algorithms are applied on the reviews such as SVM, and Random Forest classifier. The result with positive and negative reviews will be stored in the review management. If the reviews have unidentified keywords such as emoji’s, punctuations it will undergo self-learning process where the system will train the data whether it is positive or negative and it is stored in sentiment keywords. Again, it will undergo sentiment analysis and the process repeats.
4 Implementatıon Review analysis assists online buyers in learning more about the product they wish to purchase. Typically, if a person wants to buy items online, they will first read reviews [16] for the specific product that they want to buy, and based on the star rating, they will determine whether to buy or not. The accuracy level of the reviews was also assessed using the Random Forest and Logistic Regression algorithms [17] in this research, which illustrates the review analysis of Amazon items to assist online customers make the best selection. Data Collection, Data Preprocessing, Feature Selection, Detection Process, and Sentiment Classification are some of the steps involved [18]. Researchers have even used CNN Models for the detection of Keratin Pearls also [19].
4.1 Dataset Collection (Amazon Product Review Dataset) The data collected for this project is Amazon Product Review dataset [20]. It consists of 34,600 records, of which 50% are used as training data and 50% as test data. That is, 10,000 records were used for training, and 17,000 data sets were used for the testing process. The collected dataset is converted to a CSV file and used for further classification as shown in Fig. 3.
472
R. C. Patil and N. S. Chandrashekar
Fig. 3 Amazon product review dataset
4.2 Preprocessing (Removal of Punctuation, Stop Words) Because the dataset contains a large amount of information [21], pretreatment or data cleansing is critical to avoid overloading and storage difficulties. All superfluous or unneeded values and other symbols that aren’t required for categorization reasons [22] are removed prior to pre-processing. Tokenization, stop word removal, and other features are included. • Tokenization is the process of breaking down a text into separate tokens such as symbols, keywords, and phrases. The semicolon is removed from some characters, such as an exclamation mark, during tokenization. • Removing Stop Words: Stop words are phrases in a sentence that are not required in any segment of text mining, thus we usually remove these phrases to improve the analysis’s efficiency. Depending on the language and country, stop words are formatted differently.
4.3 Selection of Characteristics After preprocessing, feature selection is performed, and the needed feature for the analysis is shown in the Table 2. It shows the description of the dataset for the selected characteristics. Sample dataset is given Fig. 4.
Sentimental Analysis on Amazon Reviews Using Machine Learning Table 2 Selected features
Features
Descrıptıon
Review rating
Star rating between 1 to 5
Reviews
Reviews from the customer based on the product purchased
Review Title
Describes title of the project
473
Fig. 4 Feature selection from dataset
4.4 Sentiment Classification Sentiment classification, also known as polarity categorization, is the act of identifying and categorizing a particular viewpoint based on its orientation (positive, negative, neutral). Opinion mining is a technique for assessing and interpreting subjective data from a big data set (text). The Amazon dataset comprises of product reviews, and the dataset includes the following characteristics. ID of the review, rating of the review, title of the review, and so forth. The product reviews in the dataset are divided into five groups. Negative reviews are represented by class 0, 1; neutral reviews by class 2, 3; and good reviews by class 4, 5. This class split simplifies the process and makes the classification process much easier. Random Forest and Logistic Regression were the classification models utilized for categorization. Following the completion of the preceding method, the classification operation is carried out using the Machine Learning algorithms Random Forest and Logistic Regression to identify the correctness of the imported data. Finally, the Random Forest is compared to Logistic Regression to assess the overall accuracy of the models.
474
R. C. Patil and N. S. Chandrashekar
Fig. 5 a Word cloud for reviews, b Word cloud of results
4.5 Word Cloud The most often recurring terms are discovered in this research, which can provide both the customer and the designer with a sense of how the users feel about the product or what the essential features are [23]. The terms shown in the word cloud with a predetermined frequency might help to emphasize the most often referenced words in the evaluations. The frequency of the term is shown by its height. From the Fig. 5, it is very much visible that the outstanding words like “use”, “love”, “tablet”, “easy”, “bought”, “great”, “price”, “good”, “read” and so on are in the rating 5. For the reviews of rating 1, we can see the words “fun”, “used”, “enjoy”. It says the people who gave a rating of 1 will describe the facts and the feelings together. And the word could be just a big picture of the words, we need to do more work to find more variable things.
5 Results The term “Word Cloud” refers to a data visualization approach for showing text data in the form of a word cloud, where the size of each word represents the frequency or relevance of the word. A word cloud can be used to display significant textual data points. To implement this, we need to install some of the packages such as pandas, matplotlib, and Word cloud. The major uses of Word cloud include analyzing customer and staff feedback and identifying new SEO keywords to target. In our project the “Use” and “Love” has been most frequently used words, so it has been displayed begin Fig. 5b. The words that are not repeated many times or less frequently occurred words are displayed small. The Data distribution based on our prediction
Sentimental Analysis on Amazon Reviews Using Machine Learning
475
96%
4%
Fig. 6 Positive and negative review plot
is provided in the Fig. 6. The bar with orange color is for positive reviews whereas green bar is for the negative reviews. This get our result to achieve an accuracy of 92.4%.
6 Conclusion This document brings together a collection of papers that support several ways for identifying polarity in online reviews and categorizing them into different sentiments. The Logistic Regression and Radom Forest approaches have shown great results on their respective datasets. This experiment suggests that given a large enough training data set, correctly trained machine learning algorithms can do remarkably well in categorization. Although the differences aren’t significant, Logistic regression outperforms Random Forest in terms of accuracy, and both algorithms can correctly identify more than 96 percent of the time. The method is precise enough for the Amazon reviews test scenario. We created our own sentiment analysis methodology that leverages existing sentiment research approaches. As a result of the review classification and Sentimental analysis, the system’s accuracy improved, and the user received more accurate reviews. It is simple to evaluate the polarity of basic
476
R. C. Patil and N. S. Chandrashekar
statements using the built-in methods and accompanying libraries, but it is more difficult to manage the complex phrase patterns, such as caustic comments and several languages. In this project we have used two algorithms to perform the analysis i.e., Logistic Regression and Random Forest and it is limited to only these two algorithms only. However, we have we have tried to implement other algorithms such as SVM and Decision Tree but that does not support the system, because the data set which we are feeding here is of string type. Advanced machine learning and deep learning technologies may be utilized in the future to improve sentiment analysis apps. Ecommerce applications will also benefit from this change.
References 1. R. Vasundhara Raj, Monika, Sentiment analysis on product reviews. in International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), (2019). https://ieeexp lore.ieee.org/document/8974527 2. T. a/p Sinnasamy, N.N.A. Sjaif, A survey on sentiment analysis approaches in e-Commerce, (IJACSA). Int. J. Adv. Comput. Sci. Appl. 12(10), (2021) 3. P.M. Surya Prabha, B. Subbulakshmi, Sentimental analysis using naïve bayes classifier. in International Conference on ViTECoN (2019). https://ieeexplore.ieee.org/document/8899618 4. P. Karthika, R. Murugesari, R. Manoranjithem, Sentiment analysis of social media network using random forest algorithm. Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, (2019). https://ieeexplore.ieee.org/doc ument/8951367 5. S. Paknejad, Sentiment classification on Amazon reviews using machine learning approaches. KYH Computer Science and Communication (2018), pp. 1–23 6. K.S. Srujan, S.S. Nikhil, H. Raghav Rao, K. Karthik, B.S. Harish, H.M. Keerthi Kumar, in Classification of Amazon Book Reviews Based on Sentiment Analysis. (Springer, 2018), pp. 1–12 7. ZhiLiu, UCI: Machine Learning Repository,” 11 06 2011. [Online]. Available: https://archive. ics.uci.edu/ml/datasets/Amazon+book+review 8. T. Ul Haque, N. Nawal Saber, F. Muhammad Shah, Sentiment analysis on large scale amazon product reviews. (IEEE, 2018), pp 1–7 9. M. Ruining He, Julian, Ups and downs: Mmodeling the visual evolution of fashion trends with one- class collaborative filtering. Feb (2016). [Online]. Available: https://arxiv.org/abs/1602. 0158 10. N. Nandal, R. Tanwar, J. Pruthi, in Machine Learning Based Aspect Level Sentiment Analysis for Amazon products. (Springer, 2020), pp. 601–607 11. R. Mitchell, Web scraping with python: collecting more data from the modern web. April 2018. [Online]. Available: https://www.oreilly.com/library/view/web-scraping-with/978149 1985564 12. U.A. Chauhan, M.T. Afzal, A. Shahid, M. Abdar, M.E. Basiri, X. Zhou, in A Comprehensive Analysis of Adverb Types for Mining User Sentiments on Amazon Product Reviews (Springer, 2020). pp. 1811–1829 13. V.D. Kaur, Sentiment analysis of book reviews using unsupervised semantic orientation and supervised machine learning approaches. Netaji Subhas Institute of Technology, New Delhi, INDIA (2018). https://ieeexplore.ieee.org/document/8753089 14. A.P. Rodrigues, N.N. Chiplunkar, Aspect based sentiment analysis on product reviews. Department of Computer Science and Engineering, NMAM Institute of Technology, (2018). https:// ieeexplore.ieee.org/document/9096796
Sentimental Analysis on Amazon Reviews Using Machine Learning
477
15. Z. Singla, S. Randhawa, S. Jain, Statistical and sentiment analysis of consumer product reviews. in 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (2017), pp. 1–6. https://doi.org/10.1109/ICCCNT.2017.8203960 16. C. Chauhan, S. Sehgal, Sentiment analysis on product reviews. in 2017 International Conference on Computing, Communication and Automation (ICCCA) (2017), pp. 26–31. https://doi. org/10.1109/CCAA.2017.8229825 17. C. Sindhu, D. Veda Vyas, K. Prodyoth, Sentiment analysis based product rating using textual reviews. in International Conference on Electronics, Communication and Aerospace Technology (2017). https://ieeexplore.ieee.org/document/8212762 18. R.C. Patil, P.K. Mahesh, Analysis of various CNN models for locating keratin pearls in photomicrographs. in Emerging Research in Electronics, Computer Science and Technology. Lecture Notes in Electrical Engineering, ed. by V. Sridhar, M. Padma, K. Rao, vol 545. (Springer, Singapore, 2019). pp. 493–500. Online ISBN 978–981–13–5802–9. https://doi.org/10.1007/ 978-981-13-5802-9_45 19. S. Chawla, G. Dubey, A. Rana, Product opinion mining using sentiment analysis on smartphone reviews. in 2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) (2017), pp. 377–383. https://doi.org/ 10.1109/ICRITO.2017.8342455 20. T.U. Haque, N.N. Saber, FM. Shah, Sentiment analysis on large scale Amazon product reviews. in 2018 IEEE International Conference on Innovative Research and Development (ICIRD) (2018), pp. 1–6. https://doi.org/10.1109/ICIRD.2018.8376299 21. X. Fang, J. Zhan, Sentiment analysis using product review data. J. Big Data 2, 5 (2015). https:// doi.org/10.1186/s40537-015-0015-2 22. https://www.analyticsvidhya.com/blog/2021/08/text-preprocessing-techniques-for-perfor ming-sentiment-analysis/ 23. K. Ahmed Imran, A. Koushik, K. Ridoan, Word cloud and sentiment analysis of amazon earphones reviews with R programming language. Inform. Econom. Bucharest 24(4), 55–71, (2020). https://doi.org/10.24818/issn14531305/24.4.2020.05
Speed Breaker Identification Using Deep Learning Convolutional Neural Network B. Manikandan, R. Athilingam, M. Arivalagan, C. Nandhini, T. Tamilselvi, and R. Preethicaa
Abstract Speed breakers are one of the major reasons for road accidents in recent years. Speed breakers are constructed for human safety nearby schools and hospitals. Improper dimensions, absence of signboards, and unmarked speed breakers are the major threat to accidents. Real-time speed breaker detection is important to avoid road accidents, and all autonomous vehicles need to have this facility. Real-time identification of speed breakers is important and also difficult due to their different dimension and colour. The existing speed breaker detection method uses the accelerometer, GPS geotagged smartphone and image processing technique. Sensor data vibration pattern, GPS error, network overload and battery depletion are some of the shortcomings of these technologies. Due to these drawbacks, these approaches cannot be used to identify road conditions in real-time. In this work, Deep learning neural network is used for the detection of speed breakers. For a range of computer vision and medical image processing tasks, deep learning neural networks have proven to be effective. This research provides a new architecture for a convolutional neural network (CNN) based speed breaker detecting system. In CNN, the detection accuracy is increased by creating a bounding box with a localization approach. The CNN is trained on a library of speed breaker photos that have been gathered over time. The trained CNN is utilised to identify speed bumps in real-time. Prediction accuracy of 55% is obtained for real-time data by the trained network.
1 Introduction One of the most common causes of road accidents is driving too fast for the conditions. Speed breakers are placed on either side of the road to help slow traffic in high-traffic areas. As a result, speed breakers are primarily employed to slow down vehicles so that pedestrians can remain safe. However, if the motorist fails to see the speed breaker in good time, an accident with injuries, damage to the car, and deaths may result. B. Manikandan (B) · R. Athilingam · M. Arivalagan · C. Nandhini · T. Tamilselvi · R. Preethicaa Department of Electronics and Communication Engineering, Nadar Saraswathi College of Engineering and Technology, Theni, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_39
479
480
B. Manikandan et al.
Having speed bumps on national roads is a huge challenge because of the high vehicle speeds. When installing speed breakers, the most frustrating flaws are the ones that go unnoticed, such as the unstandardized size of the breaker (a width of 9 m and a height of 6 to 30 cm, as recommended by the National Highway Administration) [1]. Aside from speed bumps, the route is plagued by potholes, trash, and mud puddles. Speed limits are determined by factors such as the kind of route and the number of speed breakers. Because of the mobility of the cars, the detection must be swift enough to alert drivers before they approach the speed breaker. According to a government poll, Indians died in vehicle accidents at a rate of 1.42 million per year. According to a report by India’s Road Transport and Highway Ministry, speed bumps, potholes, and speed breakers have killed 6672 people and injured 4746 more. Accidents can be prevented by assisting the driver and by using technologies that can detect the presence of speed limiters in the vehicle. Keeping an eye out for things like speed bumps and potholes on the road can help keep people safe. It has been postulated in the literature that a 3-axis accelerometer or GPS can be used to detect speed breakers. The use of smartphones and image processing techniques for road condition detection is also being considered. On the other hand, Zhang [2] has proposed using LIDARs and cameras that gaze forward to identify speed bumps on the road. A method for detecting road and road boundary features using LIDAR data was created. In [3] CarTel was proposed by Hull et al. (2006), to monitor traffic and identify speed breaker locations [4]. In order to identify speed bumps, this system uses vibration sensors that are installed on the car to send data to a centralised programme. A single place will hold all the information on road conditions [4]. Continuous inquiries run by a Delay-tolerant continuous query processor will allow the applications to access the data stored in the centralised place. The "Pothole Patrol" system developed by Eriksson et al. [5] for road surface monitoring is a similar concept. Data about the status of the road is collected using vibration sensors and GPS. Numerous artificial intelligence (AI) algorithms have been developed to analyse the data collected and identify potential roadblocks. The rise in the number of people who own smartphones has sparked interest in utilising them to monitor road conditions [6]. According to Mohan et al. [7], cellphones’ microphones, accelerometers, GPS sensors, and GSM radios may be used to monitor road conditions and identify speed breakers. But with smartphone-based methods, the phone’s orientation concerning motion is critical. Thus, the accelerometer must be virtually reoriented about the vehicle before it can be used to monitor acceleration. Nericell uses a system based on heuristics to monitor road conditions. Bhoraskar et al. [8] solved the fundamental shortcoming of the Nericell system. It is called "Wolverine" in this system. As opposed to optical reorientation, this technique employed magnetometers to determine the horizontal orientation of a phone. Traffic and road conditions were determined by the Wolverine approach using K-means clustering and Support Vector Machine (SVM). Data on the road’s state may be gleaned from the accelerometers found in Android phones and analysed in real-time. To better educate drivers about road conditions, this data is kept on a server [9]. The suggested real-time solution makes use of a background Android service and the Google Maps app on a smartphone to function [10–12]. In 2014, 1,42,485 persons were killed in road accidents in India, according to a poll.
Speed Breaker Identification Using Deep Learning …
481
In the event of a speed limit violation or a shaky road, the motorist will receive an early warning from this service [13]. Using smartphones in a system is hampered by sensor data vibration patterns, GPS mistakes, network overload, latency, and battery draining issues. Smartphones can’t be used for real-time road condition sensing because of these constraints [14]. BUMPSTER is a mobile cloud computing solution for detecting speed humps [15]. Deep learning (is a method that uses the high-level feature representation for deep architectures. Deep architecture has the capability of identifying more than one hidden layer in the network. Computations to build deep architectures, an approach is known as "deep learning" makes use of high-level feature representations. Using deep architecture, the network’s hidden layers may be discovered. Computations Drivers can be warned of approaching speed bumps thanks to cloud-based data collection and support vector machines (SVM) [16]. An image processing approach has been developed for application in a driver assistance system. In [17] paper presents an autonomous driver assistance system presented the detection techniques for speed breakers utilising edge detection and morphological image processing," respectively. Hough Transformation (HT) is used in this study for lane recognition, colour segmentation, and shape modelling of speed breakers [18]. The closest neighbour classifier approach is used to identify road signs and classify the remaining items. Using a clustering-based detection technique, potholes may be detected more accurately [19, 20]. Deep learning relies on the use of hierarchical features or representations of observable data to learn from it [21]. Observations at the lower levels are defined by higher-level characteristics or variables derived from the measured data [22]. Generic (Ge-Al), memetic (Me-Al) and adaptive direct search (M-ADS) algorithms are employed to carry out the optimizations in this technique [23]. To achieve the most precise modifications to the various parameters, the use of integral and derivative controllers is essential in vehicle speed regulation. In this paper [24], most of the time, this initiative is aimed towards slowing down vehicles who violate the STOP sign’s (Red signal) average speed limit and cause an ALERT sign (Yellow signal) to flash. In this study, the coming speed bump is predicted using a deep learning technique called SegNet, which is a CNN model architecture for semantic pixel-wise segmentation [25]. Tabe 1 shows summarize result of speed breaker identification. The following sections comprise this paper: The introduction discusses various approaches for detecting speed breakers that are accessible in the literature. In Chap. 2, the key components of a convolutional neural network and the steps involved in using this for speed breaker detection is explained. Chapter 3 describes the simulations which were run to assess the validity of the new model and discusses the results of the simulations. Chapter 4 describes the conclusion of the proposed method.
2 Methodology CNN was initially used for the prediction of objects and the recognition of patterns like handwritten characters, traffics signs, house numbers, and pedestrian detection.
482
B. Manikandan et al.
Table 1 Related articles based on speed breaker identification Method
Sensors
Smart phone
Accuracy
Fernández et al. [1]
Camera, LIDAR Sensor No
Cartel [3]
Camera, WiFi, OBD device
No
Pothole Patrol [5]
Accelerometer, GPS
No
< 0.2% false positive
Nericell [7]
Accelerometer, GPS, Microphone, GSM
Yes
11.1% false positives and 22% false negatives
Wolverine [8]
Accelerometer, magnetometer, GPS
Yes
10% false negative,21.6% false negative and 2.7% false positive
Goregaonkar et al. [16]
Accelerometer, GPS
Yes
90% true positive
Bharathi [17]
Camera, image processing
No
Nearby all category 90%
Arun priyan [25]
CNN segnet
No
91%
— —
Nowadays CNN is used for object detection and classification. Using CNN is like building a three-tiered cake: the input, hidden layers, and the final output layer. Each layer has a specific number of neurons, which are utilised to determine the categorization of the picture in question. An artificial neural network (CNN) is trained using a collection of labelled data, and the trained network is then used for the classification of real-time inputs. To collect a large number of datasets of speed breakers with different angles and different times. From those data sets to pre-processing like image resize, grayscale conversion etc. After pre-processed image given to CNN network. The network split into the number of training samples and testing samples. Training image trained by the CNN network. Finally, test the samples and the data to predict the speed breaker region of the image. The proposed method consists of the dataset, pre-processing, training the network model, classification of the test data is depicted in Fig. 1.
2.1 Image Acquisition Image Acquisition is the process of getting an input image for the detection of a speed breaker using digital image processing.
2.2 Data Set In order to achieve the detection of speed breakers, some images of rural and urban areas are collected which contains different categories and their subcases. Speed
Speed Breaker Identification Using Deep Learning …
483
Fig. 1 Flowchart of speed breaker detection
breaker data sets are divided into different classes like a black-white, black–yellow, yellow-white and fully unmarked black colour speed breaker. All images can be considered as data set which is equal to 1150 images. Out of these 650 imgages considered as speed breaker and 500 images considered as without speed breaker.
2.3 Data Pre-processing The performance of the CNN networks has been evaluated based on the publicly available road marking data set. The original data set observed from the CNN network is consisting of 1150 images with the resolution of 256 × 256 with ground truth annotations of 4 classes of road markings. In the training samples out of 1150, there are 650 samples are considered as the speed breaker images, another 500 images are considered without speed breaker images. However, we retain the first four classes of road markings for our research work. Several instances in the original data set classes are fewer than 60 images (per each class), they are not good enough for training CNNs.
2.4 Training Parameters An end-to-end learning is demonstrated in this part by utilising a convolutional neural network for speed breaker detection. The Weights, Learning rate, gradient moment and the number of hidden neurons are initialized and tabulated in Table1. CNN has
484 Table 2 Training parameters
B. Manikandan et al. Parameter
Value
Learning rate
0.0001
Weight
0.0002
Bias
0.1
Gradient moment
0.9
Hidden neurons
250
two layers such as two convolution layers and two subsampling layers which are used to increase the accuracy of detection. The hidden units of the hidden neurons will have a zero initial value and will be updated, whenever there is a weight change. Some of the parameter used for training the model of the neural network is shown in Table 2. Matlab was used to simulate the CNN model and extract visual characteristics from the dataset.
3 CNN Architecture Neural networks can be trained with weights and biases. Dot product and nonlinearity function are applied to each input, which has received certain inputs and produces the dot product. From one end of the network to the other, the network expresses the raw picture pixels. From Fig. 2 the CNN architecture illustrates a count of two layers. Each layer has a different performance function. All speed breaker datasets are given to the trained CNN which reduce the dimensions of the image size by using pooling operation. Finally, we classify these images by using the softmax function in the fully connected layer.
Fig. 2 CNN architecture of the proposed system
Speed Breaker Identification Using Deep Learning …
485
Fig. 3 Convolutional layer
3.1 Convolutional Layer The convolutional layer consists of a rectangular grid of several neurons. Each of the input image neurons convolves with the weights which produce the feature maps. According to the convolutional layer each weight produces different feature maps. In the convolutional layer, weight is called the convolutional kernel filter. For this speed breaker image identification filter size was taken by 3 × 3. Finally, the convolutional layer is shown in Fig. 3. The input image X ln to the convolution layer provides the feature map. X lj denotes the lth layers of jth feature map. The convolution layer output of the feature map is which is generated by Eq. (1) ⎛ X lj = f ⎝
⎞ X l−1 ∗ wil j + blj ⎠ j
(1)
i∈M j
where, wl − l layer feature maps corresponding weight matrix, *—Represents the convolution operation for l−1 layer, blj .—bias.
3.2 Pooling Layer The feature map dimensions are reduced while simultaneously maintaining the scale variance by use of the pooling layer, which is also known as a downsampling layer. In pooling size which is selected based on the convolution layer feature maps, for example, the size of the pooling size 2 × 2, 4 × 4.
486
B. Manikandan et al.
Fig. 4 Pooling layer
Different pooling rules have been used such as max pooling, average pooling, stochastic pooling, and overlapped pooling. Figure 4 shows the max-pooling of the network X lj . The feature map into the pooling layer evaluates the pooling layer output by using the Eq. (2) l + b X lj = f β ij pooling X l−1 j j
(2)
where pooling (x)—downsampling function,β ij —the weight of pooling, β-fixed value, b—bias, f—Fully-connected layer. After that convolutional and max-pooling layer, feature maps are linked together to form one dimensional feature vector which is high-level reasoning in the neural network. It takes all the neurons from the convolutional layer and pooling layer. After a fully connected layer, there is no operation of convolution and pooling. As a result of the completely linked layer, the activation function can respond to the weighted total of its inputs as shown in Eq. (3),
X l = f wl X l−1 + bl
(3)
4 Results and Discussion Speed breaker detection is a complicated one in India, due to different dimensions and different colours. Most of the speed breakers can be detected by using the colour feature in the image processing technique. In this study, the effectiveness of the method for detecting speed breakers is evaluated through the use of a CNN simulation. BLOB analysis is used to identify the area around the speed breaker in the detected photos. The method is tested for all types of speed breaker images and the output for some samples is shown in Figs. 5 and 6.
Speed Breaker Identification Using Deep Learning …
Fig. 5 Identification of speed breaker images and result of the detection
487
488
B. Manikandan et al.
Fig. 5 (continued)
Fig. 6 Identification of without speed breaker images and result of the detection
Speed Breaker Identification Using Deep Learning …
489
Table 3 Confusion matrix output data Images
True positive
True negative
False positive
False negative
Accuracy
Black-white
8
6
1
0
90.667
Black-yellow
8
5
2
0
83
White-yellow
7
4
4
0
82
Black
5
4
6
0
78.2333
Overall
28
19
13
0
90%
The confusion matrix using CNN gives true positive, true negative, false positive and false negative. From the analysis, true positive is the correctly classified speed breaker image and false positive is the misclassified speed breaker image which means the network does not predict the speed breaker which is present in the image. The values from the confusion matrix for all types of images are given in the Table 3. It is found from the table that the prediction accuracy of the black speed breaker is less i.e. 80.3%. The black and white speed breakers are predicted with good accuracy (90.6%). The overall accuracy of the CNN is found to be 90%. From the 20 test samples, 18 sample images are detected correctly, 2 images are not detected correctly.
5 Conclusion The speed breakers are used on the roads for pedestrian safety. One of the primary causes of traffic accidents is the overuse of speed breakers that are either too large or lack sufficient warning signs. Speed humps may be detected automatically, which would increase road safety. Smartphones with Google Maps are mostly used to predict the speed breaker. In this work, CNN is used for the prediction of speed breakers. This system has been trained to detect and can be applied to all the speed breakers. The CNN is trained with 1150 images, with the presence and absence of speed breakers. The trained network is used to identify speed limit signs in real-time photos. In real-time 20 test images were considered for this research work. The results are promising (90.6%) for black and white speed breakers and not so good (78.3%) for black speed breakers. The overall accuracy is obtained a 90%. The prediction accuracy can be further will improve by training and testing the data set. CNN based speed breaker detection method can be used in autonomous vehicles when the speed breakers are properly marked.
References 1. C. Fernández, M. Gavilán, D. Fernández Llorca, I. Parra, R. Quintero, A.G. Lorente, L. Vlacic,
490
2. 3.
4. 5.
6. 7.
8.
9. 10. 11.
12. 13. 14.
15. 16.
17. 18. 19. 20.
21. 22. 23.
B. Manikandan et al. M.A. Sotelo, Free space and speed humps detection using lidar and vision for urban autonomous navigation. in 2012 IEEE Intelligent Vehicles Symposium. (IEEE, 2012), pp. 698–703 Z. Wende., Lidar-based road and road-edge detection. in 2010 IEEE Intelligent Vehicles Symposium. (IEEE, 2010), pp. 845–848 H. Bret, V. Bychkovsky, Y. Zhang, K. Chen, M. Goraczko, A. Miu, E. Shih, H. Balakrishnan, S. Madden, Cartel: a distributed mobile sensor computing system. in Proceedings of the 4th ˙ International Conference on Embedded Networked Sensor Systems, (2006), pp. 125–138 H. Al-Barazanchi, A. Verma, S.X. Wang, Intelligent plankton image classification with deep learning. Int. J. Computat. Vision Robot 8(6), 561–571 (2018) E. Jakob, L. Girod, B. Hull, R. Newton, S. Madden, H. Balakrishnan, The pothole patrol: using ˙ a mobile sensor network for road surface monitoring. in Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services. (2008), pp. 29–39 G. Chugh, D. Bansal, S. Sofat, Road condition detection using smartphone sensors: a survey. Int. J. Electron. Electri. Eng. 7(6), 595–602 (2014) M. Prashanth, V.N. Padmanabhan, R. Ramjee, Nericell: rich monitoring of road and traffic conditions using mobile smartphones. in Proceedings of the 6th ACM Conference on Embedded Network Sensor Systems. (2008), pp. 323–336 B. Ravi, N. Vankadhara, B. Raman, P. Kulkarni, Wolverine: traffic and road condition estima˙ tion using smartphone sensors. in 2012 Fourth International Conference On Communication Systems And Networks (COMSNETS 2012). (IEEE, 2012), pp. 1–6 V.P. Tonde, A. Jadhav, S. Shinde, A. Dhoka, S. Bablade, Road quality and ghats complexity analysis using Android sensors. Int. J. Adv. Res. Comput. Commun. Eng. 4(3), 101–104 (2015) P. Shubhangi, D. Jawalkar, R. Chate, S. Sangle, M.D. Umale, S.S. Awate, Safe driving using android based device. Int. J. Eng. Trends Technol. (IJETT) 18, 151–154 (2014) K. Vimalkumar, R.E. Vinodhini, R. Archanaa, An early detection-warning system to identify speed breakers and bumpy roads using sensors in smartphones. Int J Electri Comput Eng 7(3), 1377 (2017) Y. Koichi, Extensional smartphone probe for road bump detection. in 17th ITS World CongressITS JapanITS AmericaERTICO. (2010) A.D. Drume, S.R. Dubey, A.S. Jalal, Emotion recognition from facial expressions based on multi-level classification. Int. J. Comput. Vision Robot. 4(4), 365–389 (2014) R.K. Goregaonkar, S. Bhosale, Assistance to driver and monitoring the accidents on road by using three axis accelerometer and GPS system. in Proceedings 1st International Conference, vol 10. (2014), pp. 1–8 M. Fazeen, B. Gozick, R. Dantu, M. Bhukhiya, M.C. González, Safe driving using mobile phones. IEEE Trans. Intell. Transp. Syst. 13(3), 1462–1468 (2012) M.S. Sivakumar, J. Murji, L.D. Jacob, F. Nyange, M. Banupriya, Speech controlled automatic wheelchair. in 2013 Pan African International Conference on Information Science, Computing and Telecommunications (PACT) (IEEE, 2013), pp. 70–73 M. Bharathi, A. Amsaveni, B. Manikandan, Speed breaker detection using GLCM features. Int J Innov Technol Explor Eng (IJITEE) 8(2), 384–389 (2018) D. Huiming, Z. Xin,Y. Dacheng , ’Road traffic sign recognition algorithm based on computer vision’. Int. J. Computat. Vision Robot. 8(1), 85–93 (2018) B. Manikandan, M. Bharathi, Speed breaker detection using blob analysis. Int. J. Pure Appl. Mathem. 118(20), 3671–3677 (2018) M.S.K. Reddy, L. Devasena, N. Jegadeesan, Optimal search agents of dragonfly algorithm for reconfiguration of radial distribution system to reduce the distribution losses. Int. J. Pure Appl. Mathem. 116(11), 41–49 (2017) P.M. Arabi, G. Joshi, N. VamshaDeepa, Performance evaluation of GLCM and pixel intensity matrix for skin texture analysis. Perspectives in Sci. 8, 203–206 (2018) B. Manikandan, S. Ragavi, Sathya, Review based on different deep learning architecture and their applications. Int. J. Adv. Res. Innov. Ideas Educ. 4(2), 2395–2398 (2018) A. Sathesh, Metaheuristics optimizations for speed regulation in self driving vehicles. J. Inform Technol Digital World 2(1), 43–52 (2020)
Speed Breaker Identification Using Deep Learning …
491
24. S. Gowri, J.S. Vimali, D.U. Karthik, G.A. John Jeffrey, Real time traffic signal and speed violation control system of vehicles using IOT. in International conference on Computer Networks, Big data and IoT. (Springer, Cham, 2019), pp. 953–958 25. J. Arunpriyan, V.V. Variyar, K.P. Soman, S. Adarsh, Real-time speed bump detection using image segmentation for autonomous vehicles. in International Conference on Intelligent Computing, Information and Control Systems. (Springer, Cham, 2019), pp. 308–315
Intelligent System for Diagnosis of Pulmonary Tuberculosis Using XGBoosting Method Sıraj Sebhatu, Pooja, and Parmd Nand
Abstract Tuberculosis (TB) is a disease of human infection that affects the respiratory and other body systems. The WHO reported an average incidence of TB per annum of 2.2 million new cases contracted by individuals. A major public concern for the developing country is to reduce the reproductive rate of transmission and outbreaks of tuberculosis disease. It needs to improve the diagnosis process and encourage patient faithfulness to medical treatment. This research study includes several recent ensemble classification algorithms to choose our base model, achieving high-performance accuracy. The extreme gradient boosting/ XGBoosting model performs the highest testing score of AUC 95.86% using the full optimizing trained model.
1 Introduction Tuberculosis (TB) is still one of the highest 10(ten) diseases for the cause of death among the highest in worldwide infectious human diseases. Since poor access to medical diagnostic testing and care is a major problem for developing nations. Perhaps the developing countries’ most pressing issue is the lack of medical staff, diagnostic tools, and different resources. Mycobacterium tuberculosis is a pathogenic bacterium that causes Tuberculosis and is the causative agent of Tuberculosis [1]. The prevalence of Tuberculosis has been globally estimated at nearly 10.4 million incidences, 82% of multidrug-resistant, and 13% of co-infected with HIV, which was widely confirmed in (2017) by the World Health Organization [WHO] [2–4]. S. Sebhatu (B) · Pooja · P. Nand Computer Science and Engineering Department, Sharda University, Plot No.32-34, KP III, Greater Noida, UP 201310, India e-mail: [email protected] Pooja e-mail: [email protected] P. Nand e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_41
493
494
S. Sebhatu et al.
Bacteriologically positive pulmonary Tuberculosis (PTB) prevalence at the national level was 400 per 100,000 population. TB remains a great public health issue, with approximately two million cases and half a million TB deaths each year [5, 6]. Tuberculosis has been identified using several approaches, including clinical symptoms, tuberculin examination, sputum microscopy, rapid test gene Xpert MTB/RIF, and chest radiography. Moreover, it is time-consuming and a burden of work even for skilled pathologists [7, 8]. This exhausting task frequently leads to low-performance rates and misdiagnoses [8, 9]. Moreover, delaying the treatment and diagnosis of disease aggravates the occurrence of transmission of PTB infection [10–13]. Some of the studies solve the problem used to boost the efficiency and accuracy of diagnostic algorithms to improve the accuracy of screening early diagnosis of PTB. A number of new methods & technologies were introduced. Nevertheless, until now, none of these has proved robust and widely accepted, including PCR and RNA; If a TB case is detected in a population, it will be quite costly or very high to perform a medical test to give a service to everyone in the community. Furthermore, the government may regulate the investigation of TB transmission through a cheap-cost or effective initial screening[14–16]. Several reports of research were conducted recently to aid medical doctors in system support for making a decision and provide a solution to help prevent the spread or incidence of the disease and, to a significant extent, save human lives [4, 4]. Particularly early diagnosis of this Tuberculosis is important to help guide follow-up therapies. But some of the studies reports indicate that it is possible to make errors due to fatigue or inexperience [8] and also delayed in the correct diagnosis for exposure to inappropriate medication [4, 17, 18]. Therefore, the design of various tools and techniques for an intelligent prediction system for pulmonary tuberculosis disease is widely applied in medical informatics [9]. Machine learning-based intelligent prediction systems could improve the tuberculosis diagnosis capability, high availability [19], and reduce diagnosis errors [20, 21]. Moreover, these systems can provide better accurate decision support for the medical practitioners for early disease identification of pulmonary Tuberculosis, an opportunity to provide rapid treatment control multidrug resistance prevalence of the disease [19, 22]. According to these difficult circumstances, requests to enact potential alternative approaches for PTB diagnosis are important, bringing down the cost and time resource, particularly in remote areas with challenges to getting better health care services. This research aims to build a classification model for the initial screening of pulmonary Tuberculosis (PTB) disease. This model can be applied to determine whether an individual has been infected with PTB or not, based on clinical symptoms and other key factors in the population [5]. The physical examination and clinical active pulmonary tuberculosis symptoms are cough, fever, hemoptysis, weight loss, night sweat, duration_ predominant symptom its duration (PDSD), Visual appearance of sputum, HIV, Diabetics, etc. [23]. key population around the individual exposed to the number of factors like as age, gender, contact TB person, tobacco, prison inmates, miner, migrant, refugee, urban slum, Healthcare workers associated with medical expertise have been used to train the model, [4, 4]. Still, it does have limitations such
Intelligent System for Diagnosis of Pulmonary Tuberculosis …
495
as low accuracy, long observation time, etc. Consequently, an efficient TB screening method is needed [24]. The first part provides a detailed description of Tuberculosis, the problem area background, the aim of the research, and the relevance of research findings. The second part describes previous research. The third part will discuss the methodologies used to carry out this research. Eventually, the fourth part discusses the result and conclusions.
1.1 Research Contribution • To Increase the accuracy of the initial screening of pulmonary tuberculosis disease diagnosis • To minimize the time of diagnosis for initial screening of pulmonary tuberculosis disease and categorize the appropriate attributes/feature to achieve greater diagnostic accuracy. • Optimized ensemble classification algorithm to reduce the false-positive error rate, increase diagnosis accuracy and reduce screening time.
2 Related Work Ensemble classifier is the most effective way to build a series of classifiers instead of a single classification to classify or assign a new class. That combination of the effects of many (various) classifiers will improve the efficiency of the model. Analyzing and pre-processing a digital image using ANNs trained to determine the appropriate feature of segmentation of smear microscopy image result reduce time compared to the time required to a segment of the image without processing for identifying pulmonary Tuberculosis [9]. The latest improvements in ensemble models and the availability of massive datasets have support algorithms that perform numerous diagnostic tasks for respiratory diseases, such as PTB classification [24]. This model significantly improved screening the accuracy of the disease compared with a ruled-based method. The study used deep learning and classic machine learning based on physical indicators of symptoms biochemical observations to diagnose adult asthma. Their research has lung and bronchial challenge check accuracy, 60% SVM accuracy, and 65% logistic analysis accuracy [25]. Other work considered ensemble technique enhancing the integration of various classifiers, performing high accuracy on a single ensemble model and a model provide a solution to reduce the isolation of PTB patients [26]. The same year, diagnosing TB detection by implementing deep learning and machine learning, especially a conventional neural network model, achieved high sensitivity and moderate specificity [8]. The neural network was used to determine high-risk patients for the assessment of pulmonary TB diagnostic accuracy system showed that
496
S. Sebhatu et al.
sensitivity was 75–87.2%, specificity 53.5–60%, and AUC. 64–74% [21]. Among the other, researchers artificial neural networks applied to prediction, classification, and MLP models widely used for problem extraction tasks and built powerful decision rules. Both models are achieved high sensitivity values compared with the previous studies [19]. Due to the comparison of other studies, Artificial neural network (ANN) models reduce the time and cost of insufficient medical laboratory equipment to make better medical decisions. Moreover, this study improves the detection rate using the self-organizing map (SOM) network and achieves a sensitivity accuracy of 89% [27]. According to the analysis, which applies Support Vector Machine (SVM) and decision tree (C5.0) based on a multi-objective gradient evaluation, medical test PTB can easily detect infection and achieve a more reliable diagnostic outcome [20]. On the one hand, classification of PTB based on a random forest model that performs early diagnostic optimization 81% of the area under the curve (AUC.) [28] and the other study in 2017 using SVM, C5.0 shows the results of the model perform as better accuracy of 85.54% [29]. At the same year assessment of PTB characteristics using the neural network, the model performs with the highest accuracy using the pruning method [13]. The decision tree also optimizes the time to verify results compared to the nearest k neighbors [30]. A classification model based on a single MLP with a better accuracy performance obtained a sensitivity of 83.3% and specificity of 94.3% [31]. Several studies were done in the instance of a PTB diagnosis; a continuous scenario pattern is used to determine the relationship between the sign and the symptoms as the basic diagnostic process for the disease [32]. We concluded in our previous study that the most common international accepted standard guideline used for direct observation of therapy [33] categorical and numerical variables for diagnosis of Tuberculosis are age, greater than two weeks cough, fever, hemoptysis, weight-loss, night-sweat, and Visual appearance of sputum as respiratory signs and symptoms respectively [4, 9, 9, 29–31, 33–37]. Some of the Independent features as stated in this study history of pulmonary disease, demographic history, physical examination and HIV, Diabetics, [1, 1, 31] key population factors such as contact tuberculosis person, tobacco, prison inmates, miner, migrant, refugee, urban slum, healthcare worker [13, 29, 29].
2.1 Ensemble Learning for Classification Ensemble learning is a very complex field of Machine learning (ML) & Artificial intelligence (AI), where we could have merged many learners to improve our model’s predictive power. For a supervised learning classification technique, Ensemble learning has been advocated to minimize bias in classification. Variance bias is defined as a systematic learning error, and the algorithm itself affects the results.
Intelligent System for Diagnosis of Pulmonary Tuberculosis …
497
3 Research Materials and Methods Ensemble methods (EM) represent people’s co-decision process in the treatment of hard decisions. The core idea in machine learning is to create strong predictors with weak but distinct models combined [38]. In most cases, EM aims to achieve more effective and robust solutions than alternative individual models to solve complex problems [38, 39]. In most cases, EM’s goals are to achieve more effective and robust solutions than alternative individual models to solve complex problems (47). The scikit-learn library provides to select the K best class that can be used to choose a set of instances with a suite of different statistical tests such as univariate selection, chisquare(chi2 ). Univariate selection is one of the statistical tests that can be the select important feature or the output variable. Those are making a significant contribution to the proposed model, such as reducing overfitting, improving model accuracy, and reducing the training time of the algorithm. The steps of the designed model are detailed demonstrated in Fig. 1 .
3.1 Data Source The dataset in this study comprises 4322 Plumonar Tuberculosis out_patient and Inpatient collected from Greater Noida Sharda University Hospital Patient History record books from 2015 to 2019 in Greater Noida, India. The research and Ethics committee of the Sharda University approved this research work in addition to the informed consent was not needed. The live dataset is obtained/ taken from the respiratory medicine department, especially the pulmonary tuberculosis department. The study live-dataset is collected from the outpatient, Inpatient and handled manually.
Fig. 1 Proposed approach for PTB diagnosis
498
S. Sebhatu et al.
Table 1 List of all variables and data type No
Name of variables
Data types
No
Name of variables
Data types
1
Age
Numerical
12
Fever
Categorical
2
Gender
Categorical
13
Hemoptysis
Categorical
3
Contact TB person
Categorical
14
Weight loss
Categorical
4
Tobacco
Categorical
15
Night sweat
Categorical
5
Prison inmates
Categorical
16
Duration of symptom
Numerical
6
Miner
Categorical
17
Sputum a. Mucopurulent
Categorical
7
Migrant
Categorical
b. Saliva
Categorical
8
Refugee
Categorical
c. Bloodstained
Categorical
10
Healthcare worker
Categorical
18
Human immune deficiency virus (HIV)
Categorical
11
Cough
Categorical
19
Diabetics
Categorical
We observed a challenge at the time of data collection and data entry. Most of the fields are complete. Some of the fields are not completed and are useless, and the most important variable or fields are included in the patient history recorded document/history book. This history card/book is written clearly and correctly; visible to read. We entered the data set manually and converted the record in excel format to ease analysis and prediction; it took a long time. Nineteen (19) variables (eighteen categorical and two numeric) were recorded in each instant patient history card and PTB patient record books. The PTB patients detect the target class; this class has two result values (PTB_positive, PTB_negative). A full description of the PTB datasets shows in the following table (Table 1). The database is the most significant medical record of the patients. The Selected attribute or element based on respiratory medical doctors used to identify suspected patients using these features like clinical signs, symptoms, physical exams, demographic background data, and main population external factors as findings relevant to initial pulmonary tuberculosis diagnosis models.
3.2 Characteristics of PTB Data Exploration Data analysis is an important task to explore or describe and visualize data attributes, select data, and attribute subset. Moreover, describing the finding using charts, plots, and different data visualization techniques of correlating the variables, can also make it easier for users to see these relationships Figs. 2, 3 and 9. Understanding the relationships between variables becomes a fundamental mechanism in constructing a reasonable model. In this study, the new missing value treatment method replaces the value mode, simply applying for values of categorical variables, mean for numerical values, and outliers are treated.
Intelligent System for Diagnosis of Pulmonary Tuberculosis …
499
Fig. 2 Distribution of age
Fig. 3 Distribution predominant symptom
3.3 Data Preparation The processing of data is then a necessary step in the application of medical information. The entire data preparation process was carried out in this report. Some helpful tools that can help to manage data more efficiently are supported to explain data clearly, and several classification features such as categorical features and numerical features are suggested to help extract information, it leads to building a better classification model for prediction of the disease used as pre-stage of machine learning methods. We also used standardization as estimators to create different data visualization requirements.
500
S. Sebhatu et al.
3.4 Data Cleaning Pre-processing data is a preliminary stage of building a model-efficient and effective classifier algorithm. It is a process in which missing attribute values are replaced by appropriate values depending on the objective function or noise data that can be filtered closely related. Attribute value using in this study cough, fever, sputum, weight loss, night sweat “Yes” & “No” value. The missing values, data inconsistencies, and handling ambiguous attribute values are treated and normalized.
3.5 Missing Value Treatment Cleaning PTB patient dataset record is not complete to all the patient information for many reasons. One of the first problems missing is forgetting the value of important features by the medical doctor due to this factor challenging to get complete information from manual medical records. Even though machine learning techniques efficiently adjust missing value and noise data, the technique is not removed or ignored from the dataset; hence, it may contain valuable information. Therefore, we used preprocessing treatment Suggest several alternative options to replace the value of missing data like backward and forward fill technique, mean, median, and mode discretization in a sequential fashion or parallel methods. Several scholars analyzed and compared each other to complete missing values in place to handle missing values. The most appropriate solution to fill in the Missing value substitutes the mean/median attribute in the case of numerical values, and the most common value or value for categorical attributes, a probable feature value (feature mode), is used.
3.6 Outlier Treatment Outlier is an extreme data value far from another instance in a dataset sometimes; data values diverge from the overall trends. It is fundamental to detect the extreme or diverged data values from the distribution. Unless outliers are detected early, they will cause serious issues for statistical analysis. Mainly it has a significant impact on mean and standard deviation. Therefore, removing them before analysis and fitting training data to the classifier is necessary. Thus, Fig. 2 points outside the upper and lower limits are outliers. This set of values is considered for further analysis since it lies between the upper and lower bounds. Others data values will remove using distance-based methods clustering techniques [16]. Most PTB data distrubtion outliers come from human error or data entry error and natural error. The study used the Inter-quartile range method to remove it. IQR tells the distribution of the instance values in the dataset. IQR is the difference between the upper quartile (75%) and lower quartile (25%).
Intelligent System for Diagnosis of Pulmonary Tuberculosis …
501
Outliers defined as: Values below Q1 − 15∗ Q3 − Q1 or above q3 + 15∗ Q3 − Q1
(1)
Q1 = 1/4(N + 1)th Q2 = 1/2(N + 1)th Q3 = 3/4(N + 1)th
(2)
def outlier capping(x): x = x.clip_upper(x.quantile(0.99))
(3)
x = x.clip_lower(x.quantile(0.01))
(4)
mod_df_num = mod_df_num.apply(lambda x : outlier_capping(x))
(5)
return x
3.7 Data Transformation and Discretization Data discretization Data could be in a discrete or continuous form, which is considered the major problem in data pre-processing in general. Several techniques do not handle the case when the data features are continuous. The Visualization techniques such as Practical Principle analysis involve a binary form of data. The discretization process using a continuous attribute also requires to be converted into categorical attributes. And for numerical attributes. Discretization was also achieved by applying a binning of equal width for age attribute and length of symptoms.
4 Modeling The main purpose of this study is to classify pulmonary tuberculosis. The study focuses on pre-screening and gives different results to the model. This model used a training and testing dataset. Finally, this study will be a resource for future researchers. The developed models always depend on the training sets indicated as classification rules or decision trees. Constructing models with proper generalization capacity forms the important goal of the learning algorithm and models, which help accurately predict the class labels. for the experiment, we used bagging, logistic regression, SVM, MLP neural network, random forest, and K-nearest neighbor, XGBoosting.
502
S. Sebhatu et al.
5 Result and Discussion The main purpose of this study is to design a classification model for pulmonary tuberculosis diagnosis by applying a machine learning algorithm. Particularly TB screening is very difficult because of several patient occurrences or visitors in outpatient and Inpatient (admitted) in the case of pulmonary tuberculosis treatments. But Asthma, bronchial, lung, and other associated infections can also have common symptoms similar to tuberculosis. Physicians use international guidelines and clinical variables to minimize misdiagnosis for common symptoms. So one of the factors to increase the risk of morbidity and mortality is delaying effective treatments and misdiagnosis results. All these cases increase the prevalence of the disease in society. Moreover, the long delays in pulmonary tuberculosis diagnosis contribute to an increased risk of developing multi-drug-resistant TB. The individuals who are not receiving appropriate therapy could contribute to the spread of the disease in the community [40]. Then we discussed with respiratory medical doctors which inappropriate attributes could be removed and help improve the model’s accuracy. We have agreed to remove some of them from the dataset. This study selects important attributes that are gender, age, cough, fever, hemoptysis, weight loss, night sweat, predominant period symptom (PDSD), the visual appearance of sputum, HIV., diabetics, and other main population factors contact TB, person tobacco, inmates, miners migrants, refugees, urban slums, health care workers, and also the class outcome of the attribute namely early diagnosis. Classification techniques in diagnosing pulmonary Tuberculosis have recently been commonly used. These algorithms include logistic regression, Random forest, Decision tree, XGB, Multilayer Perceptron neural networks (MLP), and Support Vector Machines (SVM). The dataset has been used to experiment, and they were categorized using the seven experimental classifiers. Experimental data sets were trained with cross-validation (tenfold). Both SVM and MLP showed 89.4% of the most accurate results, but the AUC. Results showed that SVM was 71.9% and MLP was 86.7%, followed by Random Forest and Neighbors, accuracy is 89.24% and 88.17%, respectively.
5.1 Performance Evaluation Metrics The performance of the classification technique has been evaluated using the following four types of data. (i) number of correctly classified data (True PositiveT P), (ii) the number of correctly non-classified data that do not exactly belong (True negative-T N), (iii) the number of misclassified data that incorrectly assigned to the relevant class (False positive-F P) and (iv) the data that must belong to the class but it is misclassified (False Negative: F N). These four components constitute the confusion matrix. This computes the accuracy (fig.) of the classification techniques with the help of confusion metrics. Accuracy is one of the primary metrics
Intelligent System for Diagnosis of Pulmonary Tuberculosis …
503
that has been widely used in the evaluation”. Several requirements are available to evaluate the models’ performance, such as sensitivity, specificity, etc. We applied cross-validation to illustrate predictive differences in the existing datasets Fig. 4. Accuracy =
TP + TN TP + FP + FN + TN
Recall =
(6)
TP TP + FN
(7)
Specificity =
TN FP + TN
(8)
Precision =
TP TP + FP
(9)
These depend on the above evaluation metrics equations. We evaluate the performance of each classifier even though they are used to evaluate the performance of each classification of the technique of AUROC. This measurement technique evaluates the actual result of the model. These confusion matrices describe the cross-tabulation of observed and expected values for the classifier. The dependent variable expected or predicted value is represented in the column, while the observed or actual value is represented in rows. The Receiver operating characteristics (ROC.) with a cross-validation curve is a two-dimensional graph constructed by plotting features for false positive and true positive uses. The graph shows X-axis and Y-axis, respectively, for each cutoff point in class probabilities. The line with 0 intercepts and slope of 1 represents an AUC. of 0.5. Especially to evaluate binary classifiers when there is an imbalanced class or inconsistent cost for misclassification classes. Moreover, the curve indicates that the variance of the weight of the training set is split into the various subset shown in Fig. 4 and the splits to generate K-fold cross-validation each obtained listed in Table 2. Table 2 Model performance evaluation Author
Classifiers
Benjwan et al. (2020)
Random forest
92.31
65%
86.84%
Toktam et al. (2020)
Stacked ensemble
92.89
83%
94%
Rusdah and Mohamed (2020)
Random forest
90.59
Zhixu et al. (2020)
Xpert/MTB
91
75%
79.2%
Hooman et al. (2021)
Neural Network
94.16
74.55
90.51
95.86
89%
98%
Proposed model XGboosting
AUC (%)
Specificity
Sensitivity
90.53%
504
S. Sebhatu et al.
Fig. 4 Cross-validation with receiver operating characteristics (ROC)
Table 3 Classification model comparison Model/classifiers
Train time
Test time
Training accuracy
Testing accuracy
Accuracy of the model
Logistic regression
0.0809
0.0040
92.8
82.7
89.2
Decision tree
0.0044
0.543
90
89
90
Random forest
0.1937
0.0442
99
95
94.38%
Knearest neighbors
0.1745
0.0562
89
88
88.17%
MLP
0.8792
0.0050
98
97
89.24%
SVM (Rbf)
0.1699
0.0975
92.54
89.4
89.4%
Adaboost
0.1252
0.0335
92.46
90.02
90.16
XGB
0.3808
0.0141
99.2
98.05
97
5.2 Classification Results of the Ensemble Classifier Algorithms We define all the classes for the entire dataset distributed through training and testing and compare the model. For each dataset, we evaluate the performance of each ensemble model classifier algorithm. Table 3 shows that our proposed model can achieve the state of the art accuracy in all datasets over other feature-based machine learning methods. The comparisons result of classification accuracy performance among those described on histogram Fig. 5 MLP. Neural networks spend more time training than other classifiers and take less time to test Decision tree classifiers. The detailed result of using (ten) cross-validations each obtained is listed in Table 3.
Intelligent System for Diagnosis of Pulmonary Tuberculosis …
505
Fig. 5 Classification model performance comparison
5.3 Support Vector Machine The concept of SVM arises from statistics. SVMs were initially created for binary classification; moreover, they can be appropriately extended to multiclass levels. Distinguish the data point, and the SVM classifier requires a hyperplane. SVM has several attractive features for that reason; It is building a hyperplane in a unique information area to isolate the focused information to execute SVM [41]. The parameters of costs (C: 1 or 10) and of the kernel (linear and RBF: gammas = [0.001, 0.01, 0.1, 1]) for the study of the SVM based on the number of reasons resulting from the optimization of the hyperparameter with the Keras mold Talos ( kernel, ’ linear’, ’ RBF ’: C = [0.001, 0.01, 0.1, 1, 10]) and (gammas = [0.001, 0.01, 0.1, 1]) [42]. The grid search scoring was based on the mean Receiver operating characteristic of the AUC achieved by data rating cross-validation (tenfold). Use the same form of machine learning to train and test the current data set. The trained model used a complete set of features and variables 19, composed of physical and clinical signs symptoms. We obtained a better model accuracy performance of 89.4% Fig. 6. The ensemble model evaluation performance can provide more accurate information to further knowledge needed to boost the diagnosis accuracy. This curve quantifies the accuracy and evaluates the classification method output (Table 2). Therefore the ROC curve was used for interpretation and computing.
506
S. Sebhatu et al.
Fig. 6 ROC curve for SVM
5.4 Random Forest Random forest is an ensemble-based supervised learning algorithm used for both classification and regression and other tasks that operate by constructing a multitude of decision trees at training time. After that voting mechanism selects the prominent nodes in the forest, this classification technique overcomes the decision tree overfitting error [11].
5.5 XgBoost XGBoost is an algorithm that has recently dominated applied competitive machine learning and implemented gradient boosted decision trees designed for speed and performance. We have reasons to apply the XGBoost in the first execution to speed the second model performance, and it helps to minimize the loss as a time of adding new models. This approach is suitable for our research; it supports design classification for predicting model problems and the best technique for feature selection. determines the appropriate variable [42–44]. The diagnostic accuracy based on signs and symptoms of PTB disease classification of Xgbosting was AUC of 95.86%. The categorical features in the dataset were converted to binary variables, making normalized the binary to enable machine learning algorithms to perform the analysis in a better and more consistent manner. Some of them like as yes or no, reactive or non-reactive, unknown.so that makes normalize using in terms of the element value 0 and 1. Missing pertinent information has been imputed by making treatments for the missing value of instances and outlier detection. Moreover, the data set was grouped using binning techniques to reduce bias error and aggregate so many values to do better analysis. Furthermore, to select important features to the model building, we used the parameter optimization tool and refined the Random forest model and XGBoost the full features Fig. 8.
Intelligent System for Diagnosis of Pulmonary Tuberculosis …
507
The model was checked using the test dataset. This model’s performance is compared to the prediction of PTB models found in the related studies in Fig. 8 important variables. The proposed model provides a significant improvement on the grounds of these steps.
5.6 Misclassification Error The model with the highest efficiency and lower incorrect classification error rate will validate a dataset. This result shows accuracy, misclassification error rate, positive predictive values, and sensitivities evaluated with the existing estimation Fig. Moreover, Different input variables have used their requirements to improve the ability to identify the PTB diagnosis model. Assessing the dataset’s quality through preprocessing will be important for identifying bias, outliers, missing and duplicated values from the data set. The varied K is tested to identify pulmonary tuberculosis patients, and it obtains a higher classification rate. The classification is appropriate means searching the K values has the minimum threshold probability is 0.5. if the threshold value is less than 0.5, this observation is an unclassified class. Common approaches for highly associated features are redundant and may affect the efficiency of algorithms for machine learning. The correlations between the attributes are shown in Figs. 7, 9 which is from So, [20] we had to reduce the feature 19 features that have been ranked using both classifiers such as XGBoost and Random forest.
Fig. 7 Rankings of PTB features
508 Fig. 8 Misclassification error
Fig. 9 Correlation plot for PTB attribute’s
S. Sebhatu et al.
Intelligent System for Diagnosis of Pulmonary Tuberculosis …
509
6 Conclusion The proposed ensemble classification-based diagnosis model for the initial screening of PTB is highly reliable for an accurate diagnosis. The clinical symptom and physical examination apply the classification algorithms and feature selection technique. This study has designed to implement a machine learning approach and an intelligent approach for the initial diagnosis of PTB. Therefore the proposed ensemble method maximizing the capability of classifiers achieving a high accuracy 89.4 of SVM and another classifier MLP neural network 89.2 perform using 19 attributes. Finally, the model performs better in accuracy for decision-making for the initial diagnosis of pulmonary tuberculosis disease prediction show the comparison in Table 2. The extreme gradient boosting or XGBoosting model performs the highest accuracy of 97.02% and the testing score of AUC 95.86% using the full optimizing trained model. These classifiers would improve the accuracy and effectiveness of initial PTB diagnostics and monitor the health care and minimize the cost and time of the patient’s treatment service.
References 1. P. Dande, P. Samant, Acquaintance to artificial neural networks and use of artificial intelligence as a diagnostic tool for tuberculosis, a review. Tuberculosis 108, 1–9 (2018) 2. E. Winarko, R. Wardoyo, Preliminary diagnosis of pulmonary tuberculosis using ensemble method. in 2015 International Conference on Data and Software Engineering (ICoDSE). (IEEE, 2015), pp. 175–180 3. S. Natarajan, K.N. Murthy, A data mining approach to the diagnosis of tuberculosis by cascading clustering and classification. arXiv preprint arXiv, (2011), pp. 1108–1045 4. S.S. Meraj, R. Yaakob, A. Azman, S.N. Rum, A.A. Nazri, Artificial Intelligence in diagnosing tuberculosis: a review. Int. J. Adv. Sci. Eng. Inform. Technol 81–91 (2019) 5. R. Sarin, V. Vohra, U.K. Khalid, P.P. Sharma, V. Chadha, M.A. Sharada, Prevalence of pulmonary tuberculosis among adults in selected slums of Delhi city. Indian J. Tuberculosis 130–134 (2018) 6. S. Gupta, V. Arora, O.P. Sharma, L. Satyanarayana, A.K. Gupta, Prevalence and pattern of respiratory diseases including Tuberculosis in elderly in Ghaziabad–Delhi–NCR. Indian J. Tuberculosis 236–41 (2016) 7. A.B. Suthar, P.K. Moonan, H.L. Alexander, Towards national systems for continuous surveillance of antimicrobial resistance: lessons from tuberculosis, PLoS Med. (2018) 8. D.J. Horne, M. Kohli, J.S. Zifodya, I. Schiller, N. Dendukuri, D. Tollefson, S.G. Schumacher, E.A. Ochodo, M. Pai, K.R. Steingart, Xpert MTB/RIF and Xpert MTB/RIF ultra for pulmonary tuberculosis and rifampicin resistance in adults. Cochrane Database of Systemat. Rev. (2019) 9. J.L. Díaz-Huerta, A. del Carmen Téllez-Anguiano, J.A. Gutiérrez-Gnecchi, O.Y. ColinGonzález, F.L. Zavala-Santoyo, S. Arellano-Calderón, Image preprocessing to improve AcidFast Bacilli (AFB) detection in smear microscopy to diagnose pulmonary tuberculosis. in 2019 International Conference on Electronics, Communications and Computers (CONIELECOMP), (IEEE Press, 2019), pp. 66–73 10. C.T. Sreeramareddy, Z.Z. Qin, S. Satyanarayana, R. Subbaraman, M. Pai, Delays in diagnosis and treatment of pulmonary tuberculosis in India: a systematic review. Int. J. Tuberculosis Lung Disease 255–266 (2014)
510
S. Sebhatu et al.
11. P. Ghosh, D. Bhattacharjee, M. Nasipuri, A hybrid approach to diagnosis of tuberculosis ˙ from sputum. in 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), (IEEE Press, 2016), pp. 771–776 12. I. Goni, C.U. Ngene, I. Manga, N. Auwal, J.C. Sunday, Intelligent system for diagnosing tuberculosis using adaptive neuro-fuzzy. Asian J. Res. Comput. Sci. 1–9 (2018) 13. F.F. Jahantigh, H. Ameri, Evaluation of TB patients characteristics based on predictive data mining approaches. J. Tuberculosis Res. 13–22 (2017) 14. A.A. Shazzadur Rahman, I. Langley, R. Galliez, A. Kritski, E. Tomeny, S.B. Squire, Modelling the impact of chest X-ray and alternative triage approaches prior to seeking a tuberculosis diagnosis. BMC Infect. Diseases 1–1 (2019) 15. S. Jaeger, A. Karargyris, S. Candemir, L. Folio, J. Siegelman, F. Callaghan, Z. Xue, K. Palaniappan, R.K. Singh, S. Antani, G. Thoma, Automatic tuberculosis screening using chest radiographs. IEEE Trans. Med. Imaging. (IEEE Press, 2013), pp. 233–45 16. N. Umar, Cost-effectiveness analysis of tuberculosis control strategies among migrants from Nigeria in the United Kingdom (Doctoral dissertation, University of East Anglia) (2015) 17. W. Rusdah, E. Edi, Review on data mining methods for tuberculosis diagnosis. Inform. Syst. 563–568 (2013) 18. N. Khan, ERP-communication framework: aerospace smart. Int. J. Comput. Sci. Inform. Secur. (2011) 19. J.B. Souza Filho, M. Sanchez, J.M. Seixas, C. Maidantchik, R. Galliez, A.D. Moreira, P.A. da Costa, M.M. Oliveira, A.D. Harries, A.L. Kritski, Screening for active pulmonary tuberculosis: development and applicability of artificial neural network models. Tuberculosis (Edinburgh, Scotland, 2018), pp. 94–101 20. F.E. Zulvia, R.J. Kuo, E. Roflin, An initial screening method for tuberculosis diseases using a multi-objective gradient evolution-based support vector machine and c5. 0 decision tree. in IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), (IEEE Press, 2017), pp. 204–209 21. V.I. Klassen, A.A. Safin, A.V. Maltsev, N.G. Andrianov, S.P. Morozov, A.V. Vladzymyrskyy, AI-based screening of pulmonary tuberculosis: diagnostic accuracy. J. eHealth Technol. Appl. 28–32 (2018) 22. R. Zachoval, P. Nencka, M. Vasakova, E. Kopecka, V. Boroviˇcka, J. Wallenfels, P. Cermak, The incidence of subclinical forms of urogenital tuberculosis in patients with pulmonary tuberculosis. J. Infect. Public Health 243–245 (2018) 23. K.S. Mithra, W.S. Emmanuel , GFNN: gaussian-Fuzzy-neural network for diagnosis of tuberculosis using sputum smear microscopic images. J. King Saud Univers.-Comput. Inform. Sci. 1084–95 (2021) 24. O. Stephen, M. Sain, U.J. Maduh, D.U. Jeong, An efficient deep learning approach to pneumonia classification in healthcare. J. Healthcare Eng. (2019) 25. K. Tomita, R. Nagao, H. Touge, T. Ikeuchi, H. Sano, A. Yamasaki, Y. Tohda, Deep learning facilitates the diagnosis of adult asthma. Allergology Int 456–461 (2019) 26. E.D. Alves, J.B. Souza Filho, A.L. Kritski, An ensemble approach for supporting the respiratory isolation of presumed tuberculosis inpatients. Neurocomputing 289–300 (2019) 27. A.D. Orjuela-Cañón, J.E. Mendoza, C.E. García, E.P. Vela, Tuberculosis diagnosis support analysis for precarious health information systems. Comput. Methods Programs Biomed. 11–17 (2018) 28. Y. Wu, H. Wang, F. Wu, Automatic classification of pulmonary tuberculosis and sarcoidosis based on random forest. in 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). (IEEE Press, 2017), pp. 1–5 29. E. Rusdah, Winarko, R. Wardoyo, Predicting the suspect of new pulmonary tuberculosis case using SVM, C5. 0 and modified moran’s I. Int. J. Comput. Sci. Netw. Secur. 164–71 (2017) 30. S. Benbelkacem, B. Atmani, Benamina, Treatment tuberculosis retrieval using decision tree. in ˙ 2013 International Conference on Control, Decision and Information Technologies (CoDIT). (IEEE Press, 2013), pp. 283–288
Intelligent System for Diagnosis of Pulmonary Tuberculosis …
511
31. E.D. Alves, J.B. Souza Filho, R.M. Galliez, A. Kritski, Specialized MLP classifiers to support the isolation of patients suspected of pulmonary tuberculosis. in 2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence. (IEEE Press, 2013), pp. 40–45 32. M.S. Hossain, F. Ahmed, K. Andersson, A belief rule based expert system to assess tuberculosis under uncertainty. J. Med. Syst. 1–11 (2017) 33. S. Sebhatu, A. Kumar, S. Pooja, Applications of soft computing techniques for pulmonary tuberculosis diagnosis. Int. J. Recent Technol. Eng. 1–9 (2019) 34. S. Kulkarni, S. Jha, Artificial intelligence, radiology, and tuberculosis: a review. Academic Radiol 71–75 (2020) 35. M.T. Khan, A.C. Kaushik, L. Ji, S.I. Malik, S. Ali, D.Q. Wei, Artificial neural networks for prediction of tuberculosis disease. Frontiers Microbiol 395–403 (2019) 36. A. Yahiaoui, O. Er, N. Yumu¸sak, A new method of automatic recognition for tuberculosis disease diagnosis using support vector machines. Biomed. Res. 4208–4212 (2017) 37. N. Aini, H.R. Hatta, F. Agus, Z. Ariffin, Certain factor analysis for extrapulmonary tuberculosis diagnosis. in 2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI) (IEEE Press, 2017), pp. 1–7 38. B. Kaewseekhao, N. Nuntawong, P. Eiamchai, S. Roytrakul, W. Reechaipichitkul, K. Faksri, Diagnosis of active tuberculosis and latent tuberculosis infection based on Raman spectroscopy and surface-enhanced Raman spectroscopy. Tuberculosis 462–491 (2020) 39. T. Khatibi, A. Farahani, S.H. Armadian, Proposing a two-step decision support system (TPIS) based on stacked ensemble classifier for early and low cost (step-1) and final (step-2) differential diagnosis of Mycobacterium tuberculosis from non-tuberculosis Pneumonia. arXiv preprint (2020) 40. M. Beccaria, T.R. Mellors, J.S. Petion, C.A. Rees, M. Nasir, H.K. Systrom, J.W. Sairistil, M.A. Jean-Juste, V. Rivera, K. Lavoile, P. Severe, Preliminary investigation of human exhaled breath for tuberculosis diagnosis by multidimensional gas chromatography–time of flight mass spectrometry and machine learning. J. Chromatography 1074, 46–50 (2018) 41. M. Claesen, F. De Smet, J. Suykens, B. De Moor, EnsembleSVM: a library for ensemble learning using support vector machines. arXiv preprint (2014) 42. M. Syafrullah, Diagnosis of smear-negative pulmonary tuberculosis using ensemble method: a preliminary research. in 2019 6th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI). (IEEE Press, 2019), pp. 112–116 43. Z. Chen, H. Jiang, Y. Tan, T. Kudinha, J. Cui, L. Zheng, C. Cai, W. Li, C. Zhuo, Value of the Xpert MTB/RIF assay in diagnosis of presumptive pulmonary tuberculosis in general hospitals in China. Radiol. Infectious Diseases 147–53 (2020) 44. H.H. Rashidi, L.T. Dang, S. Albahra, R. Ravindran, I.H. Khan, Automated machine learning for endemic active tuberculosis prediction from multiplex serological data. Scientif. Rep. 1–12 (2021) 45. M.H. Lino Ferreira da Silva Barros, G. Oliveira Alves, L. Morais Florêncio Souza, E. da Silva Rocha, J.F. Lorenzato de Oliveira, T. Lynn, V. Sampaio, P.T. Endo, Benchmarking machine learning models to assist in the prognosis of tuberculosis. (Informatics, Multidisciplinary Digital Publishing Institute, 2021) 46. H.H. Rashidi, L.T. Dang, S. Albahra, R. Ravindran, I.H. Khan: Automated machine learning for endemic active tuberculosis prediction from multiplex serological data. Scientif. Reports 1–12 (2021) 47. S.P. Kailasam, Prediction of tuberculosis diagnosis using weighted KNN classifier. 502–509 (2021)
Web Based Voice Assistant for Railways Using Deep Learning Approach Prasad Vadamodula, R. Cristin, and T. Daniya
Abstract Speech recognition is the ability of machine or program to respond to spoken commands. It works using algorithms through acoustic and language modelling. It is a technique which enables hands-free control of various devices to interact with machines. This project deals with developing a speaking machine using the deep learning algorithm named Linear Predictive Coding (LPC) or Linear Predictive Speech Synthesis (LPSS) which can be used by customers or passengers in a railway station for an enquiry. This helps the people to enquire the details of trains for the limited questionnaire, identified by the name of the train with reference station location and its availability of trains at the real-time for required destinations. This system gives the information about the platform number for all the train arrivals with the names of the trains and the details of the ticket fare for different destinations. These can be done through giving the voice input to the developed system.
1 Introduction Deep learning is also known as deep structured learning and is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Most modern deep learning models are based on artificial neural networks, specifically convectional neural networks (CNN)s, although they can also include propositional formulas or latent variables organized layer-wise in deep generative models like the nodes in deep belief networks and deep Boltzmann machines. It allows us to P. Vadamodula (B) · R. Cristin Department of Computer Science and Engineering, GMRIT, Rajam, India e-mail: [email protected] R. Cristin e-mail: [email protected] T. Daniya Department of Information Technology, GMRIT, Rajam, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_42
513
514
P. Vadamodula et al.
coach Artificial Intelligence(AI) to predict outputs, given a group of inputs. Both supervised and unsupervised learning can be used to train the AI. In deep learning, each level learns to rework its input file into a rather more abstract and composite representation.
2 Literature Survey This study [1] focuses on basic ideology of Linear Predictive Coding (LPC) with the usage of Mel-Frequency Cepstrum Coefficients (MFCC). LPC ideology is to let the basic precise boundaries of speech given through audio or text files known as discourse. The tests give the ability to processing the direct expectation model of discourse. Whereas, the illustration of MFCC on LPC is about the capabilities used in discourse acknowledgement. Hamming Window is applied to this method to limit the discontinuity of a sign. The studies developed [2], have presented a review of Automatic Speech Recognition Systems (ARS) and decided that the most used solution for developing a system is the neural networks according to their works and implementation. It also explained about the ARS which contains two phases. Recording the voice spoken [3] by the user and comparing the level of sound produced by the system. This study took five different types of voices containing male and female combination. Each one speaks fifteen different words and checked weather the target are same or not. The traning process consist of preprocessing, followed by the signal extraction process. These two processes are done for the source and the target speaker. Neural data acquisition [4], here three participants are given three different channel ECoG arrays 1 and 3 had left hemisphere coverage and 2 had right hemisphere coverage. A data acquisition (DAQ) rig to process the local field potentials recorded from these arrays at multiple cortical sites from each participant. High gamma feature extraction. The Real-Time and Network Systems (rTNSR) package implemented a fifilter chain comprising three processes to measure high gamma activity in real-time. They used this gamma band because research has shown that activity in this band is correlated with multi-unit fifiring processes in the cortex. Restrictive Boltzmann Machines [5] for Speech Synthesis is used for modeling speech signals, such as speech recognition, spectrogram coding. In these applications, Restrictive Boltzmann Machines (RBM) is often used for pre-training of Deep auto-encoders (DAE). Multi-Distribution Deep Belief Networks (MDBN) [6] for Speech Synthesis is a method of modeling the joint distribution of context information. It models the continues spectral, discrete voiced/unvoiced (V/UV) parameters and the multi-space F0 simultaneously with three types of RBMs. LPC [7] is a method for signal source modelling in speech signal processing. LPC is based on the source-filter model of speech signal. In linear prediction, the unknown output is represented as a linear samples, and the prediction coefficients are selected
Web Based Voice Assistant for Railways …
515
to minimize the mean square error. Nearest Neighbour rule a web framework [8, 9] achieves consistently high performance, without a prior assumptions about the distributions from which the training examples are drawn. A new sample is classified by calculating the distance to the nearest training case; the sign of that point then determines the classification of the sample. The conclusion from the above literatre study gave me an idea to prepare a novel system in vernacular language.
2.1 Existing System The existing system manages the details of trains, stations, timetable, seats availability and fare. The purpose of the system is to reduce the manual work for managing trains. It shows information and description of the trains and its timetables. It also deals with monitoring the information and transactions of seat availability. It generates reports on trains, stations, routes. The existing system may be ideal for those who know English or those who can use the system correctly. Hence, an enhancement is required in this sccenario.
2.2 Proposed System The proposed system can be used by customers or passengers in a railway station for an enquiry. The system work with speech recognition along with the manual entry based. Anyone can enquire the required details from the system by giving the voice input in their own mother language. It helps the people to enquire the details of trains for the below listed questionnaire only, identified by the name of the train with reference station location and its availability of trains at the real-time for required destinations. This also gives the information about the platform number for all the train arrivals with the names of the trains and the details of the ticket fare for different destinations.
3 Methodolgy The implemented application is a simple application with the speech to text functionality. The system was developed using a Python framework named Flask (Fig. 1). The Main Module is used by the passengers at the railway station and the second module is the admin module where all the backend implementation is done for the main module. In the main module, the passengers enquire about the trains, its status or the fare details. The questionnaire can help the people to enquire the details of
516
P. Vadamodula et al.
Fig. 1 Developed system use-case
trains identified by the name of the train with reference station location and its availability of trains at the real-time for required destinations. This also gives the information about the platform number for all the train arrivals with the names of the trains and the details of the ticket fare for different destinations. These can be done through giving the voice input to the developed system. The Second Module is the backend implementation for the main module using flask that contains various libraries to implement speech recognition, where the speech synthesis is done through LPC. So, the project uses the python libraries, named speech recognition and pydub for the implementation and speech synthesis. These packages are used to recognize the speech input from the microphone, and then transcribe the same into an audio file, later; save audio data to an audio file and show extended recognition results. These packages are used in the backend implementation for text to speech and speech to text conversion.
3.1 Location Module In this module system asks for the train name through either voice or through a form data. The voice is recognized and is synthesized and is given to the database for data retrieval. Here, as per the data input given and as per the data present in the database, system processes the input data and checks for the train location with respect to the system time (Fig. 2).
Web Based Voice Assistant for Railways …
517
Fig. 2 Location module
3.2 Availability Module In availability module, system asks for the train name through either voice or through a form data. And the system checks for the data in the database and asks for the station where user wants to visit. If the train stops in the station provided then system gives whether there were tickets present or not (Fig. 3).
3.3 Platform Module In platform module, System synthesizes train name and asks for the station where user wants to board the train. If train stops in the station provided, system shows the platform number on which the train stops (Fig. 4).
3.4 Ticket Fare Module In ticket fare module, system asks for train name initially through voice or form and then asks for to and from data from user. That means from where-to-where user wants to travel. Up on getting the data, system checks for the total fare of a single ticket and then asks for total number of tickets user likes to get to travel. And later on calculates the total fare (Fig. 5).
518
P. Vadamodula et al.
Fig. 3 Availability module
Fig. 4 Platform module
4 Results and Discussion As discussed, there were four modules also known as questionnaire. The results of each module were discussed below.
Web Based Voice Assistant for Railways …
519
Fig. 5 Ticket fare module
4.1 Home Page The Home Page contains four buttons for four modules. Each button leads to different module discussed (Fig. 6). Every module is developed in such a way that system speaks in both Telugu and English. The results of each module were discussed below.
4.2 Location Module Initially, the module contains a text field which takes voice as input and converts it into text form. It takes train name as an input (Fig. 7). Upon submitting the data. System checks for the live location of the train with respect to the current system time and retrieves the train location (Fig. 8).
520
Fig. 6 Home page Fig. 7 Location module in real working app
Fig. 8 Location module output
P. Vadamodula et al.
Web Based Voice Assistant for Railways …
521
Fig. 9 Availability module in real working app
Not only providing the visual data, system is also based on complete voice based technology. The output was also spelled by the system.
4.3 Availability Module This module says whether there was any availability of the tickets in that particular station for the given train. Initially, user gives voice input of the train name and system checks the train availability and asks for the station name (Fig. 9). Upon giving the station name in speech format system converts it into text formate and checks for number of vacancies in that particular station for the given train (Fig. 10).
4.4 Platform Module This module says on which platform the given train stops for the given station. It checks whether the mentioned train stops in the given station or not. If stops system checks for the platform number upon which the train stops (Fig. 11). This was also the complete voice module, where station name was given through speech and it was later converted into text format. System checks for the platform number of the station and gives the output through voice (Fig. 12).
522
P. Vadamodula et al.
Fig. 10 Availability module output
Fig. 11 Platform module in real working app
Fig. 12 Platform module output for arrival
If the train doesn’t stop in the given station, then system gives an error output saying that the train doesn’t visit the given station (Fig. 13).
Web Based Voice Assistant for Railways …
523
Fig. 13 Platform module output for departure
4.5 Ticket Fare Module In this module, initially user gives the name of the train which he wants to board (Fig. 14). Later on, if the train user gives exists system asks for the station where he want to board the train (Fig. 15). Later, if the train stops in the station provided the data entered station is saved and system asks for the station where he want to get down the train (Fig. 16). Later, if the train stops in the station provided the data entered station is saved and system asks for the number of passengers that are likely to get into the train (Fig. 17). And system only accepts 6 passengers per user. If user enters more than 6 it rises a message regarding the issue. Final output of this module is (Fig. 18). Fig. 14 Ticket fare module in real working app
524 Fig. 15 Ticket fare from
Fig. 16 Ticket fare to
P. Vadamodula et al.
Web Based Voice Assistant for Railways …
525
Fig. 17 Ticket fare passengers
Fig. 18 Ticket fare final
5 Conclusion Speech recognition technologies such as Alexa, Cortana, Google Assistant and Siri are changing the way people interact with their devices, homes, cars, and jobs. The technology allows us to talk to a computer or device that interprets what we’re saying and respond to our question or command. The use of digital assistants have
526
P. Vadamodula et al.
moved quickly from our mobile phones to our homes, and its application in industries such as business, banking, marketing, travels and healthcare is quickly becoming apparent. This project proved that speech recognition system developed for the people especially with mobility impairments, speech impairments, visual impairments and senior citizens, who may have trouble using the current technologies. The system is developed for the railway enquiry that can be built in every possible railway station and help the people. The system is implemented successfully. The system can take the mother language as input from the user and do the whole process in that language.
5.1 Future Scope This system is useful for passengers for getting information more easily. This is a collection of static and dynamic data. But in this project we had done only for static data, in future we will work on dynamic data for more number of trains if it is useful. And we will do this dynamic data based on original database. There can be more features in this system and more services can be added for the passenger satisfaction.
References 1. H. Shanthi, R.G. Pasumarthi, P. Suneel Kumar, Estimation of speech parameters using linear predictive coding (LPC). J. Compos. Theor. 8(7) (2020) 2. S. Benk, Y. Elmir, A. Dennai, A study on automatic speech recognition. J. Inf. Technol. 10(3), 77–85 (2019) 3. F.M. Mukhneri, I. Wijayanto, S. Hadiyoso, Voice conversion for dubbing using linear predictive coding and hidden markov model. J. Southwest Jiaotong Univ. 55(5) (2020) 4. D.A. Moses, M.K. Leonard, J.G. Makin, E.F. Chang, Real-time decoding of question-and-answer speech dialogue using human cortical activity. J. Nat. Commun (2019) 5. Y. Ning, S. He, Z. Wu, C. Xing, L.-J. Zhang, A review of deep learning based speech synthesis. J. Appl. Sci. (2019) 6. J.I. Jacob, P.E. Darney, Design of deep learning algorithm for IoT application by Image based Recognition. J. ISMAC 3(3), 276–290 (2021) 7. P.B. Patil, Linear predictive codes for speech recognition system at 121bps. Int. J. Hybrid Inf. Technol. 13(1) (2020) 8. J.I.Z. Chen, L.-T. Yeh, Graphene based web framework for energy efficient IoT applications. J. Inf. Technol. 3(01), 18–28 (2021) 9. G. Ranganathan, A study to find facts behind preprocessing on deep learning algorithms. J. Innov. Image Proc. (JIIP) 3(01), 66–74 (2021)
Segmentation and Classification Approach to Improve Breast Cancer Screening Simone Singh, Sudaksh Puri, and Anupama Bhan
Abstract The most common type of cancer among women is breast cancer. The early analysis is pivotal in the treatment interaction. The radiology emotionally supportive network in the indicative interaction permits quicker and more exact radiographic moulding. The point of the work done is to upgrade the consequence of the location and acquire outcomes that are more precise. Division, Pre-Processing, Extraction and Classification of Feature table and different significant computations were performed. Recent studies have developed a deep underlying connection between mammographic parenchymal examples and breast cancer risk. Notwithstanding, there is an absence of freely accessible information and programming for genuine examination and clinical approval. This paper presents an open and versatile execution of a completely programmed automated system for mammographic picture detection for breast cancer. The methodology employs mammographic image analysis in four stages: breast segmentation, detection of Region-Of-Interests, feature extraction and risk scoring. This is tried on a bunch of 305 full-field computerized mammography pictures relating to 84 patients (51 cases and 49 controls) from the breast malignant growth advanced storehouse (BCDR). The results accomplish an AUC of 0.847 for the malignant growth inside the breast. Furthermore, utilized together with generally acknowledged dangerous factors like patient age and bosom thickness, mammographic picture examination involving this methodology shows a genuinely critical improvement in execution with an AUC of 0.867 (p < 0.001). The proposed structure will be made openly accessible, and it is not difficult to fuse new strategies. The Dice Index calculated in most of analysed cases was greater than 92%. Various Techniques like SVM, Region Growing, ST GLCM, GLCM were used for making the model produce better results.
S. Singh · S. Puri · A. Bhan (B) Department of Electronics and Communication Engineering, Amity University, Uttar Pradesh, Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_43
527
528
S. Singh et al.
1 Introduction Breast cancer is a disease characterized by uncontrolled growth of cells in the breast. A breast is made up of lobules, ducts, and connective tissue. In the breast, connective tissue (which is made up of fibrous and fatty tissue) surrounds and holds everything together. Breast cancer usually begins in the ducts or lobules. It can spread to other areas through blood vessels and lymphatic vessels. Breast cancer that spreads to other parts of the body is termed metastasizing. It is the most normal malignant growth among women in the World, beside skin diseases. A normal woman’s lifetime hazard of creating bosom malignant growth is 13%, and that implies she has a one of every eight possibility creating it. Among women, bosom malignant growth is the subsequent driving reason for disease demise. Just cellular breakdown in the lungs kills more women every year. The gamble of dying from bosom malignant growth is roughly one out of 39 (around 2.6%). The demise rate from bosom disease has been steady in women under 50 beginning around 2007, however it has consistently diminished in more seasoned women. From 2013 to 2018, the demise rate diminished by 1% each year. A cancer is an unusual development of cells that fills no need. The model proposed for classification of the Tumour which may be Benign or Malignant is done. The major goal of the proposed model is to help detect early indicators of breast cancer, which can assist to reduce the rate of women’s mortality and greatly raise their chances of survival by providing individualised, effective, and efficient therapy. The current work is centred in this setting. Figure 1 depicts the suggested system, which can be broken down into the following steps: Pre-processing and segmentation are the first two steps.
1.1 Objectives This project mainly focuses on three objectives. Work has been carried out under the umbrella of the following objectives.
Fig. 1 a Original image. b Pre-processed image after gaussian filtering
Segmentation and Classification Approach
529
• Segmentation of Breast Cancer to check the presence of Tumour. • To Validate the segmentation accuracy with the Gold Standard. • Classification as Benign and Malignant. 1.1.1
Segmentation
The phrase “image segmentation” or simply “segmentation” in computer vision refers to the division of an image into groups of pixels depending on some criterion. A picture is fed into a segmentation algorithm, which produces a collection of regions (or segments) that can be represented as. • A grouping of outlines. • A mask (grayscale or colour) with a unique grayscale value or colour provided to each segment to identify it. In Region Growing Technique of Image segmentation the following happens. When a region can no longer grow, the algorithm selects a new seed pixel that may or may not belong to any existing region. When a region has too many qualities, it can take up most of the image. To forestall committing such an error, area developing calculations produce various districts without a moment’s delay. For photographs with a great deal of clamour, the use of area extending calculations as opposed to threshold methods ought to be done on the grounds that the commotion makes it challenging to find edges. In this project, mammography have been utilized.
1.1.2
Validate the Segmentation Accuracy with the Gold Standard
After the detection of tumour is done with the help of mage segmentation, it is compared with the gold standard. The hand sketching of a region of interest is the gold standard for segmenting medical images. Experts carry out this hand tracing (radiologists). The quality of segmentation and the deviation of segmentation as compared to experts are inversely proportional. This is carried out to compare the accuracy of the segmentation process carried out. After this is done the project is then further carried on to classifying as Benign or Malignant.
1.1.3
Purpose of Classification
The goal of classification is to find the most effective treatment. For a certain type of breast cancer, the adequacy of a treatment is laid out (ordinarily by randomized, controlled preliminaries). That therapy probably will not work for an alternate sort of bosom malignant growth. Some bosom diseases are extreme and hazardous, requiring concentrated medicines with huge secondary effects. Other types of breast cancer are less aggressive and can be treated with less invasive procedures like lumpectomy.
530
S. Singh et al.
2 Literature Review The literature review for this task was done by digging profound on the work assembled by radiologists and that of designers who worked with different models to work on the exactness of division process and that of arranging it. Alongside that, numerous different papers and diaries were perused to comprehend and immolate the ideal philosophy with the end goal of this venture. The writing audit is, bestcase scenario, the correlation of various examination work done on observing bosom malignant growth tissue.
2.1 Existing Systems The first paper that was reviewed was by Bhogal et al. [1]. The study of breast diagnoses approaches reveals that DL has aided in improving the diagnostic accuracy of the breast CAD system, however clinical usability of such methods remains a difficulty, and additional research is needed. The literature presented here seeks to aid in the development of a robust, computationally CAD system to aid doctors in the early detection of breast cancer. In the second paper that was reviewed it was by Rashmi et al. [2]. For image denoising, they utilised a mean filter. The study investigates picture pre-processing techniques such global equalisation transformation, denoising, binarization, breast orientation determination, and pectoral muscle suppression to get more accurate breast segmentation before mass detection. In the third paper reviewed which Dheeba et al. [3] wrote. Particle Swarm Optimized Wavelet Neural Networks were used to develop a classification system for detecting breast abnormalities in digital mammograms (PSOWNN). The proposed abnormality detection system is based on collecting Laws Texture Energy Measures from mammograms and using a pattern classifier to classify worrisome regions. Rampun et al. [4], suggested two automated methods for detecting benign and malignant tumour types in mammograms are proposed. Segmentation is performed using an automated region growing algorithm whose threshold is decided by a trained artificial neural network (ANN), and segmentation is performed using a cellular neural network (CNN) whose parameters are chosen by a genetic algorithm in the second technique (GA). Krithiga and Geetha [5] used morphological pre-processing and seeded region growing are used in the suggested technique (SRG). Remove digitization noises, suppress radiopaque artefacts, and isolate the background region with this technique. Remove the pectoral muscle from the breast profile region to emphasise the breast profile region. They used a three-by-three neighbourhood connection with a two-dimensional (2D) Median Filtering strategy for noise removal and a thresholding method for artefact suppression and background separation. Digital mammograms from two different sources are examined utilising Ground Truth (GT) pictures
Segmentation and Classification Approach
531
for evaluation of performance characteristics to demonstrate the capabilities of the suggested approach. The acquired experimental results show that the breast sections extracted accurately match the appropriate GT images. Elmoufidi et al. [6] created for the mammographic pictures, a fully automated breast separation was presented. This algorithm’s key contribution is that it uses a combination of thresholding and morphological pre-processing to separate the background region from the breast profile, eliminate radiopaque artefacts, and label it. Remove radiopaque artefacts and labels from the background region of the breast profile. This segmentation algorithm is extensively tested utilising all mammographic pictures from the MIAS database to demonstrate its validity. There are 322 photos in the MIAS database with high-intensity rectangular labels. In the bulk of the database photos, bright scanning artefacts were discovered.
2.2 Software Used MATLAB® consolidates a work area climate tuned for iterative investigation and configuration processes with a programming language that communicates grid and cluster science straightforwardly. It incorporates the Live Editor for making scripts that consolidate code, yield, and organized text in an executable note pad.
3 Methodology and Theory The following is diagram is the workflow for this minor project.
532
S. Singh et al.
Fig. 2 Pre-processing stage and flowchart
3.1 Collection of Dataset Digital Database for Screening Mammography: The DDSM is a data set of 2620 filtered film mammography studies. It contains ordinary, harmless, and threatening cases with confirmed pathology data. The size of the data set alongside ground truth approval makes the DDSM a helpful apparatus (Fig. 2).
3.2 Gaussian Filtering The computerized mammograms are then preprocessed involving Gaussian channels for noise expulsion.
3.3 Optimized Region Growing Region growing is a pixel-based segmentation approach where similarity constraints such as intensity, the texture so on are considered for grouping the pixels into regions (Fig. 3).
3.3.1
Methodology Post Pre-processing (Flow Chart)
Segmentation and Classification Approach
533
Fig. 3 a Ground truth result. b Result with optimised region growing technique post pre-processing
3.3.2 1.
2.
Approaches and Methods Used Post Pre-processing
Gray Level Co-occurrence Matrix: The Gray Level Co-occurrence Matrix (GLCM) method is used for extracting four Statistical Texture Parameters i.e., Entropy, Inverse Difference Moment, Angular Second Moment and Correlation. Statistical Gray Level Co-occurrence: ST-GLCM merges seven statistical features and seven texture features extracted from Gray level Co-occurrence Matrix (GLCM). ST-GLCM method merges the seven statistical features (Mean, Standard deviation, Smoothness, Skewness, Entropy, Energy, Kurtosis) and the seven texture features extracted from GLCM algorithm. ST-GLCM features are extracted from 3 × 3 sub-images of the ROI. Calculating GLCM and ST-GLCM Number of Features (GLCM) . = number o f dir ection X 0.5 x d x number o f extracted G LC M f eatur es x number o f sub images Number of Features (ST-GLCM)
3.
= N umber o f f eatur es (G LC M) + 7 statistical f eatur es number o f sub − images Neighbourhood Gray-Tone Difference: Neighbourhood Grey-Tone Difference Matrix (NGTDM) Higher-order parameters were calculated using neighbourhood intensity difference matrices (NGTDM) to describe the local features include the image Standard Deviation, Skewness, Kurtosis, and Balance.
534
S. Singh et al.
Fig. 4 Breast segmentation and nipple detection. a Breast segmentation. b Nipple detection
4 Experimental Observations 4.1 Breast Segmentation Breast segmentation is comprised of two main steps: foreground/background detection and chest wall detection (only for MLO views) shown in Fig. 4a. In MLO views, it is possible to perform automatic nipple detection from the information of the breast contour and chest wall line as follows (See Fig. 4b).
4.2 Anatomical Mapping Anatomical mapping of the breast is important for gathering insights such as implicit image registration and ROI detection. Here the implementation of anatomical mapping, namely the st-mapping is done. The following figures illustrate the forward and inverse st-mapping of a mammography image (Fig. 5a, b).
Fig. 5 Anatomical breast mapping. a St-mapping. b St-map of a. c Inverse st-mapping of b
Segmentation and Classification Approach
535
Fig. 6 ROI-detection. From left to right: full breast, largest square, RA region and latice-based sampling
4.3 ROI-Detection Following different approaches that exist, the methodology adopted allows for the detection of four different ROIs within the breast: the full breast region, the largest circumscribed square, the retro areolar region and multiple lattice-based sampling. In the figures presented below, it is assumed that the variables im, mask, contour and cwall have been computed previously. Obtained results are shown in Fig. 6.
4.4 Risk Assessment For reproducibility, the system incorporates devices for the development of a gamble model mdl in view of surface highlights recently processed. To this end, consider a preparation set of m pictures. From the preparation set, an element framework is created. Feature extraction strategies for bosom malignant growth hazard appraisal. The acronyms of each method are initialized according to the working principles: s (statistical features), c (co-occurrence features), r (run-length features), g (gradientbased features) and f (spatial-frequency analysis). Acronym
Feature
References
smin
Minimum gray-level value
[4–6]
smax
Maximum gray-level value
[4–6]
savg
Mean gray-level value
[4–8, 10]
sran
Gray-level range
[11]
svar
Gray-level variance
[11, 12]
sent
Entropy
sske
Skewness
[2, 4–6, 8, 10–13] (continued)
536
S. Singh et al.
(continued) Acronym
Feature
skur
Kurtosis
References [11–13]
sp05
5th percentile
[4–6]
sp30
30th percentile
[4–6, 8, 10]
sp70
70th percentile
[4–6, 8, 10]
sp90
90th percentile
[4–6] [20]
sba1
Balance 1
[4–6]
sba2
Balance 2
[4–6, 8, 10]
cene
Energy
[10, 12, 13]
ccor
Correlation
[12, 13]
ccon
Contrast
[10–13]
chom
Homogeneity
[11–13]
cent
Entropy
[10–13]
rrln
Run-length non-uniformity
[13]
rgln
Gray-level non-uniformity
[13]
rlre
Long run emphasis
[13]
rsre
Short run emphasis
[13]
rrpe
Run percentage
[13]
rhgr
High gray-level run
[13]
rlgr
Low gray-level run
[13]
gene
Gradient energy
[1]
gvar
Gradient variance
[1]
glap
Modified laplacian
[14]
fwas
Wavelet sum
[3, 14]
fwav
Wavelet variance
[14]
fwar
Wavelet ratio
[14]
fdim
Fractal dimension
[8]
Fig. 7 Risk scoring. This figure shows the visualization of the risk score for an input image with r = 0:88. The distributions of the risk scores of the low-and high-risk training images are shown in green and red, respectively
5 The Confusion Matrix for Quadratic Svm and Cubic Svm with Roc Plots Effective finish of the venture occurred as MATLAB was utilized for significant cycles, for example, Segmentation, Pre-Processing, Extraction and Classification of Feature table and performing different significant estimations. The Dice Index
Segmentation and Classification Approach
537
Fig. 7 Risk score graph
Fig. 8 Predictions model (fine tree)
determined in the majority of broke down cases was more prominent than 92%. Different Techniques like Dragon Fly Algorithm, SVM, Region Growing, ST GLCM, and GLCM were utilized for improving model produce results. The desired results were obtained and the numerical values of the True positive, false positive, true negative, Prevalence, Precision were obtained for both Quadratic and Cubic SVM (Figs. 8, 9, 10, 11 and 12).Fig. 10 Predictions model (cubic SVM)
538
S. Singh et al.
Fig. 9 Predictions model (quadratic SVM)
Fig. 10 Predictions model (cubic SVM)
Type of kernel of SVM
Accuracy Error True False True Precision Prevalence Mathews rate (%) rate positive positive negative (%) (%) correlation (%) rate (%) rate (%) (%) coefficient
Quadratic 97.7
2.28
97.51
1.93
98.08
98.88
63.62
0.952
Cubic
2.28
97.51
1.93
98.06
98.87
63.62
0.95
97.7
Segmentation and Classification Approach
539
Fig. 11 ROC curve (quadratic SVM)
Fig. 12 ROC curve (cubic SVM)
6 Conclusion and Future Work This paper introduced an open computational system and convention for breast malignant growth assessment dependent on the completely robotized, electronic examination of mammography pictures. The system incorporates four principal stages:
540
S. Singh et al.
bosom division, the locale of interest (ROI) recognition, highlight extraction, and risk scoring. The experiment yielded an AUC of 0.847 when the analysis is performed in the largest circumscribed square region withing the breast. Further experimentation yielded an enhanced performance from AUC = 0.665, to AUC = 0.867 employing both clinical and computerised texture analysis (p < 0.001). The confusion matrix for fine tree model, quadratic and cubic SVM was also obtained to visualise the results. This has helped in the plotting of the ROC curve to obtain the AUC. For cubic and quadratic SVM the AUC = 1.00 and that of fine tree model was 0.99. Future work would incorporate the original element extraction with the consideration of more information as histopathological pictures, which are simpler and more reasonable to access, rather than mammography, which are acquired with the gamble of presenting body to high radiation. Alongside that, the future work would be done with the guide of profound learning calculations to plunge further into the component extraction process and to match the exactness in the event that not surpass the one that was accomplished by the picture division approach. The future extent of the task is to make bosom malignant growth discovery process open to a layman at reasonable expense and give constant outcomes accuracy and exactness that is endorsed by the specialists. The execution of profound learning model will work with the effectiveness in earning results quickly with precision that is wanted and fulfils the clinical guidelines.
References 1. R.K. Bhogal, P.D. Suchit, C. Naresh, Review: breast cancer detection using deep learning, in 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI) (2021), pp. 847–854, https://doi.org/10.1109/ICOEI51242.2021.9452835 2. R. Rashmi, K. Prasad, C.B.K. Udupa, V. Shwetha, A comparative evaluation of texture features for semantic segmentation of breast histopathological images. IEEE Access 8, 64 331-64 346 (2020) 3. J. Dheeba, N. Albert Singh, S. Tamil Selvi, Computer-aided detection of breast cancer on mammograms: a swarm intelligence optimized wavelet neural network approach. J. Biomed. Inf. (2020). http://dx.doi.org/https://doi.org/10.1016/j.jbi.2014.01.010 4. A. Rampun, K. López-Linares, P.J. Morrow, B.W. Scotney, H. Wang, I.G. Ocaña, G. Maclair, R. Zwiggelaar, M.A.G. Ballester, I. Macía, Breast pectoral muscle segmentation in mammograms using a modified holistically-nested edge detection network. Med. Image Anal. 57, 1–17 (2019) 5. R. Krithiga, P. Geetha, Breast cancer detection segmentation and classification on histopathology images analysis: a systematic review, in Archives of Computational Methods in Engineering (2020), pp. 1–13 6. A. Elmoufidi, K. El Fahssi, S. Jai-Andaloussi, A. Sekkaki, G. Quellec, M. Lamard, G. Cazuguel (2020) 7. H. Shen, et al., in Medical Image Computing and Computer Assisted Intervention—MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science, ed. by A.L. Martel. Deep active learning for breast cancer segmentation on immunohistochemistry images, vol 12265 (Springer, Cham, 2020). https://doi.org/10.1007/978-3-030-59722-1_49 8. P. Raha, R.V. Menon, I. Chakrabarti, Fully automated computer aided diagnosis system for classification of breast mass from ultrasound images, in 2017 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET) (2017)
Segmentation and Classification Approach
541
9. A.A. Jothi, V.M.A. Rajam, A survey on automated cancer diagnosis from histopathology images. Artif. Intell. Rev. 48(1), 31–81 (2017) 10. A. AlQoud, M.A. Jaffar, Hybrid gabor based local binary patterns texture features for classification of breast mammograms. Int. J. Comput. Sci. Netw. Secur. (IJCSNS) 16(4), 16 (2016) 11. A. Moh’d Rasoul, Al-Hadidi, Y. Mohammed, Al-Gawagzeh, B. Alsaaidah, Solving mammography problems of breast cancer detection using artificial neural networks and image processing techniques, vol 5, pp. 2520–2528 (2012) 12. L. Lee, S. Liew, Breast ultrasound automated ROI segmentation with region growing, in 2015 4th International Conference on Software Engineering and Computer Systems (ICSECS) (2015) 13. H. shah, Automatic classification of breast masses for diagnosis of breast cancer in digital mammograms using neural network. Int. J. Sci. Technol. Eng. 1(11):47–52 14. K. Yu, L. Tan, L. Lin, X. Cheng, Z. Yi, T. Sato, et al., Deep-learning-empowered breast cancer auxiliary diagnosis for 5GB remote e-health, for IEEE, (2021)
Study of Impact and Reflected Waves in Computer Echolocation Oleksandr Khoshaba , Viktor Grechaninov , Tetiana Molodetska , Anatoliy Lopushanskyi , and Kostiantyn Zavertailo
Abstract The paper presents wave models of trajectories and signals for loading effects in computer echolocation. Based on these models and their parameters, experimental studies were carried out using computer echolocation in distributed structures. The studies’ conclusions are given due to wave models of computer echolocation. It is shown that several markers that were both in the time and frequency regions of the reflected wave spectrum indicated an increase in the influence of the load (wave impact) on the dynamics of changes in the structural and functional features of the object of study in distributed structures. So, the time domain of the reflected wave spectrum for response time or modified response time shows the structural features of the object of study, that is, its location relative to the subject of study (wave generator). The frequency domain of the spectrum of the reflected wave shows the dynamics of changes in the functional features (the state) of the object of study. The work described such markers and their characteristic features in detail.
1 Introduction Echolocation came to us from bionics, where scientists peeped useful phenomena from wildlife and transferred them to the inanimate world. Echolocation is a specific ability to perceive the surrounding world, which began to be studied in bats and some aquatic mammals [20]. Also, there are many works devoted to studying echolocation in humans. Thus, according to Kolarik et al. [14] and Kupers et al. [15], echolocation in humans is the ability to use one’s sounds and their echo-reverberations to build a mental image of the surrounding space. The sound can be generated by clicking your mouth, clapping, footsteps, or hitting a cane on the floor. Using self-generated O. Khoshaba (B) · V. Grechaninov · A. Lopushanskyi · K. Zavertailo Institute of Mathematical Machines and Systems Problems of the Ukraine National Academy of Science, 42 Academician Glushkov Avenue, Kyiv 03187, Ukraine e-mail: [email protected] URL: http://khoshaba.vk.vntu.edu.ua/ T. Molodetska Vinnytsia National Technical University, 95 Khmelnytske shose, Vinnytsia 21021, Ukraine © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_45
543
544
O. Khoshaba et al.
sounds and echoes stimulates the visual cortex in humans [14, 21], leading to a form of sensory substitution in which visual input is replaced by auditory input. Also, there is a lot of current research where echolocation is used to develop technologies for people with visual impairment (PVI). In this direction area, much attention is paid to the development of tools for navigation [1, 8, 11, 12], learning problems according to mobility [24]. Some works are devoted to creating, transmitting, and visualizing graphic information [2, 4, 18]. However, less attention is paid to people’s leisure activities such as creativity [9], travel [22], dating [13], sporting events or physical activity [6, 10, 17], and composing music [5]. Unfortunately, theoretical developments and the practice of using computer echolocation find their application only in virtual environment and computer games. However, even in virtual environment and computer games, all the possibilities from computer echolocation are not fully disclosed. Consider the main directions in the use of computer echolocation. The most popular are studies in which virtual experiences occur outside the physical world and include virtual environments (VE) and video games. There is overlap between VE and video games where a video game is usually contained in a VE, and a VE may or may not include gameplay [3]. In this regard, the total number of publications that describe virtual experiences intended for people with visual impairment is growing. Thus, in publications [7, 18, 19], there are terms where virtual environment or video game are used interchangeably. At the same time, a sensory replacement of the visual image of a person is used with the help of mental maps on which objects are located, and their properties are determined. No less relevant are the works in the signal analysis based on machine learning [16], in which it is possible to obtain several important markers. For example, work [23] allows, based on ECG signals using Neural Networks, Fusion technique architecture, and other methods to recognize such tasks as myocardial infarction, AF detection, etc. Thus, the work of existing systems is mainly aimed at solving virtual problems in PVI, VE, and video games. In contrast to these works, the proposed methods solve significant engineering problems. Also, it should note that relevant studies using computer echolocation establish the location of objects and determine their properties.
2 Purpose of Work The study aims to consider wave models for the trajectories of loading effects in computer echolocation. Based on these models and their parameters, it is necessary to conduct experimental studies using computer echolocation in distributed structures, as a result of using wave models of computer echolocation, to form the conclusions of the studies.
Study of Impact and Reflected Waves in Computer Echolocation
545
3 Basic Concepts of Computer Echolocation Computer echolocation is a method of generating a load impact wave and analyzing the delays of reflected signals (Fig. 1) for different frequencies to detect the structural and functional features of the operation of research objects in computer distributed structures. The use of computer echolocation allows you to move to a higher quality level in obtaining information about the structural and functional features of objects in distributed systems. The study’s structural features consisted of determining its localization concerning the installed echo sounder (generator and sensor for receiving the reflected signal). Functional features showed the load stage on the object of study during exposure to impact (load) waves (Fig. 1). The importance of echolocation research is increasing because the developed infrastructure of companies can consist of heterogeneous software and hardware systems and devices. In this regard, it becomes possible to diagnose violations in the operation of research objects, which can manifest themselves both at earlier and later stages of the operation of corporate software and hardware systems. At the same time, computer echolocation can diagnose structural and functional disturbances in distributed systems due to various estimates in the amplitude and frequency spectra of signals, which are obtained as a result of load effects on the objects of study. As a result, computer echolocation makes it possible to identify abnormal areas of software and hardware functioning in distributed structures. Computer echolocation studies the processes of load impact on the object of study and data processing using utilities that send requests via ICMP, TCP, or UDP protocols (TCP/IP stack) to a given node or service in a distributed structure. At the same time, echo sounders in the form of signals generate and record incoming responses in time between sending and receiving a request (response time, RT ). This time is measured in nanoseconds or milliseconds, depending on the process under study. It allows you to define round-trip time delays along the data transmission channels and request processing route. Let for the current request in the communication channel, the time of its sending will be equal to tib , and the time of its reception will be a tie . Then the response time (RTi ) to the current request in the communication channel will be: RTi = tie − tib
(1)
Also, the concept of a request period (P T ) is introduced, which determines the time interval for its generation (or creation). So, in the case of a uniform law of random variables for the creation of n requests over some time T , P Ti will be equal to:
Fig. 1 Fundamentals of computer echolocation
546
O. Khoshaba et al.
Fig. 2 Demonstration of the process of creating (generation) and executing queries
P Ti = T /n
(2)
The process of creating (generation) and executing queries is shown in Fig. 2, where the following is indicated. During some time (T ), n queries are created. At the same time, on the time interval from t2 to t4 , it is clear that there are two requests in the system: the current one and the previous one, which has not yet completed its execution. Therefore, each request has its own execution time (RTi ). In this regard, the RTi estimate shows the duration of the execution of all n requests for some time T. To increase the information content of the response time (RT ), a modified RT (m RT ) is used, which is calculated by the formula: m RT = k ∗ lg
RTi P Ti
(3)
where k is some factor, usually equal to 10. However, for k = 1, we will have the following relations: RT /P T = 100,1m RT
(4)
RT = P T ∗ 100,1m RT
(5)
where
Increasing the information content of m RT is to obtain additional information about the state of the workload of the object of study. So, in case of a low workload of the object of study, the m RT score will be negative sign. Otherwise, positive sign.
3.1 Features of Deterministic Models of the Load Impact on the Object of Study Models of load impact and data processing describe the processes occurring in distributed structures. Computer echolocation can describe such processes using deterministic (Fig. 3) and probabilistic models. In turn, several categories have a different nature of functioning and features in using specific methods.
Study of Impact and Reflected Waves in Computer Echolocation
547
Fig. 3 Classification of deterministic models for load impact and data processing in distributed structures
Deterministic models directly describe the processes of loading influences. The model of the loading impact on the object of study is deterministic (or non-random) when it is possible to describe its exact prediction (behavior) over any time. The deterministic model of the process of load impact on the object of study and data processing (M L I ) has the following relation: M L I = F(t, z, ω, . . . , A, B, C, . . .)
(6)
where t, z, ω, . . .—are independent arguments (time, spatial coordinate, frequency, etc.); A, B, C . . .—parameters of the deterministic model of the loading process and data processing.
3.2 Wave Models of Computer Echolocation Wave models form the basis of computer echolocation. At the same time, essential aspects in the study of computer echolocation are the construction of impact models
548
O. Khoshaba et al.
and reflected wave signals from the object of study. Trajectories and signals describe the impact and reflected waves (Table 1). The wave representation of the model as a signal is described as follows: T r L I (t) = B + Asin(2π f 0 t + ϕ) = B + Asin(ω0 t + ϕ)
(7)
T r L I (t) = B + Acos(ω0 t + φ)
(8)
or
where B A f0 ω0 = 2π f 0 ϕ and φ
constant component (in units of measurement); signal amplitude (in units of measurement); cyclic frequency (in hertz); angular frequency (in radians); are the initial phase angles (in radians).
In this case, the period of one oscillation will be: T = 1/ f 0 = 2π/ω0
(9)
This formula (9) also shows the relationship between cyclic and angular frequency (Fig. 4). It should also be noted that at ϕ = φ − π/2 sine and cosine functions will describe the same signal.
Table 1 Wave representations of models for loading actions Wave representations Levels In the form of signals
Lower, physical
In the form of trajectories
Upper, abstract
Parameters Constant component, signal amplitude, Cyclic and angular frequencies, initial phase angle Quantity, speed and acceleration
Fig. 4 Relationship between cyclic frequency and angular displacement
Study of Impact and Reflected Waves in Computer Echolocation
549
3.3 Wave Models for the Trajectory of Loading Effects In wave models of trajectories of load impacts on the object of study, such indicators as quantity (the number of processed requests for some time t), speed (v), and acceleration (a) of processing requests are used, which are determined as follows (Fig. 5). The request processing rate data is approximated by a linear function or n-th degree polynomials using the least-squares method to obtain volume (number of processed requests) and acceleration functions as follows. For example, if the approximation of a function is to find the coefficients of a linear relationship, then the function of two variables a and b takes the smallest value. For data a and b, the sum of the squared deviations of the given data from the found row will be the smallest. This is using the least-squares method. In this case, the solution of this problem is reduced to finding the extremum of a function of two variables. As a result of using the least-squares method, we obtain a polynomial of the nth degree, which describes the speed (v) of the process for exchanging requests between the subject and the object of study. To obtain the quantity indicator (q), we will perform the integration of the nth degree polynomial, which describes the speed of the process for exchanging requests between the subject and the object of study: t2 v(t)dt
q= t1
Fig. 5 An example of defining the function of the best fit to given points
(10)
550
O. Khoshaba et al.
where t1 the beginning of the period; t2 the end of the period. If we differentiate the indicator of the speed (v) of the execution of the process for the exchange of requests between the subject and the object of research, then we get the value of the change in this speed or the acceleration indicator (a): a = v
(11)
Estimates of indicators of the trajectory of the load impact (10, 11) are determined when analyzing the subject area or designing echolocation work, where the following actions are performed. In the beginning, the points of load actions are plotted on the coordinate plane (Fig. 5). These points correspond to the goals and objectives identified during the analysis of the subject area. The values of these points correspond to varying degrees of load effects on the object of study. The degree of load impacts the object of study is determined using preliminary studies. The abscissa axis (Fig. 5) corresponds to time, and the ordinate axis corresponds to the number of requests the generator needs to create in a certain period. Also, Fig. 1 shows the process of selecting a function for the trajectory of loading actions. Labels y correspond to the given values of the points, and labels y4 and y5 correspond to polynomials of the fourth and fifth degree. In this case, the best degree of approximation to the given points is defined as the fifth. Thus, using the least-squares method, the trajectory of loading effects is selected to determine its estimates (10, 11).
3.4 Discrete Fourier Transform Fourier series provides an alternate way of representing data. Instead of representing the signal amplitude as a function of time, we represent the signal by how much information is contained at different frequencies. Fourier analysis is essential in data acquisition and isolates specific frequency ranges. A Fourier series takes a signal and decomposes it into a sum of sines and cosines of different frequencies. Assume that we have a signal that lasts for 1 s, where 0 < t < 1, we conjecture that can represent that signal by the infinite series: f (t) = a0 +
∞ (an sin(2π nt) + bn cos(2π nt)) n=1
(12)
Study of Impact and Reflected Waves in Computer Echolocation
551
where f (t) is the signal in the time domain, and an and bn are unknown series coefficients. The integer, n, has units of H er t z(H z) = 1/s and corresponds to the wave’s frequency. In computer echolocation, to determine the structural and functional features of the object of study in distributed structures, the discrete Fourier transform is used, which is as follows. With the help of Fourier Transform, we can go between the time domain and the frequency domain by using a tool called Fourier transform. A Fourier transform converts a signal in the time domain to the frequency domain (spectrum). An inverse Fourier transform converts the frequency domain components back into the original time domain signal. For Continuous-Time Fourier Transform, we have: +∞ F( jω) =
f (t)e− jωt dt
(13)
F( jω)e jωt dω
(14)
−∞
and 1 f (t) = 2π
+∞ −∞
For Discrete-Time Fourier Transform (DTFT), we have: +∞
X (e jω ) =
x[n]e− jωn
(15)
n=−∞
and x[n] =
1 2π
X (e jω)e jωn dω
(16)
2π
3.5 Limitation of Existing Methods in Computer Echolocation The methods discussed above, which are used in computer echolocation, are pretty effective for determining structural and functional disorders in distributed structures. However, there are currently some limitations associated with using these methods. First of all, this is due to the technology of computer echolocation itself, which lies in the fact that standard utilities like ping or nping allow you to interact with restrictions for some research objects. First, such restrictions do not apply to hosts in distributed structures or services that have standard ping or nping utilities. The
552
O. Khoshaba et al.
solution to this limitation to other objects of study (for example, a processor, hard disk, etc.) is to write separate software. The next, no less critical limitation in computer echolocation is the analysis of the reflected wave using the Fourier method. It lies in the incorrect interpretation of the reflected wave due to the impact wave’s impact, which was created with different frequencies in separate time sections. The solution to this limitation is using another mathematical apparatus in the analysis, for example, wavelet analysis.
4 Experimental Studies The study of structural and functional features in a distributed structure was the goal of experimental studies. Experimental studies were carried out in a distributed structure (Fig. 6), which consisted of several local networks. At the same time, the data storage system was the object of study (Fig. 6, object, campus A, upper part of the figure). The object of study was subjected to a load effect from different places of the distributed structure. The loading effect was of the same frequency but different amplitude. The wave representation of the load signal is shown in the Fig. 7. For the lower level (Table 1) carried out such a wave action according to formula 7, where the ratio parameters were as follows: for signal A: B = 30; A = 20; for signal B: B = 60; A = 50; for signal C: B = 130; A = 120; As can be seen from these parameters, the most significant load effect was carried out by signal C. The subject of the study was a generator that, based on the trajectory, created signals of a load effect. This subject of the study (wave generator) was located on campuses A, B, and D (Fig. 6). In experimental studies, we received reflected waves that differed in amplitude and, in some cases, the additional frequency for RT (1) and m RT (3) indicators. For example, for the m RT (3) index and signals A, B, and C for case A (Fig. 6), the following graphs of reflected waves were obtained (Fig. 8). Such results of loading effects made it possible to obtain the structural and functional features of the work of the object of study, which was as follows. Several markers show the increase in the load impact on the dynamics of changes in the structural and functional features of the object of study in distributed structures. Let’s look at these markers in order. First of all, these markers are located in the time and frequency domains of the reflected wave spectrum. The time domain of the reflected wave spectrum for m RT (Fig. 9) shows the structural features of the object of study, that is, its location relative to the subject of the study (wave generator). Thus, a decrease in the amplitude of the signal in this spectrum indicates the approach of the subject of research to the object. An increase
Study of Impact and Reflected Waves in Computer Echolocation
553
Fig. 6 Distributed structure for echolocation research
in the load effect on the object of study in this time domain also shows a decrease in the signal amplitude. In echolocation research, sign areas in the location of signal amplitudes also have their meaning. So, by the appearance of amplitudes in the region of positive values, it is necessary to judge the increase in load effects on the object of study. The frequency domain of the reflected wave spectrum for m RT (Fig. 10) shows the dynamics of changes in the functional features of the object of study, which is as follows. First of all, it is necessary to pay attention to the appearance of additional frequencies and changes in their amplitudes. Thus, an increase in the amplitude in frequencies indicates an increase in the load effects (impact wave) on the object of study.
554
Fig. 7 Wave representations of load action signals
Fig. 8 Graphs of reflected waves for the m RT (3) index
Fig. 9 The time domain of the reflected wave spectrum for m RT
O. Khoshaba et al.
Study of Impact and Reflected Waves in Computer Echolocation
555
Fig. 10 The frequency domain of the reflected wave spectrum for m RT
Also of great importance is the relationship between the zero amplitude of the frequency and all other frequencies in the spectrum of the reflected wave. Thus, an increase in the zero amplitude corresponds to an increased load on the object of study.
5 Conclusions The paper proposes the basics of computer echolocation in distributed structures based on the generation and analysis of delays of reflected different frequency signals to detect structural and functional disorders. The classification of models of loading and data processing based on determined and probabilistic research methods is shown. Examples of the use of computer echolocation in distributed structures are described. The comparative characteristic of levels of subjects of echolocation research is described. Several markers show the increase in the load impact on the dynamics of changes in the structural and functional features of the object of study in distributed structures. First of all, these markers are located in the time and frequency domains of the reflected wave spectrum. The time domain of the reflected wave spectrum for response time or modified response time shows the structural features of the object of study, that is, its location relative to the subject of study (wave generator). The frequency domain of the spectrum of the reflected wave shows the dynamics of changes in the functional features (the state) of the object of study.
556
O. Khoshaba et al.
References 1. J. Albouys-Perrois, J. Laviole, C. Briant, A.M. Brock, Towards a multisensory augmented reality map for blind and low vision people: a participatory design approach, in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (ACM, 2018), pp. 1–14. https://doi.org/10.1145/3173574.3174203 2. J.L. Alty, D.I. Rigas, Communicating graphical information to blind users using music: the role of context, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (ACM, 1998), pp. 574–581. https://doi.org/10.1145/274644.274721 3. R. Bergonse, Fifty years on, what exactly is a videogame? an essentialistic definitional approach. Comput. Games J. 6(4), 239–255 (2017). https://doi.org/10.1007/s40869-017-00454 4. J. Bornschein, D. Bornschein, G. Weber, Comparing computer-based drawing methods for blind people with real-time tactile feedback, in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (ACM, 2018). https://doi.org/10.1145/3173574.3173689 5. J.M. Coleman, Classroom guitar and students with visual impairments: a positive approach to music learning and artistry. J. Vis. Impairment Blind. 110(1), 63–68 (2016) 6. J.C. Colgan, M.J. Bopp, B.E. Starkoff, L.J. Lieberman, Fitness wearables and youths with visual impairments: implications for practice and application. J. Vis. Impairment Blindness 110(5), 335–348 (2016) 7. E.C. Connors, E.R Chrastil, J. Sánchez, L.B. Merabet, Action video game play and transfer of navigation and spatial cognition skills in adolescents who are blind. Front. Human Neurosc. 8, Article 133 (2014). https://doi.org/10.3389/fnhum.2014.00133 8. G. Flores, R. Manduchi, Easy return: an App for indoor backtracking assistance, in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (2018), pp. 1–12. https:// doi.org/10.1145/3173574.3173591 9. E. Giles, J. van der Linden, M. Petre, Weaving lighthouses and stitching stories: blind and visually impaired people designing E-textiles, in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (2018), pp. 1–12. https://doi.org/10.1145/3173574. 3174044 10. J.A. Haegele, L.J. Lieberman, The current experiences of physical education teachers at schools for blind students in the United States. J. Vis. Impairment Blind. 110(5) (2016), 323–334 (2016) 11. L. Holloway, K. Marriott, M. Butler, Accessible maps for the blind: comparing 3D printed models with tactile graphics, in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (ACM, 2018), pp. 1–13. https://doi.org/10.1145/3173574.3173772 12. H. Kacorri, E. Ohn-Bar, K.M. Kitani, C. Asakawa, Environmental factors in indoor navigation based on realworld trajectories of blind users, in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (ACM, 2018), pp. 1–12. https://doi.org/10.1145/ 3173574.3173630 13. G. Kapperman, S.M. Kelly, K. Kilmer, T.J. Smith, An assessment of the tinder mobile dating application for individuals who are visually impaired. J. Vis. Impairment Blindness 111(4), 369–374 (2017) 14. A.J. Kolarik, S. Cirstea, S. Pardhan, B.C.J. Moore, A summary of research investigating echolocation abilities of blind and sighted humans. Hearing Res. 310, 60–68 (2014) 15. R. Kupers, M. Ptito, Compensatory plasticity and cross-modal reorganization following early visual deprivation. Neurosc. Biobehav. Rev. 41, 36–52 (2014) 16. M. Lakshmi, S. Bhavani, P. Manimegalai. Investigation of non-invasive hemoglobin estimation using photoplethysmograph signal and machine learning, in International Conference On Computational Vision and Bio Inspired Computing (Springer, Cham, 2019), pp. 1273–1282 17. L.J. Lieberman, No barriers: a blind man’s journey to kayak the grand canyon. J. Vis. Impairment Blindness 111(3), 291–294 (2017). https://search.ebscohost.com/login.aspx? direct=true&db=a9h&AN=123223473&site=ehost-live
Study of Impact and Reflected Waves in Computer Echolocation
557
18. S. Maidenbaum, G. Buchs, S. Abboud, O. Lavi-Rotbain, A. Amedi, Perception of graphical virtual environments by blind users via sensory substitution. PloS one 11, 2 (2016). https://doi. org/10.1371/journal.pone.0147501 19. J. Sanchez, T. Darin, R. Andrade, Multimodal videogames for the cognition of people who are blind: trends and issues, in International Conference on Universal Access in Human-Computer Interaction (Springer, 2015), pp. 535–546 20. L. Thaler, G.M. Reich, X. Zhang, D. Wang, G.E. Smith, Z. Tao, R. Syamsul A.B.R. Abdullah, M. Cherniakov, C.J. Baker, D. Kish, Mouth-clicks used by blind expert human echolocatorssignal description and model based signal synthesis. PLOS Comput. Biol. 13(8) (2017). https:// doi.org/10.1371/journal.pcbi.1005670 21. L. Thaler, M.A. Goodale, Echolocation in humans: an overview. Wiley Interdisc. Rev.: Cognit. Sci. 7(6), 382–393 (2016) 22. O. Tutuncu, L.J. Lieberman, Accessibility of hotels for people with visual impairments: from research to practice. J. Vis. Impairment Blindness 110(3), 163–175 (2016) 23. T. Vijayakumar, R. Vinothkanna, M. Duraipandian, Fusion based feature extraction analysis of ECG signal interpretation-A systematic approach. J. Artif. Intell. 3(01), 1–16 (2021) 24. E. Zimmerman, Foreword, in The Well-Played Game: A Player’s Philosophy (revised ed.) ed. by B. De Koven (The MIT Press, Cambridge, MA, 2013). http://cat.lib.unimelb.edu.au/ record=b5608797-30
Enneaontology: A Proposed Enneagram Ontology Esraa Abdelhamid , Sally Ismail , and Mostafa Aref
Abstract Automated personality detection attracts researchers nowadays. Realizing personality aid in many domains like social media to attract users. Models that are used in personality detection, lack understanding of personality deeply. Enneagram is a personality model to know motivations, desires, and fears. Psychiatrists uses Enneagram. Enneagram gives an intense understanding of patient’s personality which aid to give psychological support. Providing tools that facilitate patient’s recovery. Enneagram can be used in education and dating applications. Assessment tests are currently used to identify Enneagram. Automated Personality detection is less time-consuming and no effort. The Lack of knowledge representation for Enneagram has made it difficult to build personality detection. This paper presents an ontology for Enneagram personality model. The ontology consists of seven classes: Enneagram, feature, fear, desire, key motivation, problem and best. There are also three instances: reformer, helper and achiever. Enneaontology is built according to the design principles of the METHONTOLOGY.
1 Introduction The importance of providing health care with tools is aroused during COVID-19. Psychological support for patients plays a crucial role in medicine. The fast identification of personality leads to providing patients with the target support in less time. Taking into consideration that some patients may not be able to perform the assessment test. This also provides an extra advantage. Persons are not eager to perform a questionnaire as it is not practical and time-wasting [1]. Researchers in many domains concerned with personality detection like psychology, artificial intelligence, natural language processing, behavioral analytics and machine learning [2]. Automated personality detection is a hot topic that attracts a lot of researchers nowadays. Personality detection catches attention in various sociE. Abdelhamid (B) · S. Ismail · M. Aref Faculty of Computer and Information Science, Ainshams University, Cairo, Egypt e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_46
559
560
E. Abdelhamid et al.
eties like social media, robotics, human-computer interaction and speech processing [3]. Personality detection is a hard and complex task for humans as it requires expert knowledge. One of the roadblocks in building automated personality detection is the lack of knowledge representation. The current research directs towards improvement to apply better convenient algorithms, preprocessing techniques and implementing other personality models [4]. Personality definition includes behavioral, temperamental, emotional and mental which identify person uniqueness [5]. There are many personality models like the big five model, Myers-Briggs Type Indicator (MBTI) and three factor model. These models measure certain qualities like introvert or extrovert, thinking or feeling ... etc. Enneagram is a personality model consisting of nine personalities. Enneagram is mapping the nine basic personality types of humans nature and its compound mutual relations [6]. Enneagram personalities are the reformer, the helper, the achiever, the individualist, the investigator, the loyalist, the enthusiast, the challenger and the peacemaker. Enneagram is studied at many USA universities in medicine, psychology, education, arts and business [7]. The advantage of the Enneagram is understanding human behavior at a depth level. A person can learn from the Enneagram his point of strengths, his limitations and targeting his growth [8]. Enneagram helps to identify desires, fears, motivation, problems and the best. Enneagram is an excellent theory for human development. Enneagram is a significant tool to enhance relationships with family, friends and coworkers [9]. Enneagram helps the counselor to detect client behavior, this knowledge increases healing and growth [10]. Enneagram is used by psychiatrists as a way to understand the personality of the patient. Psychiatrists have used the Enneagram since 1970 [11]. Psychiatrists uses the Enneagram to identify patient’s personality. This knowledge helps them to give the patient the psychological support. Other physicians can also use the Enneagram to support their patients. This means faster recovery of patients. Enneagram can be used in education to support and assist students to enhance their performance. Enneagram can be applied for dating purposes to match the personalities. Ontology links concepts, relations and rules of a specific area. An ontology abstractly describes domain knowledge. An ontology consists of concepts, relations, functions, instances and axioms [12]. The ontology has many advantages like: sharing knowledge understanding between different applications, reusing the domain knowledge, providing analysis of domain knowledge, splitting up domain knowledge from the application logic and helping with the clearance and mapping of domain knowledge [13]. This leads to machines that can figure out the knowledge of a particular domain [14]. This opens the door to many applications and fields. The benefits of having a good ontology that aids in developing semantic and knowledge management applications [15]. METHONTOLOGY is an orderly method used to create ontologies according to standards [16]. Personality prediction are recent research topic. Most of the research do not use ontology but machine learning is widely used. Some systems use ontology as a knowledge representation. An Ontology of text classification based on a three-factor
Enneaontology: A Proposed Enneagram Ontology
561
personality model is described. The proposed method uses linguistic feature analysis based on personality ontologies, supervised machine learning techniques, and questionnaire-based personality to detect personality [17]. Personality measurements system was presented. Social media text was classified into big five personality traits. The knowledge was represented by an ontology model and implemented as a dictionary system. Five algorithms were applied beside the ontology: Radix tree conversion model, database saving, csv processing, sentence processing and trait estimation [18], [19]. A job applicant system employed a personality measurement based on an ontology model using twitter text. User texts are analyzed using an ontology model for the Big Five Personality Traits [20]. This paper is the first one to model an ontology for the Enneagram personality model. This paper is going to address the design of the Enneaontology. The design is based on the METHONTOLOGY method. METHONTOLOGY is a set of guidelines that define how the ontology development process is applied. The phases of building the ontology are specification, knowledge acquisition, conceptualization, integration, implementation, evaluation and documentation. Enneaontology design consists of seven classes: Enneagram, feature, fear, desire, key motivation, problem and best. Enneaontology consists of three objects: reformer, helper and achiever. The design details of the ontology are going to be demonstrated.
2 Building The process of developing the Enneaontology follows the METHONTOLOGY methodology. The main phases of METHONTOLOGY are specification, knowledge acquisition, conceptualization, integration, implementation, evaluation and documentation. The questions that arouse in order to develop the ontology • • • • •
What are the main concepts in the Enneagram? How can these concepts be grouped together? What are the relations between these concepts? How can these relations relate by object properties? What’s the entities selected to detect personality?
The main purpose of Enneaontology is to build an automated personality detection for the Enneagram. The aim is to provide a generic model for the Enneagram. The Enneaontology uses natural language semi-formal format in order to represent the meaning of concepts in an abstract way. The scope covers basic characteristics for each personality. The main source of knowledge is the official Enneagram institute [21] description of each personality. The informal and formal analysis of the text is used as a knowledge acquisition method. The conceptualization is derived from the knowledge which consists of classes and object relations. The main concepts with the descriptions are going to be discussed. There is no integration with other ontologies during the development of the Enneaontology. Enneaontology is not dependent on any current ontology. Enneaontology is implemented in RDF/XML format which
562
E. Abdelhamid et al.
can be embedded with a lot of programming languages like java, python ... etc. The representation of the ontology is in a relational form which benefits in clear visualization and easy maintenance. The implementation is done using Protege 5.5.0. Protege supports concepts, a lot of facilities and operators [22]. The evaluation of the Enneaontology is based on the METHONTOLOGY framework. Based on this framework, The Enneaontology is evaluated as consistent, partially complete and correct. Consistent is related that all concepts mapped the domain with similarities. Partially Complete ontology is covering terms of the scope which the ontology is intended to handle. Correctness of ontology as it is mapping the scope domain in an accurate way. Information Documents are collected from knowledge sources during the knowledge acquisition phase. Conceptual Documentation consists of classes and object properties. This Conceptual documentation follows the same guidelines for all concepts and object properties. This Paper is considered documentation.
3 Design Enneaontology consists of seven classes: Enneagram, feature, fear, desire, key motivation, problem and best classes. Enneagram class represents Enneagram personality type. Feature class demonstrates the characteristics of the personality. Fear class declares the frightens of the personality. Desire class symbolizes the personality’s wishes and goals. Key motivation class explains the drives of personality’s behavior. Problem class illustrates the issues and troubles of the personality. Best class states the superior characteristics of personality. Personality became at their best when the person uses his/her qualifications in an efficient way. Every Enneagram personality has features, fears, desires, key motivations, problems and the best of himself/herself. Enneagram class has connections with these classes as shown in Fig. 1.
Fig. 1 Enneaontology class
Enneaontology: A Proposed Enneagram Ontology Table 1 Enneagram relations Domain Enneagram
Object property
Range
has Feature has Fear has Desire has Key Motivation has Problem has Best
Feature Fear Desire Key Motivation Problem Best
Table 2 Classes and attributes Class Attribute Feature Fear Desire Key_Motivation Problem Best Enneagram
563
FName FName DName KMName PName BName ENumber Feature.FName Fear.FName Desire.DName Key_Motivation.KMName Problem.PName Best.BName
Type
Description
string 1..* string 1..* string 1..* string 1..* string 1..* string 1..* short [1] string 1..* string 1..* string 1..* string 1..*
Features Names Fears Names Desires Names Key Motivations Names Problems Names Best Names Enneagram’s number (1–9) Enneagram’s Features Enneagram’s Fears Enneagram’s Desires Enneagram’s Key Motivations string 1..* Enneagram’s Problems string 1..* Enneagram’s Best
Enneagram has feature with feature class, fear with fear class, desire with desire class, key motivation with key motivation class, problem with problem class and has best with best class as shown in Table 1. Feature Class has attribute FName that describes features names. FName can contain one-to-many strings. Fear Class has FName as an attribute that represents the fears names. FName can have one to many strings. Desire Class comprises attribute DName which is the desires names. DName ranges from one-to-many strings. Key Motivation Class holds KMName attribute which represents key motivations names. KMName includes one string to many key motivations. Problem Class comprises PName attribute. PName is a one-to-many string problems names. Best Class implicates BName as an attribute. BName is the best names and can carry one string to many as shown in Table 2. Enneagram personalities are nine types. Each type is individual to the Enneagram class. These individuals are reformer, helper, achiever, individualist, investigator, loyalist, enthusiast, challenger and peacemaker. Reformer is the first Enneagram type.
564
E. Abdelhamid et al.
Fig. 2 Reformer object
Fig. 3 Helper object
Reformer is an object of the Enneagram. Reformer has unique features, desires, key motivations, fears, problems and best. Reformer features are rational, idealistic, principled, purposeful, self-controlled, perfectionistic, conscientious, ethical, teacher, crusader, organized, orderly and fastidious. Reformer fears are to corrupt, to be defective and to be evil. Reformer desires are to be balanced and have integrity. Reformer key motivations are to be beyond criticism, consistent, improve, justify, not condemned, right and strive. Reformer problems are resentment and impatience. Reformer best is wise, discerning, realistic, noble and heroic as shown in Fig. 2. Helper is the second Enneagram type. Helper is an object of the Enneagram. Helper has unique features, desires, key motivations, fears, problems and best. Helper features are interpersonal, caring, generous, demonstrative, pleasing, possessive, empathetic, sincere, warm hearted, friendly, self-sacrificing, sentimental and flattering. Helper fears are to be unwanted and unworthy. Helper desires are to be loved. Helper key motivations are to express, to be needed, to be appreciated and to vindicate. Helper problems are possessiveness and self-denying. Helper best is unselfish and altruistic as shown in Fig. 3. Achiever is the third Enneagram type. Achiever is an object of the Enneagram. Achiever has unique features, desires, key motivations, fears, problems and best. Achiever features are pragmatic, adaptable, excelling, driven, self-assured, attractive, charming, ambitious, competent, energetic, diplomatic and poised. Achiever fears are to be worthless. Achiever desires are to be valuable, to be worthwhile, to have success, image focused and status oriented. Achiever key motivations are to be affirmed, to distinguish, to have attention, to be admired and to impress. Achiever problems are
Enneaontology: A Proposed Enneagram Ontology
565
Fig. 4 Achiever object
workaholism and competitiveness. Achiever best is self-acceptance, authentic and to be a role model as shown in Fig. 4.
4 Discussion Enneaontology presents part of the Enneagram knowledge. Any Enneagram personality has Enneagram number, features, desires, key motivations, problem and best. These parameters give light to the direction of the personality. This ontology can be used in personality prediction systems using the Enneagram model, Unlike the traditional way to know the Enneagram by using assessment test. Humans are not interested in performing a test as it is boring and time consuming. Current personality prediction systems are based on different models like the big five, three factor model and MBTI. Enneagram gives a great advantage over these models which demonstrate personality in depth. Enneaontology cannot be measured in the effectiveness level alone. The ontology has to be applied in a personality prediction system in order to measure the accuracy level of the results.
5 Conclusion and Future Work Enneaontology has been proposed. Enneaontology holds seven classes. These classes are Enneagram, feature, fear, desire, key motivation, problem and best classes. Enneagram number, feature name, fear name, desire name, key motivation name, problem name and best name are attributes to the classes. Enneaontology also includes three objects. These objects are reformer, helper and achiever. This ontology is built according to the METHONTOLOGY design principles. The sequence of the building is specification, knowledge acquisition, conceptualization, integration, implementation, evaluation and documentation. This ontology helps to represent the Enneagram generic personality model. This ontology provides a starting point for researchers in
566
E. Abdelhamid et al.
order to develop more enneagram ontologies. Researchers can use this ontology to build automated personality detection tools. Future Work, the rest of Enneaontology objects will be completed. The future direction is to build automated personality detection using Enneaontology. Also, a system that can provide the state of personality in the case of healthy, average and unhealthy. Unhealthy states according to personality can predict suicide attempts and are prone to specific disorders.
References 1. G. Farnadi, G. Sitaraman, S. Sushmita, F. Celli, M. Kosinski, D. Stillwell, S. Davalos, M.F. Moens, M. De Cock, Computational personality recognition in social media. User Model. User-Adap. Interact. 26(2), 109–142 (2016) 2. B. Agarwal, Personality detection from text: a review. Int. J. Comput. Syst. 1(1), 1–4 (2014) 3. A. Vinciarelli, G. Mohammadi, A survey of personality computing. IEEE Trans. Affect. Comput. 5(3), 273–291 (2014) 4. V. Ong, A.D.S. Rahmanto, W. Williem, D. Suhartono, Exploring personality prediction from text on social media: a literature review. Internetworking Indonesia 9(1), 65–70 (2017) 5. V. Kaushal, M. Patwardhan, Emerging trends in personality identification using online social networks—a literature survey. ACM Trans. Knowl. Discov. Data (TKDD) 12(2), 1–30 (2018) 6. D.R. Riso, R. Hudson, The Wisdom of the Enneagram: The Complete Guide to Psychological and Spiritual Growth for the Nine Personality Types (Bantam, 1999) 7. A. Demir, O. Rakhmanov, K. Tastan, S. Dane, Z. Akturk, Development and validation of the nile personality assessment tool based on enneagram. J. Res. Med. Dental Sci. 8(4), 24–32 (2020) 8. A.M. Bland, The enneagram: a review of the empirical and transformational literature. J. Humanistic Couns. Educ. Dev. 49(1), 16–31 (2010) 9. R. Baron, E. Wagele, The Enneagram Made Easy: Discover the 9 Types of People (Harper Collins, 2009) 10. M. Matise, The enneagram: an enhancement to family therapy. Contemp. Family Ther. 41(1), 68–78 (2019) 11. M. Alexander, B. Schnipke, The enneagram: a primer for psychiatry residents. Am. J. Psychiatry Residents’ J. (2020) 12. A. Gómez-Pérez, M. Fernández-López, O. Corcho, Ontological Engineering: With Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web (Springer Science & Business Media, 2006) 13. N.F. Noy, D.L. McGuinness et al., Ontology development 101: a guide to creating your first ontology (2001) 14. T. Berners-Lee, J. Hendler, O. Lassila, The semantic web. Sci. Am. 284(5), 34–43 (2001) 15. J. Raad, C. Cruz, A survey on ontology evaluation methods, in Proceedings of the International Conference on Knowledge Engineering and Ontology Development. Part of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (2015) 16. M. Fernandez, A. Gomez-Perez, N. Juristo, Methontology: from ontological art towards ontological engineering, in Proceedings of the AAAI97 Spring Symposium Series on Ontological Engineering (Stanford, USA, 1997), pp. 33–40 17. D. Sewwandi, K. Perera, S. Sandaruwan, O. Lakchani, A. Nugaliyadde, S. Thelijjagoda, Linguistic features based personality recognition using social media data, in 2017 6th National Conference on Technology and Management (NCTM) (IEEE, 2017), pp. 63–68 18. A. Alamsyah, M.F. Rachman, C.S. Hudaya, R.P. Putra, A.I. Rifkyano, F. Nurwianti, A progress on the personality measurement model using ontology based on social media text, in 2019
Enneaontology: A Proposed Enneagram Ontology
19.
20.
21. 22.
567
International Conference on Information Management and Technology (ICIMTech), vol. 1 (IEEE, 2019), pp. 581–586 A. Alamsyah, S. Widiyanesti, R.D. Putra, P.K. Sari, Personality measurement design for ontology based platform using social media text. Adv. Sci., Technol. Eng. Syst. 5(3), 100–107 (2020) Y.M.F. Geovanni, A. Alamsyah, N. Dudija et al., Identifying personality of the new job applicants using the ontology model on twitter data, in 2021 2nd International Conference on ICT for Rural Development (IC-ICTRuDev) (IEEE, 2021), pp. 1–5 Enneagram Official Institute. https://www.enneagraminstitute.com/ M. Horridge, H. Knublauch, A. Rector, R. Stevens, C. Wroe, A Practical Guide to Building OWL Ontologies Using the Protégé-OWL Plugin and CO-ODE Tools Edition 1.0 (University of Manchester, 2004)
IoT Based Signal Patrolling for Precision Vehicle Control K. Sridhar and R. Srinivasan
Abstract Internet of Things basically provides the connectivity to the devices located even at the far places. Data can be gathered from the devices such as sensors, moving vehicles, devices, etc. and can be stored and accessed at the centralised hub with the help of Internet of Things devices. In this paper, precision vehicle control is achieved by signal patrolling with the help of IoT. Signal patrolling is achieved by radar in the signal junction. Sensors and cameras in the car, work in tandem to ensure that the traffic signal is respected to avoid crashes and accidental deaths. When the signal is sent by the radar, it is then received and interpreted by the sensors in the vehicle, and cameras in front and back of the vehicle work in tandem to stop the vehicle. Engine control unit plays a major part in bringing the vehicle to halt. On board diagnostics device plays a major role in achieving this milestone. Vehicles can also be controlled in blind spots to avoid over-speed with the help of sensors fitted in the vehicle. Using signal patrolling, signal jumping, accidental deaths, speeding, rash driving etc. can be controlled in the urban and suburbs areas.
1 Introduction With economic development and population rise in cities, vehicle count also increases simultaneously. More the vehicles, the possibility of accidents is higher, and thus safety measures and counter measures are to be implemented in all the levels. Reckless driving and autonomous vehicle can be controlled. IoT being used as a medium to create the smart city and its infrastructure, should ensure the connectivity and optimized network. The smart city software and hardware play a major role in shaping the city, where devices are used to optimize and to automate everything. Since traffic congestion is a severe problem that worsens as cities grow, smart infrastructure is an important part of smart city initiatives. Autonomous vehicles and roadside units work in tandem to increase the efficiency while consuming the lowest energy. Rapid K. Sridhar · R. Srinivasan (B) Department of Computer Science and Engineering, Vel Tech Rangarajan Dr Sagunthala R&D Institute of Science and Technology, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_47
569
570
K. Sridhar and R. Srinivasan
modifications in traffic junctions, such as autonomous systems, will become frequent as autonomous vehicles become more prevalent. Traffic junctions are used to control the traffic and to maintain the vehicle flow on the roads. Even in the existing systems, changing the phase duration can result in a compromise between the throughput of the intersection and the time it takes for vehicles to pass through it. Despite making only a small percentage of the road network, traffic crossroads are responsible for traffic accidents and delay in a significant amount. As a result, traffic engineers are constantly worried about the safe and efficient management of intersections. Therefore, fixed lights have been shifted to dynamic traffic signal lights to control the increased traffic flow, which manages traffic flow using a variety of communication, processing, and sensor technologies. Devices are used for smart infrastructure and to control the traffic in the smart city infrastructure and in signal junctions. This proves no proper use of data to plan the traffic congestion. Making use of the real time situation at the situation itself will avoid time delay in deciding factors. The aim of the project is to control the loss of lives of people and animals using signal patrolling with the help of software and hardware. The scope of the project is to reduce the number of road accidents which occurs due to rash driving, signal jumping, and uncontrolled vehicles because of which leads to the loss of lives of humans as well as animals. Here the uncontrolled vehicles need to be controlled in such a manner it should not harm others on the road. The following tools are used to achieve this. IoT—Internet of Things is used to achieve the desired results, and using IoT, real time validations and real time decisions are made. With the help of IoT, Sensors, Radars, and Vehicles are interconnected to achieve the desired result. Data are passed from radars to vehicle sensors in real time with the help of IoT devices and connections.
2 Literature Survey Sarrab et al. suggested advancing the concept of smart cities [1]. The infrastructures in a smart city are filled with smart devices that continuously produce large/huge amounts of data in multiple places, which is subsequently processed to provide the infrastructure intelligence. AI/ML was applied to improve the socioeconomic processes of society. Smart traffic management encompasses intelligent transportation systems with integrated radars and sensors, highway management and roadside units. Liyanage et al. [2], proposed that MEC offers computational resources that are near to IoT devices, allowing for major benefits such low cost in computation offloading. Edge computing services are available to a wide range of devices through the Internet of Things (IoT), allowing for further acceptance and advancement of MEC. Furthermore, the cloud infrastructure was decoupled.
IoT Based Signal Patrolling for Precision Vehicle Control
571
Santa et al. [3] suggested the basic on-board unit implementation in moving vehicles to connect the vehicles to the IoT, and data were collected using the integrated network of Low-Power Wide area network, NBIoT and LoRaWan networks efficiently. Simsek et al. [4] proposed that autonomous vehicles are used as mobile testing facilities to identify the infected cases and to detect early. Data were acquired through the wireless communication setup, 5G, Wireless fidelity and LTE used for wireless connectivity. Krasniqi et al. [5] proposed that the unit is designed for low-power, small-size devices, with a focus on mobility, but it can be expanded for traffic management things in smart city infrastructure. Such systems collect real-time traffic data and take the necessary steps to control and alleviate any traffic problems that develop as a result of traffic congestion. Seif et al. [6], pointed out that self-driving automobiles are rapidly approaching commercialization. Enhanced Driver Assistance Systems (EDAS) in the latest luxury cars, now allow for a safe and reliable self-driving experience on highways. Lidar that scans the environment around the car, provides this functionality. The information is supplied into actuators for steering, braking, and accelerating, enabling for automatic driving in less complex situations. This paved the way for the initial step in the development of self-driving cars. Kassu et al. [7], illustrated that sensor/actuator devices in today’s vehicles are commonly connected using a cabled network technology known as CAN (Controller Area Network). When it comes to traffic deaths, the number of fatalities in urban areas are 34% higher. Yousefnezhad et al. [8], determined the product life cycle of IoT devices. IoT devices were planned, designed, and developed. But the security part was mostly ignored from designing and development side as well. This made the device to become vulnerable to hackers. Main problem arised when the devices were manufactured by different manufactures, where they all got access to the devices hardware and manufacturing details, and usually common passwords were used while development. Ramapriya et al. [9], proved that the presence of traditional sirens and hooters on emergency vehicles waiting in traffic lanes makes them easily identifiable. Ambulances and fire trucks typically have these sounding devices installed on their roofs. The lane in which the ambulance is approaching was detected by a sound sensor positioned along the sides of the road, which was connected in a circuit with a microprocessor programmed to detect its authorised spectrum of frequencies. When an ambulance or fire truck sounds its siren, the relevant traffic signal light in the lane turned green, until the ambulance or fire truck passed through the intersection. Philip et al. [10], illustrated that the number of cables required grows in tandem with the number of sensors on cars, resulting in increasing weight and maintenance expenses. Due to the connections, sensor placement was also limited. The complexity of audio/visual data/information used in Enhanced Driver Assistance Systems (EDAS) and entertainment was increasing. As a result, additional hardware
572
K. Sridhar and R. Srinivasan
was required in addition to software, i.e., devices like sensors, software, were necessary to complete the functions equivalent to those of an AI. Audio/Video sensors produced the following data in real-time: Vehicle speed, Gas levels, tyre pressure, Geo tagging, etc. The total data created per year by 100,000 autos exceeds 100 terabyte. Nurzaman et al. [11], determined that each framework module is a collection of many networks with multi-hop communication. As a result, the end-to-end delay of the dense network topology was the key concern. The most common types of delays are as follows: Delays in transmission leads to slow processing, time taken will be higher if there are multiple hops, and Euclidian distance defines the propagation delay between wireless signals and sites. Different technologies like as NB-IoT was created for long connections, but slow processing capabilities made them unsuitable for large-scale data transfer and fog computation. The WiLD network, on the other hand, may be able to accommodate both of them. WiLD is a network that connects rural areas with Internet access sites. Anandraj et al. [12] provided a model with specifications for autonomous spotting of the location of the accident or vehicle breakdown with ambient information as soon as possible using a communication module incorporated in the system. The developed framework had no geographical or temporal constraints. The Python programming language was utilised to access and handle the detected data on notification services. A Raspberry Pi 3 model, as well as a MEMS sensor and a vibration sensor, were used to create the prototype. Sensor data, such as position and ambient image, were transmitted to a rescue or approved email address. The experimental study was performed on a variety of metrics to demonstrate its efficiency and accuracy in comparison to other real-time methodologies. Karami et al. [13] proposed smart transportation planning using data, models and algorithms. Multiple models were discussed to make transportation smart and also connected vehicles were included in the research article. Predictions including Kalman filtering, Random walk, Deep learning, KNN and ARIIMA their drawbacks were also discussed, and presented the forecasting models for business adoption. Khayyam et al. [14] proposed that Internet of Things and Artificial intelligence are combined for autonomous vehicles. Machine learning tools gathered lots of information from real word subjects such as smart cities and connected vehicles. Data gathered from these sources were used for decision making and analysed for new creations. AI and IoT were used for assisting in autonomous vehicles. Yao et al. [15] explained the challenges and opportunities in IoT in terms of security and privacy issues. The physical security of IoT was discussed with protecting technologies. Once the devices reached its life span, then IoT devices were discarded but not discarded according to the terms, which led to data leakage, access control and other environmental issues.
IoT Based Signal Patrolling for Precision Vehicle Control
573
3 Proposed Algorithm Proposed system consists of radars, sensors, connected vehicles and on board diagnostic device. These devices work in tandem and communicate with each other to control the vehicle. It is achieved using the following steps. 1. 2.
Radar/Sensor implementation on the signals. Vehicle On-Board Diagnostic (OBD) sensor implementation.
Precision vehicle control is achieved by signal patrolling with the help of IoT. Signal patrolling is achieved by Radar in the signal junction. Sensors and cameras in the car work in tandem to ensure that the signal light is obeyed to avoid crashes and accidental deaths. When the signals are sent by the radar, it is then received by the sensors in the vehicles waiting or passing the signal. The received signals are interpreted by the OBD sensors in the vehicle, which controls the engine to stop the vehicle if required.
3.1 Signal Patrolling: Radar/Sensor Implementation on the Signals The radar should be implemented in the road side signal junctions to send the data to the vehicles. Radar sends the traffic light signals to the vehicles whenever the traffic signals change. Here, radar sends the information (data) i.e., which the radar received from admin or the pre-programmed information available with signal junction, to the information sensor present in the car. Radar broadcasts the data arbitrarily so that all the devices can receive the data at different intervals, because passing data only once is not enough since the signal changes every second and the updated data needs to be sent to the sensor in order to control the vehicle. Data passing happens every millisecond. If the radar needs to be configured for enhancement or to change the signal settings, it can be done using the connected grid or satellite connection which is already available.
3.2 Signal Patrolling: Vehicle On-Board Diagnostic (OBD) Sensor Implementation Data from the radar is received to check the signal status and to control the vehicle. The signals (data) passed from the radar will be received by the sensors or On-Board Diagnostic (OBD) sensor in the car which basically has access to almost all the devices in the car. Electronic Control Unit (ECU) is also being controlled by the OBD. By tapping the OBD unit, access to ECU to control the car engine can be obtained.
574
K. Sridhar and R. Srinivasan
Fig. 1 Architecture of signal patrolling
Once the data are passed to the ECU, the sensor and cameras around the car are activated to check for proximity while controlling the vehicle to avoid the collision. Proximity sensor and camera will pass the input to the ECU to control the vehicle if needed to. All these works in tandem to ensure the accident do not occur because skipping the signal is avoided. Likewise, if the signal changes again, the data is passed from the Radar to the vehicles’ OBD devices, which will process the data for the received signal. Vehicle will come to a halt depending on the data received from the radar. Once the traffic signal changes, again the radar passes the signal information to the vehicles in the road, and this happens all over again and again (Fig. 1).
3.3 Pseudocode for Proposed Model Step1: Radars should be setup in the signal junctions. Step2: Setting up radar to the centralized hub or respective authorizer. Step3: Creating the application programming interface for the service and access. Step4: Initiating the connection with the radar present in the signal junction. Step5: Radar is configured to transmit the signal to the vehicles present on the road when the vehicles meet the radar. • On-Board Diagnostics device sensor receives the signal from radar, and once the signal is received it is then processed by the OBD sensor. • If the length of the signal junction is longer, then multiple sensors are placed at a certain distance to ensure the transmitted signal reaches the vehicles at a farther distance or curved roads.
IoT Based Signal Patrolling for Precision Vehicle Control
575
– OBD sensor interprets the data and controls the Engine Control Unit depending on the signal it interprets. – Camera and sensors in the sides of the car are activated for safer and effective breaking to avoid collision with the nearby vehicles. Step6: When the traffic signal changes, again the changed signal is transmitted to the vehicles.
4 Results and Discussion As seen in the Figure 2, device located at the traffic signal sends out the signal to the vehicles on the road, the vehicle gets in contact with the signal from radar and the received signal is processed by OBD device and sends the information to the engine control unit. Signals are changed from time to time, and when the process starts, it is repeated all over again. In the research, it is found that the device is secured as it is connected to the wired network, and all the sensors are connected directly and linked to each other, which avoids tampering from external factors. Figure 3 depicts the performance analysis of the existing research, where the values show the availability, device up time its percentage of working and other features.
Fig. 2 Result/expected output
576
Fig. 3 Performance analysis of the existing research
Fig. 4 Performance analysis of the current research
K. Sridhar and R. Srinivasan
IoT Based Signal Patrolling for Precision Vehicle Control
577
Figure 4 depicts the performance analysis of the proposed research model, which shows the analysis of different aspects and metrics of IoT based signal patrolling for precision vehicle control. When IoT devices are used for implementation, its main concern related to the device is the device up time. Since most of the IoT devices connect to Raspberry pi, their start up time is almost the same. Device’s security is about connecting and transferring data from source to destination. In previous research, wireless devices were used for connecting and transferring data since it was suitable for their requirements. Whereas, in signal patrolling, data and security is utmost important as it controls the vehicles in real time, and hence wired connectivity is used and preferred, moreover this avoids tampering. As it is connected through the wired connection, speed is constant throughout the connection. Existing models gathered the data using the radar and sensors in real-time, gathered from the radar/sensors placed in the roadside, models identified the congestion, the time when the congestion might occur, predicting accidents using the gathered data, etc. In the proposed model, the radar/sensor is used to control the vehicle in real-time in signal junctions, which gives the edge in controlling the accidents and rash driving. Vehicles cannot speed in or around the signal junctions or skip signal because, whenever vehicles approached the traffic signal, vehicle will be slowed down depending on the signal the OBD device in the car receives, which controls engine. The range of signal patrolling by default is 70 meters, but this can be increased by adding extra sensors along the signal length. The response time of OBD sensor is 18ms, i.e., once the information is received it will process and send the information to the Engine control unit.
5 Limitations and Future Work Limitations of the model is that, transmitted signals from the radar may not reach the vehicles if the roads are curved and the traffic signal is not in a straight line. In this study energy usage of the devices is not considered. And the energy used can be from renewable energy source such as solar, wind, etc. For the connectivity to the internet, broadband is only considered. However, multiple connectivity options such as Wi-Fi and satellite connectivity can be explored along with multiple miniature radars for curved roads. Adding exceptions to the emergency vehicles such as ambulance and VIP vehicles can be considered.
6 Conclusion With the help of radars in the signal junction connected with Internet of Things, sensors and vehicles’ on-board diagnostic device work in tandem to ensure to effectively control the accidents, rash driving, and signal skipping, which in turn causes deaths. Features of this proposed model is that respected authorities can configure
578
K. Sridhar and R. Srinivasan
and control the radar present in the junctions at any given time. Once the radar is configured it is then enabled to send the signals arbitrarily to the vehicles to control the speed of the vehicles and communicate the current signal to vehicles. When the data is received by the vehicles’ OBD sensor, it will then send the input and control the engine control unit of the vehicle. Whenever the signal changes, again the radar transmits the signal arbitrarily to the vehicles. Pedestrians, animals, and commuters are protected from the accidents which happen because of rash driving and signal skipping. Engine control unit and Internet of Things, and other hardwares are interconnected, which works to control accidents.
References 1. M. Sarrab, S. Pulparambil, M. Awadalla, Development of an IoT based real-time traffic monitoring system for city governance. Glob. Trans. 2, 230–245 (2020) 2. M. Liyanage, P. Porambage, A.Y. Ding, A. Kalla, Driving forces for multi-access edge computing (MEC) IoT integration in 5G. ICT Exp. 7(2), 127–137 (2021) 3. J. Santa, L. Bernal-Escobedo, R. Sanchez-Iborra, J. Santa, L. Bernal-Escobedo, R. Sancheziborra, On-board unit to connect personal mobility vehicles to the IoT. Procedia Comput. Sci. 175(2019), 173–180 (2020) 4. M. Simsek, A. Boukerche, B. Kantarci, S. Khan, AI driven autonomous vehicles as COVID19 assessment centers: a novel crowd sensing-enabled strategy. Pervasive Mob. Comput. 75, 101426 (2021) 5. X. Krasniqi, E. Hajrizi, Use of IoT technology to drive the automotive industry from connected to full autonomous vehicles. IFAC-PapersOnLine 49(29), 269–274 (2016) 6. H.G. Seif, X. Hu, Autonomous driving in the iCity—HD maps as a key challenge of the automotive industry. Engineering 2(2), 159–162 (2016) 7. A. Kassu, M. Hasan, Factors associated with traffic crashes on urban freeways. Transp. Eng. 2(July), 100014 (2020) 8. N. Yousefnezhad, A. Malhi, K. Framling, Journal of network and computer applications security in product lifecycle of IoT devices: a survey. J. Netw. Comput. Appl. 171(July), 102779 (2020) 9. R. Ramapriya, P. Mp, G. Ap, A. Kamath, A. Srinivas, M. Rajasekar, IoT green corridor. Procedia Comput. Sci. 151(2018), 953–958 (2019) 10. S. Sachdev, J. Macwan, C. Patel, N. Doshi, Voice controlled autonomous vehicle using IoT. Procedia Comput. Sci. 160, 712–717 (2019) 11. N. Ahmed, D. De, S. Member, I. Hussain, Internet of Things (IoT) for smart precision agriculture and farming in rural areas. IEEE Trans. 11, 2327–4662 (2018) 12. A.P.S. Anandraj, P. Nandhini, A.A.A. Punitha, K.R., A new vehicular emergency model based on IoT, in 2021 6th International Conference on Communication and Electronics Systems (ICCES) (2021), pp. 643–648. https://doi.org/10.1109/ICCES51350.2021.9489092 13. Z. Karami, R. Kashef, Smart transportation planning: data, models, and algorithms. Trans. Eng. 2(100013), 100013 (2020) 14. H. Khayyam, B. Javadi, M. Jalili, R.N. Jazar, Artificial intelligence and internet of things for autonomous vehicles, in Nonlinear Approaches in Engineering Applications (Springer International Publishing, Cham, 2020), pp. 39–68 15. X. Yao, F. Farha, R. Li, I. Psychoula, L. Chen, H. Ning, Security and privacy issues of physical objects in the IoT: challenges and opportunities. Digit. Commun. Netw. 7(3), 373–384 (2021)
Land Use/Cover Novel Dataset Based on Deep Learning: Case Study of Fayoum, Egypt Rehab Mahmoud, Haytham Al Feel, and Rasha M. Badry
Abstract Land use and land cover classification is essential for monitoring studies, resource management, and planning activities. The accuracy of such classification depends on set of factors such as the resolution of input data/images and the size of dataset. The resolution and size of input dataset/images determine the definitions of classes; So, the low resolution images define low number of classes. We select a case study of Fayoum governorate, Egypt for this research; the previous land use/cover classification researches for that studying area are applied on low resolution satellite images with 30m spatial resolution and small size of dataset. These researches were conducted using supervised classifiers; the objectives of this context are building a novel land use/cover dataset for Fayoum governorate using deep learning model. In the proposed study, we used images with high resolution of 3 m, the proposed dataset is called FayPDT. The proposed classification model is trained and conducted on two types of data; raster and vector data; using deep and machine leaning algorithms such as Artificial Neural Network (ANN), Random Forests (RF), and SVM. The proposed classification model is called Deep learning classification model. The Deep classification model is tested with the precision, recall, f-score and kappa index, and the test result for the proposed model is 97.1% overall accuracy for Artificial neural network.
1 Introduction Land cover refers to the surface cover on the ground, whether vegetation, urban infrastructure, water, bare soil or other. Identifying, delineating and mapping land cover are important for global monitoring studies, resource management, and planning R. Mahmoud (B) · R. M. Badry Faculty of Computers and Information, Fayoum University, Faiyum, Egypt e-mail: [email protected] R. M. Badry e-mail: [email protected] H. A. Feel Faculty of Applied College, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_48
579
580
R. Mahmoud et al.
activities. Identification of land cover establishes the baseline from which monitoring activities (change detection) can be performed, and provides the ground cover information for baseline thematic maps [1]. While the function or the purpose of the land is called land use, Land use examples such as recreation, wildlife habitat, or agriculture. Land use applications involve both baseline mapping and subsequent monitoring, since timely information is required to know what current quantity of land is, in what type of use and to identify the land use changes from year to year [2, 3]. Machine Learning (ML) is defined as programming computers to optimize a performance criterion using an training data or past experience. On the other hand, ML uses the theory of statistics in building mathematical models, because the core task is making inference from a sample. It solves complex problems such as handling a large datasets and continuous data [4]. Deep learning is a sub-field of ML that concerned with algorithms like Deep Neural Networks (D N N ) [5, 6], these algorithms simulate the human brain. Recently the deep learning is applied in remote sensing domain, one of advantages is the working on large size of dataset that enhance the learning process. On the other hand, the large number and dimensions on vectors of spatial data can be proceed in deep leaning. Remote sensing is concerned with scanning the earth by a set of satellites, the images given by these satellite is important to be collected and analyzed [5]. Using Models of deep learning and machine learning in LULC Map generation or LULC dataset creation are discussed in many recent researches. Early; some researches used images with low resolution as 30m, so the quality of getting LULC dataset was not high because of the low quality of used images. These images had a bit of details and objects that will be classified [7]; Because of that most of generated LULC maps was holding by maximum three or four classes. For example; (Allam et al.) proposed a study on certain area in Egypt “Fayoum”; this study is applied on dataset of images with spatial distance 30m. The getting Maps is classified for four main classes (agricultural land, urban area, barren land, and water class) [7]. The objective of this context is using high quality satellite to establish a novel dataset for Fayoum, Egypt governorate, Egypt. Fayoum is located 100 km southwest of Cairo in Middle Egypt and coordinates are Latitude: 29◦ 18 35.82 N and Longitude: 30◦ 50 30.48 E; as shown in Fig. 1.Since Fayoum has a variety of types of land use and land cover. The proposed Novel dataset for Fayoum is called FayPDT, the dataset was built on images with 3.0m resolution or spatial distance. This resolution is considered a high resolution compared with the resolution of images that used in previous studies [7]. So,the proposed FayPDT dataset adds a new definition classes for land use/ cover based on location and spatial distance. On the other hand, the proposed dataset is generated by using deep learning model, the model is trained, validated, and tested on raster and vector data. Then, the getting results of deep learning is compared with the results given by another machine learning models like SVM and random forests. The remaining sections of this paper are divided as follow Sect. 2 related work, Sect. 3 Framework and methodology, Sect. 4 results, and Sect. 5 conclusion.
Land Use/Cover Novel Dataset Based on Deep …
581
Fig. 1 4-band PlanetScope scenes in RGB color space
2 Related Work Many recent studies are used ML to predict land use and land cover. Some of these researches focused on object detection, while others focused on the Land use/cover classification or the both techniques, these are debated here in details. In this study [7] The objectives is to use Maximum Likelihood Classification (MLC), which is a supervised learning, and normalized difference vegetation index to track the shift in LULC in Fayoum governorate, Egypt. The changes in LULC in the study region were used to assess (3077 km2 ) on 1984, 2001, and 2010 Landsat satellite images. In 2016, ground truth points were gathered and used to conduct the accuracy test for the classified maps. Four LULC classes were determined based on MLC; agricultural land, urban area, barren land, and water class. Two classes were categorized; vegetated and non-vegetated. The results showed that the MLC accurately identified the LULC classes with an overall accuracy of 96%. On the other hand, authors in [4] constructed a benchmarks dataset for land use/cover classification. The novel dataset consists of 10 classes based on 13 different spectral bands of 27.000 labeled images. To classify dataset they used Convolutional Neural Network (CNN). By comparing the proposed dataset with the existing ones the authors found the accuracy of the proposed approach is 98.57 %. On the other
582
R. Mahmoud et al.
hand, their contribution opens the gate for a lot of the land use applications to monitor and detect the earth surface changes. Authors in [2] want to see how multispectral satellite imaging (optical and radar) affects land use and land cover classification. The study site is located in located in southwest France and they aim to detect six types of LULC (wheat, barley, rapeseed and grassland and two clasess from bara soil). The dataset covers 211 plots and multispectral images have different resolutions from 2 to 8 m spatial distance. They classified 211 plots into four main groups. To conduct the classification, They applied machine learning classifiers such as RFs and SVMs and they achieved 0.85 as an over all accuracy and Kappa with 0.81 for RF. Some of previous studies focused on improving the accuracy of object detection like studies [3, 8]. Authors in [3] established W FC N N classification model based on C N N ; this model is trained on six aerial four-band images and tested on 587 label image; Two versions have been experimentally tested on the basis of Gaofen 6 images and aerial images. They developed a new Weight Feature value Convolution Neural Network (WFCNN) to perform remote sensing image segmentation and extract enhanced land use information from remote sensing imagery. One encoder and one classifier are included within the method. A collection of spectral features and five levels of semantic features are obtained by the encoder. It uses the linear fusion to hierarchically fuse the semantic characteristics, uses an adjustment layer to optimize every fused stage characteristics. The results clearly show that the WFCNN can use remote sensing imagery to boost the accuracy and automation level of large-scale land-use mapping and the extraction of other information. The classification was built on pixel-by-pixel segmentation and their model is compared to the SegNet, U-NET, and RefineNet models that were widely used. They obtained 91.93% for precision, 94.14% for recall, 0.9271 for F1-score. The WFCNN’s precision, accuracy, recall, and F1-score were higher than those of the other versions. So, they recommended the use of pixel-by-pixel segmentation. While authors in [8] demonstrated a Network on Convolutional feature maps (NoCs) to improve the accuracy of object detection. Despite the existing system that use techniques like SPPnet and Fast/Faster R-CNN, the proposed NoCs uses deep convolution classifiers for feature extraction. Based on their observation they take the first step of MS coco challenge dataset that addressed 2015 by presenting a novel model away of Faster R-CNN with ResNets. From the explained related studies, we founded that the performance of classification and map generation depends on set of factors like the number of bands and resolution of satellite images. The use of deep learning and supervised learning give high accuracy than the traditional classifiers.
Land Use/Cover Novel Dataset Based on Deep …
583
3 Framework and Methodology The Proposed LULC framework is showing the main phases and processes for the proposed deep learning model and FayPDT dataset. It holds two phases as shown in Fig. 3, these phases are preprocessing and and classification phase. The proposed Model is trained on raster and vector data in the following we illustrate the raster data, vector data, the conversion from raster to vector data, then we will discuss the phases of the proposed framework. Satellite imagery is always in raster format, with a unique signature for each pixel. Each cell in an image has its reflection measured at a certain wavelength. The cells may cover several hundreds of meters square on the earth’s surface, with the area covered being a function of the number of cells [9]. The cell must be small enough to capture the required detail while yet being large enough to allow for effective computer storage and analysis. The greater the cell size without impacting accuracy, the more homogeneous an area is for crucial characteristics such as topography and land use [9, 10]. Vector data is a type of item that is described using mathematical symbols. It is the stored as geometric objects. Simple geometric objects like as points, lines, polygons, arcs, and circles are used to represent vector data. A city, for example, could be represented by a point, a road by a collection of lines, and a state could be represented by a polygon. A raster or vector model can be used to reflect a given real-world situation [11]. The process of getting the vector data from raster data is shown in Fig. 2. The raster data is holding a set of objects that identified by the shape and color, to create a vector layer from raster data a set of the objects are determined and represented as a polygon with a unique color. The first step is dividing the raster images into a set of objects with an color segment as proposed in Fig. 2; then determining the objects features like shapes and texture, then defining these features to be digitizing. After the digitizing process, we need to assigning a unique color for the objects. In case that we will classify the data the class number will be given for that objects.
Fig. 2 Vector data from raster dataset
584
R. Mahmoud et al.
Fig. 3 Classification framework for LULC Dataset
The input for the proposed model is raster and vector data, in the following the framework • Preprocessing • Mosaic images Mosaic images is the process of converting the input multi-bands images with lower resolution into one image with a higher resolution. • Define objects Defining objects concentrates on answering the questions of what is the object? And what is the category which it belongs to? And finally, what are the different categories in our image’s dataset? • Defining of classes Define the classes according to the interpreted objects and features. Firstly, defining set of classes, then assigning the class for the digitized objects. • Label Objects The process of assigning class number for a defined objects; the assignment bases on attributes of each class, in addition to the shape and color of objects
Land Use/Cover Novel Dataset Based on Deep …
585
• Generate vector layer Vector layer is holding a sample of all objects that extracted from input images. The first step is creating a sample vector layer from each unique feature/object, then digitizing different objects and storing the layer. • Classification phase Which means building of the classifier model from the training dataset, at this phase we apply one or more ML classifier; model building phases consists of the following processes: • Machine learning classifier: At this process we apply a set of ML classification algorithms such as SV M, R Fs, and AN N ; the model have two inputs. First is the raster mosaic image and second is the vector layer which holds predefined samples. Another input into the ML classifier is classifier’s parameters; setting parameter and assigning their values are discussed in details in the section of results. Deep learning and ML models that used in proposed framework is discussed here: Random forests classifier is a collection of tree classifiers, where each classifier is constructed using a random vector selected separately from the input vector, and each tree casts a unit vote for the most popular class to categorise an input vector [12]. The Gini Index is used by the random forest classifier as an attribute selection measure, which measures the impurity of an attribute in relation to the classes. Choosing one instance (pixel) at random from a given training set T and declaring it to belong to some class Ci, the Gini index formula can be written as [13, 14]:
( f (Ci , T ) / |T |) f C j , T / |T |
(1)
i= j
where f (Ci, T )/|T | is the probability that the selected case belongs to class C. SVMs which are based on statistical learning theory, is to determine the position of decision borders that yield the best separation of classes. SVMs choose the one linear decision boundary that leaves the most margin between the two classes in a two-class pattern recognition task when classes are linearly separable. [15] The margin is the total of the distances from the nearest points of the two classes to the hyperplane. SVMs were originally intended to solve binary (two-class) issues. When working with many classes, a multi-class method is required. For multi-class situations, techniques such as ‘one versus one’ and ‘one against the others’ are frequently used. SVMs can be written as [16]. Maximi ze
n i=1
αi −
n n 1 αi α j yi y j .K (xi , x j )Subject to : αi yi , 0 ≤ αi ≤ C 2 i, j=1 i=1
(2) where the coefficients α j are non-negative. The xi with α j > 0 are called support vectors. C is a parameter used to trade off the training accuracy and the model complexity so that a superior generalization capability can be achieved. K is a kernel
586
R. Mahmoud et al.
function transforms the data into a higher dimensional feature space to make it possible to perform the linear separation. SVM has two cost parameters are C and Nu. The C parameter trades against maximization of the decision function’s margin from the proper classification of training examples. A smaller margin would be tolerated for greater values of C if the decision function is better at correctly classifying all training points. At the cost of training precision, a lower C would encourage a greater margin, thus a simpler decision feature. In other words, in the SVM, C conducts as a regularization parameter. The parameter nu is an upper bound on the margin error fraction and a lower bound on the support vector fraction relative to the total number of examples of instruction. For instance, if you set it to 0.05, you are guaranteed to find that at most 5% of your training examples are misclassified (at the expense of a small margin, though and support vectors are at least 5% of your training examples. ANN is one of the classifier models which is applied to build the model and its formula is mentioned in Eq. 3. yi = f (
xi × wi j) f (x) =
1 1 + e−x
(3)
where Y is the output of your network, X the input being the connection weight between two layers. f (x)is the activation function of the neuron.
4 Results and Analysis 4.1 Dataset and Constructional Environment In this context, the used dataset covers 10 km2 of Fayoum city with 65 images that measures 7939 × 3775 pixels with spatial resolution 0.51 m per pixel. The images is extracted from 4-band PlanetScope scene in RGB color space. Another images is captured from google earth to enhance the images resolution. The dataset is not classified, the classes is manually identified based on experts as shown in Table 1. Table 1 shows the proposed eight classes with sub-classes definition for LU/LC. Sub-classes description defines features included in the main class. The proposed model is developed using qGIS open source tool in addition to ArcGIS desktop on Windows 10 platform. According to the proposed model, the data was collected and represented. Figure 4a shows different images of the captured scene for Fayoum city on 10 km2 . These images are gathered and displayed to present the over all scene in RGB color as presented in Fig. 4b.
Land Use/Cover Novel Dataset Based on Deep …
587
Table 1 The proposed: classes types (LU), subclass description, Land cover (LC), and their Corresponding colors
LU
Sub-classes description Houses Green spaces Residential Building Green area permanently crops Agriculture seasonal crops Bare soil
Bare soil
Water
Roads
Water lines, sea Governmental Nongovernmental Building Universities Schools Trees Small green areas Discreate small green Circle High way, rail way
Desert
Desert land
Building
Trees
LC Corresponding color Building Flat areas Grassland Crops Orchards Green space Grey color Parable color Water, asphalt Building White color
Discreate small Green circles asphalt road Yellow land Stone land Empty space
4.2 Results A mosaic image was generated from multi-band input images Fig. 4a, b. It is a high resolution image with 4 band (red, green, blue, and near-infrared) and it is generated by re-sampling input images using K N N approach. Figure 4c presents a sample of a generated mosaic image. After that, the features for objects are detected. Then, the objects are classified based on their types, color, shape, and land use/cover. The output classes are categorized into eight classes associated with the description of sub-classes as shown in Table 1. Now; new set of classes are defined at the current step. Then a training dataset was created. At the beginning we are digitizing the detected objects for each class and storing them as an isolated class in the vector layer. The second step after finishing the digitizing process is labeling classes in training set so, a unique number is assigned as shown in Fig. 4d, finally; we will give a unique color for each class as shown in Table 1. To the classification process, the input images for the model are divided according to the vector layer into a set of samples. The sampling strategy is applied to all classes
588
R. Mahmoud et al.
(a) Raster images captured from Plan- (b) Scene images with 4-band and RGB etScope color before KN N re-sampling
(c) Mosaic layer that created by KN N re- (d) Training set; Raster moasic and vector sampling layer Fig. 4 Training set; Raster and mosaic images and vector layer
based on number of samples in smallest class which is called a number of required samples. Sampling strategy is fitting the number of samples based on the smallest class which is 119 for class 2. The 119 samples is divided 50% for training and 50% for testing. The confusion matrix shows the number of predicted label for samples of each class. The RF Confusion Matrix showed in the Table. 2 indicates that two classes are class 2 and class 5 with full correctly predicted samples and lowest interference, and then class 4 was predicted with 58 correct samples/59. Class 6, class 7, and class 8 are three grades of 55, 53, and 50 right samples/59. Class 3 expected 45 correct samples out of 59 and the highest interference in Class 1 with 38 correct samples out of 59. Table 3 shows the confusion matrix of SVM, the most interference class with other classes being class 1, the second being class 3 with 49 correctly predicted sample of 59. On the other hand, in the third level of low interference classes with 57 correct predicted samples out of 59. Both class 4 and class 6 are well sperate with 58 correct sample out of 59 one class 5 in the third level of low interference classes with 57
Land Use/Cover Novel Dataset Based on Deep …
589
Table 2 Confusion Matrix for RF Reference label Predicted label [1] [2] [3] [4] [5] [6] [7] [8]
[1]
[2]
38 0 9 0 0 4 6 0
0 59 0 0 0 0 0 0
[3]
[4]
[5]
[6]
[7]
[8]
3
0 0 2 58 0 0 0 0
0 0 0 0 59 0 0 6
0 0 1 0 0 55 0 0
19 0 0 1 0 0 53 0
0 0 2 0 0 0 0 50
2 0 45 0 0 0 0
Table 3 Confusion matrix for SVM Reference label Predicted label [1] [2] [3] [4] [5] [6] [7] [8]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
39 0 6 0 0 1 3 0
0 59 0 0 0 0 0 0
5 0 49 0 0 0 0 0
0 0 1 58 0 0 0 0
0 0 0 0 57 0 0 2
0 0 3 0 0 58 0 0
15 0 0 1 1 0 56 1
0 0 0 0 1 0 0 56
correct predicted samples out of 59. Both class 7 and class 8 out of 59 total sample have 56 correct sample. Table 4 AN N confusion matrix shows predicted labels versus reference labels, we find that class 2 and class 5 are classes of maximum separation with non-interference versus other classes. The majority of other classes are in range of 56 to 58 correct predicted samples/59 in high separation or low interference groups.
4.3 Evaluation The accuracy for the applied models is measured by precision, recall, and F-score. These accuracy measures are obtained by Equations (4), (5), and (6). Table 5 contains the results of the accuracy tests performed by the three classification models.
590
R. Mahmoud et al.
Table 4 Confusion Matrix for AN N Reference label Predicted label [1] [2] [3] [4] [5] [6] [7] [8]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
58 0 2 0 0 0 2 0
0 59 0 0 0 1 0 0
1 0 56 0 0 1 0 1
0 0 0 58 0 0 0 0
0 0 0 0 59 0 0 0
0 0 1 0 0 57 0 0
0 0 0 1 0 0 57 0
0 0 0 0 0 0 0 58
Table 5 Accuracy measures precision, recall, f-score for SVM, RF, ANN Model/acc measure SVMs RFs ANN Precision Recall F-score
0.917814 0.915254 0.914115
0.888842 0.883475 0.883465
Table 6 Kappa index for SV M,R Fs, and AN N Classifier AN N R Fs Kappa index
0.93130
0.866828
0.974244 0.972054 0.9743
SV M 0.903148
pr ecision/class T otal no.classes r ecall/class r ecall = T otal no.classes F − scor e/class F − scor e = T otal no.classes Pr ecision =
(4)
(5)
(6)
Kappa index is measured for the three models and recorded on Table 6 kappa index is derived from Cohen’s Kappa formula, which is the probability of agreement removing the probability of random agreement divided by 1 minus the probability of random agreement. AN N is the model of highest kappa index with 0.93130, the higher kappa index the more efficient and identical classification is.
Land Use/Cover Novel Dataset Based on Deep …
591
4.4 Comparative Analysis We conduct a comparison on related studies; that comparison is separated into two parts; first: papers that works on the same geographic area, second: papers that applied the same classifier models. Allam et al. focused on in the Fayoum governorate, Egypt and this study covers area of 3077 m2 . The satellite images that they used have a 30m spatial resolution, and they aimed to detect LULC shifts in the Fayoum governorate, Egypt. The dataset that they used consisted of only four classes and it was a labeled data as mentioned in Table 7. They used a supervised learning classifiers MLC and NDVI. The proposed deep learning model is applied on 3 m spatial resolution satellite images that covers area of 10000 m2 of the Fayoum governorate, Egypt. The high-resolution images that used in the proposed FayPDT dataset give the ability to detect large numbers of images features. On the other hand, they used images with three band RGB while we used images with four bands (RGB- near infrared), this makes objects and their characteristics being more clearly identified. Because of that the proposed FayPDT dataset has eight definition classes with compare of 4 classes on their study. They trained their model with six three bands satellite images, while the proposed model is rained on raster and vector data; 65 four-band satellite images and vector layer. On the On other hand, they used the MLC supervised learning model for classification and the proposed deep learning model applied three classification models SVM, RF, and ANN. Finally, the proposed deep learning model outperformed the related work and achieved 97.1% as an accuracy while the related work achieved 96%. Table 7 shows some comparison elements on these two studies. In this study (Krueger et al.) applied machine learning classifiers such as RFs and SVMs on multispectral satellite imaging (optical and radar). Their study is conducted on 211 plots at the site of southwest France. They used multispectral images with different resolution from 2 to 8 m spatial distance. The plots were grouped to four classes. At the classification process RFs and SVMs are applied and they achieved 0.85 as an over all accuracy and Kappa with 0.81 for RF. The proposed deep learning models for FayPDT achieved 0.86 for Kappa index on RFs, 0.9031 on SVM and 0.93130 on deep learning. They was preventing the use of deep learning that may be
Table 7 Comparison on datasets of the study area Fayoum city, Egypt: for related studies and the proposed FayPDT dataset Paper/year Learning model No. of Bands Labeled No. of classes Allam et al. (2019) Proposed FayPDT 2022
Supervised learning MLC NVDI Deep learning ANN RFs SVMs
Three bands (RGB)
Yes
4 for MLC 2 for NVDI
Four bands (RGB-near infrared)
No
8
592
R. Mahmoud et al.
Table 8 A comparison on related studies and the proposed study includes: classifier models, and obtained accuracy Acc/model SVM RF ANN Marais et al.
Proposed model
P Recall Fscore P Recall Fscore
Overall accuracy=0.85
0.917814 0.915254 0.914115
0.888842 0.883475 0.883465
– – – 0.974244 0.972054 0.9743
more robust to the lagre size of the satellite images on the other hands, the satellite images that they used have a different number of spectral radar bands. A comparison on accuracy of RFs, SVMs and ANN for theses two studies is addressed in Table 8.
5 Conclusion and Feature Work In this context a novel dataset for LULC is proposed. The novel dataset is built for Fayoum city, Egypt and it is called FayPDT. FayPDT was built on deep learning model like ANN and machine leaning model like SVMs and random forests. This study is conduced on a dataset of High-resolution satellite images with 3.0 m spatial resolution, these images is captured from ScopePlant satellite and it covers 10 km2 of Fayoum city. We define a set of eight LULC classes to generate FayPDT. The defined classes and high resolution images are used to train deep learning models to build and generate land use/cover dataset. So, the proposed deep learning classification model is trained using raster high resolution images and vector data. The proposed FayPDT framework consists of two main phases which are preprocessing, and classification phase. Preproccessing phase concerned with enhancement the input images and creating vector data from raster images, the input data for model is a vector set and satellite images, and classification phase is applying a set of classifier models like SV M, R F, and AN N . The getting results is tested and compared with related studies, and it was observed that ANN achieved high accuracy with 97.1% than other classifiers. The feature work will focus on the prediction and optimization of LULC map generation.
Land Use/Cover Novel Dataset Based on Deep …
593
References 1. Q. Hu, L. Zhen, Y. Mao, X. Zhou, G. Zhou, Automated building extraction using satellite remote sensing imagery. Autom Constr 123, 103509 (2021) 2. C. Marais Sicre, R. Fieuzal, F. Baup, Contribution of multispectral (optical and radar) satellite images to the classification of agricultural surfaces. Int. J. Appl. Earth Observat. Geoinformation 84, 101972 2020 3. C. Zhang, Y. Chen, X. Yang, S. Gao, F. Li, A. Kong, D. Zu, L. Sun, Improved remote sensing image classification based on multi-scale feature fusion. Rem. Sens. 12(2) (2020) 4. H. Krueger, V. Noonan, D. Williams, L. Trenaman, C. Rivers, EuroSAT: a novel dataset and deep learning benchmark for land use and land cover classification. Appl. Earth Observat. Rem. Sens. 51(4), 260–266 (2017) 5. D. Tomè, F. Monti, L. Baroffio, L. Bondi, M. Tagliasacchi, S. Tubaro, Deep convolutional neural networks for pedestrian detection. Sig. Proc.: Image Commun. 47, 482–489 (2016) 6. L. Subirats, L. Ceccaroni, Real-time Pedestrian Detection With Deep Network Cascades, vol. 7094 (Springer, Berlin Heidelberg, 2017), pp. 549–559 7. M. Allam, N. Bakr, W. Elbably, Multi-temporal assessment of land use/land cover change in arid region based on landsat satellite imagery: case study in Fayoum Region. Egypt, Remote Sens Appl: Soc Environ 14, 8–19 (2019) 8. C. Boldt, Object detection networks on convolutional feature maps knowledge representation for prognosis of health status in rehabilitation. Fut. Internet 4(3), 762–775 (2016) 9. H. Lim, Raster data, in Encyclopedia of GIS. ed. by S. Shekhar, H. Xiong (Springer, US, 2008), pp. 949–955 10. D. Schraik, P. Varvia, L. Korhonen, M. Rautiainen, Bayesian inversion of a forest reflectance model using Sentinel-2 and Landsat 8 satellite images. J. Quant. Spectrosc. Radiat. Transfer. 233, 1–12 (2019) 11. V. Gandhi, Vector data, in Encyclopedia of GIS, ed. by S. Shekhar, H. Xiong, X. Zhou (Springer International Publishing, 2017), pp. 2411–2416 12. M. Pal, Random forest classifier for remote sensing classification. Int. J. Rem. Sens. Harlow, 26(1) (2005) 13. S. Tian, X. Zhang, J. Tian, Q. Sun, Random forest classification of wetland land covers from multi-sensor data in the arid region of Xinjiang. J. Rem. Sens. 8(11), 954 (2016) 14. M. Belgiu, L. Dr˘agu¸t, Random forest in remote sensing: a review of applications and future directions. J. Photogrammetry Rem. Sens. 114, 24–31 (2016) 15. D. Liu, J. Chen, G. Wu, H. Duan, SVM-based remote sensing image classification and monitoring of lijiang chenghai, in International Conference on Remote Sensing (Environment and Transportation Engineering, Nanjing, 2012), pp. 1–4 16. A. Kross, E. Znoj, D. Callegari, G. Kaur, M. Sunohara, D. Lapen, H. McNairn, Using deep neural networks and remotely sensed data to evaluate the relative importance of variables for prediction of within-field corn and soybean yields. Rem. Sens. 12(14), 2230
Exploring the Effect of Word Embeddings and Bag-of-Words for Vietnamese Sentiment Analysis Duc-Hong Pham
Abstract In the field of natural language processing, word embeddings and bag of words are two objects used to represent initial features for many machine learning models. Bag of words is known to be widely used and suitable for traditional models such as SVM, logistic regression, random forest. Meanwhile, word embeddings are used suitable for deep learning models that designed based on Convolutional neural network (CNN) and Long short-term memory (LSTM). Many sentiment analysis studies have effectively used word embeddings. However, they have ignored the role of bag of words in their proposed model, this paper proposes a CNN model that simultaneously exploits the effectiveness of word embeddings and bag of words for vietnamese sentiment analysis. In the experiment, we used 4009 actual textual reviews in the data domain of mobile phone products, the experimental results have demonstrated the ability of the proposed model as well as the role of bag of words.
1 Introduction Today, there are many sources of customer-reviewed data about commercial products and services provided online, such as social networks facebook or ecommerce website thegioididong.com, tripadvisor.com, and electronic newspaper vnexpress.net, etc. These contain a lot of useful information for service managers and salespeople as well as new customers. Because, for managers, they really want to know the customer’s opinion of whether the service that they provide is good or not. Good or bad in any aspect, so that they can improve their service or provide more quality products which they currently have. Many problems have been performed to support service managers, such as classifying sentiment and extracting opinions [1, 2], identify comparative textual sentence from review texts [3, 4], extracting and summarizing information from review texts [5, 6]. Some other problems exploit aspects of products and services, such as aspect-based sentiment analysis Social listening system [7, 8], rank and weight mining of aspects [9, 10]. D.-H. Pham (B) Faculty of Information Technology, Electric Power University, Hanoi, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_49
595
596
D.-H. Pham
Over the past 10 years, the problem of sentiment analysis for Vietnamese reviews has attracted the attention of many researchers. Case studies such as [11] using the statistical information of n-grams (bag of words) in textual reviews, Kieu and Pham [12] proposed models based on rules. Besides that, Duyen et al. [13] proposed a method using the combination of support vector machine (SVM) classification and maximum entropy technique to solve the classification task. Trinh et al. [14] built a dictionary containing some sentiment words for a specific domain and used other methods based on dictionaries. Deep learning models used word embeddings as input that give better prediction results, typically [15–17] proposed a model using the combination of LSTM and CNN. In this paper, we propose a CNN model exploiting two common feature objects as word embedding and bag of words. In which, word embeddings are learned from a large data set, and the bag of words is simply determined through the statistics of the frequency of words. The CNN model has proven effective through many natural language processing tasks such as sentence level classification [18], aspect-based sentiment analysis [19], toxic comments classification [20]. However, most of these studies only used word embedding as a starting point of the work and did not consider the combined aspect of single features in a previously constructed and provided bag of words. The remaining structure of the paper is organized into the following sections: In Sect. 2, we present some background knowledge required for this research. Our proposed model is presented in Sect. 3. Section 4 presents data and our experimental results, and some conclusions are presented in Sect. 5.
2 Preliminaries 2.1 Bag of Words In natural language processing, text representation (word level, sentence level, whole text level), the bag of words representation is the most common and simplest. Given a text data set, a bag of words can be constructed by selecting a list of the most influential typical words that appear in the data set. However, this selection way often takes a lot of time and effort of the data maker. A simpler way is that we statistics the frequency of occurrence of words and select the words whose frequency reaches the threshold that we pre-determined. For example, two textual sentences: (1) (2)
A little better battery is good; The vibe’s battery is its weak point.
Based on these two sentences, we build a dictionary containing 11 words: {‘A’: 1, ‘little’: 1, ‘better’: 1, ‘battery’: 2, ‘is’: 2, ‘good’ ‘: 1, ‘The’: 1, ‘vibe’s’: 1, ‘its’: 1, ‘weak’: 1, ‘point’: 1}, the corresponding representation vectors are: (1) (2)
[1], 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]; [0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1].
Exploring the Effect of Word Embeddings …
597
This method also has limitations that we may not be able to choose words with good features but few occurrences.
2.2 Word Embedding Given a large data set, we can easily build a dictionary containing the words that appear in it and represent each word as a one-hot vector, Fig. 1 illustrates the representation vectors for the words ‘strong’ and ‘weak’ that are base on a dictionary of N words. However, with these representations we cannot evaluate the relationship between them, even though, in fact the words have a clear relationship. Furthermore, these vectors are sparse and have high numbers when the number of words in the dictionary is very large. Word embedding plays the role of representing words by vectors of real numbers with low dimensionality and especially can use them to better evaluate the relationship between words. The typical word embedding techniques are such in [21], the common features of the words are embedded and encoded in a continuous vector space with a lower dimension than the original, where the words are semantically similar that are represented as close points. This is identified from based on the idea that words will be shared semantic when they are in their same contexts. Figure 2 is an illustration of the word embedding representation with K dimensions for the words ‘strong’ and ‘weak’, which are vectors of real numbers. Word2vec has two main techniques as the the Skip-Gram model and Continuous Bag-of-Words (CBOW) model. These models use three layers in one neural network that map the words in same contexts to learn weights that are representation vectors of words. CBOW model is learned to predict words in its context from the given Fig. 1 Represents one-hot vectors for the words ‘strong’ and ‘weak’
598
D.-H. Pham
Fig. 2 Represents word embeddings for the words ‘strong’ and ‘weak’
contexts, while the Skip-gram model is learned to predict its contexts from given words.
2.3 Basic Convolutional Neural Network The convolution neural network and LSTM are the two most used architectures in deep learning models. A common convolution neural network, when applied to an input sequence to generate the corresponding feature representation, passes through two layers: convolution and pooling. The general model of a convolution network is shown in Fig. 3. With an input string d that consists of words as w1 , w2 , ..., wn and their corresponding representaions are x1 , x2 , ..., xn . Note that these representations Fig. 3 Illustrates a convolution neural network generates the feature representation of a string
Exploring the Effect of Word Embeddings …
599
can include word embedding vectors or feature vectors represented by a previously defined bag of words. The basic CNN model works as follows: The convolution layer uses x1 , x2 , ..., xn as input and apply convolution operations to generate a new vector sequence y11 , y21 , ..., yn1 according to the following equation: yi1 = f (U.xi:i+h−1 + b), where xi:i+k−1 is the combine of xi , xi+1 , ..., xi+k−1 , and k is the window size for the embedding vectors that need to be combined, f (.) is an element-wise activation v −v ).U ∈ function, such as a non-linear function (we can use f (v) = tanh(v) = eev −e +e−v RC×m and b ∈ RC are parameters of CNN which will be learned in the training phase. The work of collecting the best features is done at the Pooling layer, according to [18, 19, 22] good features are those that reach the max value and they are combined together to form a vector—C dimensions with the following formula: 1 1 1 ), max(yi2 ), ..., max(yiC )], y 2 = [max(yi1
where yi1j is the jth dimension of the vector yi1 . The vector y 2 ∈ R C is considered as the output of the basic CNN.
3 Proposed Method Different from common CNN models, our proposed model can be seen as a hybrid model between two different CNN networks. For input as a textual review, we first take its representation in the form of a word embedding matrix and a feature vector represented by a bag of words. Then one CNN is applied to the word embedding matrix and another CNN is applied to the feature vector. In each CNN, the convolution operations along with its filters are performed and the max pooling operation is applied to filter for the best features. These features are combined together to form a vector and serve as input to the next layers of the neural network. Figure 4 illustrates our proposed model, where DNN (Deep Neural Network) is a multi-layer model that uses vector as input and whose output is a global representation vector of textual review. This vector continues through a softmax function to compute (predict) the opinion label for the original textual review. Given a dataset of labeled textual reviews, using the filter matrices together with the CNN operations presented in the previous paragraph and the parameters of the appropriate DNN model, we will build an error function for the proposed model. The model is learned by applying the back-propagation algorithm to minimize the function. Finally, the model’s learned parameters will be used to predict labels for new textual reviews.
600
D.-H. Pham
Fig. 4 Illustrates the proposed model
4 Experiments 4.1 Data and Preprocessing We use the data set containing 4009 textual reviews of mobile phone products (i.e. Samsung, Nokia, iPhone,…) which are collected from the electronic newspaper website vnexpress.net, in which each textual review is labeled as 0 or 1. Two people are selected to perform this labeling. 1 means that the review text content is rated positively by the product users, 0 means that the content of review is negative. These reviews are performed preprocessing as follows: Using a list of original words to remove stop words, non-sense words, and special characters which do not affect to contents of textual reviews. Uppercase words are also converted to lowercase. Then a tokenization is developed based on a dictionary containing 46,659 bigrams and 25,327 trigrams, and the long matching algorithm that is used to split words for each textual title. Table 1 shows the statistics of the number of specific reviews used in the experiment.
Exploring the Effect of Word Embeddings … Table 1 Textual review data statistics
Number of textual reviews
601 4009
Number of textual reviews as positive
2628
Number of textual reviews as negative
1381
Average number of words in a textual review
13
4.2 Word Embeddings and Bag of Words We use a data set containing 50 M textual sentences which are splited and filtered from product descriptions and textual reviews. These documents are collected from websites as sendo.vn and vnexpress.net. The continuous bag-of-words (CBOW) model of Word2Vec in the Gensim tool is selected for learning word embeddings. Contexts have a window size as 6, the threshold of term frequency as 3 and the dimension of word embedding is 50. The word bag is determined simply by counting the number of words that appear in the product review text data set. Words with a frequency greater than 3 will be selected. In addition, we also use words with a frequency of occurrence less than 3, but when observed with the naked eye, they have an effect in the experimental data set. The total number of words in the bag of words is 6867 words.
4.3 Experimental Result The experimental data set is randomly selected 80% for the model learning and the remaining 20% is used for the model evaluation. We use the Keras open source code to implement the proposed model with statistical parameter types as shown in Table 2. Accordingly, CNN works with input as word embeddings whose length size of each textual review is 30 and the number of convolution operations used to perform on it as 3 (conv2d_0, conv2d_1, conv2d_2) with sliding window sizes of 2, 3, and 4 respectively. Each convolution operation uses 512 filters. CNN works with bag of words has one convolution operation (conv1d) using 50 filters on feature vector of length 6836 dimensions, sliding window of size 5. Operations (MaxPooling1D, MaxPooling2D) are used to collect the best features. These features are then combined together to make the final feature vector representing the input textual review. There are a total of 21,253,828 parameters learned from the proposed model. After learning the model successfully, we use the test dataset to make predictions along with measures including Precision, Recall, and F1-Score to report the results. Table 3 is the detailed prediction results on two classes of the proposed model, we see that the prediction results on class 1 (positive) are achieved slightly more than 90%, while on the class (negative) the results are better. are 82.44, 81.85 and 82.14%, respectively. In order to compare the proposed model with other models, we use Logistic regression, SVM, Random forest, CNN, LSTM, and CNN + LSTM models. All of
602
D.-H. Pham
Table 2 Statistics of the types of parameters of the proposed model Layer (type)
Output shape
Param #
input_1 (InputLayer)
[(None, 30)]
0
Connected to
embedding (Embedding)
(None, 30, 50)
343,350
input_1[0][0]
reshape (Reshape)
(None, 30, 50, 1)
0
embedding[0][0]
input_3 (InputLayer)
[(None, 6867, 1)]
0
conv2d (Conv2D)
(None, 29, 1, 512)
51,712
reshape[0][0]
conv2d_1 (Conv2D)
(None, 28, 1, 512)
77,312
reshape[0][0]
conv2d_2 (Conv2D)
(None, 27, 1, 512)
102,912
reshape[0][0]
conv1d (Conv1D)
(None, 6863, 50)
300
input_3[0][0]
max_pooling2d (MaxPooling2D)
(None, 1, 1, 512)
0
conv2d[0][0]
max_pooling2d_1 (MaxPooling2D)
(None, 1, 1, 512)
0
conv2d_1[0][0]
max_pooling2d_2 (MaxPooling2D)
(None, 1, 1, 512)
0
conv2d_2[0][0]
max_pooling1d (MaxPooling1D)
(None, 3431, 50)
0
conv1d[0][0]
concatenate (Concatenate)
(None, 3, 1, 512)
0
(max_pooling2d[0][0], max_pooling2d_1[0][0], max_pooling2d_2[0][0])
flatten_1 (Flatten)
(None, 171,550)
0
max_pooling1d[0][0]
flatten (Flatten)
(None, 1536)
0
concatenate[0][0]
dense (Dense)
(None, 120)
20,586,120
flatten_1[0][0]
dropout (Dropout)
(None, 1536)
0
flatten[0][0]
dense_1 (Dense)
(None, 80)
9680
dense[0][0]
concatenate_1 (Concatenate)
(None, 1616)
0
(dropout[0][0], dense_1[0][0])
dense_2 (Dense)
(None, 50)
80,850
concatenate_1[0][0]
dense_3 (Dense)
(None, 30)
1530
dense_2[0][0]
dense_4 (Dense)
(None, 2)
62
dense_3[0][0]
Total params: 21,253,828 Table 3 The evaluation results of the proposed model on each class (label)
Class
Precision
Recall
F1-Score
0
0.8244
0.8185
0.8214
1
0.9025
0.9060
0.9042
Exploring the Effect of Word Embeddings …
603
Table 4 Compare the proposed model with other methods Input
Method
Precision
Recall
F1-Score
Bag of words
Logistic regression
78.39
81.01
79.68
Word embeddings
Bag of words + word embeddings
SVM
83.82
82.35
83.07
Random forest
81.31
79.02
80.15
CNN
84.82
80.89
82.81
CNN
86.13
81.27
83.63
LSTM
85.38
83.05
84.20
CNN + LSTM
82.79
82.28
83.55
Our proposed model
86.35
86.23
86.28
these models are also installed based on the Keras engine. Note that the CNN, LSTM, CNN + LSTM models have the same number of layers as the ones selected in the proposed model. Table 4 shows the results obtained for each model. As a general observation, we find that the group using word embeddings as input achieves better results than the group using the bag of words. Considering each group separately, in the group of models using only Bag of words as input, the SVM model achieved the best results on the F1-Scorem measure, followed by CNN. In the group of models using word embeddings, the LSTM model achieved the highest results, 84.20%. The CNN + LSTM model is a combination of two CNN and LSTM models, but the results are lower than that of each model alone. The proposed model using a combination of CNN on Bag of words and Word embeddings gives the best results on all three measures of Precision, Recall, F1Score of 86.35, 86.23, and 86.28 respectively. Although using CNN alone on Bag of words or Word embeddings both give lower results than the LSTM model, but when combined together, they give better results. All of these have shown the important role of the proposed model which use of bag of words in the CNN model.
5 Conclusion In this paper, we have proposed a CNN model exploiting the efficiency of two most popular feature objects today, word embeddings and bag of words. The experimental results have demonstrated the role of each type of feature as well as the combination between them. We have also clearly seen the appropriateness of the CNN model when using feature extraction and representation. In the future, we plan to use these types of features on larger data sets and in other problems such as aspect-based sentiment analysis, text search, and automatic question answer suggestions for question and answer system.
604
D.-H. Pham
References 1. K. Dave, S. Lawrence, D.M. Pennock, Mining the peanut gallery: opinion extraction and semantic classification of product reviews, ın Proceedings of WWW (2003), pp. 519–528 2. A. Devitt, K. Ahmad, Sentiment polarity identification in financial news: a cohesion-based approach, ın Proceedings of ACL (2007), pp. 984–991 3. N. Jindal, B. Jindal, Identifying comparative sentences in text documents, in Proceedings of SIGIR (2006), pp. 244–251 4. H. Kim, C. Zhai, Generating comparative summaries of contradictory opinions in text, in Proceedings of CIKM (2009), pp. 385–394 5. B. Liu, M. Hu, J. Cheng, Opinion observer: analyzing and comparing opinions on the web, in Proceedings of WWW (2005), pp. 342–351 6. L. Zhuang, F. Jing, X. Zhu, Movie review mining and summarization, in Proceedings of CIKM (2006), pp. 43–50 7. L.L. Phan, P.H. Pham, K.T.-T. Nguyen, T.T. Nguyen, S.K. Huynh, L.T. Nguyen, T.V. Huynh, K.V. Nguyen, SA2SL: from aspect-based sentiment analysis to social listening system for business ıntelligence (2021) 8. S., M.C., A., M., A., C.L., Improving customer relations with social listening: a case study of an American academic library. Int. J. Cust. Relation. Market. Manage. 8(1), 49–63 (2017) 9. D.-H. Pham, A.-C. Le, Learning multiple layers of knowledge representation for aspect based sentiment analysis. Data Knowl. Eng. (2018), pp. 26–39 10. S. Ghosal, A. Jain, S. Sharma et al., ARMLOWA: aspect rating analysis with multi-layer approach. Prog. Artif. Intell. 10, 505–516 (2021) 11. X.B. Ngo, M.P. Tu, Leveraging user ratings for resource-poor sentiment classification. Procedia Comput. Sci. 60, 322–331 (2015) ˙ 12. B.T. Kieu, S.B. Pham, Sentiment analysis for vietnamese, in 2010 Second International Conference on Knowledge and Systems Engineering (KSE) (2010), pp. 152–157 13. N.T. Duyen, N.X. Bach, T.M. Phuong, An empirical study on sentiment analysis for viet˙ namese, in 2014 International Conference on Advanced Technologies for Communications (ATC) (Hanoi, Vietnam, 2014), pp. 309–314 14. S. Trinh, L. Nguyen, M. Vo, P. Do, Lexicon-based sentiment analysis of facebook comments in Vietnamese language, in Recent developments in intelligent information and database systems (Springer, 2016), pp. 263–276 15. Q.-H. Vo, H.-T. Nguyen, B. Le, M.-L. Nguyen, Multi-channel lstm-cnn model for viet˙ namese sentiment analysis, in 2017 9th International Conference on Knowledge and Systems Engineering (KSE) (2017), pp. 24–29 16. D. Nguyen, K. Vo, D. Pham, M. Nguyen, T. Quan, A deep architecture for sentiment analysis of news articles, in International Conference on Computer Science, Applied Mathematics and Applications (Berlin, Germany, 2017), pp. 129–140 17. K. Vo, T. Nguyen, D. Pham, M. Nguyen, M. Truong, D. Nguyen, T. Quan, Handling negative mentions on social media channels using deep learning. J. Inf. Telecommun. 3(3), 271–293 (2019) 18. Y. Kim, Convolutional neural networks for sentence classification, in Proceedings of EMNLP (2014), pp. 1746–1751 19. D.-H. Pham, A.-C. Le, Exploiting multiple word embeddings and one-hot character vectors for aspect-based sentiment analysis (2018), pp.1–10 20. M.I. Pavel, R. Razzak, K. Sengupta, M.D.K. Niloy, M.B. Muqith, S.Y. Tan, in Inventive Computation and Information Technologies. Lecture Notes in Networks and Systems, ed. by S. Smys, V.E. Balas, K.A. Kamel, P. Lafata, Toxic comment classification ımplementing CNN combining word embedding technique, vol. 173 (Springer, Singapore, 2021) 21. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words ˙ and phrases and their compositionality, in Advances in Neural Information Processing Systems, vol. 2 of NIPS’13 (USA, Curran Associates Inc, 2013), pp. 3111–3119
Exploring the Effect of Word Embeddings …
605
22. R. Collobert, R. Weston, A unified architecture for natural language processing, in Proceedings of the ICML (2008), pp. 160–167
A Portable System for Automated Measurement of Striped Catfish Length Using Computer Vision Le Hong Phong, Nguyen Phuc Truong, Luong Vinh Quoc Danh , Vo Hoai Nam, Nguyen Thanh Tung, and Tu Thanh Dung
Abstract In fish farming, regular measurement of fish length is a necessary work to evaluate the growth and health status of fish. However, manual measurement of fish length using measuring board is a time-consuming task and subject to error due to human bias factor. In this paper, the authors present the use of image processing techniques and smartphone cameras for automated measurement of striped catfish length. The system was designed to be portable by taking advantage of laptop computers and high-performance cameras found on most smartphones. The image processing algorithms include the following steps: take image of fish, convert image to binary one, find contours and draw masks, extract the region of interest, determine fish orientation, find the position of fish’s caudal peduncle, and calculate fish length. The system algorithms were implemented in Python programming language in combination with the OpenCV library. Experimental validation with 30 striped catfish samples shows that the designed system can provide an average accuracy of 97.71%. The proposed method could be a potential alternative of fish length measurement to support fisheries industry.
1 Introduction In fish farming, regular measurement of fish length is a necessary work to evaluate the growth and health status of fish. In addition, fish length data could be useful information to determine the amount of feeding, and perform grading or sorting in the fishery industries [1]. Presently, fish farming industry is still using measuring wooden
L. H. Phong · L. V. Q. Danh (B) · V. H. Nam · N. T. Tung College of Engineering, Can Tho University, Can Tho City, Vietnam e-mail: [email protected] N. P. Truong Vinh Long University of Technology Education, Vinh Long Province, Vietnam T. T. Dung College of Aquaculture and Fisheries, Can Tho University, Can Tho City, Vietnam © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_50
607
608
L. H. Phong et al.
boards or acrylic plastic rulers to measure fish size. However, manual measurement of fish length is a time-consuming task and subject to error due to human bias factor. Several researches have been implemented to apply computer vision of image processing for automated fish length measurement [2–11]. For example, the authors in [2] introduced the use of advanced techniques including homomorphic filtering, contrast limited adaptive histogram equalization and guided filtering for fish image enhancement. Regional convolutional neural networks were employed to obtain better prediction of fish body position in an image [3]. Deep learning techniques were also employed to locate positions of landmarks in fish image for improved fish size estimation [4]. Besides, the authors in [5] proposed the use of a vision system with dual synchronized orthogonal webcams for better tracking of free-swimming fish activities. In [6], the laser fiducial markers were employed to reduce error in fish size measurement for single camera photogrammetry. However, most advanced methods are expensive as they require complex setup. Although some recent works have introduced simple and inexpensive systems for automated fish length measurement in aquaculture sector [7–11], the use of high-end digital cameras in the measurement systems makes these methods less attractive for small-scale or commercial fisheries. Therefore, there is still a demand for simpler and more effective methods to measure fish size. In Vietnam, striped catfish (Pangasianodon hypophthalmus) is one of the main freshwater aquaculture products of Mekong Delta region with total export turnover of about 2 billion USD in 2019 [12, 13]. The aim of this work is to develop a portable and low-cost system for automated measurement of striped catfish length using computer vision. The advantages of the proposed approach are achieved by the use of high-quality cameras available on most now-ubiquitous smartphones and the open-source computer vision library OpenCV [14].
2 Methodology 2.1 System Overview Figure 1 shows the principle diagram of the proposed computer vision system for automated measurement of fish length. The system consists of a laptop computer, a smartphone, and a styrofoam box. The fish is placed at the bottom of the box and picture of the fish is taken from the top of the box using a smartphone camera. The images are then transferred to the computer via a USB cable with the help of the DroidCamX application software [15] installed on the smartphone. The image processing algorithms are implemented in Python programming language (version 3.10.0) [16] and the OpenCV library (version 4.5.4.60) and installed on the Microsoft Windows-based computer. The measurement of fish length will be displayed on computer screen.
A Portable System for Automated Measurement …
609
Fig. 1 The principle diagram of the designed system for automated fish length measurement
2.2 Image Collection The fish used in the experiment were purchased from a catfish farm in Phu Tan District, An Giang Province. A total of 30 images of striped catfish fingerlings were gathered using a 16-Megapixel camera on the Vsmart Aris smartphone [17]. The distance between the camera and the fish is chosen to achieve the best image quality, which is fixed at about 23 cm. The captured images with dimension of 4608 × 3456 pixels are resized to 800 × 600 pixels with PNG format before being processed. This allows both fast processing speed and reasonable measurement accuracy.
2.3 Image Processing Algorithms Figure 2 shows the flowchart of the image processing algorithms of the designed system. The algorithms consists of 6 processing steps: convert image to binary one, find contours and draw masks, extract the region of interest, determine fish orientation, find the position of fish’s caudal peduncle, and calculate fish length.
2.3.1
Convert Image to Binary One
In this step, the original image is converted to binary one. The RGB image is first converted to grayscale image using the cv2.cvtColor function in OpenCV library. The grayscale image is then blurred to filter out noises and smooth edges using cv2.GaussianBlur function. Finally, the adaptive thresholding method is applied to transform grayscale image to binary one using the cv2.adaptiveThreshold function. Figure 3 shows the results of the RGB to binary conversion.
610
L. H. Phong et al.
Fig. 2 The flowchart of the image processing algorithms
Fig. 3 Conversion from RGB to binary image. a Original RGB image; b Grayscale image; c Blurring; and d Binary image
A Portable System for Automated Measurement …
611
Fig. 4 Image masking. a Finding contours; b Filtering noise; c Drawing masks
2.3.2
Find Contours and Draw Masks
In this step, cv2.findContours function is applied to detect objects in the binary image. To eliminate the presence of noises in the image, the contours.sort_contours function of the Imutils library [18] is used to arrange contours in order and remove unwanted objects having small area. For image masking, the numpy.zeros function of Numpy library [19] and the cv2.drawContours function to create image mask. The results of the image masking process are shown in Fig. 4.
2.3.3
Extract the Region of Interest
The region of interest (RoI) is the part of the image that contains fish. The determination of RoI also helps to reduce processing time. An RoI with rectangular shape is extracted using the max and cv2.boundingRect functions of OpenCV. The results of extracting the RoI are depicted in Fig. 5.
612
L. H. Phong et al.
Fig. 5 Extracting the region of interest. a Binary image; b Original RGB image
Fig. 6 Result of changing fish orientation. a The RoI of binary image; b The original image
2.3.4
Determine Fish Orientation
In this study, it is assumed that the fish body is placed in horizontal direction. The purpose of this step is to transform the RoI of the binary image so that the fish lies horizontally with its head facing to the left. The cv2.flip() function of OpenCV library will be used to flip the image in the case of an image with fish head facing to the right. Figure 6 shows the results of applying this transformation.
2.3.5
Find the Position of fish’s Caudal Peduncle
There are three commonly used metrics for measuring fish length: standard length, fork length, and total length [20]. In this work, the length measured from the tip of the fish’s nose to the narrowest part of its caudal peduncle is used in measuring striped catfish. Figure 7 describes the method for determining the position of the narrowest part of the fish’s caudal peduncle. In this way, the RoI of the fish image from the position of 0.5L to 0.9L (with L is the fish length) is scanned to find the smallest value of the body width. A function called width_measuring is created to calculate the distance between the first white pixel and the last one in each image column over the part of 0.5L ÷ 0.9L. Location of the image column having the smallest value of distances will provides information about the position of the narrowest part of the
A Portable System for Automated Measurement …
613
Fig. 7 Description of the method for finding the narrowest position of fish’s caudal peduncle
Fig. 8 The flowchart of the width_measuring function
fish’s caudal peduncle. The flowcharts of the above algorithms are shown in Figs. 8 and 9. The results of this processing step are depicted in Fig. 10.
2.3.6
Calculate Fish Length
In this stage, the distance.euclidean function of the SciPy library [21] is used to calculate the distance from the fish’s nose to the position having the smallest width determined in the previous step. This distance is the fish length to be measured. The fish length is calculated using the following formula: Length_ cm = Length_ px ∗ A/W
(1)
614
L. H. Phong et al.
Fig. 9 The algorithm for finding the position of the narrowest part of fish’s caudal peduncle
Fig. 10 Location of the narrowest part of the fish’s caudal peduncle is marked (by blue line) in a the RoI of binary image and b the original image
where, Length_cm: fish length measured in centimeters. Length_px: fish length measured in pixels. A: the distance between the two edges of the image in centimeters. W: the width of the image in pixels. In this experiment, the values of A and W are 20 cm and 800 pixels, respectively. Figure 11 shows the calculation results of a typical fish with a length of 394.08 pixels or 9.852 cm.
A Portable System for Automated Measurement …
615
Fig. 11 Fish length measurement a in pixels and b in centimeters
3 Experimental Results To validate the effectiveness of the proposed method, a total of 30 images of striped catfish fingerlings were taken for this experiment. The fish under test is placed at the bottom of a styrofoam box having a dimension of 31 × 22 × 24 cm (Fig. 12). Because the striped catfish has a dark color, the background of the fish images was chosen to be white to easily detect the fish shape in the image processing algorithms. After starting the system program, users can carry out new measurements by pressing the spacebar button. Figure 13 shows the results of a typical measurement with fish length displayed in centimeters and execution time in seconds. The fish length measurements taken by the automated system are compared with the manual measurements. Table 1 shows the data of manual measured length and automated measurement, absolute error, relative error, and accuracy of the length measurements obtained from 30 fish samples. The experimental results show that the designed system can provide an average accuracy of 97.71%. The average processing time for a sample of fish is less than 0.2 s. Fig. 12 Fish sample placed in the styrofoam box for image acquisition
616
L. H. Phong et al.
Fig. 13 Fish length measurement displayed on computer screen
4 Conclusions and Future Work This paper introduced the implementation of a computer vision system for automated striped catfish length measurement. The system was designed to be portable by taking advantage of laptop computers and high-performance cameras found on most smartphones. This provides users with the ability to conveniently take measurements anywhere. The proposed method could be a potential alternative of fish length measurement to support fisheries industry. For future, the designed algorithm implementations can be ported to mobile platforms. In addition, the system can be equipped with conveyors and electronic control circuits for application in automatic fish sorting.
A Portable System for Automated Measurement …
617
Table 1 Experimental results of fish length measurement No. of fish
Manual measurement, mi (cm)
1
10.00
2
11.80
3
12.30
4
10.30
5 6
Automated measurement, si (cm)
Absolute error, |si –mi | (cm)
Relative error, |si –mi |/mi (%)
Accuracy (%)
9.85
0.15
1.48
98.52
11.98
0.18
1.54
98.46
12.16
0.14
1.12
98.88
10.19
0.11
1.09
98.91
11.00
10.96
0.04
0.40
99.60
11.90
12.00
0.09
0.80
99.20
7
12.60
12.72
0.12
0.93
99.07
8
11.70
11.90
0.20
1.72
98.28
9
12.70
12.98
0.28
2.20
97.80
10
12.80
12.98
0.17
1.37
98.63
11
9.00
8.75
0.25
2.79
97.21
12
10.20
10.87
0.67
6.58
93.42
13
10.90
10.20
0.70
6.39
93.61
14
8.80
8.69
0.11
1.24
98.76
15
12.10
11.71
0.39
3.25
96.75
16
10.20
9.94
0.26
2.58
97.42
17
10.10
9.84
0.26
2.62
97.38
18
10.70
10.75
0.05
0.50
99.50
19
8.80
8.71
0.09
1.07
98.93
20
9.80
9.86
0.06
0.58
99.42
21
9.70
9.88
0.18
1.88
98.12
22
8.40
8.75
0.35
4.13
95.87
23
9.30
9.64
0.34
3.70
96.30
24
10.00
9.52
0.48
4.79
95.21
25
8.60
8.85
0.25
2.91
97.09
26
9.80
9.62
0.18
1.82
98.18
27
9.40
9.32
0.08
0.81
99.19
28
9.70
9.55
0.15
1.53
98.47
29
8.40
8.07
0.33
3.95
96.05
30
9.50
9.22
0.28
2.99
97.01
0.23
2.29
97.71
Mean value
618
L. H. Phong et al.
References 1. M. Hao, H. Yu, D. Li, The measurement of fish size by machine vision-a review, in 9th International Conference on Computer and Computing Technologies in Agriculture (CCTA) (Beijing, China, 2015), pp.15–32 2. Sánchez-Torres, Ceballos-Arroyo, Automatic measurement of fish weight and size by processing underwater hatchery images. Eng. Lett. 26(4) EL_26_4_09 (2018) 3. G.G. Monkman, K. Hyder, M.J. Kaiser, F.P. Vidal, Using machine vision to estimate fish length from images using regional convolutional neural networks. Methods Ecol. Evol. 10(12), 2045–2056 (2019) 4. N. Petrellis, Measurement of fish morphological features through image processing and deep learning techniques. Appl. Sci. 11, 4416 (2021) 5. A.-J. Qussay, A.-N. Waleed, T. Majid, Y. Iain, An automated vision system for measurement of zebrafish length using low-cost orthogonal web cameras. Aquac. Eng. (2017) http://dx.doi. org/https://doi.org/10.1016/j.aquaeng.2017.07.003 6. G.G. Monkman, K. Hyder, M.J. Kaiser, F.P. Vidal, Accurate estimation of fish length in single camera photogrammetry with a fiducial Marker. ICES J. Marine Sci. 77(6), 2245–2254 (2020) 7. M. Man, N. Abdullah, M.S.M. Rahim, Ismail mat amin: fish length measurement: the results from different types of digital camera. J. Adv. Agric. Technol. 3(1), 67–71 (2016) 8. N.S. Damanhuri, M.F.M. Zamri, N.A. Othman, S.A. Shamsuddin, B.C.C. Meng, M.H. Abbas, A. Ahmad, An automated length measurement system for tilapia fish based on image processing technique, in IOP Conference Series: Materials Science and Engineering, vol 1088 (2021), p. 012049 9. F. Sun, J. Yu, Z. Gu, H. Zheng, N. Wang, B. Zheng, A practical system of fish size measurement, in OCEANS 2017 (Aberdeen, 2017), pp. 1–5 10. M.H. Jamaluddin, C. Siong Seng, A.Z. Shukor, F. Ali Ibrahim, M.F. Miskon, M.S. Mohd Aras, M. Md Ghazaly, R. Ranom, The effectiveness of fish length measurement system using non-contact measuring approach. Jurnal Teknologi 77(20) (2015) 11. D.J. White, C. Svellingen, N.J.C. Strachan, Automated measurement of species and length of fish by computer vision. Fish. Res. 80(2–3), 203–210 (2006) - t 2 tyij USD (Pangasius exports in 2019 reached 2 ´ khâij u cá tra n˘am 2019 da 12. VASEP. Xuât . billion USD). https://vasep.com.vn/san-pham-xuat-khau/ca-tra/xuat-nhap-khau/xuat-khau-catra-nam-2019-dat-2-ty-usd-9580.html. Last accessed 2 Feb 2022 13. FAO. Viet Nam on track for USD 2 billion annual pangasius export target as high prices continue. https://www.fao.org/in-action/globefish/market-reports/resource-detail/en/c/ 1176222/. Last accessed 2 Feb 2022 14. OpenCV library. https://opencv.org. Last accessed 2 Feb 2022 15. DroidCam. https://play.google.com/store/apps/developer?id=Dev47Apps&hl=en&gl=US. Last accessed 2 Feb 2022 16. Python. https://www.python.org/downloads/. Last accessed 2 Feb 2022 17. Vsmart Aris. https://www.vsmart.net/en/aris. Last accessed 19 Feb 2022 18. Imutils library. https://pypi.org/project/imutils. Last accessed 2 Feb 2022 19. Numpy library. https://numpy.org. Last accessed 2 Feb 2022 20. Standard length. https://fishionary.fisheries.org/tag/standard-length/. Last accessed 2 Feb 2022 21. Scipy library. https://scipy.org. Last accessed 2 Feb 2022
IoT Based Automated Monitoring System for the Measurement of Soil Quality Pratoy Kumar Proshad, Anish Bajla, Adib Hossin Srijon, Rituparna Talukder, and Md. Sadekur Rahman
Abstract Agriculture has long been the principal source of income in our country. However, agriculture is being hampered as a result of people migrating from rural to urban areas. The purpose of the study is to make a device that measures the soil and air quality on its own, not by any human action or anything. The device will automatically collect data from the soil and send it over the internet to various devices. This project includes other features like a wireless monitoring system and soil moisture and temperature sensing. This project also includes some comparisons between measured results by excel sheet for better soil quality. The sensors collect the result from the soil and supply the data through the internet. The primary focus of the paper is to demonstrate a project that helps to do farming with advanced techniques. The data obtained from the soil is pushed to the cloud for future analysis. Our motive of this research is to help the farmers and the agricultural officers in soil and weather testing.
P. K. Proshad (B) · Md. S. Rahman Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh e-mail: [email protected] Md. S. Rahman e-mail: [email protected] A. Bajla Department of Material Science and Engineering, Khulna University of Engineering and Technology, Khulna, Bangladesh e-mail: [email protected] A. H. Srijon Department of Industrial and Production Engineering, Ahsanullah University of Science and Technology, Dhaka, Bangladesh e-mail: [email protected] R. Talukder Department of Civil Engineering, Bangladesh Army University of Engineering and Technology, Natore, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_51
619
620
P. Proshad et al.
1 Introduction Trending up in agriculture is a crucial objective as the globe moves toward new technologies and implementations. Many researches already held on this project and different types of research organizations are already working on this. As a result, precision agriculture and farming can be described as the art and science of utilizing technology to boost crop yields [1]. The Internet of Things (IoT) is a network of interconnected devices that are equipped with electronics, software, sensors, and a network connection to collect and exchange data. IoT-based automated system makes a massive change in farming because anybody can get information about different soil parameters without human effort. Soil parameters include temperature, moisture, and water level. Air quality management and climate monitoring are required to reduce polluting gases in the local atmosphere and to improve crop yields in the region [2]. The wireless sensors gather data from the soil and transmit it via wireless protocol. The data will give an idea about different environmental factors for increasing the yield of crops, but many factors make the decrease in productivity. As a result, simply monitoring the soil is not a complete answer for crop yields. So, automation is required for the complete solution. The device must permanently be implanted in the soil to update the measurements. By this, we can overcome some problems by integrating such a system that will update us every time about the soil quality. This will help us to improve farming in every situation. Observing the soil quality through the device makes handling a critical case easier. The growth of plantations depends on photosynthesis methods that depend upon the radiation from the sun. So complete monitoring of soil for temperature and humidity control is required. Because of high humidity, the chances of disease are increasing. Water also affects the growth of crops. This IoT-based automated project for measuring soil quality will help farmers achieve higher crop growth in soil. This paper deals with developing the automated system for observing soil quality and providing the project to the farmers. But everything has a disadvantage; in this project, higher cost network issues are the disadvantages. So, the main obstacle is to build the device at a minimum price because a higher price is not affordable for every farmer. So, the paper also deals with the device to make it affordable for the farmers.
2 Research Motivation The key to good agricultural production is keeping a regular update of the local weather and soil, which is difficult for both the farmer and the agricultural officer because there are many fields that are impossible to test manually, plus we get a good database of the changing climate and soil quality. We’re driven to work on this
IoT Based Automated Monitoring System …
621
project because we want to be able to evaluate the soil and weather around the field automatically and remotely.
3 Literature Review In previous times and nowadays, farmers will use the oldest method to measure soil quality. The oldest and existing method to determine soil quality is the manual method to measure different soil parameters. In the past, different types of soil indicators were also used to measure soil quality [3]. Then, various microbiological indicators were used to measure soil quality [4]. Now, we can determine the soil quality by collecting some soil data using sensors, and it is easier and more accurate. Farmers must take advantage of every chance to improve production efficiency, address other difficulties, and monitor yields. Water stress is one factor that influences crop output and quality. Farmers must keep enough water in the root zone to ensure optimum crop production. For farmers, irrigation has become a vital risk control strategy [5]. Farmers must have a good understanding of soil moisture management before making irrigation management decisions. Irrigation water management and soil moisture control are the most effective ways to regulate root zone soil water. Soil moisture maintenance technology has advanced to the point that it is now a costeffective risk management tool. This step allows you to choose the right crop without having to do anything else. It has been observed that crop growth is delayed and affected [6, 7]. The following parameters affect potential evapotranspiration if water is readily available from soil and plant surfaces: I temperature, (ii) humidity, and (iii) moisture. Water evaporates from the field surface due to two thermal sources: solar radiation and temperature. The aerodynamic forces that influence evapotranspiration are air movement and humidity. The vapor pressure grade of the atmosphere is affected by humidity, and wind mixes and changes the vapor pressure grade. The amount of water that plant roots can absorb is referred to as the total available water capacity. Between the original field capacity and the permanent wilting point water contents, the amount of water available, stored, or released. The Table 1 shows the average quantity of total accessible water in the root zone for various soil types [8].
4 Methodology The goal of this project is to create a system that uses ThingSpeak, an IoT platform, to display sensor data online. The procedure is split into two parts: hardware development and software development. Hardware development entails the creation of circuits and the development of prototypes. Meanwhile, IoT programming, circuit schematic diagrams, circuit simulation, and data collecting are all used by the software [9].
622 Table 1 Moisture content of different types of soil
P. Proshad et al. Soil type
Total available water Total available water (%) (In./ft)
Loamy sand
17
2.0
Loam
32
3.8
Coarse sand
5
0.6
Sandy loam
20
2.4
Sandy clay loam 16
1.9
Fine sand
15
1.8
Silt loam
35
4.2
Peat
50
6.0
Clay
20
2.4
Silty clay
22
2.6
Clay loam
18 s
2.2
Silty clay loam
20
2.4
The system will be able to display the weather condition by analyzing the current weather with the sensor value data by using the sensors to monitor the weather parameters of temperature, humidity, and air quality. All of the data will be controlled by an ESP8266 microcontroller, with the client receiving sensor data from the ESP8266 and displaying it on a serial monitor. This system will also be visible on the ThingSpeak channel, which was designed to make it easier for users to check online. To assure the accuracy of data and weather conditions on current conditions, the data will be examined and compared with Google weather. Without having to verify manually, the Internet of Things (IoT) will connect the system to the user wirelessly and online (Fig. 1). Here we can see in the Dataflow Diagram that the user can trigger the NodeMCU to connect with ThingSpeak using local WiFi and send the sensor data in the channel, which will also show it on the web page and in the ThingSpeak channel private view as well. Besides, the data can be stored and the user can get it anytime.
5 Hardware Components 5.1 ESP8266 NodeMCU The ESP8266 NodeMCU (Espressif module) can be used like Arduino with WIFI connectivity, making our work simple. Tensilica 32-bit RISC CPU Xtensa LX106 Microcontroller is used in the ESP8266 NodeMCU. There are 16 general-purpose input–output pins on the ESP8266 NodeMCU. One analog pin, four SPI communication pins, and two UART interfaces, UART0 (RXD0 and TXD0) and UART1
IoT Based Automated Monitoring System …
623
Fig. 1 Dataflow diagram of the system
(RXD0 and TXD1) (RXD1 and TXD1). The firmware/program is uploaded through UART1. The ESP8266 NodeMCU is an open-source hardware and software platform that is simple to use [10, 11]. The board has an LDO voltage regulator to maintain the voltage stable at 3.3 V, while the ESP8266’s operational voltage range is 3–3.6 V. When the ESP8266 draws up to 80 mA during RF transmissions, it can dependably supply up to 600 mA, which should be more than enough. The output of the regulator is likewise separated off to one of the board’s sides and designated as 3V3. Power can be supplied to external components via this pin. The inbuilt MicroB USB connector provides power to the ESP8266 NodeMCU. Alternatively, the ESP8266 and its peripherals can be powered directly from the VIN pin. The maximum voltage in this scenario is 5 V [12, 13] (Fig. 2).
5.2 DHT11 The DHT11 (digital temperature and humidity sensor) is a must-have. It measures the ambient air with a capacitive humidity sensor and a thermistor and outputs a digital signal on the data pin. It’s easy to use, and we can acquire new data from it once every two seconds. It includes a 4.7 or 10 K resistor that can be used to pull up the data pin to VCC [6]. The sensor includes a separate NTC for temperature measurement and an 8-bit microprocessor for serial data output of temperature and humidity measurements [14]. The sensor comes factory calibrated and is simple to
624
P. Proshad et al.
Fig. 2 ESP8266 NodeMCU (own captured)
Fig. 3 DHT11 sensor (own captured)
connect to other microcontrollers. With an accuracy of 1 °C and 1%, the sensor can monitor temperature from 0 to 50 °C and humidity from 20 to 90% [15] (Fig. 3).
5.3 Soil Mositure Sensor The moisture of the soil is detected with this soil moisture sensor. It determines the volumetric water content of the soil and outputs the moisture level. There are digital and analog outputs on the module, as well as a potentiometer for adjusting the threshold level. It has four pins, VCC, GND, DO, AO. Here VCC pin powers the sensor, GND supplies the Ground, AO is for analog output, and DO is for digital output. It runs on + 5 V [6]. It’s divided into two parts, those are: The Probe: The sensor contains a bifurcated probe with two exposed conductors inserted on the ground or elsewhere to measure moisture content. It works as a variable resistor whose resistance varies with the amount of moisture in the soil.
IoT Based Automated Monitoring System …
625
Fig. 4 Soil moisture sensor (own captured)
The Module: Based on the resistance of the probe, the module will generate an output voltage that will be available on the analog output (AO) pins. The identical signal is delivered to the precision LM393 comparator, which converts it to digital data and makes it available on the digital output (DO) pin. The module contains a built-in potentiometer for adjusting the digital output (DO) sensitivity [14, 16] (Fig. 4).
6 System Architecture The temperature and humidity sensor DHT11, capacitive soil moisture sensor, PC, and ESP8266 NodeMCU module make up the system (Fig. 5). The DHT11 temperature and humidity sensor module provides the Microcontroller with a digital value for temperature and relative humidity. The soil moisture sensor detects moisture in the soil and outputs an analog voltage proportional to moisture content. The output voltage is then converted into soil moisture by the NodeMCU. The data is then transmitted to the ESP8266 through the serial port. It sends the data over the internet to the ThingSpeak server. The data can be accessed remotely once it has been uploaded to the cloud server. The temperature, relative humidity, and soil moisture data can all be found on the ThingSpeak channels [15, 17]. The data sent to the ThingSpeak channel can be accessed and shown in any app or website using API, and we can export the data as a CSV file. The data is displayed like this Figs. 6, 7, and 8. Soil moisture sensors monitor the amount of water in the soil. It is a crucial parameter in agricultural environment studies. Measurement and monitoring of soil moisture are required to know when and how much to water the crops. Here the soil moisture is calculated using the gravimetric method. At first, a soil sample is ovendried to remove the moisture content from the model and weighed. The dry sample
626
Fig. 5 Soil monitoring system diagram (own made) Fig. 6 Humidity measurement
Fig. 7 Temperature measurement
Fig. 8 Moisture measurement
P. Proshad et al.
IoT Based Automated Monitoring System …
627
is then weighed after a small amount of water is introduced. The weight of the dry sample and the weight of the wet sample are then used to compute the moisture. Both soil moisture sensors are put into a dried soil sample, and the resulting voltages are recorded. These voltages are then passed through the ESP8266 NodeMCU, which transforms them to moisture values [7, 18]. In the graphs, Fig. 6, shows us the humidity data we captured while testing. We can see that it was near 32.75 to 33 most of the time. We took the data every 15 s. Figure 7 shows us 72 degrees Fahrenheit most of the time. It was also measered in every 15 s [19]. In Fig. 8, we can see the soil moisture, which is 0, when we inserted the probe into dry soil and more than 0, when we inserted the probe into wet soil. So basically, we can get more perfect data if we average the data we get every 15 s over a longer period of time, like if we took all the data we got in 6 h, we will get a proper value.
7 System Implimentation and Testing The diagram depicts the system’s general design. Using NodeMCU, the data input controller collects soil data via sensors. On a cloud platform, the collected data may be processed, and the results are presented in the form of charts, using CSV files as values stored in the cloud. The user will be able to examine the outcome of the data analysis. In Fig. 9, we are getting real-time sensor values with the time and date in the CSV file. We can also get those values by using the write API key [20]. Here, we have tested our project in Dhaka on 15 May 2021 and got those results. It gave a good result in testing, and the data was accurate. We got about 94% accuracy while we compared it with obtained data from Google and other devices as well. The data flow was flawless as well.
8 Conclusion In agriculture, the technology is used to remotely measure temperature, relative humidity, and soil moisture. The purpose of the inquiry is to determine the temperature and humidity using DHT11 and soil quality using soil moisture sensor, so that we can measure the soil and weather quality and help the farmers by analyzing the data we achieved.
628
P. Proshad et al.
Fig. 9 Exported data as CSV file
Acknowledgements The success of this initiative is primarily dependent on the support and guidance of many others. Md. Sadekur Rahman, Faculty of Computer Science and Engineering, Daffodil International University, has my heartfelt gratitude for his encouragement, support, and direct engagement in the successful completion of this project. Without my parents’ support and supervision, no endeavour at any level can be performed satisfactorily. I’d want to express my gratitude to my parents, who assisted me greatly in obtaining various information, collecting statistics, and coaching me from time to time in the creation of this project. Despite their busy schedules, they provided me with various ideas to complete this project.
References 1. J. Panuska, Methods to Monitor Soil Moisture. University of Wisconsin Extension, Cooperative Extension. Scott Sanford and Astrid 2. M. Monteiro, F. de Caldas Filho, L. Barbosa, L. Martins, J. de Menezes, D. da Silva Filho, University campus microclimate monitoring using IoT, in 2019 Workshop on Communication Networks and Power Systems (WCNPS), 2019. Available: https://doi.org/10.1109/wcnps.2019. 8896242. Accessed 28 January 2022 3. B. Stenberg, Monitoring soil quality of arable land: microbiological ındicators. Acta Agric. Scandinavica Section B: Soil Plant Sci. 49(1) (1999). https://doi.org/10.1080/090647199501 35669
IoT Based Automated Monitoring System …
629
4. O. Heinemeyer, H. Insam, E.A. Kaiser, G. Walenzik, Soil microbial biomass and respiration measurements: an automated technique based on infra-red gas analysis. Plant Soil 116(2) (1989). https://doi.org/10.1007/BF02214547 5. D.V. Wattington, “Soil and Related Problems” alliance of crop, soil, and Environmental Science Societies (1969) 6. N. Gahlot, V. Gundkal, S. Kothimbire, A. Thite, Zigbee based weather monitoring system. Int. J. Eng. Sci. (IJES) 4(4), 61–66 (2022). Accessed 28 Jan 2022 7. A. Sungheetha, R. Sharma, Real time monitoring and fire detection using internet of things and cloud based drones. J. Soft Comput. Paradigm (JSCP) 2(03), 168–174 (2020) 8. V. Kadam, S. Tamane, V. Solanki, Smart and connected cities through technologies (IGI-Global) 9. F. Joe, J. Joseph, IoT based weather monitoring system for effective analytics. Int. J. Eng. Adv. Technol. (IJEAT) 8(4), 311–315 (2022). Accessed 28 Jan 2022 10. R. Kodali, A. Sahu, An IoT based weather information prototype using WeMos, in 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I) (2016). Available: https://doi.org/10.1109/ic3i.2016.7918036. Accessed 28 Jan 2022 11. Southamptonweather.co.uk, 2022. [Online]. Available: http://southamptonweather.co.uk/eva potranspirationinline.php. Accessed: 28 Jan 2022 12. Insight Into ESP8266 NodeMCU Features and Using It With Arduino IDE (Easy Steps). Last Minute Engineers, 2022. [Online]. Available: https://lastminuteengineers.com/esp8266-nod emcu-arduino-tutorial/. Accessed: 28 Jan 2022 13. "Loading", Appchallenge.tsaweb.org, 2022. [Online]. Available: https://appchallenge.tsaweb. org/x/pdf/G1I6R3/esp8266-programming-nodemcu-using-arduino-ide-get-started-with-esp 8266-internet-of-things-iot-projects-in-internet-of-things-internet-of-things-for-beginnersnodemcu-programming-esp8266-_pdf. Accessed: 28 Jan 2022 14. ["DHT11, DHT22 and AM2302 Sensors", Adafruit Learning System, 2022. [Online]. Available: https://learn.adafruit.com/dht. Accessed: 28 Jan 2022 15. "DHT11–Temperature and Humidity Sensor", Components101, 2022. [Online]. Available: https://components101.com/sensors/dht11-temperature-sensor. Accessed: 28 Jan 2022 16. "Soil Moisture Sensor Module", Components101, 2022. [Online]. Available: https://compon ents101.com/modules/soil-moisture-sensor-module. Accessed: 28 Jan 2022 17. "Big Data Analytics for Smart and Connected Cities", Advances in Civil and Industrial Engineering, 2019. Available: https://doi.org/10.4018/978-1-5225-6207-8. Accessed 28 Jan 2022 18. H. Koresh, J. Deva, Analysis of soil nutrients based on potential productivity tests with balanced minerals for maize chickpea crop. J. Electron. Inf. 3(1), 23–35 (2021) 19. P. Darshini, S. Mohana Kumar, K. Prasad, S.N. Jagadeesha, A cost and power analysis of farmer using smart farming IoT system, ın Computer Networks, Big Data and IoT (Springer, Singapore, 2021), pp. 251–260 20. D.D. Sanju, A. Subramani, V.K. Solanki, Smart city: IoT based prototype for parking monitoring and parking management system
Pattern Recognition on Railway Points with Machine Learning: A Real Case Study Alba Muñoz del Río, Isaac Segovia Ramirez, and Fausto Pedro García Márquez
Abstract Railway points are crucial components to ensure the reliability of railway networks. Several types of condition monitoring systems are widely applied in the industry with the aim of early fault detection, since the presence of faults can cause reductions in operational safety, delays and increased maintenance costs. The application of fault detection systems and pattern recognition tools is essential to ensure new improvements in the industry. The novelty proposed in this work is the application of statistical analysis and Machine Learning techniques in power curves defined by the movement of the motors in the opening and closing movements. The Shapelets algorithm is selected for pattern recognition, analyzing curves with abnormal distribution that demonstrate the presence of faults. The results provide high accuracy with performance measures above 90%.
1 Introduction Railway industry are capable of covering long distances with high speed, suitability and reduced CO2 emission [1]. The rail sector can provide significant advantages for the energy sector and for the environment. China, India, Japan and Russia are leading the industry, although new improvements are required to ensure reliability, availability, maintainability and safety of the railway network. The future of railway sector will be determined by the increase in transport demand and its adaptation to achieve competitiveness and technological innovation. Railway transportation systems are expected to grow around 2% between 2020 and 2025, being rail networks expanded by more than 430,000 track kilometres with over 100 trillion passenger-kilometresby A. Muñoz del Río · I. Segovia Ramirez (B) · F. P. García Márquez Ingenium Research Group, Universidad Castilla-La Mancha, 13071 Ciudad Real, Spain e-mail: [email protected] A. Muñoz del Río e-mail: [email protected] F. P. García Márquez e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_52
631
632
A. Muñoz del Río et al.
Fig. 1 Passenger transport activity 2017, 2030 and 2050
2050, an increase of 27% over 2017, see Fig. 1 [2]. New technologies, enhanced reliability and safety, higher level of automation and improvements in infrastructures are required to achieve significant improvements in the railway sector. Current maintenance strategies are based on periodic inspections that cause elevated downtimes and issues in the railway network. Several condition monitoring approaches are widely applied to increase the efficiency of the inspections and the maintenance operations. The automation of fault detection is one of the critical challenges of the railway industry to maximize the railway capacity to achieve economic viability. Railway points are critical components in the availability of the railway infrastructure since they connect different tracks in certain places. Switches are one of the most essential components, presenting two positionings normal and reverse, referred to the normal trajectory of the train or it is deviated to other trajectory with the change of tracks, respectively, controlled by the movement of the blades. These movements are produced in specific areas with the installation of mechanical and electric devices, as it is showed in Fig. 2 [3]. Several types of railway turnouts based on hydraulic, electro-mechanic motors are widely applied in the industry. They must be positioned efficiently, maintaining the positioning while the train is crossing [4]. Railway switches are affected by hard environmental conditions and high downtimes periods caused by several failures with continuous openings and closings, e.g., misalignments, issues in lubrication or high vibration. The appearance of any of these faults affects the performance of the motors, modifying the opening and closing duration. The duration of the movement of the blades is stablished on less than 10 s, although it is variable regarding on the conditions or failures presented in the system [5]. For this reason, railway points require advanced condition-based maintenance to achieve reliable normal and reverse movements. Railway points and switches require high availability and reliability with suitable maintenance operations. The data acquisition and the post-processing of the data
Pattern Recognition on Railway Points …
633
Fig. 2 Diagram of railway point. Adapted from [6] Stock Rails
Switch Blades Slide Chairs
Railway Point Drive Arm Detector Rods
allow condition-based maintenance and fault detection. Several types of Condition Monitoring Systems (CMS) are installed in railways systems to acquire reliable data about the real state of critical components. The main parameters widely acquired in the railway points are vibration, opening and closing periods, electronic parameters such as current and voltage, among others. The power curves summarise the information on voltage and current provided by the different CMS, being one of the most representative graphs of the real state of the motors. Hard environmental conditions and high railway traffic may cause undesired vibration signal and noises that affects the signal dataset acquired from CMS. The maintenance operations are widely based on thresholding limits that activate alarms when the failure is produced, reducing the preventive maintenance operations [7]. Fault detection algorithms and novel classification techniques are required to ensure suitable maintenance management according to proper levels of reliability, availability, maintainability and suitability [8]. Machine Learning algorithms have proven to be extremely useful in a variety of fields [9–12]. The application of novel algorithms for data classification, signal processing and data mapping is a fundamental phase for fault detection [13]. Several studies are found in the literature about data processing applying statistical analysis, wavelet analysis and Machine Learning techniques, among others. Karman filters are applied by García et al. [14], detecting 100% of faults in the reverse movement and 97.1% in the normal movement. Novel robust models have been developed to obtain reliable references for fault detection [15, 16]. Kim et al. [17] proposed dynamic time warping for data classification of standard form of current curves. This approach can identify normal and faulty curve with high accuracy. The electric current and voltage of the motors have been introduced in different research studies. García et al. [18] analyzed the force and the electric consumption
634
A. Muñoz del Río et al.
of the motors using continuous polynomial B-spline functions. The method showed robust results, detecting the 80% of the specific type of fault, and the presence of anomalies is detected the 100% of the cases. Several authors combined Support Vector Machine (SVM) with other machine learning techniques to increase the reliability of the results, optimizing the fault detection process [19]. A novel CMS based on sound analysis is proposed by Yongkui et al. [20]. They proposed an efficient feature extraction with SVM, showing the best performance in comparison with traditional techniques. The combination of SVM with statistical tools can detect failures in more than 80% of the cases [21]. The application of SVM with neural networks is applied to define prediction models with reduced error rates [22]. Asada et al. [23] detected misalignment faults combining SVM with wavelet transforms, analyzing voltage data. Zhang [24] applied neural network to analyze variations in the current curve. The author proposed basic tests, obtaining accurate results around 100%. Traditional Machine Learning algorithms require high computational costs and training with limited accuracy, being required novel techniques to improve the reliability of the analysis. Shapelets algorithm is one of the most relevant techniques for time series and pattern recognition. Different applications of the shapelet algorithm have been found in the literature, however, in no case it has been applied to the analysis of power curves. Several authors have demonstrated the reliability of the technique for time series analysis and pattern recognition [25, 26]. Hills et al. [27] applied shapelets algorithm for a specific time series problem. The authors quantify the measures of the shapelets and the shapelet selection, presenting a real case study of image classification and providing results with high accuracy. Ji et al. [28] used shapelet algorithm for time series classification. The authors introduced the main concepts about shapelets, presenting a real case study with different datasets. The results showed high accuracy with higher speed than other time series classification techniques. The novelty presented in this paper is the application of shapelets algorithms for fault pattern recognition of railway switches, analyzing the power curve defined by the motors in the opening and closing movements. The shapelet algorithm has already been applied for different application fields, and the novelty of this paper does not lay on the application of shapelets algorithm but on the whole methodology that analyzes patterns associated to failures in the power curves of the motors. This paper is organized as follows: Sect. 2 presents the approach and the main concepts of the algorithm applied for this paper; Sect. 3 presents a real case study applying shapelets; Sect. 4 presents the results of shapelets algorithm and Sect. 5 summarizes all the results and conclusion of the paper.
2 Approach After reviewing the current state of the art in railway switch fault detection, it is determined that SVM is one of the most applied techniques although novel techniques
Pattern Recognition on Railway Points …
635
with higher reliability and speed are required. The aim of this paper is the detection of faults in the switches of the railway points through the analysis of power curves with new machine learning technique not applied until now in this application field. These curves have been selected for this approach because of their reliability and the possibility to find patterns associated with failures in them. The approach described in this work applies the machine learning shapelets algorithm to detect patterns associated to faults, as it is observed in. Figure 3. The first phase is the sensing and data acquisition process of real data. A signal pre-processing is applied to obtain the normal curves associated to habitual movements without failure. The feature extraction and pattern recognition are developed with Shapelets algorithm. It is necessary to know the values of a normal power curve to identify faults in the electrical system. For this purpose, a statistical analysis of all power curves available in the data is carried out. One curve is selected from the available data as a reference pattern for a normal curve once the normal values of a power curve are known. The Pearson correlation coefficient is used to classify the rest of the data curves into normal and non-normal from the normal curve selected as a reference. Pearson’s correlation coefficient is used as a measure of linear dependence between two variables X and Y. The coefficient is given by (1), where cov (X,Y) is the covariance of X and Y and var (X,Y) is the variance of X and Y and the variables must be quantitative. (X, Y ) = √
cov(X, Y ) . var (X )var (Y )
(1)
Once the Pearson correlation coefficient has been applied, a labelled dataset of curves classified as normal and non-normal is obtained. This dataset is used to train the selected Machine Learning algorithm for pattern recognition. The term pattern
Fig. 3 Block schematic of proposed approach
636
A. Muñoz del Río et al.
recognition is used for this paper in accordance with its formal definition as the discipline of classifying objects into a number of categories or classes [29]. In this particular study, the objects are presented in the form of data structured as time series that are classified into different classes. Shapelets algorithm has been chosen because it is a data mining algorithm used for time series classification with high efficiency in the classification of different patterns. This algorithm is based on the use of subsequences or patterns of the time series called shapelets. A shapelet is discriminant if it occurs in the majority of the time series in a class. The algorithm uses the shapelets extracted from the training to identify the most discriminant shapelet in the time series for classification. The time series analysed is tagged with the label corresponding to the most discriminant shapelet. A shapelet is defined as a subsequence of the time series that characterises a class. The definition of the distance between a shapelet and the pattern is required to quantify the similarities between them. The Euclidean distance is usually applied between the pattern x to be analysed and the shapelet s. It is calculated as described in (2), where x(t,t+l) is the subsequence of x, starting at time index t and ending at t + l. d(x, s) = minx(t, t+l) − s2 t
(2)
The k-fold cross-validation method is used to validate the results of the algorithms. This method repeats the training and testing phase of the algorithm k times with different partitions of the original dataset. The objective is to verify that the obtained results are not dependent on the partition of the data used for training and testing. The final result is calculate d as the average of the results obtained in each iteration. The results of the algorithm are evaluated on the basis of the well-known performance measures accuracy, recall and precision. These measures are calculated by (3)–(5), where true positives (TP) and true negatives (TN) are the items correctly classified as positive or negative by the algorithm. False positives (FP) and false negatives (FN) are defined as the elements wrongly classified by the algorithm in the opposite class. Accuracy =
TP +TN T P + FP + T N + FN
Recall =
TP T P + FP
Precision =
TP T P + FN
(3) (4) (5)
Pattern Recognition on Railway Points …
637
3 Case Study This paper presents a real case study analyzing power curves recorded on several railway points in Spain. The power curves represent the amperage evolution over time, providing characteristics patterns in normal and reverse movements. This analysis focuses on data for a full month, since two critical maintenance activities were carried out in that month. The original dataset has been provided by an international company and it consists of 1,048,576 records of engine power of real trains collected every 10 ms over the course of a month. The direction of point movement and other values related to engine power are also collected in the original dataset. The recording is carried out using CMS, applyingº the engine power curves as the object of study in this analysis. The curves are statistically analysed, obtaining an average normal length of the curves of 270 values and an average value of 2.72A. A normal curve without detectable alterations is selected from the available data, showing a suitable shape with respect to a standard power curve without defects. The selected normal curve is plotted using the real dataset, as it is shown in Fig. 4, and it is used as a reference pattern to classify the remaining curves of the dataset. Pearson’s correlation coefficient is applied to classify the remaining curves into normal and non-normal. High values of the coefficient are tested according to the literature between 0.8 and 0.9. These results determine an accurate classification in the selection of normal curves, causing a large number of non-normal curves that are very similar to the normal curve. However, the aim of this study is the detection of nonnormal curves that are distinctive and differentiated from the normal curve to detect patterns associated to failures. For this purpose, a Pearson correlation coefficient
Fig. 4 Normal power curve selected for the analysis
638
A. Muñoz del Río et al.
lower than 0.7 is adjusted to obtain a classification with differentiated non-normal curves. A labelled dataset is generated to train Machine Learning algorithms from this classification after applying the Pearson correlation coefficient. This dataset contains 93% normal curves and the remaining 7% are classified as non-normal curves. The normal curves composing the dataset are represented in Fig. 5a, showing similar shape and longitude that the reference curve determined in Fig. 4, and only a few
Fig. 5 a Normal curves. b Non-normal curves
Pattern Recognition on Railway Points …
639
curves seem to be deviated from the normal distribution. Non-normal curves are plotted in Fig. 5b, where all curves have no similarities either in values or in length with respect to the initial pattern. The dataset is automatically balanced before applying the Machine Learning algorithms since the dataset is highly unbalanced. The balancing process of the dataset randomly removes normal curves until a balanced dataset is obtained. This process is carried out in order to perform adequate training of the algorithms.
4 Shapelet Results The results obtained have been validated with k-fold cross validation. In this case a 5- fold cross validation is applied in accordance with the size of the dataset. The results obtained from applying the 5- fold cross-validation to the shapelet algorithm are presented in (Table 1). The results are accurate with performance measures above 90% in each case. A single execution of the Shapelets algorithm is analysed to verify that the results are in line with the reality of each execution. These results are collected and it is shown that the Shapelets algorithm is indeed correctly classifying the power curves (Table 2). After executing Shapelets several times, it is verified that the algorithm recognises the curves that correspond to the dates of maintenance activities as non-normal in all the cases. This result is critical because the aim of this work is to recognise the real faults produced in railway switches. It is proposed as future work the application of Shapelets algorithm in the classification of specific types of failures, proposing new case studies and tests. Table 1 Results 5- fold cross validation
Average values (%)
Standard deviation
Precision
92.88
0.0272
Recall
94.23
0.0183
Accuracy
93.34
0.0199
Shapelet algorithm
Table 2 Results for one execution Shapelet algorithm
Classified values Non-normal curve (%) Real values
Non-normal curve Normal curve
94.27 4.3
Normal curve (%) 5.73 95.7
640
A. Muñoz del Río et al.
5 Conclusions Suitable inspection techniques and maintenance operations for railway points are needed to ensure high safety and reduced downtimes in railway transportation. Advanced analysis tools are essential to analyze large amounts of data by the different monitoring systems. This paper proposes the application of statistical analysis and Machine Learning algorithms for the detection of faults in railway points. The approach is based in pattern recognition in power curves defined by the motors in the opening and closing movements. It is presented a real case study with data of power curves collected from underground facilities in Spain. Shapelets algorithm is selected for this work because of its high speed and reliability on time series patterns classification. The results obtained by the Shapelets algorithm are validated with the k-fold cross validation method. One execution of the algorithm is also analysed to verify that the results obtained by k-fold cross validation are reliable. The Shapelets algorithm classifies correctly the majority of the curves obtaining performance measures above 90% in all cases. It is demonstrated that the algorithm classifies correctly the non-normal curves corresponding with maintenance tasks. It is concluded that the application of advanced data analysis techniques, such as Machine Learning and statistic algorithms, are accurate tools for pattern recognition and fault detection, increasing the reliability of maintenance management plans.
References 1. P. Singh et al., Deployment of autonomous trains in rail transportation: current trends and existing challenges. IEEE Access 9, 91427–91461 (2021) 2. M. Intelligence, Gaming market-growth, trends, COVID-19 impact, and forecasts (2021–2026) (2020) 3. F.P. García Márquez, C. Roberts, A.M. Tobias, Railway point mechanisms: condition monitoring and fault detection. Proc. Instit. Mech. Eng. Part F: J. Rail Rapid Transit. 224(1), 35–44 (2010) 4. V. Atamuradov et al., Railway point machine prognostics based on feature fusion and health state assessment. IEEE Trans. Instrum. Meas. 68(8), 2691–2704 (2018) 5. V. Atamuradov, et al. (2009) Failure diagnostics for railway point machines using expert systems, in 2009 IEEE International Symposium on Diagnostics for Electric Machines, Power Electronics and Drives (IEEE, 2009) 6. M. Hamadache et al., On the fault detection and diagnosis of railway switch and crossing systems: an overview. Appl. Sci. 9(23), 5129 (2019) 7. I.S. Ramirez, B. Mohammadi-Ivatloob, F.P.G. Márqueza, Alarms management by supervisory control and data acquisition system for wind turbines. Eksploatacja i Niezawodno´sc´ 23(1), (2021) 8. F.P.G. Márquez, F. Schmid, J.C. Collado, Wear assessment employing remote condition monitoring: a case study. Wear 255(7–12), 1209–1220 (2003) 9. P. Karuppusamy, Building detection using two-layered novel convolutional neural networks. J. Soft Comput. Paradigm (JSCP) 3(01), 29–37 (2021) 10. T. Smitha, A study on various mesh generation techniques used for engineering applications. J. Innov. Image Proc. 3(2), 75–84 (2021)
Pattern Recognition on Railway Points …
641
11. H.K. Andi, An accurate bitcoin price prediction using logistic regression with LSTM machine learning model. J. Soft Comput. Paradigm 3(3), 205–217 (2021) 12. J.I.Z. Chen, P. Hengjinda, Early prediction of coronary artery disease (cad) by machine learning method-a comparative study. J. Artif. Intell. 3(01), 17–33 (2021) 13. M. McHutchon, W. Staszewski, F. Schmid, Signal processing for remote condition monitoring of railway points. Strain 41(2), 71–85 (2005) 14. Garcı, et al., A reliability centered approach to remote condition monitoring. A railway points case study. Reliab. Eng. Syst. Saf. 80(1): 33–40 (2003) 15. F.P.G. Marquez, D.J.P. Tercero, F. Schmid, Unobserved component models applied to the assessment of wear in railway points: a case study. Eur. J. Oper. Res. 176(3), 1703–1712 (2007) 16. F.P.G. Márquez, D.J. Pedregal, Applied RCM 2 algorithms based on statistical methods. Int. J. Autom. Comput. 4(2), 109–116 (2007) 17. H. Kim et al., Fault diagnosis of railway point machines using dynamic time warping. Electron. Lett. 52(10), 818–819 (2016) 18. F.P. García Márquez, J.M. Chacón Muñoz, A.M. Tobias, B-spline approach for failure detection and diagnosis on railway point mechanisms case study. Q. Eng. 27(2), 177–185 (2015) 19. Y. Yang, C. Tao, R. Zhang, Fault diagnosis of switch control circuit using support vector machine optimized by genetic algorithm. Comput. Measur. Control 21(1), 48–50 (2013) 20. S. Yongkui et al., Condition monitoring for railway point machines based on sound analysis and support vector machine. Chin. J. Electron. 29(4), 786–792 (2020) 21. O. Eker, F. Camci, U. Kumar, SVM based diagnostics on railway turnouts. Int. J. Performability Eng. 8(3), 289 (2012) 22. B. Arslan, H. Tiryaki, Prediction of railway switch point failures by artificial intelligence methods. Turk. J. Electr. Eng. Comput. Sci. 28(2), 1044–1058 (2020) 23. T. Asada, C. Roberts, T. Koseki, An algorithm for improved performance of railway condition monitoring equipment: alternating-current point machine case study. Transp. Res. Part C: Emerg. Technol. 30, 81–92 (2013) 24. K. Zhang, The railway turnout fault diagnosis algorithm based on BP neural network, in 2014 IEEE International Conference on Control Science and Systems Engineering (IEEE, 2014) 25. L. Ye, E. Keogh, Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min. Knowl. Disc. 22(1), 149–182 (2011) 26. J. Lines, et al., A shapelet transform for time series classification, in Proceedings of the 18th ˙ ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2012) 27. J. Hills et al., Classification of time series by shapelet transformation. Data Min. Knowl. Disc. 28(4), 851–881 (2014) 28. C. Ji et al., A fast shapelet selection algorithm for time series classification. Comput. Netw. 148, 231–240 (2019) 29. S. Theodoridis, K. Koutroumbas, Pattern recognition (Elsevier, 2006)
Sustainability in Development of Grant Applications Sylvia Encheva
Abstract The aim of this work is to present a novel method for transitioning from competition to collaboration phase in the process of preparing applications for delivering an external support for research projects. Concept similarity is suggested to be employed to facilitate a search for semantic similarities among differently formulated grant applications and thereafter offer possibilities for incorporating a collaborative work to interested parties. During the entire process of similarity search and eventual suggestions for collaboration among competing teams, there will be no disclosure of proposals’ contexts or ideas they are built upon. The presented approach may very and it will be used in other types of project applications cases, where both public and private organizations are involved.
1 Introduction Writing proposals for external support of research requires a lot of time and effort and yet quite naturally, most proposals are not or cannot be granted. Looking at the figures related to applications sent to the Research Council of Norway (NRC) in February 2022 can help to illustrate our point. While NRC has 18 billion Norwegian krone at its disposal to support project applications in 2022 the amount researchers applied for is nearly 21 billion [1]. This alone exemplifies very well the need for finding a way to make use of all the good work put in application development. A large number of good ideas are lost in all those cases where a significant amount of researchers feel discouraged after rejection and as a result the researchers will lose interest in working on further applications. For the rest of this article, proposals and applications will be used interchangeably. Let’s focus on applications that do not receive support. Here we propose an approach that granting organizations could use to foster collaboration between those previously competing teams in the preparation of their future applications. Note that S. Encheva (B) Western Norway University of Applied Sciences, Inndalsveien 28, Post Box 7030, 5020 Bergen, Norway e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_53
643
644
S. Encheva
granting organizations do not disclose information about the contents of submitted applications. Their job is usually completed by informing all applicants about the final status of their proposals. Typical feedback consists of marks or points assigned to each criterion along with a couple of sentences addressing possible areas that need improvement. What we suggest is that once this is done, they can start looking at the rejected applications and find out more about the concepts and approaches these applications contain. Some proposals may be addressing similar research questions, have similar scientific context, have similarities in primary and secondary objectives, plan on using similar methodology, and foresee similar outcomes and impacts (see f. ex. [2] and [3]). Such teams could be suggested to be put in contact with other teams who have submitted applications with a degree of similarities on some of the above-mentioned parts of their proposals. This way those that agree that their teams f. ex. need additional expertise, might be interested to participate in this arrangement and afterward establish new fruitful collaborations. Concept similarity techniques used in text analytics [4] and near sets [5] are further used to utilize the proposed approach. The rest of the paper is organized as follows. Theoretical background is presented in Sect. 2, concept similarities are discussed in Sect. 3, and the paper ends with a conclusion in Sect. 4.
2 Theoretical Background Text mining or text data mining is a process of extracting valuable information from text applying methods from machine learning, statistics, and computational linguistics [6, 7]. Inter-document similarity computation stays in focus in [8]. The authors present there a new technique Context Semantic Analysis based on a model called Semantic Context Vector. Semantic similarity is often referred to as a metric based on the likeness of documents meaning/semantic content. It can be measured applying a variety of algorithms like Jaccard Index, Sorensen-Dice Index, Manhattan Distance, Euclidean Distance, Cosine Similarity, Similarit y(D , D ) = n
n
Di Di n 2
i=1
i=1 (Di )
2 i=1 (Di )
where D , D are two documents while n is the size of features vector. Cosine similarity measure between two vectors is employed in [9]. The k-nearest neighbors (KNN) algorithm is a popular supervised machine learning algorithm and can be used for classification and regression [10]. Similarity hashing [9, 11] transforms documents into similarity vectors and can also be used to facilitate the process of finding thematically closed proposals. A widget based on SimHash method from [9] can be found in [12]. For recent results
Sustainability in Development of Grant Applications
645
on pre-trained neural networks and big data analytics techniques see [13] and [14] respectively. Near sets theory is considering two disjoint sets of objects to be near each other, if the description of their objects is found to be the same applying indiscernibility relation [15]. They can be used to find similarities in graphically represented ideas and concepts. Petersen graph [16] is a 3-regular graph with 10 vertices and 15 edges. It is well discussed in the graph theory literature and is named after a danish mathematician Julius Petersen. Petersen graph pattern techniques are applied for automated detection of heart valve diseases with PCG signals [17]. The unit-distance Clebsch graph or the Greenwood-Gleason graph [18] a strongly regular quintic graph with 16 vertices and 40 edges. The Golomb graph [19] is a unit-distance graph with 10 vertices and 18 edges.
3 Concepts Similarities Among Applications for External Support to Research Projects While a lot of research is done on project management and project leadership there is not much focus on what is happening with rejected proposals. Most of the time a team behind a rejected proposal is advised to take into account referees’ comments and recommendations, improve the current version, and resubmitted. More often than not, applying teams receive information about the status of their proposal, titles of the granted ones, and eventually titles of the rest of the submitted proposals along with names of coordinating organizations. While authors’ rights to their ideas are well taken care of there are not many attempts to facilitate collaboration among researchers. In this work, we propose an approach where granting organizations can make use of their knowledge about submitted proposals and come up with suggestions to less fortunate applicants to consider collaborating with other teams. Teams behind rejected applications can be asked whether they would be interested to participate in a similarity search among non granted applications knowing that their project could be found somewhat similar to others. At the same time, it has to be made crystal clear to all of the involved teams that they can be part of this arrangement if and only if they are willing to do so. After a similarity has been established, the corresponding teams will then receive information about it and a suggestion to consider taking contact aiming at future collaboration. Figure 1 illustrates where in the process of proposal ranking a semantic similarity could take place. Similarity ratio may vary depending on topics’ description. Once its value is established it will be up to corresponding teams to consider its importance and act upon it. Examples of portfolios are health, climate changes, education, the ocean, industry and services, energy, transport, etc. Calls usually contain descriptions of more specific
646
S. Encheva
Fig. 1 Project similarities
topics and provide details on requirements, evaluation criteria, deadlines, and so on. In this work we consider the following evaluation criteria: • • • • •
scientific context (C1), primary and secondary objectives (C2), research methodology (C3), outcomes (C4), and impacts (C5).
Outcome of concept similarity search [20] is illustrated in Fig. 2. The meaning of text units is converted to a semantic fingerprint where each subunit is represented by a blue dot and the overlapping ones are colored in orange. The semantic similarity between texts is represented as a score belonging to an interval [0, 1] where the meaning is closer when the value approaches 1. This tool can be used to find degrees of concepts’ similarities between rejected proposals and suggest thereafter to those that are found similar to consider preparing a joint proposal next time.
Sustainability in Development of Grant Applications
647
Fig. 2 Similarity Fig. 3 Proposals with high marks on two criteria
3.1 Visualization A Petersen graph, Fig. 3 is employed to visualize proposals with high marks on certain criteria shown in the last five elements in Fig. 1 and at the same time having concept similarities. Proposals placed in a node have received high marks on the criteria on the node’s label. An edge between two nodes indicates concept similarity in the description of a criterion not found among their labels. Thus: • A node in Fig. 3 represents proposals with high marks on (C2, C3) and (C1, C4) and similarity in C5. It can be useful to look at similarities like f. ex. in C5 where they are not given high marks.
648
S. Encheva
Fig. 4 Proposals with high marks on two and four criteria
• (C2, C3) can contact (C4, C5), (C1, C4) and (C1, C5) if (C2, C3) is interested in similarity C1, C5, and C4 respectively. • (C2, C3) and(C1, C4) can contact (C1, C5), (C2, C5) and (C3, C5) if they would like to collaborate with a team whose application has received a high mark on C5. Other graphs like the Clebsch graph and the Golomb graph can be used to visualize cases with different amount of evaluation criteria and similarities. The Clebsch graph can be used to visualize cases where in addition to proposals with high marks on two criteria (as illustrated with the Petersen graph) there are also proposals with high marks on four criteria. An edge between nodes containing proposals with high marks on four criteria and proposals with high marks on two criteria indicates similarity in the criterion that can be found in the two criteria node but not in the four criteria node, Fig. 4. Thus an edge between the nodes labeled (C1 C2 C3 C5) and (C2 C4) indicates similarity in criterion C4, while an edge between the nodes labeled (C1 C3 C4 C5) and (C2 C3) indicates similarity in criterion C2. Such edges are colored in green in contrast to those colored in black and refer to two nodes containing proposals with high marks on two criteria. Proposals placed in the green node in Fig. 5 have high grades on all criteria and an edge between two nodes indicates similarity. Proposals in red nodes have high grades on 4 criteria whereas proposals in one of them do not share similarities with any of the those placed in the green node. Proposals in blue nodes have high grades on 3 criteria whereas proposals in one of them do not share similarities with any of the those placed in the green node. Proposals in yellow nodes have high grades on 2 criteria where proposals in one of them do not share similarities with any of the those placed in the green node. However, all proposals in the red, blue, and yellow
Sustainability in Development of Grant Applications
649
Fig. 5 Golomb graph
nodes that do not share similarities with any of the ones placed in the green node share similarities with each other. Discussion If applications are found to be near in the sense of Near sets theory, the corresponding teams could be suggested to collaborate in order to develop a joint proposal. If they are not near it is worth investigating whether they are complimentary. In that case, they also can be suggested to collaborate and thus develop a stronger application for the future. Semantic similarity can be applied also among granted projects. Examples of questions that could receive answers after a similarity search are: What do the top-ranked projects have in common apart from being given excellent marks from evaluation committees? What is the difference between a granted proposal and a rejected one in case they have received similar grading? Is there a way to encourage such teams to collaborate in the future? The tool described in [20] can be used by a granting organization to place proposals in separate pools based on their similarity and thus easier ranked them within each pool. Proposals that are selected for the last round of ranking ought to come from different pools, and thus cover several areas. This would definitely save time efforts for the evaluation committee members.
4 Conclusion In this work, we raise a question about sustainability in project proposals’ development.
650
S. Encheva
Rejected and somewhat similar projects can be suggested to consider further collaboration, while somewhat complementary projects can be suggested to look at the possibility to develop a larger project together. Last but not least rejected proposals conceptually close to a granted one can receive a hint that it is in their best interest to consider a new focus area for their next proposal rather than work on improving the current one.
References 1. https://khrono.no/sa-mange-prosjekter-kniver-om-forskningsradet-milliarder/660598 2. https://ec.europa.eu/info/research-and-innovation/funding/funding-opportunities/fundingprogrammes-and-open-calls/horizon-europe_en 3. https://www.forskningsradet.no/en/ 4. C. Zong, R. Xia, J. Zhang, Text Data Mining (Springer, Berlin, 2021) 5. J.F. Peters, Near sets. Special theory about nearness of objects. Fundam. Inf. 75(1–4), 407–433 (2007) 6. M. Allahyari et al., A brief survey of text mining: classification, clustering and extraction techniques (2017). Retrieved from https://www.semanticscholar.org/paper/A-Brief-Survey-ofText-Mining 7. J.A. Atkinson-Abutridy, Text Analytics: An Introduction to the Science and Applications of Unstructured Information Analysis (Independently published, 2020) 8. F. Benedetti, D. Beneventano, S. Bergamaschi, G. Simonini, Computing inter document similarity with context semantic analysis. Inf. Syst. 80, 136–147 (2019) 9. M. Charikar, Similarity estimation techniques from rounding algorithms, in STOC’02 Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing (2002), pp. 380–388 10. K. Jamsa Introduction to Data Mining and Analytics (Navigate, 2021) 11. K. Li, G.-J. Qi, J. Ye, T. Yusuph, K.A. Hua, Supervised ranking hash for semantic similarity search, in IEEE International Symposium on Multimedia (ISM) (2016), pp. 551–558 12. https://orange3-text.readthedocs.io/en/latest/widgets/similarityhashing.html 13. C. Anand, Comparison of stock price prediction models using pre-trained neural networks. J. Ubiquitous Comput. Commun. Technol. (UCCT) 3(02), 122–134 (2021) 14. S. Subarna, S. Smys, Big data analytics for improved risk management and customer segregation in banking applications. J. ISMAC 3(03), 235–249 (2021) 15. C. Henry, J.F. Peters, Arthritic hand-finger movement similarity measurements: tolerance near set approach. Comput. Math. Methods Med. (2011) 16. G. Chartrand, H. Hevia, R.J. Wilson, The ubiquitous Petersen graph. Discrete Math. 100(1–3), 303–311 (1992) 17. T. Tuncer, S. Dogan, R.-S. Tan, U.R. Acharya, Application of Petersen graph pattern technique for automated detection of heart valve diseases with PCG signals. Inf. Sci. 565, 91–104 (2021) 18. https://www.win.tue.nl/~aeb/drg/graphs/Clebsch.html 19. G. Exoo, D. Ismailescu, The chromatic number of the plane is at least 5: a new proof. Discrete Comput. Geom. 64, 216–226 (2020) 20. https://www.cortical.io/
New Category of Equivalence Classes of Intuitionistic Fuzzy Delta-Algebras with Their Applications Azeez Lafta Jaber and Shuker Mahmood Khalil
1 Introduction Zadeh [1] invistegated the idea of a fuzzy set as a class of objects having a continuum of membership grades in 1965. In 1986, Atanassov [2] introduced intuitionistic fuzzy set (IFS), By using the notion of non-membership, which appears to be more accurate to uncertainty quantification as well as possibility to properly describe the matter founded on current information and perception, it appears to be more accurate to uncertainty quantification and gives the possibility to properly describe the problem based on existing knowledge and observations. Following that, many mathematicians are investigating the concept of intuitionistic fuzzy sets in many domains [3–7]. The notion of (IFS) is a non-classical set like soft sets [8–13], fuzzy sets [14–18], permutation sets [19–24], nano sets [25], Neutrosophic sets [26–29] and others [30]. Imai and Iseki [31] looked into BC K -algebra, a kind of logic algebra. After that, Neggers and Kim [32] looked into the concept of d-algebras. Khalil and Abud Alradha [33] introduced the notions of ρ-algebra, ρ-ideal, ρ-ideal, ρ-subalgebra, and permutation topological ρ-algebra for the first time in 2017. Following that, several expansions of this concept, such as soft ρ-algebra [34], fuzzy ρ-algebra [35], and intuitionistic fuzzy ρ-algebra [36], are studied. Khalil and Hassan [37] introduce the concept of δ-algebra in 2021. The concept of δ-algebra was described, analyzed it in fuzzy sets [38], and addressed its applications, in which the efficacy of (COVID-19) pharmaceuticals is investigated using a fuzzy logical algebra algorithm. As a result, in this study, we’d want to look into the idea of δ-algebra by using another nonclassical set, intuitionistic fuzzy sets, to analyze a new structure in algebra known as intuitionistic fuzzy δ-algebra. In addition, various concepts and applications will be introduced and investigated in this study.
A. L. Jaber · S. M. Khalil (B) Department of Mathematics, College of Science, University of Basrah, Basrah 61004, Iraq e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_54
651
652
A. Jaber and S. Khalil
We shall look at the concepts like (I Fδi), (I Fδs),δ-homomorphism and (I Fδi), in this paper. We will also illustrate several applications on the set I Fδ S (W ) = {A/A is I Fδ S o f W }, such as the notations ≈μ , ≈ν and r on I Fδ S (W ). We’ll also look at and discuss their equivalence classes.
2 Preliminaries Here, in this part some definitions are recalled that we’ll need for our new results. Definition 2.1 Atanassov [2] An (IFS) Z over the universal set W is given by Z ={≺ s, θ Z (s), ϕ Z (s) | s ∈ W }, where. θ Z (s): W → [0; 1], ϕ Z (s): W → [0; 1] with 0 ≤θ Z (s)+ϕ Z (s) ≤ 1, ∀ s ∈ W . Definition 2.2 Atanassov [2] We say a whole set for the (IFS) 1 = {≺ s, (1, 0) |s ∈ W } and empty set for the (IFS) 0 = {≺ s, (0, 1) |s ∈ W }.
2.1 Some Operations on IF Sets Definition 2.3 Atanassov [2] Assume that = {≺ ι, (θ (ι), ϕ (ι)) | ι ∈ W } and ζ = {≺ ι, θζ (ι), ϕζ (ι)) | ι ∈ W } are IF sets of W . Then the following relationships were defined: (1) (2) (3) (4) (5)
⊆ ζ iff θ (ι) ≤ θζ (ι) and ϕ (ι) ≥ ϕζ (ι), ∀ ι ∈ W , = ζ iff ⊆ ζ and ζ ⊆ , ∧ζ ={(ι, min{θ (ι), θζ (ι)}, max{ϕ (ι), ϕζ (ι)}) : ι ∈ W }, ∨ζ = {(ι, max{θ (ι), θζ (ι)}, min{ϕ (ι), ϕ ζ (ι)}) : ι ∈ W }, c ={(ι, ϕ (ι), θ (ι)), ι ∈ W }.
Definition 2.4 Khalil and Hassan [37] We say (W, o, α) is a δ-algebra (δ− A) if α ∈ W and fulfilling the following assumptions: (i) (ii) (iii) (iv) (v)
υoυ = α. αoυ = α. υoω = α and ωoυ = α → ω = υ, for all υ, ω ∈ W . For all υ = ω ∈ W − {α} → υoω = ωoυ = α. For all υ = ω ∈ − {α} → (υo(υoω))o(ωoυ) = α.
Definition 2.5 Khalil and Hassan [37] Let φ = λ ⊆ W , where (W, o, α) is a (δ− A), λ is said to be δ-subalgebra (δ− SA) of W if υoω ∈ λ, for any υ, ω ∈ λ.
New Category of Equivalence Classes …
653
Definition 2.6 Khalil and Hassan [37] Let (W, o, α) be a (δ− A) and φ = E ⊆ W . E is said to be a δ-ideal of W if. (1) (2)
ι, t ∈ E → ιo t ∈ E, ιo t ∈ E and t ∈ E → ι ∈ E, for all ι, t ∈ W .
Definition 2.7 Jun et al. [4] Let = (θ , ϕ ) be an IFS in W and r ∈ [0, 1]. We say M(θ , r ) = {s ∈ W |θ (s) ≥ r } is a θ -level r-cut of . Definition 2.8 Jun et al. [4] Let = (θ , ϕ ) be an IFS in W and r ∈ [0, 1]. We say N (θϕ , r ) ={s ∈ W |ϕ (s) ≤ r }) is a ϕ-level r-cut of .
3 Motivation Khalil and Hassan [38] described the idea of δ-algebra, then examined it in fuzzy sets and addressed its applications, in which the effectiveness of medicines for (COVID19) is investigated using an algorithm based on fuzzy logical δ-algebra. So, in this research, we’d want to investigate the concept of δ-algebra by employing another non-classical set, intuitionistic fuzzy sets, to examine a new structure in algebra, which is referred to as intuitionistic fuzzy δ-algebra. This work will also introduce and investigate some concepts and applications.
4 Objectives We shall define and analyze the ideas of (IFS) in δ-algebras concerning the connotations of ideals and subalgebras, as well as various associated characterizations. We will also insert and analyze some new notions like (I Fρi), (I Fδs),δ-homomorphism and (I Fδi). Furthermore, we will demonstrate several applications on the set I Fδ S (W ) such as ≈θ , ≈ϕ and ∇r on I Fδ S (W ). We’ll also talk about their equivalence classes.
5 An Intuitionistic Fuzzy δ-Subalgebras in δ-Algebras We investigate and explore several new notions in this part, such as (I Fδs), (I Fδi), (I Fδi) and δ-homomorphism, as well as some fundamental characteristics are presented.
654
A. Jaber and S. Khalil
Table 1 (W, o, α) is a δ-algebra σ
l
ξ
α
o
α
α
α
α
α
ξ
ξ
α
ξ
ξ
ξ
α
ξ
l
l
α
ξ
ξ
σ
σ
Definition 5.1 Let (W, o, α) be a (δ− A) and = {≺ ι, (θ (ι), ϕ (ι)) |ι ∈ W } is IFS of W. We say is an intuitionistic fuzzy δ-subalgebra (I Fδs) of W if θ (ιoσ ) ≥ min{θ (ι) , θ (σ )} and ϕ (soσ ) ≤ max{ϕ (s) , ϕ (σ )},∀ ι, σ ∈ W . Example 5.1 Let W = {α, ξ, l, σ } be a δ-algebra, see Table 1: Hence, = {≺ ι, (θ (ι), ϕ (ι)) |ι ∈ W} {(α, 0.8, 0.2), (ξ, 0.3, 0.4), (l, 0.6, 0.4), (σ, 0.4, 0.3)} is an (I Fδs) of W .
=
Definition 5.2 Let (W, o, α) be δ-algebra and = {≺ ι, (θ (ι), ϕ (ι)) |ι ∈ W } is IFS of W. is called an intuitionistic fuzzy δ-ideal (I Fδi) of W if. (1) (2)
θ (ιoσ ) ≥ min{θ (ι) , θ (σ )} and ϕ (ιoσ ) ≤ max{ϕ (ι) , ϕ (σ )}, θ (ι) ≥ min{θ (ιoσ ) , θ (σ )} and ϕ (ι) ≤ max{ϕ (ιoσ ) , ϕ (σ )},∀ ι, σ ∈ W.
Example 5.2 Suppose that (W, o, α) is a δ-algebra in example (5.1). Also, let ζ = {≺ s, (θζ (s), ϕζ (s)) |s ∈ W} = {(α, 0.9, 0.1), (ξ, 0.4, 0.3), (l, 0.8, 0.3), (σ, 0.4, 0.1)} be an IFS of W . Hence, ζ is an (I Fδi) of W . Definition 5.3 Let (W, o, α) be δ-algebra and = {≺ ι, (θ (ι), ϕ (ι)) |ι ∈ W } is IFS of W. We say is an intuitionistic fuzzy δ- ideal (I Fδi) of W if. (1) (2)
θ (α) ≥ θ (ι) and ϕ (α) ≤ ϕ (ι) , θ (ιoσ ) ≥ min{θ (ι) , θ (σ )} and ϕ (ιoσ ) ≤ max{ϕ (ι) , ϕ (σ )},∀ ι, σ ∈ W.
Example 5.3 Assume that (W, o, α) is a δ-algebra in example (5.2). Also, let C = {≺ s, (θτ (s), ϕτ (s)) |s ∈ W} = {(α, 0.7, 0.2), (ξ, 0.3, 0.4), (l, 0.5, 0.4), (σ, 0.3, 0.3)} be an IF set of W . Hence, C is an (I Fδi) of W . Remarks 5.1 From the above definitions, we will consider the following: (1) (2) (3) (4)
Any (I Fδi) is (I Fδs), Any (I Fδs) is (I Fδi) if the condition (2) in Definition (5.3) is held, Any (I Fδi) is (I Fδs), Any (I Fδs) is (I Fδi) if the condition (1) in Definition (5.5) is held.
New Category of Equivalence Classes …
655
Lemma 5.1 Let (W, o, α) be a (δ− A) and = {≺ γ , (θ (γ ), ϕ (γ )) |γ ∈ W } be an (I Fδs), then θ (α) ≥ θ (γ ) and ϕ (α) ≤ ϕ (γ ) ,∀ γ ∈ W . Proof Assume that γ ∈ W. Hence θ (α) = θ (γ o γ ) ≥ min{θ (γ ) , θ (γ )} = θ (γ ), also ϕ (α) = ϕ (γ o γ ) ≤ max{ϕ (γ ) , ϕ (γ )} = ϕ (γ ). Theorem 5.1 Let (W, o, α) be a (δ− A) and { i =≺ s, (θi (s),ϕi (s)) |s ∈ W ,i ∈ } be a collection of (I Fδs) of W , then ∧i∈ i is an (I Fδi) of W , where ∧i∈ i = {≺ s, (min{θi (s)}, max{ϕi (s)}) |s ∈ W }. Proof Let s, τ ∈ W . Then min{θi (s o τ )} ≥ min{ min{θi (s),θi (τ )} } = Furthermore,max{ ϕi (s o τ )} ≤ min{ min{θi (s)} ,min{θi (τ )} } . max{ max{ ϕi (s),ϕi (τ )} } = max{ max{ ϕi (s)} , max{ ϕi (τ )}}. Hence ∧i∈ i = {≺ s, (min{θi (s)}, max{ϕi (s)}) |s ∈ W } such that condition (2) in Definition (5.5). Now, let s ∈ W. Thus, we have: (i) (ii)
min{θi (α)} = min{θi (s o s)} ≥min{θi (s) , θi (s)} = min{θi (s)}. In other side, max{ϕi (α)} = max{ϕ(s o s)} ≤ max{ϕi (s), ϕi (s)} = max{ϕi (s)}.
From (i) and (ii), we get (1) in Definition (5.5) is satisfied. So, ∧i∈ i is an (I Fδi) of W . Theorem 5.2 Let (W, o, α) be a (δ− A) and = {≺ γ , (θ (γ ), ϕ (γ )) |γ ∈ W } be an (I Fδi), then U = ≺ γ , θ (γ ), 1−θ (γ ) is an (I Fδi) of W . Proof We have to prove that 1−θ (γ ) such that the (1) and (2) in Definition (5.3). Let γ , ι ∈ W . Thus 1−θ (γ o ι) ≤ 1 - min{ θ (γ ), θ (ι) } = max{ 1 - θ (γ ), 1−θ (ι )} . Moreover, 1−θ (γ ) ≤ 1 - min{ θ (γ o ι), θ (ι) } = max{ 1 - θ (γ o ι), 1−θ (ι )} . Then U is an (I Fδi) of W . Theorem 5.3 Let (W, o, α) be a (δ− A) and = {≺ γ , (θ (γ ), ϕ (γ )) |γ ∈ W } is an (I Fδs) of W, then both of the sets Gθ = { γ ∈ W |θ (γ ) = θ (α)} and Gϕ = { γ ∈ W |ϕ (γ ) = ϕ (α)} are δ- subalgebras of W . Proof Let γ ,ι ∈ Gθ . Then θ (γ ) = θ (α) = θ (ι), also θ (γ o ι) ≥ min{θ (γ ), θ (ι)} = θ (α). By From Lemma (5.8), we get θ (γ o ι) = θ (α) or equivalently γ o ι ∈ Gθ . Furthermore, let γ , ι ∈ Gϕ . Therefore ϕ (γ o ι) ≤ max{ϕ (γ ), ϕ (ι)} = ϕ (α). So, by Lemma (5.8), we obtain ϕ (γ o ι) = ϕ (α). Hence t o ι ∈ Gϕ . Definition 5.4 Let = {≺ s, (θ (s), ϕ (s)) |s ∈ W } be an (I Fδs) of W . We say has finite image, if any image of θ and ϕ with finite cardinality (i.e. Im(θ ) = {θ (s)|s ∈ W } and Im(ϕ ) = {ϕ (s)|s ∈ W } such that |Im(θ )| < ∞ and |Im(ϕ )| < ∞). Theorem 5.4 Let (W, o, α) be a (δ− A) and = {≺ s, (θ (s), ϕ (s)) |s ∈ W } be an (I Fδs) of W , then each one of the θ -level l-cut and ϕ-level l-cut of is (δ− SA) of W . Any l ∈ [0, 1] with l ∈ Im(θ ) ∩ Im(ϕ ) are said to be θ -level δsubalgebra (θ Lδ− SA) and ϕ-level δ- subalgebra (ϕ Lδ− SA).
656
A. Jaber and S. Khalil
Proof Let s, τ ∈ M(θ , l). Therefore θ (s) ≥ l and θ (τ ) ≥ l. So, we consider that θ (so τ ) ≥ min{θ (s), θ (τ )} ≥ l so that so τ ∈ M(θ , l). Hence M(θ , l) is a δ- subalgebra of W . Moreover, let s, τ ∈ N (ϕ , l). We get ϕ (so τ ) ≤ max{ϕ (s), ϕ (τ )} ≤ l and so τ ∈ N (ϕ , l)..Then N (ϕ , l) is a (δ− SA) of W . Theorem 5.5 Let (W, o, α) be a (δ− A) and = {≺ s, (θ (s), ϕ (s)) |s ∈ W } be IFS of W with M(θ , l) and N (ϕ , l) are δ- subalgebras of W . Then is an (I Fδs) of W . Proof Let σ1 and σ2 be two elements in W such that θ (σ1 o σ2 ) < min{θ (σ1 ), θ (σ2 )} . Let γ = [θ (σ1 o σ2 ) + min{2 θ (σ1 ),θ (σ2 )} ] . Then θ (σ1 o σ2 ) < / M(θ ,γ ). However σ1 , σ2 ∈ M(θ ,γ ). γ < θ (σ2 )} min{θ (σ1 ), thus σ1 o σ2 ∈ But that is a contradiction. Hence θ (so τ ) ≥ min{θ (s), θ (τ )} , ∀s, τ ∈ M. In other side, if ϕ (σ1 oσ2 ) > min{ϕ (σ1 ), ϕ (σ2 )} for some σ1 , σ2 ∈ M. Put ϕ (σ1 ),ϕI (σ2 )} ] , then we get that ϕ (σ1 o σ2 ) > l > max{ϕ (σ1 ), l= [ϕ (σ1 o σ2 ) + min{ 2 / L(ϕ ,l). This is a contradicϕ (σ2 )} and hence σ1 , σ2 ∈ L(ϕ ,l) and σ1 o σ2 ∈ tion. Thus, we have ϕ (so τ ) ≤ max{ϕ (s), ϕ (τ )} , ∀ s, τ ∈ W . Hence is an (I Fδs) of W . Theorem 5.6 Let (W, o, α) be a (δ− A) and be (δ − SA) of W , then λ can be realized as both (θ Lδ− SA) and (ϕ Lδ− SA) of some (I Fδs) of W . Proof Let λ be a (δ − SA) of W , θ and ϕ be fuzzy sets in W defined by. θ (s) =
l, i f s ∈ λ 1, Other wise.
l, i f s ∈ λ , ∀ s ∈ W where l, ι ∈ (0, 1) are fixed real 1, Other wise. numbers with l + ι < 1. Let s, τ ∈ W . Therefore s τ ∈ λ whenever s, τ ∈ λ. So, we get θ (so τ ) = min{θ (s), θ (τ )} and ϕ (so τ ) ≤max{ϕ (s), ϕ (τ )} . If either s ∈ / λ or τ ∈ / λ, then either θ (s) = 0 or θ (τ ) = 0. Moreover, we have either ϕ (s) = 1 or ϕ (τ ) = 1. It follows that θ (so τ ) ≥ 0 = min{ θ (s),θ (τ )} , ϕ (so τ ) ≤ 1 = max{ ϕ (s), ϕ (τ )} . Then = {≺ s, (θ (s), ϕ (s)) |s ∈ W } is an (I Fδs) of W . Obviously, W (θ , l) = λ =L(ϕ , ι). and ϕ (s) =
Definition 5.5 Let I : (W, o, αW ) → (C, ∗, αC ) be a mapping of δ-algebras. is said to be a δ-homomorphism if I(so τ ) = I(s) ∗ I(τ ), ∀ s, τ ∈ W . Also, I−1 (ζ ) = {≺ s, (I−1 θζ (s), I−1 ϕζ (s)) |s ∈ W } is an IFS in (δ− A) W for any IFS ζ = {≺ τ, (θζ (τ ), ϕζ (τ )) |τ ∈ C} of (δ− A) C. Furthermore, if = {≺ s, (θ (s), ϕ (s)) |s ∈ W } is an IFS in(δ− A) W , then I( ) is IFS in C and defined by: I( ) = {≺ τ, (Isup θ (τ ), Iinf ϕ (τ )) |τ ∈ C}, where sup{θ (s)|s ∈ I−1 (τ )}, i f I−1 (τ ) = αW , Isup θ (τ ) = 0, Other wise,
New Category of Equivalence Classes …
and. Q inf ϕ (τ ) =
657
inf{ϕ (s)|s ∈ I−1 (c)}, i f I−1 (τ ) = αW , , ∀ τ ∈ C. 1, Other wise,
Theorem 5.7 Let I : (W, o, αW ) → (C, ∗, αC ) be a δ-homomorphism from (δ− A) W into a (δ− A) C and let U be an (I Fδs) of C. Then I−1 (U ) is an (I Fδs) of W . Proof For any s, τ ∈ W , we have θI−1 (U ) (so τ ) = θU (I(so τ )) = θU (I(s) ∗ I(τ )) ≥ min{θU (I(s)), θU (I(τ ))} = min{θI−1 (U ) (s), θI−1 (U ) (τ )} and ϕI−1 (U ) (so τ ) = ϕU (I(so τ )) = ϕU (I(s) ∗ I(τ )) ≤ max{ϕU (I(s)), ϕU (I(τ ))} = max{ϕI−1 (U ) (s), ϕI−1 (U ) (τ )}. Hence I−1 (U) is an (I Fδs) of W . Theorem 5.8 Let I : (W, o, αW ) → (C, ∗, αC ) be a δ-homomorphism from (δ− A) W into a (δ− A) C and let = {≺ s, (θ (s), ϕ (s)) |s ∈ W } be an (I Fδs) of W . Then I( ) =≺ τ, (Isup (θ ), Iinf (ϕ )) is an (I Fδs) of C. Proof Let = {≺ s, (θ (s), ϕ (s)) |s ∈ W } be an (I Fδs) of W and let σ1 , σ2 ∈ C. Noticing that. { s1 o s2 |s1 ∈ I−1 (σ1 ) and s2 ∈ I−1 (σ2 )} ⊆ { s ∈ W |s ∈ I−1 (σ1 ∗ σ2 )} , we have Isup (θ )(σ1 ∗ σ2 ) = sup{ θ (s)|s ∈ I−1 (σ1 ∗ σ2 )} ≥ sup{ θ (s1 o s2 )|s1 ∈ I−1 (σ1 ) and s2 ∈ I−1 (σ2 )} ≥ sup{ min{ θ (s1 ),θ (s2 )} | s1 ∈ I−1 (σ1 ) and x2 ∈ I−1 (σ2 )} = min{sup { θ (s1 )|s1 ∈ I−1 (σ1 ) } ,sup{ θ (s2 )|s2 ∈ I−1 (σ2 )}} = min{ Isup (θ )(σ1 ),Isup (θ )(σ2 )} and Iinf (ϕ )(σ1 ∗ σ2 ) = inf{ θ (s)|s ∈ I−1 (σ1 ∗ σ2 )} ≤ inf{ ϕ (s1 o s2 )|s1 ∈ I−1 (s1 ) and s2 ∈ I−1 (σ2 )} ≤ inf { max{ ϕ (s1 ),ϕ (s2 )} | s1 ∈ I−1 (σ1 ) and s2 ∈ I−1 (σ2 )}
658
A. Jaber and S. Khalil
= max{inf{ ϕ (s1 )|s1 ∈ I−1 (σ1 ) } ,inf{ ϕ (s2 )|s2 ∈ I−1 (σ2 )}} = max{ Isup (ϕ )(σ1 ),Isup (ϕ )(σ2 )} . Then I( ) =≺ τ, (Isup (θ ), Iinf (ϕ )) is an (I Fδs) of C. Theorem 5.9 Let I : (W, o, αW ) → (C, ∗, αC ) be a δ-homomorphism from (δ− A) W onto a (δ− A) C and let = {≺ s, (θ (s), ϕ (s)) |s ∈ W } be an (I Fδi) of W . Then I( ) =≺ τ, (Isup (θ ), Iinf (ϕ )) is an (I Fδi) of C. Proof Since = {≺ s, (θ (s), ϕ (s)) |s ∈ W } is an (I Fδi) of W . Hence from theorem (5.8) and remark (5.7) we have I( ) =≺ τ, (Isup (θ ), Iinf (ϕ )) is an (I Fδi)of C. Therefore condition (1) in definition (5.3) is held. Since I is onto, thus for any. σ1 , σ2 ∈ C, ∃s1 , s2 ∈ W such that s1 ∈ I−1 I(s1 ) = I−1 (σ1 ) and I−1 (σ2 ).s2 ∈ −1 I I(s2 ) = Also, s1 o s2 ∈ I−1 (σ1 ) o I−1 (σ1 ) = I−1 (σ1 ∗ σ1 ) Furthermore, noticing that θ (s1 ) ≥ min{θ (s1 o s2 ) , θ I (s2 )} and ϕ A (s1 ) ≤ max{ϕ (s1 o s2 ) , ϕ (s2 )}. For any σ1 , σ2 ∈ C, we have Isup (θ )(σ1 ) = sup{ θ (s)|s ∈ I−1 (σ1 )} ≥ sup{ min{θ (s1 o s2 ) , μ (s2 )}|s1 o s2 ∈ I−1 (σ1 ∗ σ2 ) and s2 ∈ I−1 (σ2 )} = min { sup{θ (s1 o s2 ) |s1 o s2 ∈ I−1 (σ1 ∗ σ2 ) } ,sup{θ (s2 )|s2 ∈ θ −1 (σ2 )} } = min{ Isup (θ )(σ1 ∗ σ2 ),Isup (θ )(σ2 )} Also, Isup (ϕ )(σ1 ) = sup{ ϕ (s)|s ∈ I−1 (σ1 )} ≤ sup{ max{θ (s1 o s2 ) , ϕ (s2 )}|s1 o s2 ∈ I−1 (s1 o s2 ) and s2 ∈ I−1 (σ2 )} = max { sup{ϕ (s1 o s2 ) |s1 o s2 ∈ I−1 (σ1 ∗ σ2 ) } ,sup{ϕ (s2 )|s2 ∈ I−1 (σ2 )} } = max{ Isup (ϕ )(σ1 ∗ σ2 ),Isup (ϕ )(σ2 )} .
New Category of Equivalence Classes …
659
Hence I( ) =≺ c, (Isup (θ ), Iinf (ϕ )) is an (I Fδi) of C. Theorem 5.10 Let I : (W, o, αW ) → (C, ∗, αC ) be a δ-homomorphism from (δ− A) W onto a (δ− A) C and let = {s, (θ (s), ϕ (s)) |s ∈ W } be an (I Fδi) of W , then I( ) =≺ τ, (Isup (θ ), Iinf (ϕ )) is an (I Fδi) of C. Proof Since = {≺ s, (θ (s), ϕ (s)) |s ∈ W } is an (I Fδi) of W . Thus from theorem (5.8) and remark (5.7) we have I( ) =≺ τ, (Isup (θ ), Iinf (ϕ )) is an (I Fδs) of C. We obtain condition (2) in definition (5.5) is held. Since = {≺ s, (θ (s), ϕ (s)) |s ∈ W } is an (I Fδi) of W , hence θ (αW ) ≥ θ (s) and ϕ (αW ) ≤ ϕ (s) , for any s ∈ W . Since I is δ-homomorphism of δ-algebras, then I(αW ) = αC . Also, αW ∈ I−1 (αC ) and { s|s ∈ I−1 (τ )} ⊆ { s|s ∈ W }, ∀τ ∈ C. Hence we get, Isup (θ )(αC ) = sup{ θ (s)|s ∈ I−1 (αC )} =θ (αW ) ≥ sup{ θ (s)|s ∈ W } ≥ sup { θ (s)|s ∈ I−1 (τ )} = Isup (θ )(τ ) In other side, Isup (ϕ )(αC ) = inf{ ϕ (s)|s ∈ I−1 (αC )} =ϕ (αW ) ≤ inf{ ϕ (s)|s ∈ W } ≤ inf { ϕ (s) |s ∈ I−1 (τ )} = Isup (ϕ )(τ ) Then I( ) =≺ τ, (Isup (θ ), Iinf (ϕ )) is an (I Fδi) of C.
6 The Equivalence Classes of Intuitionistic Fuzzy Delta-Algebras This section illustrates certain I Fδ S (W ) applications, such as ≈, ≈ and ∇r on θ
θ
I Fδ S (W ). In addition, some of their important components, are provided in this section.
660
A. Jaber and S. Khalil
≈ 6.1 The Equivalence Classes of I Fδ S (W ) Modulo (≈ θ /φ )
Symbolize to the set of all (I Fδs) of W by I Fδ S (W ). Also, let ≈ and ≈ be two θ
θ
binary relations on I Fδ S (W ), they are defined by: ≈ ζ ⇔ M(θ ,l) =M(θζ ,l) and ≈ ζ ⇔ N (ϕ ,l) = N(ϕζ ,l), respectively, μ
θ
for some, = ≺ s,θ ,ϕ and ζ = ≺ s,θζ ,ϕζ in I Fδ S (W ). Furthermore, its easy to show that ≈and ≈ are equivalence relations on I Fδ S (W ). Let θ
θ
= ≺ s,θ ,ϕ ∈I Fδ S (W ). The equivalence class of = ≺ s,θ ,ϕ modulo ≈ (resp. ≈) is symbolized by < >θ (resp. < >ϕ ). Moreover, the set θ
θ
of all equivalence classes of modulo ≈ (resp. ≈) is symbolized by I Fδ S (W )/ ≈ θ
θ
θ
(resp. I Fδ S (W )/ ≈). That means I Fδ S (W )/ ≈ = {< >θ | = ≺ s,θ ,ϕ θ
θ
∈ I Fδ S (W )}(resp.I Fδ S (W )/ ≈ = {< >ϕ | = ≺ s,θ ,ϕ ∈I Fδ S (W )}). θ
Next, symbolize to the set of all (I Fδi) of W by δ (W ). Define the maps ∂l , l : I Fδ S (W ) →δ (W ) ∪ {φ} by ∂l ( ) =M(θ ,l) and l ( ) =N (ϕ ,l), where l ∈ [0, 1], ∀ = ≺ s,θ ,ϕ ∈ I Fδ S (W ). Also, ∂l and l are well-defined. Theorem 6.1 Let ∂l , l :I Fδ S (W ) → δ I (W ) ∪ {φ} be the maps. Then ∂l and l are surjective, ∀l ∈ (0, 1).
Proof Let l ∈ (0, 1). Then 0 = ≺ s,0,1 is in I Fδ S (W ) where 0 and 1 are defined by 0(s) = 0 and 1(s) = 1, ∀ s ∈ W . Also, ∂l ( 0 ) =M(0,l) = φ = N (1,l) = l ( 1 ). 1, i f s ∈ λ Let φ = λ ∈ δ (W ). ∀s ∈ W , let θλ (s) = , and ϕλ (s) = 1 − θλ (s), 0, i f s ∈ /λ
thus ∂r ( λ ) = M(θλ ,r ) = λ = N (ϕλ ,r ) = r ( λ ). Now, we want to show that λ =≺ s, θλ ,ϕλ ∈ I Fδ S (W ). From [Definition (2.6)-(1)] we get λ is δ- subalgebra of W [since λ ∈ δ (W )]. So, M(θλ ,l), N (ϕλ ,l) are δ- subalgebras of W . Also, λ =≺ s, θλ ,ϕλ ∈ I Fδ S (W )[From Theorem (5.5)]. Therefore, ∀ λ ∈ δ (W ) we consider ∂l ( λ ) =λ and l ( λ ) = λ for some λ ∈ I Fδ S (W ). This completes the proof. Theorem 6.2 Assume that I Fδ S (W )/ ≈ and I Fδ S (W )/ ≈ are quotient sets. Then θ
for any l ∈ (0, 1) they are equipotent to δ (W ) ∪ {φ}.
θ
Proof Suppose that l ∈ (0, 1) and assume ∂l :I Fδ S (W )/ ≈ → δ (W ) ∪ {φ} and θ
l :I Fδ S (W )/ ≈→ δ (W ) ∪ {φ} are maps, where they are defined as ∂l ( < >θ θ
) = ∂l ( ) (resp. l ( < >ϕ ) =l ( )), ∀ = ≺ s,θ ,ϕ ∈I Fδ S (W ). Hence, ≈ ζ and ≈ ζ , ∀ = ≺ s,θ ,ϕ and ζ = ≺ s,θζ ,ϕζ in θ
ϕ
I Fδ S (W ), if M(μ ,l) = M(μζ ,l) and N (ϕ ,l) = N (ϕζ ,l). Then < >θ = < ζ >θ and < >ϕ = < ζ >ϕ . Then ∂l and l are injective. Furthermore, let 1, i f x ∈ λ φ = λ ∈ δ (W ) and ∀ s ∈ W , let θλ (s) = , ϕλ (s) = 1 − θλ (s), 0, i f x ∈ /λ
New Category of Equivalence Classes …
661
thus λ =≺ s, θλ ,ϕλ ∈ I Fδ S (W ). We consider that ∂l ( < λ >θ ) =∂l ( λ ) = M(θλ ,l) = λ, and l ( < λ >ϕ ) = l ( λ ) = N (ϕλ ,l) = λ. Next, for
0 = ≺ s,0,1 ∈ I Fδ S (W ) we have ∂l ( < 0 >μ ) =∂l ( 0 ) =M(0,l) = φ. Also, r ( < 0 >ν ) = r ( 0 ) = N (1,l) = φ. Thus ∂l and l are surjective and hence I Fδ S (W )/ ≈ and I Fδ S (W )/ ≈ are equipotent to δ (W ) ∪ {φ}. ϕ
θ
6.2 The Equivalence Classes of I Fδ S (W ) Modulo ∇l We define the relation ∇l on I Fδ S (W ) by (, ζ ) ∈∇l ⇔ M(θ ,l) ∩ N (ϕ ,l) =M(θζ ,l) ∩ N (ϕζ ,l),∀ l ∈ [0, 1] and ∀ = ≺ s,θ ,ϕ , ζ = ≺ s,θζ ,ϕζ ∈I Fδ S (W ). In other word, the relation ∇l is also an equivalence relation on I Fδ S (W ). Furthermore, the equivalence class of = ≺ s,θ ,ϕ modulo ∇l is symbolized as < >∇l . Theorem 6.3 A map λl :I Fδ S (W ) → δ (W ) ∪ { φ} is surjective if λl ( ) =∂l ( ) ∩ l ( ) ∀ = ≺ s,θ ,ϕ ∈ I Fδ S (W ) and l ∈ (0, 1).
Proof Assume that l ∈ (0, 1). Then λl ( 0 ) =∂l ( 0 ) ∩ l ( 0 ) = M(0,l) ∩ N (1,l) = φ. There exists λ = ≺ s, θλ ,ϕλ ∈ I Fδ S (M),∀λ ∈ I Fδ S (W ), where θλ (s) = 1, i f s ∈ λ and ϕλ (s) = 1−θλ (s) such that λl ( λ ) =∂l ( λ )∩l ( λ ) = M(θλ ,l)∩ 0, i f s ∈ /λ N (ϕλ ,l) = λ. Then λl is surjective. Theorem 6.4 The quotient set I Fδ S (W )/∇l is equipotent to δ (W ) ∪ {φ}, ∀l ∈ (0, 1). Proof Suppose that l ∈ (0, 1) and λl :I Fδ S (W )/∇l → δ (W ) ∪ { φ} is a map with λl ( < >∇l ) = λl ( ), ∀ < >∇l ∈ I Fδ S (W )/∇l . Assume that λl ( < >∇l ) = λl ( < ζ >∇l ),∀ < >∇l , < ζ >∇l ∈ I Fδ S (W )/∇l . We obtain ∂l ( )∩l ( ) = ∂l (ζ ) ∩ l (ζ ). That means M(θ ,l) ∩ N (ϕ ,l) = M(θζ ,l) ∩ N (ϕζ ,l). Therefore
(, ζ ) ∈ ∇l , and hence < >∇l =< ζ >∇l . So, λl is injective. Also, for 0 = ≺ s,0,1 ∈I Fδ S (W ) and this implies that ψl ( < 0 >l ) = λl ( 0 )= ∂l ( 0 ) ∩ l ( 0 )
= M(0,l) ∩ N (1,l) = φ. Take λ = ≺ s, θλ ,ϕλ ∈I Fδ S (W ),∀ λ ∈ I Fδ S (W ) have been the same (I Fδ S ) as in the proving of the theorem (5.5). Hence λl ( < λ >∇l
) = λl ( λ ) =∂l ( λ ) ∩ l ( λ ) = M(θλ ,l)∩N (ϕλ ,l) = λ. Hence λl is surjective. Then I Fδ S (W )/∇l is equipotent to δ (W ) ∪ {φ}.
662
A. Jaber and S. Khalil
7 Conclusion In this paper, various novel algebraic implications of (IFS) are presented, and their applications are addressed. In other side, several important δ-homomorphism theorems are presented. Next, we deduce numerous binary relations ≈, ≈ and ∇l on θ
ϕ
I Fδ S (W ), as well as some of their fundamental properties. In the future, we will investigate IF in new types of algebras, such as U P-algebras, U P- subalgebras, U Pideals, and others. Then we’ll have a look at their characteristics.
References 1. L.A. Zadeh, Fuzzy sets. Inf. Control 8, 338–353 (1965) 2. K.T. Atanassov, Intuitionistic fuzzy sets. Fuzzy Sets Syst. 20, 87–96 (1986) 3. P.A. Ejegwa, A.J. Akubo, O.M. Joshua, Intuitionistic fuzzy sets and its application in career determination via normalized euclidean distance method. Eur. Sci. J. 10, 529–536 (2014) 4. Y.B. Jun, H.S. Kim, D.S. Yoo, Intuitionistic Fuzzy d-Algebras. Scientiae Mathematicae Japonica 2006(19), 1289–1297 (2006) 5. S. Mahmood, Z. Al-Batat, Intuitionistic fuzzy soft LA-semigroups and intuitionistic fuzzy soft ideals. Int. J. Appl. Fuzzy Sets Artifi. Intell. 6, 119–132 (2016) 6. S. Mahmood, On intuitionistic fuzzy soft b-closed sets in intuitionistic fuzzy soft topological spaces. Ann. Fuzzy Math. Inform. 10, 221–233 (2015) 7. P.K. Maji, R. Biswas, A.R. Roy, Intuitionistic fuzzy soft sets. J. Fuzzy Math. 9, 677–692 (2001) 8. S.M. Khalil, S.A. Abdul-Ghani, Soft M-ideals and soft S-ideals in soft S-algebras. IOP Conf. Series: J. Phys. 1234, 012100 (2019) 9. M.A. Hasan, S.M. Khalil, N.M.A. Abbas, Characteristics of the soft-(1, 2)-gprw–closed sets in soft bi-topological spaces. Conf., IT-ELA 9253110, 103–108 (2020) 10. S.M. Khalil, Decision making using algebraic operations on soft effect matrix as new category of similarity measures and study their application in medical diagnosis problems. J. Intell. Fuzzy Syst. 37(2), 1865–1877 (2019). https://doi.org/10.3233/JIFS-179249 11. S.M. Khalil, F. Hameed, Applications on cyclic soft symmetric. IOP Conf. Series: J. Phys. 1530, 012046 (2020) 12. M.A. Hasan, N.M. Ali Abbas, S.M. Khalil, On soft α* —open sets and soft contra α* —continuous mappings in soft topological spaces. J. Interdiscip. Math. 24, 729–734 (2021) 13. S.M. Khalil, F. Hameed, An algorithm for generating permutation algebras using soft spaces. J. Taibah Univer. Sci. 12(3), 299–308 (2018) 14. S.M. Khalil, Decision making using new category of similarity measures and study their applications in medical diagnosis problems. Afr. Mat. 32(5–6), 865–878 (2021). https://doi.org/10. 1007/s13370-020-00866-2 15. S.M. Khalil, M. Ulrazaq, S. Abdul-Ghani, Abu Firas Al-Musawi, σ−Algebra and σ−Baire in fuzzy soft setting. Adv. Fuzzy Syst. 10 (2018) 16. S.M. Khalil, A. Hassan, Applications of fuzzy soft ρ—ideals in ρ—algebras. Fuzzy Inf. Eng. 10, 467–475 (2018) 17. S.M. Khalil, Dissimilarity fuzzy soft points and their applications. Fuzzy Inf. Eng. 8, 281–294 (2016) 18. S.M. Khalil, New category of the fuzzy d-algebras. J. Taibah Univer. Sci. 12(2), 143–149 (2018) 19. S.M. Khalil, F. Hameed, An algorithm for generating permutations in symmetric groups using soft spaces with general study and basic properties of permutations spaces. J. Theor. Appl. Inf. Technol. 96, 2445–2457 (2018)
New Category of Equivalence Classes …
663
20. S.M. Khalil, N.M. Abbas, Applications on new category of the symmetric groups. AIP Conf. Proc. 2290, 040004 (2020) 21. M.M. Torki, S.M. Khalil, New types of finite groups and generated algorithm to determine the integer factorization by Excel. AIP Conf. Proc. 2290, 040020 (2020) 22. S.M. Khalil, A. Rajah, Solving Class Equation xd = β in an Alternating Group for all n θ & β Hn ∩ Cα . Arab J. Basic Appl. Sci. 16, 38–45 (2014) 23. S.M. Khalil, A. Rajah, Solving the Class Equation xd = β in an Alternating Group for each β H∩ Cα and n θ. Arab J. Basic Appl. Sci. 10, 42–50 (2011) 24. S.M. Khalil, E. Suleiman, N.M. Ali Abbas, New technical to generate permutation measurable spaces (IEEE, 2021), pp. 160–163. https://doi.org/10.1109/BICITS51482.2021.9509892 25. S.M. Khalil, N.M.A. Abbas, On nano with their applications in medical field. AIP Conf. Proc. 2290, 040002 (2020) 26. A.R. Nivetha, M. Vigneshwaran, N.M. Ali Abbas, S.M. Khalil, On N*gα —continuous in topological spaces of neutrosophy. J. Interdiscipl. Math. 24(3), 677–685 (2021). https://doi.org/10. 1080/09720502.2020.1860288 27. N.M. Ali Abbas, S.M. Khalil, M. Vigneshwaran, The neutrosophic strongly open maps in neutrosophic Bi—topological spaces. J. Interdis. Math. 24(3), 667–675 (2021). https://doi.org/ 10.1080/09720502.2020.1860287 28. K. Damodharan, M. Vigneshwaran, S.M. Khalil, Nδ*gα —continuous and irresolute functions in neutrosophic topological spaces. Neutrosophic Sets Syst. 38(1), 439–452 (2020) 29. S.M. Khalil, On neurosophic delta generated per-continuous functions in neutrosophic topological spaces. Neutrosophic Sets Syst. 48, 122–141 (2022) 30. S.M. Saied, S.M. Khalil, Gamma ideal extension in gamma systems. J. Discrete Math. Sci. Crypto. 24(6), 1675–1681 (2021) 31. Y. Imai, K. Iséki, On axiom systems of propositional calculi. XIV, Proc. Japan Acad. Ser A Math. Sci. 42, 19–22 (1966) 32. J. Neggers, H.S. Kim, On d—algebra. Math. Slovaca 49(1), 19–26 (1999) 33. S. Mahmood, M. Abud Alradha, Characterizations of ρ-algebra and generation permutation topological ρ-algebra using permutation in symmetric group. Am. J. Math. Stat. 7(4), 152–159 (2017) 34. S. Mahmood, M. Abud Alradha, Soft edge ρ-algebras of the power sets. Int. J. Appl. Fuzzy Sets Artif. Intell. 7, 231–243 (2017) 35. S. Mahmood, F. Hameed, Applications of fuzzy ρ-ideals in ρ-algebras. Soft. Comput. 24(18), 13997–14004 (2020) 36. S.A. Abdul-Ghani, S.M. Khalil, M.A. Ulrazaq, A.F.M.J. Al-Musawi, New branch of intuitionistic fuzzification in algebras with their applications. Int. J. Math. Math. Sci. 5712676, 6 pages (2018) 37. S.M. Khalil, A. Hassan, The characterizations of δ—algebras with their ideals. IOP Conf. Ser.: J. Phys. 1999, 012108 (2021). https://doi.org/10.1088/1742-6596/1999/1/012108 38. S. M. Khalil, A. Hassan, A.H. Alaskar, W. Khan, A. Hussain, Fuzzy logical algebra and study of the effectiveness of medications for COVID-19. Mathematics 9(22), 28–38 (2021)
Deep Neural Networks for Stock Market Price Predictions in VUCA Environments Dennis Murekachiro
Abbreviations BGRU BLSTM GRU LSTM MAD MAPE MFNN MSE OHLVC RMSE SVM VUCA
Bidirectional gated recurrent unit Bidirectional long short term memory2 Gated recurrent unit Long short term memory Mean absolute deviation Mean absolute percentage error Multi-filters neural networks Mean square error Open price, high price, low price, volume, closing price Root mean square error Support vector machine Volatile, uncertain, complex and ambiguous
1 Introduction Volatile, uncertain, complex and ambiguous (VUCA) environments such as the stock markets make future investment decisions a challenge for investors and traders. Accurate stock market prediction is of paramount importance to traders and investors in that it permits them to alleviate risks, make knowledgeable and profitable investment decisions by determining future asset prices appropriately [1, 2]. The nature of stock markets in that they are noisy, chaotic, dynamic systems which are non-parametric, D. Murekachiro (B) Ministry of Higher and Tertiary Education, Innovation, Science and Technology Development, Cnr Samora Machel Avenue/Simon Muzenda Street, Harare, Zimbabwe e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_55
665
666
D. Murekachiro
unpredictable, non-stationary, nonlinear and highly volatile [2–6] pose investments decision making a challenge in VUCA environments. In order to avert this challenge in investment decision making, future asset price determination has become a practicable task with the advent of deep learning technologies which have been established to be very effective in forecasting stock prices and are also applicable to the financial markets domain [7, 8]. This paper introduces two deep learning technologies namely Bidirectional Long Short Memory (BLSTM) and Bidirectional Gated Recurrent Units (BGRU) to predict stock market closing prices of S&P 500 top 3 counters namely Apple, Microsoft and Amazon. Resounding deep learning prediction accuracies from this study attest to the successful application of deep neural networks to financial markets prediction. The results also point to the fact that in vuca periods of financial crisis and covid-19, deep learning models are useful in stock market predictions. Therefore, the main contributions of this paper are summarised as follows; Contribution 1: applicability of deep neural networks to the nonlinear and complex financial time series domain successfully. Contribution 2: development of two deep learning architectures for effective stock price prediction in covid induced financial crisis. This paper is structured as follows: Sect. 2 discusses on previous stock market applications using deep neural networks. Section 3 follows with a description of the deep learning architectures and details of the empirical experiments done in this paper. Section 4 discusses the results with regards to predictive performance whilst Sect. 5 summarises and concludes on this paper and provides suggestions for further research.
2 Literature Review Great usage of deep learning technologies for stock market predictions is on the increase with various algorithms and architectures continuously been released. This arises from the fact that deep learning models have the potential to learn and simplify from experience [9], have proved that they can learn complex, non-linear functional mappings [10], and that they demonstrate advanced levels of feature extraction capabilities with regards to each added hidden layer [10]. Two key parts of intelligent system-oriented financial decision making relate to prediction models with an objective either to ascertain future asset price or trading strategy for profit generation. This is been achieved through various implementations of deep neural networks for stock prediction, showing that deep learning is becoming a frontline of application in the financial domain over the use of traditional architectures. With regards to price prediction [19] implemented a deep learning end-to-end model named multi-filters neural network (MFNN) on the Chinese stock market index CSI300 which outperformed traditional machine learning methods and statistical
Deep Neural Networks for Stock Market Price Predictions …
667
methods on the prediction task. Taking a similar approach, [1] predicted daily price movements of 88 NASDAQ listed stocks with the best accuracy of 59.44% achieved for STOCKNET. Both results show that adding a feature selection component in the models does not enhance prediction accuracies. It is evident from these studies that the use of deep neural networks produces better accuracy results over traditional neural architectures. However, a great need to come up with better performing deep neural architectures remains a key concern in the prediction domains, especially for the financial time series domain. Ingle and Deshmukh [9] implemented an ensemble deep learning framework achieving an 85% prediction accuracy using sentiment analysis. In another experiment, [5] achieved 51.7, 51.4, 45.8 and 44.3% for the Gated Recurrent Unit-Support Vector Machine (GRU-SVM), GRU, Deep Neural Network (DNN) and SVM respectively for the HIS, DAX and S&P 500 indices. [12] achieved an accuracy of 74% on the Korean market. Other deep learning experiments on stock markets implemented over time have been assessed through Mean Absolute Deviation (MAD), Mean Square Error (MSE), Root Mean Square Error (RMSE) and Mean Absolute Percentage error (MAPE) metrics [3, 4, 13–15]. Trend predictions using deep neural networks are also on the rise with prediction accuracies ranging from 50 to 74% for most studies. For instance [19] achieved 73.59% on CITIC securities, whilst [16] experimenting on 10 S&P BSE-Bankex stocks found the GRU to be the best model amongst others with a prediction accuracy of 71.95%. Zhang et al. [2] attained prediction accuracies of 62.12 and 73.29% using deep learning fuzzy algorithm and BP algorithm respectively for three stocks on the shanghai market whilst [17] attained an average of 70% for the Shenzhen market. Haqet al. [1] experimenting with 88 NASDAQ listed stocks attained an average 59.44% accuracy level and [18] attained classification accuracy ranging between 50 and 65% on seven stock indices. Focusing on the S&P 500, [11] achieved accuracy levels of 59.14, 66.18 and 66.47% for Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU) and BGRU respectively for the S&P 500. In order to improve on Huynh’s work in addition to other previous studies and attain better accuracy results, this paper extends the models to include BLSTM for selected S&P 500 stocks and develops a better prediction model. To the best knowledge of the author, there has been no implementation and comparison of BLSTM and BGRU prediction models on one of the world’s top index (S&P500) top three counters to ascertain if deep neural architectures can be improved to successfully predict both in a financial crisis and on financial time series. In consideration of previous applications of deep neural networks to financial time series, none was implemented in a financial crisis period as well under covid era where there were great financial market disturbances. Ascertainment of whether deep neural networks predict successfully in a covid induced financial crisis was the key question that this paper worked on to address.
668
D. Murekachiro
3 Proposed Methodology In order to address the question on whether deep neural networks can predict successfully in a covid induced financial crisis of the USA, the following methodology is proposed which undertakes 6 steps as outlined in Table 1. Using daily data extracted from Yahoo finance engine for the period 2010–2021, each stock market’s stock index data is uploaded in csv file format and converted into a data frame format to enable pandas operations to work. The models use only five inputs namely open price, low price, high price, volume and adjusted closing price (OHLCV) in order to predict the next day’s closing price. These variables are chosen on the basis of previous experiments that use OHLCV data for predictions. The regression formula to examine the above relationship is as follows; Closet+1 = f (O pen t , H igh t , Lowt , V olumet , Closet )
(1)
All the input and output parameters (Open, High, Low, Volume and Adjusted Closing Price) are normalized between the range (0, 1). In sample and out sample were considered in this research. An 80/20 split between the in-sample and outsample was considered. This was premised on Granger’s (1993) work where he postulated that a need to retain at least 20% of sample as out of sample and 80% as training set is important. The splits between training and test samples for the three stock markets are exhibited in Table 2. As a rule of thumb, a minimum of 1000 data points for experiments is acceptable; hence all the stock markets meet these criteria. The graphical representations for the BGRU and BLSTM models are shown in Figs. 1 and 2 with the model parameters shown after the figures.
Table 1 Six steps in designing a deep neural network forecasting model
Step 1: Uploading of libraries Step 2: Data uploading and exploratory data analysis Step 3: Data pre-processing Step 4: Model built up Step 5: Training and validation of network Step 6: Evaluation criteria
Table 2 Train–test splits Stock
Country
Start date
End date
Data points
Training set
Test set
Apple
USA
2010-01-04
2021-06-21
2863
2290
573
Microsoft
USA
2010-01-04
2021-06-21
2863
2290
573
Amazon
USA
2010-01-04
2021-06-21
2863
2290
573
Deep Neural Networks for Stock Market Price Predictions …
669
Fig. 1 BGRU architecture
Fig. 2 BLSTM architecture
Layer (type)
Output shape
Param #
bidirectional_2 (Bidirection
(None, 22, 256)
102,912
dropout_2 (Dropout)
(None, 22, 256)
0
bidirectional_3 (Bidirection
(None, 256)
295,680
dropout_3 (Dropout)
(None, 256)
0
dense_2 (Dense)
(None, 16)
4112
dense_3 (Dense)
(None, 1)
17
Layer (type)
Output shape
Param #
bidirectional (Bidirectional
(None, 22, 256)
137,216
dropout (Dropout)
(None, 22, 256)
0
bidirectional_1 (Bidirection
(None, 256)
394240
dropout_1 (Dropout)
(None, 256)
0
dense (Dense)
(None, 16)
4112
dense_1 (Dense)
(None, 1)
17
Total params: 402,721 Trainable params: 402,721 Non-trainable params: 0
670
D. Murekachiro
Total params: 535,585 Trainable params: 535,585 Non-trainable params: 0
4 Results and Analysis Experiment results obtained reveal that the BLSTM and BGRU models used in this paper were able to predict the out of sample sets successfully as exhibited below through MAD, MSE, RMSE, MAPE evaluation metrics and graphical accuracy presentations; Panel 1: Summary of prediction results—BLSTM MAD
MSE
RMSE
MAPE
Accuracy (%)
Apple
0.0047
0.0000
0.0059
19.0004
81.39
Microsoft
0.0087
0.0001
0.0094
93.6721
59.80
Amazon
0.0026
0.0000
0.0031
48.3660
80.57
RMSE
MAPE
Accuracy (%)
Panel 2: Summary of prediction results—BGRU MAD
MSE
Apple
0.0049
0.0000
0.0059
83.4197
79.85
Microsoft
0.0285
0.0008
0.0285
5.2215
95.04
Amazon
0.0044
0.0000
0.0051
87.2912
73.56
Basing assessment on the MAD, MSE, RMSE and MAPE evaluation metrics, the proposed bidirectional deep learning models performed well with most metrics very close to zero implying very little divergence of predicted values to the actual values. In addition to these metrics, the graphical representation of the accuracy metric are exhibited below with graphs showing the predicted values in red against actual values in blue.
BLSTM
Model
Stock
Apple
Microsoft
Amazon
(continued)
Deep Neural Networks for Stock Market Price Predictions …
671
BGRU
Stock
(continued)
Apple
Microsoft
Amazon
672 D. Murekachiro
Deep Neural Networks for Stock Market Price Predictions …
673
The highest prediction accuracy for the BGRU application is on Microsoft with an accuracy of 95.04% whilst the highest accuracy for the BLSTM is 81.39% on Apple stocks. The current model outperformed previous deep neural networks applications on financial time series. For instance, whilst Huynh achieved accuracies of 59.14, 66.18 and 66.47% for LSTM, GRU and BGRU on the S& P 500, this study achieved 79.85, 95.04 and 73.56% for Apple, Microsoft and Amazon using the BGRU. Similarly to Huynh’s findings, the BGRU proved to be the best prediction model for the S& P 500 selected stocks.
5 Summary and Conclusion This paper proposed BGRU and BLSTM deep learning architectures to predict the next day’s closing price for S&P 500 top 3 stocks namely Apple, Microsoft and Amazon. The results show the capability of deep neural networks in financial predictions by attaining 81.39%, 59.80% and 80.57% for Apple, Microsoft and Amazon respectively using BLSTM whilst the BGRU results were 79.85%, 95.04% and 73.56% for Apple, Microsoft and Amazon respectively. Irrespective of the Covid 19-induced 2021 USA financial crisis, both architectures were able to successfully predict the three stocks, hence aiding investment decision making in a vuca environment and showing the ability of deep neural networks on financial time series prediction. Availability of Data and Materials Data and materials are available upon request from the corresponding author. Acknowledgements There are no acknowledgements to date. Funding There is no funding for this research article. Contribution The author contributed to this research article, read and approved the final manuscript. Ethical Declarations Competing Interests The author declares no competing interests. Conflict of Interest There is no conflict of interest to declare.
674
D. Murekachiro
References 1. A.U. Haq, A. Zeb, Z. Lei, D. Zhang, Forecasting daily stock trend using multi-filter feature selection and deep learning. Exp. Syst. Appl. 168, 114444 (2021) 2. Y. Zhang, B. Yan, M. Aasma, A novel deep learning framework: prediction and analysis of financial time series using CEEMD and LSTM. Exp. Syst. Appl. 159, 113609 (2020) 3. H. Rezaei, H. Faaljou, G. Mansourfar, Stock price prediction using deep learning and frequency decomposition. Exp. Syst. Appl. 169, 114332 (2021) 4. M. Vijh, D. Chandola, V.A. Tikkiwal, A. Kumar, Stock closing price prediction using machine learning techniques. Procedia Comput. Sci. 167, 599–606 (2020) 5. G. Shen, Q. Tan, H. Zhang, P. Zeng, J. Xu, Deep learning with gated recurrent unit networks for financial sequence predictions. Procedia Comput. Sci. 131, 895–903 (2018) 6. A. Nayak, M.M.M. Pai, R.M. Pai, Prediction models for Indian stock market. Procedia Comput. Sci. 89, 441–449 (2016) 7. D. Murekachiro, T. Mokoteli, H. Vadapalli, Predicting emerging and frontier stock markets using deep neural networks. In: Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol. 1037 ed. by Y. Bi, R. Bhatia, S. Kapoor. (Springer, Cham, 2020). https://doi.org/10.1007/978-3-030-29516-5_68 8. A.S. Saud, S. Shakya, Analysis of look back period for stock price prediction with RNN variants: a case study on banking sector of NEPSE. Procedia Comput. Sci. 167, 788–798 (2020) 9. V. Ingle, S. Deshmukh, Ensemble deep learning framework for stock market data prediction (EDLF-DP). Glob. Transitions Proc. 2, 47–66 (2021) 10. A. Thakkar, K. Chaudhari, A comprehensive survey on deep neural networks for stock market: the need, challenges and future direction. Exp. Syst. Appl. 177, 114800 (2021) 11. H.D. Huynh, D.M. Doung, L.M. Dang, A new model for stock price movements using deep neural networks. ACM (2017). https://doi.org/10.1145/3155133.3155202 12. H. Na, S. Kim, Predicting stock prices based on informed traders’ activities using deep neural networks. Econ. Lett. 204(C) 2021. Elsevier 13. Q. Liu, Z. Tao, Y. Tse, C. Wang, Stock market prediction with deep learning: the case of China. Finan. Res. Lett. (2021) 14. H. Liu, Z. Long, An improved deep learning model for predicting stock market price time series. Dig. Sig. Process. 102, 102741 (2020) 15. M. Hiransha, E.A. Gopalakrishman, K.M. Vijay, K.P. Soman, NSE stock market prediction using deep-learning models. Procedia Comput. Sci. 132, 1351–1362 (2018) 16. A.J. Balaji, D.S.H. Ram, B.B. Nair, Applicability of deep learning models for stock price forecasting: an empirical study on BANKEX data. Procedia Comput. Sci. 143, 947–953 (2018) 17. J. Zhang, S. Cui, Y. Xu, Q. Li, T. Li, A Novel data-driven stock price trend prediction system. Exp. Syst. Appl. 97, 60–69 (2018) 18. Y. Peng, P.H.M. Albuquerque, H. Kimura, C.A. Portela, B. Saavedra, Feature selection and deep neural networks for stock price direction forecasting using technical analysis indicators. Mach. Learn. Appl. 5, 100060 (2021) 19. W. Long, Z. Lu, L. Cui, Deep learning-based feature engineering for stock price movement prediction. Knowl.-Based Syst. 164, 163–173 (2019) 20. D. Zhang, S. Lou, The application research of neural network and BP algorithm in stock price pattern classification and prediction. Futur. Gener. Comput. Syst. 115, 872–879 (2021)
Building a Traffic Flow Management System Based on Neural Networks Varlamova Lyudmila Petrovna and Nabiev Timur Erikovich
Abstract A lot of work has been devoted to the issues of traffic flow control, but the problem of creating a traffic flow control system remains relevant. The number of transport units is growing every year, the number of types of transport units is increasing. Existing traffic management systems are not up to the task of managing. The use of control methods based on artificial neural networks and fuzzy logic allows to control the traffic flow, recognize road users and make quick decisions. Since the traffic stream has the property of emergence, the control system must be flexible. Neuro-fuzzy methods and control models make it possible to take into account all the features of the traffic flow.
1 Introduction The research of the traffic flow, the creation of mathematical macroscopic hydrodynamic models developed by the middle of 1950 and continues at the present time. Macroscopic hydrodynamic models do not reflect all properties of the traffic flow. The number of transport units increases every year, the number of road users grows, the number of different types of transport units increases. All these issues must be taken into account and regulate the traffic flow. Conventional models cannot always cope with an increasing traffic flow, therefore, one of the approaches to traffic flow control is proposed in the paper. In this connection, the topic is relevant. This paper considers the implementation of a traffic light control system model based on fuzzy logic and hybrid neural networks. For the first time, in this work were used intelligent TV cameras (multifunctional Television motion control sensors) to fix transport units, which allow to fix vehicles and recognize them. To increase the control system’s speed in the local controller unit, we used a probabilistic neural network.
V. Lyudmila Petrovna (B) · N. Timur Erikovich National University of Uzbekistan, Tashkent, Uzbekistan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_56
675
676
V. Lyudmila Petrovna and N. Timur Erikovich
2 The Problem Formulation The papers [1–7] proposed the use of artificial neural networks and a fuzzy approach for traffic flow control. The disadvantage of these systems lies in the control of traffic signaling without taking into account local situations. We propose in the control system, the neural network sets the type and location of the relevance function of the fuzzy controller, reduces the time delay of vehicles in the case of constant loading of the considered intersection with the help of the central controller. The traffic flow control system is built on the basis of local control systems with fuzzy logic controllers (FLC). Considering the shortcomings of work [8], a structural diagram of a neuro-fuzzy traffic flow control system was implemented. The control scheme for several interconnected intersections is shown in Fig. 1. Here we presented the group traffic control (Fig. 2). In this scheme, each local controller works independently and interacts with the main controller, which also contains fuzzy logic rules in the control loop. To determine the continuity of the traffic flow, a motion detector or a camera is usually used, which have disadvantages. They don’t take into account the situation and types of transport units. We used multifunctional television motion control sensors [9] (smart camera). Smart cameras are installed at the intersection at the stop line. After testing the software, it was found that the multi-level architecture of the control system is more flexible, despite the control of fixed traffic lights, and the waiting time is reduced in terms of density and traffic flow. With this approach, the fuzzy logic rule is taken into account twice when setting up controls. In the proposed model of the traffic flow management system, isolated intersections are included and pedestrian movements are taken into account in accordance with the traffic light phases. Let’s open fuzzy—a logical controller in a multilevel form (Fig. 3).
Cameras
FLC
Σ
Signal controller feedback
Adaptation module
Quality control module
Traffic controller
Sequencer
Detector
Fig. 1 The scheme of a system with a fuzzy logic controller and smart cameras
Building a Traffic Flow Management System Based …
677
General controller - workload degree - the cycle time regulation Flow rate Flux density
Local controller: -local situation; -adjustable green time; -fixed green time.
-----
Local controller: -local situation; -adjustable green time; -fixed green time.
-----
Fig. 2 The control and management scheme of traffic flows at several intersections
Predefined rules
Fuzzifier
Level 1 FLC
Signal to controller
Defuzzifier
+ Level 2 FLC
Defuzzifier
Fig. 3 Multilevel fuzzy logic controller (FLC)
The predefined rules are located in the management system in the knowledge base. Two-level fuzzification of the input signal makes it possible to take into account the predefined operation rules of the fuzzy logic controller, and thus the quality of the system increases by 9.5–10%. This scheme allows the use of multiphase control and determine the degree of intersection load. In [10–14], a traffic control approach is proposed that combines the fusion of fuzzy logic and neural networks in a traffic light alarm control system (LSE). In the proposed control system, the neural network sets the type and location of the relevance function of the fuzzy controller, reduces the time delay of vehicles in the case of constant loading of the considered intersection with the help of the central controller. The operation of local controllers operating according to the fuzzy logic algorithm is carried out in accordance with [13], with a multi-level and multi-agent approach, the block diagram of the fuzzy control system will take the form (Fig. 5). The proposed system takes into account the arrival of vehicles at the intersection, fuzzy logic rules for public transport, green phases, priorities of movement through the intersection and detection using video cameras and TV (television) sensors.
678
V. Lyudmila Petrovna and N. Timur Erikovich
Based on fuzzy-logical rules, the signal goes to increase the duration of the green signal in accordance with the observation, the fuzzy-logical decision and the traffic situation model. This structure of the fuzzy-logical system allows to control the traffic flow in accordance with the observed traffic situation. The traffic situation modeling block operates in accordance with the parameters, input data coming from sensors and surveillance cameras. Consider this block as a multilevel, similarly to Fig. 4—multi-level controller (FLC), where the fuzzy rule generation unit is presented as a multilayer neural network, in which the neural network gives the prediction of the parameters used as input for the fuzzy logic unit itself (Fig. 5).
Fig. 4 Multilevel neuro-fuzzy-logical system
y Level 4 Level 3 Level 2 [w1ij,w2ij,w3ij,w4ij] Level 1
x1
x2
x3
Fig.5 Multilevel neural network with fuzzy logic functions [w1ij , w2ij , w3ij , w4ij ]—weight functions
Building a Traffic Flow Management System Based …
679
Each layer of the neural network performs certain functions. The first layer of the neural network processes the exact values of the input variable. In the second layer of the neural network, the functions parameters are calculated V ehcle I n W aiting T ime F = V ehcle Out ∗ Drawing T ime . The expression shows the ratio of the vehicles entering number intersection, i.e. crossing the stop line (V ehcle I n) and passing the stop line—leaving (V ehcle Out) the intersection in accordance with the ratio of drawing time and waiting time. The third level is the “level of rules”. Each of the nodes of the third level represents one of the fuzzy rules. The last (fourth) level is the defuzzification level with a sharp value of the selected output variable. Such a structure of the control system, built on the basis of a hybrid neural network, with fuzzification and defuzzification functions, allows to adjust fuzzy rules in accordance with the traffic situation, the density and intensity of the traffic flow, the speed of vehicles arriving at the intersection, thereby reducing the load, preventing traffic jams. The aim of the work was to determine the offset parameter, that is, the difference in time between the beginning of the green phase at the first and second intersections. The control system received the intensity values and the length of the cycle, and at the output the degree of the necessary change in the signal at the traffic light was obtained. The tested fuzzy control system showed, in comparison with adaptive control, higher efficiency (permeability) near the saturation point of the transport diagram. Compared to conventional dynamic control, the use of the Fuzzy Logic Controller (FLC) has achieved a 10–21% reduction in vehicle latency. However, in such a system there is no traffic flow prediction function. This problem is solved by a multilayer adaptive fuzzy probabilistic neural network.
3 Method of Solution The congestion of the road transport system and the influence of various disturbing factors leads to traffic congestion, so that they stretch over long distances, leading to the collapse of the road network. For this purpose, a neuro-fuzzy approach was used in the creation of traffic control systems, taking into account the instability and non-stationarity of the traffic flow as a control object [14, 15]. Since traffic flows may contain different types of vehicles, their classification and adaptation are required in the management of such objects. Since traffic flows may contain different types of vehicles, their classification and adaptation are required in the management of such objects. Let a traffic stream be given, containing N vehicles described by n-dimensional feature vectors, while some of the vehicles are classified, and some are not. It is also assumed a priori that m different classes can be allocated in the array, to which these motor transport units may belong. It is also assumed that these classes can have different shapes in the m-dimensional space and mutually overlap. It is necessary to
680
V. Lyudmila Petrovna and N. Timur Erikovich
create a classifying neuro-fuzzy system that will allow a simple and efficient classification method under the condition of mutually overlapping classes and propose an architecture of a classifying fuzzy probabilistic network that will allow classifying vehicles passing an intersection, both in terms of Bayesian and fuzzy classification at the same time. The network should be easy to implement and suitable for processing incoming observations in a sequential online mode. A fairly effective tool for solving the problem of classifying input parameters are probabilistic neural networks introduced by D.F. Specht [16], which are trained according to the principle of “neurons in data points”, which makes it extremely simple and fast. The literature contains modifications of PNNs (Probabilistic Neural Networks—probabilistic neural networks) designed for processing visual information and differing in the presence of elements of competition in the learning process and the possibility of correcting the receptor fields of nuclear activation functions [17]. In [18], fuzzy modifications of probabilistic networks were introduced that allow solving the classification problem under conditions of intersecting classes. At the same time, the use of PNN (Probabilistic Neural Network) and FPNN (Fuzzy Probabilistic Neural Network—fuzzy probabilistic neural networks) in visual information processing problems becomes more complicated in cases where the volume of information being analyzed is large, and the feature vectors (images) have a sufficiently high dimension. This difficulty is explained by the fact that both in a probabilistic neural network and in other neural networks trained according to the “neurons at data points” principle [19], the number of neurons in the first hidden layer (layer of images) is determined by the number of image vectors of the training sample N, which, of course, leads to a decrease in performance and requires the storage of all data used in the network training process, which naturally makes it difficult to work in on-line mode. To overcome this shortcoming, we proposed an improved probabilistic neural network, where the first hidden layer is formed not by images, but by class prototypes calculated using the usual K-mean in batch mode. Since the number of possible classes m in classification problems is usually significantly less than the size of the training sample N, the fuzzy probabilistic network (FPNN) is much better suited to solving real problems than the standard probabilistic neural network (PNN). In connection with the above disadvantages of probabilistic neural networks, we used a fuzzy probabilistic network, where in the first hidden layer, adaptive refinement of prototypes is performed using the WTA rule (“Winner takes all”) of training by T. Kohonen [18, 19], and the output layer estimates the levels belonging of the images arriving for processing to one or another class using the procedure of Fuzzy C-means (FCM) [18]. Such a network contains the minimum possible number of neurons, equal to the number of classes, and therefore is characterized by high performance. At the same time, the network does not take into account either the size of the classes or the frequency of occurrence of images in each of these classes, which naturally limits its capabilities when processing data whose prototypes are located at different distances from each other, which, moreover, can change over time. A standard probabilistic network (PNN) consists of an input (receptor) layer, a first hidden layer called the image layer, a second hidden layer called the summation
Building a Traffic Flow Management System Based …
681
layer, and an output layer formed by a comparator that extracts the maximum value at the output of the second hidden layer. Moreover, the initial information for network synthesis is a training sample of images formed by a “package” of n-dimensional vectors x(1), x(2), …, x(k), …, x(N) with a known classification. It is also assumed that N A vectors belong to class A, N B to class B, and N C to class C, i.e. NA + NB + NC = N, and prior probabilities can be calculated using elementary relations PA =
NA NB NC , PB = , PC = , N N N
PA + PB + PC = 1
Training of Specht networks is reduced to a one-time setting of weights, since the number of neurons in the layer of images is N, and their synaptic weights are determined by the values of the components of these images according to the principle of neurons at data points: wli = xi (l), i = 1, 2, . . . , n; l = 1, 2, . . . , N In vector form: wli = xi (l) = (x1 (l), x2 (l), . . . , xn (l))T . The activation function of the image layer, which has a bell shape, converts the signal x(k) into a scalar neuron output ol[i] (k) = (x(k) − wl , σ )
(1)
x(k) − wl 2 ol[i] (k) = exp − , 2σ 2
(2)
based on Gaussian
where ol[i] —hidden layer output. When processing visual data, the number of vehicle classes can be large, and in on-line mode, their classification using a standard probabilistic network can be difficult. For this purpose, a simple architecture is introduced, the number of neurons of which is equal to the number of classes (for example, trucks, cars, buses) (N = 3, A, B, C), and the classification is performed by estimating the distance to class prototypes, calculated using the arithmetic mean Nj 1 x(k, j), cj = N j k=1
j = 1, 2, . . . , m.
(3)
682
V. Lyudmila Petrovna and N. Timur Erikovich
With a larger number of classes, such a scheme cannot estimate class sizes and their overlaps. For this purpose, a multilayer adaptive fuzzy probabilistic neural network is proposed, shown in Fig. 6. From the figure (Fig. 6) it can be seen that the hidden layer contains m blocks of the same type according to the number of classes, the number of which can change during the learning process, and learning is carried out according to the principle “neurons in data points”. Each of the blocks contains the same number of neurons ˜ N + 1 N˜ A = N˜ B = N˜ C = N˜ , while in each block N˜ neurons (by the number of classes = 3) are trained according to the principle “neurons in data points”, and one neuron C j (C A , C B , C C ) calculates class prototypes; cj —arithmetic mean; N˜ is the number of neurons in the hidden layer.
w1
x1(k)
PA(x(k))
w2 w3
Σ
CorA
CA oA(k)
x2(k)
w4 PB(x(k)) w5
Σ
CorB
oB(k)
j**(k)
w6 CB
x3(k) w7 PC(x(k)) w8
x4(k)
Σ
oC(k) CorC
w9 CC
k
Fig. 6 Appearance of an adaptive fuzzy probabilistic neural network with five layers for three classes (A, B, C)
Building a Traffic Flow Management System Based …
683
At the same time, in each block, between individual neurons and between blocks as a whole, the process of “competition” according to Kohonen [36] is organized along intrablock and interblock lateral connections, which makes it possible to evaluate both the prototypes (centroids) of classes and their sizes. The second hidden layer of adders is similar to the layer in the Specht network, in the third hidden layer of correction of a priori probabilities, the frequencies of the appearance of images in each of the classes are calculated, and the output comparator layer implements the actual classification of the presented image. The network training process begins with setting the initial synaptic weights of all neurons. For the architecture shown in Fig. 6, it is necessary to have (N * m) nine classified images, three for each class A, B and C. x(1, A) = w1 (0), x(2, A) = w2 (0), x(3, A) = w3 (0), x(4, B) = w4 (0), x(5, B) = w5 (0), x(6, B) = w6 (0), x(7, C) = w7 (0), x(8, C) = w8 (0), x(9, C) = w9 (0), C A (0) =
1 1 x(k, A), C B (0) = x(k, B), 3 k=1 3 k=4
CC (0) =
1 x(k, C). 3 k=7
3
6
9
(4)
Further, the image vectors that participated in the formation of the initial conditions are not used, and all subsequent signals will be denoted by x(k, j) if they belong to the training sample x(k), if they were subject to classification. The first image x(1, j) is fed to the network input, with respect to which its belonging to a particular class A or B or C is known. As a result of interblock competition, a winner prototype j* is determined (in this case, j is not necessarily equal to j*), whose parameter vector cj*(0) in the sense of the accepted metric (usually Euclidean) is closest to the input signal x(1, j), i.e. 2 j ∗ = arg min D x(1, j), c p (0) = arg P min x(1, J ) − C p (0) = arg P max x T (1, j)c P (0) = arg P max cos(x(1, j), c P (0)) ∀ p = 1, 2, . . . , m, −1 ≤ cos(x(1, j)) c P (0) = x T (1, j)c P (0) ≤ 1, 0 ≤ x(1, j) − c P (0)2 ≤ 4.
(5)
where D—Euclidean metric, x(k, j)—image vector signal that participated in the classification. When fulfilling T. Kohonen’s WTA learning rule, two situations may arise • the input vector x(1, j) and the winner cj*(0) belong to different classes, i.e. j = j *;
684
V. Lyudmila Petrovna and N. Timur Erikovich
• the input vector x(1, j) and the winner prototype cj*(0) may belong to the same class j = j*. Next, the parameters of neurons and prototypes are tuned using a fuzzy learning rule (LVQ—Learning Vector Quantization) [18, 19] ⎧ ⎨ c j∗ (0) + η(1) x(1, j) − c j∗ (0), if i = j ∗ ; c j (1) = c j∗ (0) − η(1) x(1, j) − c j∗ (0) , if i = j ∗ ; ⎩ c j (0), if j-th neur on won.
(6)
0 < η(1) < 1—training step parameter chosen empirically. After training and choosing a training step, the level of belonging of the image to the class x(1, j) to each of the m available classes is determined. The stage of intrablock competition ends and the stage of interblock competition begins, where the distances are calculated where q runs through all numbers of neurons corresponding to the jth class. 2 D c j (1), wq (0) = c j (1) − wq (0) ,
(7)
gde D—Euclidean metric, c j —average, j—neuron winner, j*—winner prototype index, k—observation number, wq —synaptic weight. Then the winner is also calculated, to the prototype of the similarly to (5), closest jth class (7) and if the condition D c j (1), w j∗ (0) < D c j (1), x(1, j) , the center vector of the q*-th activation function is replaced by x(1, j), thereby increasing the size of the class, i.e. wq* (1) = x(1, j). Otherwise, all wq(0) remain unchanged, incrementing only their own index so that wq (1) = wq (0). Thus, only observations that are far from the current value of the prototype are included in the learning process according to the principle “neurons in data points”. Let us assume that by the moment of arrival of the kth observation of the training sample, all prototypes cj (k − 1) and vectors of neuron parameters wl (k − 1) with a total number of N × m have been formed. Then the learning process of the first hidden layer can be written as the following sequence of steps: • arrival at the input of the network of the vector-image x(k, j) with a known classification; • determination of the winning prototype cj*( k − 1) such that j ∗ (k − 1) = arg P min D x(k, j), c p (k − 1) , p = 1, 2, . . . , m; Setting the prototype parameter—the winner in step (k − 1) looks like: ⎧ ⎨ c j∗ (k − 1) + η(k) x(k, j) − c j∗ (k − 1), if j = j ∗ ; c j (k) = c j∗ (k − 1) − η(k) x(k, j) − c j∗ (k − 1) , if j = j ∗ ; ⎩ c j (0k − 1), if j-th neur on won.
(8)
Building a Traffic Flow Management System Based …
685
• calculation of intra-block distances in the jth class D(cj (k), wq (k − 1)), where q—all indices of j-block neurons; • determination of an intrablock winner wq* (k − 1) such that q ∗ (k − 1) = argq min D c j (k), wq (k − 1) . • under condition D c j (k), w j∗ (k − 1) < D c j (k), x(k, j) being replaced wq∗ (k) = x(k, j), wq k) = wq (k − 1) . The learning process of this layer is carried out until the training sample is exhausted, i.e., it ends with the calculation of all cj (N) and all N × m weights wl (N). At the same time, in the third hidden layer, the process of calculating the relative frequencies of appearance of images from different classes takes place. Pj =
Nj N
The training of the multilayer fuzzy probabilistic neural network is completed. Further, the recognition process is carried out in stages (3)–(8). The using of a neuro-fuzzy probabilistic network is shown, which has a number of advantages over other types of networks, including Bayesian networks for solving the problem of classification or belonging to any class in conditions of overlapping classes [19–21]. The using of this network allows to more accurately determine the values of the probabilities of incoming data belonging to possibly potential classes. The proposed classification method is characterized by ease of implementation, high speed, and the ability to process data as information becomes available.
3.1 The Concept of a Neuro-Fuzzy Traffic Control System The principles underlying the proposed solutions of the adaptive transport system— the use of neuro-fuzzy logic methods, the use of complex electronic equipment, which collects data on traffic flows for subsequent processing of the received data, are based on a multi-agent approach. Various types of sensors are used in different systems for vehicle detection [21, 22], which are capable of recognizing those passing through a given section and continuously transmit data to local control systems. Various types of sensors are used in different systems for vehicle detection, which are capable of recognizing those passing through a given section and continuously transmit data to local control systems. Due to the instability of the traffic flow as a control object, the variability of its characteristics at different times of the day, the use of neuro-fuzzy control methods with
686
V. Lyudmila Petrovna and N. Timur Erikovich
a probabilistic forecasting approach is shown. Improving the efficiency of intelligent traffic management systems can be achieved using the listed neuro-fuzzy approaches, such as: 1. 2. 3. 4. 5. 6. 7.
Processing of big data, their accumulation, analysis and, on their basis, the development of fuzzy rules for management and decision, prioritization; Modeling the traffic situation with the help of fuzzy-logical solutions; Development of a neuro-fuzzy algorithm for controlling traffic light phases; Development of a traffic flow control algorithm based on an adaptive neurofuzzy probabilistic model; The using of intelligent monitoring tools used in local control systems with data processing in top-level systems; Application of neuro-fuzzy methods for vehicle classification; Interaction of all levels and subsystems within the framework of global governance.
The conceptual diagram of an intelligent traffic management system operating in accordance with the above approaches, in combination with the wide capabilities of technical means, centralized data processing, allows to quickly, based on data received from all available sources, analyze traffic flows and automatically develop solutions for them. Regulation, including control data for traffic light objects at intersections and pedestrian crossings.
4 Conclusion This approach to building an automatic traffic control system in the adaptive class is most preferable, since it contains all the necessary stages of input information processing, taking into account changes in operating conditions. The scientific novelty of this work lies in the development of a conceptual diagram of a road traffic management system based on the using of a neuro-fuzzy approach to control, the using of intelligent monitoring tools and the integration of local control systems into a higher level system.
References 1. E.H. Mamdani, Application of fuzzy logic to approximate reasoning using linguistic synthesis. Fuzzy Sets Syst. 26(12), 1182–1191 (1977) 2. M. Nakatsuyama, H. Nagahashi, N. Nishizuka, Fuzzy logic phase controller for traffic junctions in the one-way arterial road, in Proceedings of the IFAC 9th Triennial World Congress, Budapest, Hungary (1984), pp. 2865–2870 3. K.K. Tan, M. Khalid, R. Yusof, Intelligent traffic lights control by fuzzy logic. Malays. J. Comput. Sci. 9(2), 29–35 (1996)
Building a Traffic Flow Management System Based …
687
4. J. Favilla, A. Machion, F. Gomide, Fuzzy traffic control: adaptive strategies, in Proceedings of the 2nd IEEE International Conference on Fuzzy Systems, San Francisco, Calif, USA, vol. 1 (1993), pp. 505–511 5. G. Beauchamp-Baez, E. Rodriguez-Morales, E.L. Muniz-Marrero, A fuzzy logic based phase controller for traffic control, in Proceedings of the 6th IEEE International Conference on Fussy Systems, July 1997, pp. 1533–1539 6. M.B. Trabia, M.S. Kaseko, M. Ande, A two-stage fuzzy logic controller for traffic signals. Transp. Res. Part C Emerg. Technol. 7(6), 353–367 (1999) 7. K. Kagolanu, R. Fink, H. Smartt, R. Powell, E. Larson, An intelligent traffic controller, in Proceedings of the 2nd World Congress on Intelligent Transportation Systems (1995) 8. J. Niittymaki, R. Nevala, Fuzzy adaptive traffic signal control—principles and results, in Proceedings of the Joint 9th IFSA World Congress and 20th NAFIPS International Conference, vol. 5 (2001), pp. 2870–2875 9. N.R. Yusupbekov, A.R. Marakhimov, Synthesis of the intelligent traffic control systems in conditions saturated transport stream. Int. J. Chem. Technol. Control Manage. 3(4), 12–18 (2015) 10. J. Favilla, A. Machion, F. Gomide, Fuzzy traffic control: adaptive strategies, in Second IEEE International Conference on on Fuzzy Systems, San Francisco, CA (1993), pp. 506–511 11. K.M. Passino, S. Yurkovich, Fuzzy Control (Addison-Wesley, 1998) 12. R.A. Aliev, R.R. Aliev, Soft Computing and its Applications (World Scientific, 2001) 13. J. Niittymäki, M. Pursula, Signal control using fuzzy logic. Fuzzy Sets Syst. 116(1), 11–22 (2000) 14. E. Bingham, Reinforcement learning in neurofuzzy traffic signal control. Eur. J. Oper. Res. 131(2), 232–241 (2001) 15. W. Wei, Y. Zhang, J.B. Mbede, Z. Zhang, J. Song, Traffic signal control using fuzzy logic and MOGA, in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, vol. 2 (2001), pp. 1335–1340 16. D.F. Specht, Probabilistic neural networks, in Neural Networks, vol. 3(1) ed. by D.F. Specht. (1990), pp. 109–118 17. Y. Bodyanskiy, Probabilistic neuro-fuzzy network with nonconventional activation functions, in Knowledge-Based Intelligent Information and Engineering Systems: 7th International Conference KES 2003, Oxford, 3–5 September 2003: Proceedings, ed. by Y. Bodyanskiy, Y. Gorshkov, V. Kolodyazhniy, J. Wernstedt. (Springer, Berlin-Heidelberg, New York, 2003), pp. 973–979. (Lecture Notes in Computer Science, vol. 2774) 18. E. Tron, M. Margaliot, Mathematical modeling of observed natural behavior: a fuzzy logic approach. Fuzzy Sets Syst. 146(3), 437–450 (2004) 19. F. Qu, Y. Hu, Y. Yang, S. Sun, A convergence theorem for ımproved kernel based fuzzy C-means clustering algorithm, in 3rd International Workshop on Intelligent Systems and Applications, Wuhan, China (2011), pp.1–4 20. Japan’s Kyosan electric manufacturing Co Ltd forays into Indian market. https://in.finance. yahoo.com/news/japans-kyosan-electric-manufacturing-co-072813783.html 21. G.H.A. Alnovani, A.I. Diveev, K.A. Pupkov, E.A. Sofronova, Control synthesis for traffic simulation in the urban road network, in Proceedings of the 18th IFAC World Congress, Milano, Italy, 28 August–2 September 2011, pp. 2196–2201 22. L.P. Varlamova, M.A. Artikova, Z. Aripova, M.D. Khashimkhodjaeva, Fuzzy logic traffic management model, in Scientific Collection «InterConf», (38): with the Proceedings of the 1st International Scientific and Practical Conference «Science, Education, Innovation: Topical Issues and Modern Aspects», Tallinn, Estonia (Uhingu Teadus juhatus, 2020), p. 1376. https:// ojs.ukrlogos.in.ua/index.php/interconf/article/view/7812
The Impact of Ruang Cerita Application on the Neuro Depression and Anxiety of Private University Students Satria Devona Algista, Fakhri Dhiya ‘Ulhaq, Bella Natalia, Ignasius Christopher, Ford Lumban Gaol, Tokuro Matsuo, and Chew Fong Peng Abstract Mental health is an important part of overall health and well-being. In this COVID-19 pandemic era, many people are facing mental health problems, especially depression and anxiety. In this study we will discuss the impact of the RuangCerita application on the depression and anxiety of private university students. This study aims to know what impact the RuangCerita application has on the depression and anxiety of private university students especially in this pandemic COVID-19 era. The data collection method from this research is a questionnaire (Google Form) which is distributed to 22 respondents and was carried out for 3 days. Population of this research is students of private universities. The result of this research shows that 77.3% of all respondents felt lonely during this pandemic. Besides that, 88.4% of all respondents have an interest in using applications that focus on mental health. In conclusion, COVID-19 pandemic can cause people to have mental health issues. Therefore, it is hoped that with the RuangCerita application, many people can more easily overcome their mental health problems.
S. Devona Algista · F. Dhiya ‘Ulhaq · B. Natalia · I. Christopher School of Information System, Bina Nusantara University, Alam Sutera, Indonesia e-mail: [email protected] F. Dhiya ‘Ulhaq e-mail: [email protected] B. Natalia e-mail: [email protected] F. Lumban Gaol (B) Doctor of Computer Science, Binus Graduate Program, Jakarta, Indonesia e-mail: [email protected] T. Matsuo Advanced Institute of Industrial Technology, Tokyo, Japan e-mail: [email protected] C. Fong Peng University of Malaya, Kuala Lumpur, Malaysia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_57
689
690
S. Devona Algista et al.
1 Introduction The COVID-19 pandemic has had a major impact on our lives. Many of us face challenges that can be stressful, overwhelming, and cause strong emotions in both adults and children. Public health measures, such as social distancing, are necessary to reduce the spread of COVID-19, but they can leave us feeling isolated and lonely and can increase stress and anxiety [2]. Riskesdas (Basic Health Research) shows an increase in emotional problems in all provinces in Indonesia, compared to 5 years ago based on Riskesdas 2013. And of the 6.1% prevalence of depression in people over 15 years old, only 9% received treatment [2]. 1 in 10 people in Indonesia suffer from mental disorders. The highest mental disorder is anxiety disorder. Followed by depression. The World Health Organization (WHO) states that being in a pandemic situation, such as the COVID-19 pandemic, can also be a triggering factor for stress which then makes people more susceptible to mental disorders [8]. Mental health is an important part of overall health and well-being. Mental health includes our emotional, psychological and social well-being. It affects the way we think, feel, and act. It also helps determine how we handle stress, relate to others, and make healthy choices [12]. Therefore, the existence of mental health disorders cannot be underestimated, because the current number of cases is still quite worrying. There are about 450 million people suffering from mental and behavioral disorders worldwide. It is estimated that one in four people will suffer from mental disorders during their lifetime. According to WHO Asia Pacific region (WHO SEARO) the highest number of cases of depression is in India (56,675,969 cases or 4.5% of the total population), the lowest in Maldives (12,739). cases or 3.7% of the population). As for Indonesia, there were 9,162,886 cases or 3.7% of the population [8]. Encouraged by this, the state of handling problems related to mental health in Indonesia, especially in large areas such as DKI Jakarta and especially for students, especially college students, is still very low. Likewise in Indonesia, with various biological, psychological and social factors with a diverse population, the number of cases of mental disorders is likely to continue to grow. Therefore, it is important in every country to have efforts to overcome the consequences of this mental health disorder [8]. RuangCerita is an application that helps their users to overcome mental health issues. In general, this RuangCerita application will consist of 5 main features, namely, Consultation, Drugstore, Activity, Community, Profile [2].
The Impact of Ruang Cerita Application on the Neuro Depression …
691
1.1 Medical Purposes Mental health services among health workers in Indonesia are still low. These conditions affect the process of diagnosis, care and treatment of patients, as well as the family’s understanding of the condition, and how to treat patients [19]. Another easy access to mental health services is the government’s effort to place psychologists in primary health services, namely Puskesmas, so that they are easily accessible and can reach a wider level of society. Puskesmas can be the starting point for health workers to be closer to the community directly. Although the placement of Psychologists in Puskesmas has not become an evenly distributed program throughout Indonesia, it is a very good start in improving mental health services. The digital era can be seen as an opportunity to participate in improving people’s mental health. The RuangCerita application can be duplicated through various channels, as well as the services of a Psychologist or Psychiatrist. By collaborating with internet-based information service providers, mental health education will become wider in scope so that mental health services in Indonesia will be facilitated more thoroughly and people, especially students can overcome their mental health issues easily just through their phone.
1.2 Educational Purpose Many people don’t know about the importance of mental health especially in this COVID-19 pandemic. Mental illness is a general term for a group of illnesses that may include symptoms that can affect a person’s thinking, perceptions, mood or behavior [21]. There are a lot of mental health issues such as anxiety, depression and stress. That mental health issue can mess up the mind and consciousness of the sufferer. Besides that, it also can cause delusion, halutination, and experience change in behavior. That mental health issue can be dangerous if not being resolved. The impact triggers aggressive behavior, depression, suicede attempts, social isolation, and alcohol or drug abuse [2]. In RuangCerita application there is many insight about mental health including articles about mental health, activity of mental health, medicine of mental health, and many more. So, people can get a lot of insight on mental health.
1.3 Promotion Purposes According to Kotler [9], promotion includes all activities undertaken to communicate and promote their products to the target market. In accordance with the context that has been discussed regarding mental health itself and driven by a willingness to find solutions to existing problems in the form of the RuangCerita application, this
692
S. Devona Algista et al.
research paper intends to promote the RuangCerita application which will help solve existing problems. Problem Formulation: 1. 2. 3.
Can this RuangCerita mental health application help students to reduce their mental health problems? How does a student’s mental health affect his/her performance as a student? How is the impact of the COVID-19 pandemic on the mental health of students?
By knowing the impact of the RuangCerita application on mental health, it can increase the users of this application because people become interested in using it. People become know about what impact and features in RuangCerita application to overcome mental health issue. Therefore, it can be a promotion alternative for RuangCerita application.
2 Research Technique We use questionnaires as the research methods. Questionnaire is a research instrument consisting of a series of questions that aim to collect information from respondents [13]. The questionnaire can be considered as a written interview. We will use the google form platform to make the questionnaire more efficient. College students, workers, and psychiatrists will be participants in this research because we know that they have potential to have mental health issues and are most likely to use this application (Fig. 1). Following is the list of questionnaire question: • • • • • • • • • • • •
During this pandemic, do you often feel lonely? What activities do you usually do during this pandemic? If there is an application for mental health, would you be interested to use it? What do you currently feel is interfering with your daily activities? When do you feel these things like excessive worry and the like interfere with your daily activity? Does anyone else know how you are feeling right now? How do you deal with boredom during a pandemic like now? What features do you expect to have in our application? Do you feel your workload as a student/worker makes you stressed? How do your parents support you with your personal problems? In your opinion, what impact will students and workers have if mental health is impaired on performance in college and work? Both academically and in an organizational setting? At this time, doing the assignments given by the lectures and the scope of work, do you often feel complaining and tired of doing lectures or work responsibilities? Following is the steps in research technique:
The Impact of Ruang Cerita Application on the Neuro Depression …
693
Fig. 1 Step in research technique
1. 2. 3.
Distribution of questionnaire to respondent at random, to be filled in according to with the rules that have been prepared Withdrawal of questionnaire that has been filled out by the respondent Analyze the research data carefully. Here are the steps in doing the research technique.
3 Research Method In this research, we use quantitative research design in order to collect information. Quantitative research design is the process of collecting and analyzing numerical data. It can be used to find averages and patterns, make predictions, and generalize results to wider populations [5]. In quantitative research, we use the type of descriptive research which prioritizes in depth analysis of the data and facts found [22]. We use quantitative research because we need to know the averages of how stressed people, especially private university students are in this pandemic era and make the prediction about the impact of RuangCerita application on mental health (Fig. 2). In this research, we will use 3 instruments such as depression, anxiety, and OCD. Depression (major depressive disorder) is a common and serious medical illness that negatively affects how you feel, the way you think and how you act [17]. We will measure the depression level of the RuangCerita Application’s users whether it
694
S. Devona Algista et al.
Fig. 2 Research model of Ruang Cerita impact
increases or decreases. Besides that, we also use anxiety as an instrument’s research method. Anxiety is a feeling of fear, dread, and uneasiness [11]. We measure the anxiety level of this user’s application. After using the features in RuangCerita application, whether there is a changes or not. In addition, OCD (Obsessive Compulsive Disorder) is a mental health disorder that affects people of all ages and walks of life, and occurs when a person gets caught in a cycle of obsession and compulsions [1]. We also measure the OCD level as the instrument’s research method. Based on those 3 instruments, we can know the impact of the RuangCerita application on the mental health of private university students. The population used in this research are college students or college students who study at the private university and are in the information systems department. Where from these results were found with a population of 1000 people. By using the sampling technique in accordance with the slovin formula [14], namely α = N ÷ 1 + N × e2 where α = sample size N = population size e = Allowance for inaccuracy due to tolerable sampling error. Based on the Slovin formula, the sample size is obtained as follows:
The Impact of Ruang Cerita Application on the Neuro Depression …
695
n = 1000/(1 + (1000(0.05)2 )) n = 1000/(1 + (1000 × 0.0025) n = 1000/(1 + 2.5) n = 1000/45 n = 285 So the sample size in this study was 285 students who study at private universities and are in the information systems department who will be respondents. And obtained a total of 22 people who have filled out the questionnaires and interviews. Meanwhile, the sampling technique used is a non-probability technique which is a technique used based on the researcher’s choice regarding the sample criteria so that later it is selected as an unknown sample [20]. The method chosen by the researcher is accidental sampling which is a method or technique of determining the sample by chance, namely in the form of people who accidentally meet the researcher and have sample characteristics that are in accordance with what has been determined by Sugiyono [16].
4 Research Result 4.1 Data Collection Result After doing the data collection, we get several data from 22 respondents. Following is the data collection results (Figs. 3, 4, 5, 6 and 7). From the questionnaire data. Researchers also conducted research related to the activities that are often done to the respondents regarding how they manage stress so that depression does not occur. The data obtained are as follows (Table 1).
And how old are you?
Total Respondents
22 answers
Age
Fig. 3 Age questionnaire result
696
S. Devona Algista et al.
What you do for living? (Job)
Total Respondents
22 answers
Job
Fig. 4 Job questionnaire result
What activities do you usually do during this pandemic?
Activities
22 answers
Total Respondents
Fig. 5 Activities questionnaire result
4.2 Result of Research Question In this section, we will explore the result of the experiments. In the first part, we will answer the effect of the RuangCerita mental health application on students’ mental health problems. RuangCerita mental health application can reduce student’s mental health problems because many of the respondents do not have a supportive environment or themselves who cannot tell the mental health problems they are experiencing, even to their own parents. To manage this problem, RuangCerita application provides features where users can share their mental health problems with the professional.
The Impact of Ruang Cerita Application on the Neuro Depression …
697
During this pandemic, do you often feel lonely? 22 answers
Fig. 6 Loneliness questionnaire result
What do you currently feel is interfering with your daily activities? 22 answers
Fig. 7 Feeling questionnaire result
In this part, we will answer the result of the effect of a student’s mental health on their performance as a student. Student’s mental health issues can affect their performance as a student because mental illness can affect students’ motivation, concentration, and social interactions, crucial factors for students to succeed in higher education. This statement is in line with the results of filling out the questionnaire survey that we have done, where most of the respondents stated that when their mental health is unstable, the work they do will not produce maximum results. In this part, we will answer the result of the impact of the COVID-19 pandemic on the mental health of students. The pandemic COVID-19 can have an impact to student’s mental health. It can be seen that many private university students experience feelings of loneliness. One of the reasons for this feeling of loneliness is the online learning media where students cannot meet and interact with each other due to the COVID-19 pandemic. This feeling of loneliness can peak and increase the stress level of the students themselves.
698
S. Devona Algista et al.
Table 1 Table questionnaire result No Questions
Answers
1
Playing games, watching Netflix and call a friend
How do you deal with boredom during a pandemic like now?
Cooking or strolling around the town like once or twice a week Sleep I usually do things that I enjoy, like exercise, watch movies, and also hang out with my friends Take a nap/watch social media contents Watching movies, write more, talk with my family Sing, writing lyrics, and dance Watching series, read some novels Healing Catching up with my friends 2
When do you feel these things like excessive worry and the like interfere with your daily activity?
Almost everyday When I do things and suddenly random thoughts and anxiety strike in Midnight In the morning or sometimes at midnight When things get hectic and there’s a lot of pressure, sometimes excessive worry and anxiety would interfere with my daily activity When I have a lot of assignments in college and organization When a workload such as organization thing or assignment awaits, my anxiety will going up When I’m alone or overthinking At night before sleep Usually when I’m alone and thinking about the problems that I have
3
In your opinion, what impact will students Burnout have if mental health is impaired on The results of their work are not optimal performance in college? Both academically because the mental health issue and in an organizational setting? I can’t focus with my job (continued)
The Impact of Ruang Cerita Application on the Neuro Depression …
699
Table 1 (continued) No Questions
Answers Sometimes the workers/students need to have time to deal with their mental health, so the impact will student will have a lot time to take care or their mental health Really bad, because it can affect in surrounding environment Produce something unsatisfactory They will not working optimally Students or workers won’t be able to do their best in performing work if their well-being is not healthy. Even a person’s performance can be affected even by moods changes, therefore having mental health issues would most likely lower people’s performance Decreasing in score and understanding in course No motivation and feeling gloomy so makes the productivity low
4
How do your parents support you with your Actually, I don’t really like to talk about the personal problems? problems that I have, but usually my parents often ask questions and encourage me to stay motivated By listening to my problems and give an advice I deal it with my self I’m not kind of person who likes to tell such thing to my parents Hug and give some advices they support me both in material and materialistic They provide necessities for me to do self-care they always encourage me to do whatever i want Give the opinions or take me out to some vacation
5 Discussion From the results of the survey and research, we discuss whether the results obtained are in accordance with what is the formulation of the problem. The results of our discussion show that each of the results obtained are in accordance with our hypothesis that many people feel lonely and mental health is very important especially in this
700
S. Devona Algista et al.
COVID-19 pandemic situation. The survey results show 77.3% of respondents feel lonely in this pandemic situation. In addition, 50% of respondents said that mental health disorders can interfere with daily activities. In a previous study, it was said that only 12% of the 89 respondents said that mental health disorders could interfere with daily activities [23]. With these results it can be said that the COVID-19 pandemic can cause mental health problems and mental health is very important. Based on these problems, the RuangCerita application has the impact of making it easier for the public to consult professionals or psychiatrists regarding mental health which can help the community to reduce their mental health problems.
6 Conclusion The RuangCerita application is expected to be a solution for students to help overcome problems related to their mental health through the features available in this application. With the consultation feature, patients or in this context students, can share the problems or complaints they face with the right people so that they will receive the right treatment and avoid dangerous actions such as self-harm to attempt suicide. This is of course in line with the survey responses that we have conducted where more than 80% of respondents agree with the making of this RuangCerita application. In previous research, it can be concluded that a solution is needed to explore the role of local demographic and cultural factors in interpreting the introduction, causes, and models of seeking help related to mental disorders [23]. With this data, it is hoped that RuangCerita can be used as a model in seeking help related to mental disorders. In addition, as we can see that mental health problems, especially depression and anxiety can have an impact on performance in college both academically and organizationally. It is evident from the results of the questionnaire which shows that mental health problems can cause burnout, loss of focus, not working optimally, no motivation, etc. The COVID-19 pandemic requires us to do online classes, social distancing, and lockdown. This can affect the mental health of students because they cannot interact with friends face to face and feel lonely so that it has the potential to cause mental health problems. Therefore, RuangCerita is considered to be able to help students overcome their mental health problems. Acknowledgements The research is funded from International Research Grant with number No: 017/VR.RTT/III/2021 from Binus University, Jakarta Indonesia
References 1. J.S. Abramowitz, What is OCD? (2019). Retrieved from https://iocdf.org/about-ocd/ 2. I.A. Aisyah Putri Rahvy, Actual Challenges of Mental Health in Indonesia: Urgency, UHC, Humanity, and Government Commitment. (2020, September 22). Retrieved from https://ace
The Impact of Ruang Cerita Application on the Neuro Depression …
3.
4. 5. 6.
7. 8.
9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.
701
hap.org/2020/09/22/actual-challenges-of-mental-health-in-indonesia-urgency-uhc-humanityand-government-commitment/ R. Aryanto, Susanto, L.S. Stefenny, DAMPAK LOYALITAS DARI KEPUTUSAN KONSUMEN DISEBABKAN PELAYANAN DAN PROMOSI PADA USAHA GIMNASTIK (2009) M.H. ATLAS, Mental Health ATLAS 2017 Member State Profile (2017). Retrieved from https:// www.who.int/mental_health/evidence/atlas/profiles-2017/IDN.pdf P. Bhandari, What Is Quantitative Research? Definition, Uses and Methods (2021, December 8). Retrieved from https://www.scribbr.com/methodology/quantitative-research/ B.H. Channel, Types of mental health issues and illnesses (2020). Retrieved from https:// www.betterhealth.vic.gov.au/health/servicesandsupport/types-of-mental-health-issues-and-ill nesses iEduNote, Interview: Definition, Types of Interview (2021). Retrieved from https://www.ied unote.com/interview Kesehatan Mental di Indonesia: Kini dan Nanti. (2020, 11 10). Retrieved from https://buletin. jagaddhita.org/es/publications/276147/kesehatan-mental-di-indonesia-kini-dan-nanti#id-sec tion-content P. Kotler, Marketing Management, 11th edn. (Prentice Internasional Inc, New Jersey, 2003) P. Kotler, Marketing Management, 11th edn. (Pearson Education Inc, New Jersey, 2005) MedlinePlus, Anxiety (2021). Retrieved from https://medlineplus.gov/anxiety.html C.f. Prevention, Mental Health (2021, July 20). Retrieved from https://www.cdc.gov/mental health/index.htm QuestionPro, Questionnaires: The ultimate guide, samples & examples (2020). Retrieved from https://www.questionpro.com/blog/what-is-a-questionnaire/ Riduwan, Belajar Mudah Penelitian Untuk Guru, Karyawan dan Peneliti Pemula. (Alfabeta, Bandung, 2005) W.J. Santrock, Life-Span Development, 5 E (Times Mirror International Publisher Ltd, Brown and Benchmark, 1995) Sugiyono, Metode penelitian pendidikan : (pendekatan kuantitatif, kualitatif dan R & D) (Bandung, 2010), pp. 447–450 F. Torres, What Is Depression? (2020, October). Retrieved from https://www.psychiatry.org/ patients-families/depression/what-is-depression Y.F. Wardhani, A. Paramita, PELAYANAN KESEHATAN MENTAL DALAM HUBUNGANNYA DENGAN (2015) A. Idris, Overcoming stigma and getting social support: what can we learn from people with lived experience of MDD? (2016) D. Suhartanto, Metode Riset Pemasaran (Alfabeta, Bandung , 2014) MayoClinic, Mental illness (2020). Retrieved from https://www.mayoclinic.org/diseases-con ditions/mental-illness/symptoms-causes/syc-20374968 M.P. Mutiara, Quantitative Research (2021, September 03). Retrieved from https://sis.binus. ac.id/2021/09/03/quantitative-research/ A. Novianty, Literacy of mental health: knowledge and public perception of mental disorders. 9(2) (2017)
JobSeek Mobile Application: Helps Reduce Unemployment on the Agribusiness Sectors During New Normal Tsabit Danendra Fatah, William Widjaya, Hendik Darmawan, Muhammad Haekal Rachman, Ford Lumban Gaol, Tokuro Matsuo, and Natalia Filimonova Abstract Due to the increasing cases of people getting laid off or getting fired during New Normal era in Indonesia causes several people in several employment sector to be unemployed. Some of them, such as parents of university students getting unemployed and experiencing family economic difficulties, cause students to look for work to help stabilize their family economy and help them with college tuition fees. According to data from the Kemnaker (Ministry of Employment), 1,058,284 formal sector workers were laid off as of May 27, 2020, whereas 380,221 workers were laid off by formal sector workers. Meanwhile, 318,959 jobs in the informal sector were impacted. There were also 34,179 potential migrant workers who were not dispatched and 465 interns who were returned. A total of 1,792,108 employees have been impacted by Covid-19 epidemic. Based on the data taken from Kemnaker as of 27th May 2020, the number of laid off in Indonesia has continued to decline since 2014. In 2018, the number of layoffs could be reduced to 3,400 workers, or down 95.67% from 2014. However, in 2019, the figure rose again to 45,000. In the T. D. Fatah (B) · W. Widjaya · H. Darmawan · M. H. Rachman School of Information System, Bina Nusantara University, Alam Sutera, Indonesia e-mail: [email protected] W. Widjaya e-mail: [email protected] H. Darmawan e-mail: [email protected] M. H. Rachman e-mail: [email protected] F. L. Gaol Binus Graduate Program—Doctor of Computer Science, Jakarta, Indonesia e-mail: [email protected] T. Matsuo Advanced Institute of Industrial Technology, Tokyo, Japan e-mail: [email protected] N. Filimonova Vladimir Branch of Russian Academy of National Economy and Public Administration, Vladimir, Russia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_58
703
704
T. D. Fatah et al.
midst of the pandemic, the number of laid off has increased again. According to a number of statements from the Minister of Manpower Ida Fauziyah quoted by several media, the number of layoffs until August 2020 reached 3.6 million people. Based on data from the Central Statistics Agency on February 26, 2021, almost all areas in Jakarta have more job seekers than job vacancies. The goal of this study was to see how the pandemic affected the difficulties of getting a job. And how many additional things come into play when it comes to finding work? This study will be carried out with qualitative methodologies. Primary sources were used to compile this information. The study will employ a descriptive approach. The data was collected via a google form survey that included questions regarding the difficulties of getting work. The study’s findings show that the current pandemic seems to have had a substantial impact mostly on difficulty in obtaining work, and that the job-finding app is really quite valuable for people seeking for work during the pandemic.
1 Introduction A job is the most significant item to satisfy the necessities of human life in our modern day [1]. Many people have trouble obtaining work, and the information supplied by the person in question is frequently disappointing since the information is delayed and does not fulfil many of the essential standards. Temporary labor limits workers essential employment rights and benefits while allowing businesses to easily fire them whenever they want to [2]. Most of the time, seeking employment and recruiting workers is still done by hand. As a result, job searchers and employers who need workers face several challenges in the job search, job search, and recruiting processes. According to BPS (Badan Pusat Statistik) data, the number of jobseekers greatly outnumber the number of available positions in each of Jakarta’s major cities [3]. The amount of people laid off has skyrocketed again in the midst of the pandemic. According to numerous news reports reporting Minister of Manpower Ida Fauziyah, the amount of employees laid off through August 2020 reached 3.6 million [4]. Changes in cultural behavior and social alienation, as well as a downturn in economic development, have all contributed to the considerable increase in unemployment [5]. People have been forced to remain at home to reduce transmitting the virus, which causes the economy to collapse as people waste their money less than normal, culminating in recession across several countries [6]. Urban labor markets in developing nations are characterized by high unemployment and underemployment, as well as significant job inadequacy [7]. The emergence of the Covid-19 pandemic has changed the game in so many ways [8]. The Pandemic cause problems in a wide range of businesses. Covid-19’s pandemic has had a profound influence on labor and employment [9]. Governments have enacted social distancing measures to halt the spread of the illness, resulting in the closure of enterprises and the layoff of people in non-essential vocations and
JobSeek Mobile Application: Helps Reduce Unemployment …
705
sectors owing to the disease’s low demand [10]. Not only does it influence the health sector, but it also has an economic impact [11]. Many commercial disciplines have been hampered in their development, even during the epidemic. These business areas, according to Covid-19, include general transportation, tourism, hospitality, shopping center, and offline trade that exclusively focuses on direct client visits [12]. Numerous business owners lay off some of their employees in order to stay in business. This layoff results in unemployment, which makes it much more difficult to obtain a job [13]. The pandemic condition makes it difficult for jobseekers to find available positions [14]. Many jobseekers are having trouble getting information about vacant positions. In the post-pandemic period, technological and communication innovation is frequently dismissed [15]. Despite this, the growth in unemployment figures highlights the importance of having knowledge in the job recruiting process [16]. As a result, in this epidemic era, where job searchers seek every piece of information about job recruiting, the job searching application becomes critical. We’d like to know how the epidemic has affected the difficulty of finding work because of these challenges. We’ll also look at how crucial other factors are in getting a job. The purpose of this research is to determine the causes for the difficulties in finding employment during the epidemic, the number of persons seeking for work and the number of available positions, as well as the level of competition and possibilities to find work during the pandemic. The study is timely since many applicants are presently looking for information regarding job recruiting during the epidemic. Several Research questions related to the Research: 1. 2. 3. 4.
Does the pandemic have an effect on the difficulty of finding a job? What are the most desired jobs during a pandemic? Does a job seeking application helps applicants to find a job during a pandemic? Are job finding applications widely used?
1.1 Mobile Application A mobile application is a program or piece of software that runs on a mobile device, such as a phone, tablet, or other mobile device, and performs certain tasks for the user. The mobile app can be easily downloaded from app stores or play store which is the type of digital distribution platforms. The mobile app is simple, easy to use, inexpensive, downloadable, and compatible with most mobile phones, including budget and entry-level phones [17].
706
T. D. Fatah et al.
1.2 Job Finding Application Job finding application is an application installed on mobile phones that allows the user to create personal profiles and engage other users to search for a job or recruit a potential employee [18]. Job finding apps allow jobseekers to apply for a job through the app, rather than applying through websites or a job search. The job finding apps also can help the recruiter to find the potential candidates more easily. The job finding application is very useful and makes it easier for jobseekers to get information about job vacancies both inside and outside the country. By just using the smartphone you have, you can find a job online. Not only information about job vacancies, but you can also get information about internships, entrepreneurship and other interesting information related to scholarships. By displaying the information of the company that needs labor and the positions sought, you can choose the type of job whose qualifications match your skills [19]. The effects of the epidemic on several nations throughout the world included people who were forced to abandon their employment owing to the situation’s demands. For people impacted by the Corona pandemic, who lose their jobs, you don’t have to worry. The job finding apps may be the solution for you. This technology allows you to easily find a job [19]. Job finding applications have advantages that benefit both job seekers and recruiters. The benefits of job finding applications include: • Mobile recruiting Jobseekers can apply for new jobs anytime. Users can get their desired jobs with just a few touches. It’sincredibly practical. Recruiters, on the other hand, can search for applicants and analyze applications at any time [20]. • Smart notifications Smart notifications can notify jobseekers when they receive an offer or recruiters when a new candidate applies for a position. As a result, recruiters can respond more quickly and solve difficulties anytime. • Access to information Information accessibility Recruiters and job seekers gain from having access to information 24 hours a day, seven days a week. Job seekers can look for and apply for their ideal position from anywhere and at any time. Recruiters can also add new job openings, review candidates, and send feedback as soon as possible. • Improve applicant’s precision and accuracy On conventional paper-based forms, it’s easy to make inadvertent mistakes and leave concerns ignored, which might result in the candidates failing to receive the job. On the other hand, online application survey forms may be designed to alert applicants to any fields they need to complete before submitting their application. As a consequence, candidates’ overall accuracy improves, increasing their chances of progressing to the interview round.
JobSeek Mobile Application: Helps Reduce Unemployment …
707
• Convenient Online job application survey forms can be filled from any location and from any device, making the digital application far more convenient for the applicants. This will boost the number of applicants you get for each position. While comes with many benefits, yet job finding applications also have some disadvantages. Those disadvantages include: • Lack of personal interaction For the aim of job networking, applying for jobs online does not provide the same level of engagement as meeting people one-on-one at a job fair. • Considerations Recruiters must make sure the applied company is a credible and legitimate company. • Lower pay Jobs advertised on the internet or through applications often pay less. Entry level jobs advertised on the internet or through an application pay less than if they were not advertised on the internet or through an application [21].
1.3 Unemployment The definition of unemployment is the term for someone who is categorized as workers, that is currently unemployed and seeking for a job, yet unable to find a job [22]. What about the employees in Indonesia, which is in the midst of a major crisis? Is unemployment becoming more frequent or, on the contrary, becoming less frequent [23]? Numerous businesses in Indonesia have discontinued operations as a result of the present prevalence of the Corona virus, resulting in the loss of many jobs. One example is the Covid-19 epidemic, which occurred in Jakarta and caused the dismissal of several workers [23]. According to sources, over 11,000 enterprises are experiencing difficulties, and 10,000 individuals have been instructed to work from home without getting paid, while over 16,000 people have had their employment terminated [23]. In accordance with the possibility for sluggish economic development in 2020, the Finance Ministry forecast that Indonesia’s jobless rate will rise dramatically. Economic sluggishness caused by the corona Covid-19 outbreak is expected to result in the unemployment of 5 million people, according to the government’s worst-case scenario [23].
708
T. D. Fatah et al.
1.4 Overcome Unemployment Covid-19 is the world’s most serious health catastrophe in a century, and it has the potential to be one of the most significant job destroyers in human history. When individuals are deprived of their jobs, they lose not only their money but also their dignity, significance, and hope, which matters significantly [24]. The epidemic, according to the International Labor Organization, may cut world working time by almost 7% in 2020’s second half, resulting in the loss of 195 million full-time jobs [24]. Many jobless people will fall through the gaps, even in countries where laid-off employees are covered by unemployment insurance or compensation subsidies, it will affect the social-psychological part. In fact, those who can barely afford life expenses, such as low salary workers and small businesses, are likely to be significantly affected by job losses [24]. This situation also occurred in Indonesia, where several companies laid off workers due to the falling value of the Rupiah and the scarcity of raw supplies. During the pandemic, total unemployment in Indonesia is expected to grow by about 200%. The government already provides assistance to the general public. However, because it wasn’t on target, it rendered some people defenseless. Regional policy can be utilized to remedy this employment challenge. Because Indonesia is such a huge country, each region has its own distinct traits [6]. During the Covid-19 outbreak in Indonesia, the Indonesian government adopted anti-unemployment efforts, including changes to the Pre-Work Card to make it more useful/functional for employees who had their jobs terminated or had recently finished their studies. To widen the breadth of the government’s position, the budget was increased to Rp20 trillion. Furthermore, the overall aid received was IDR 3.55 million, which included IDR 600,000 every month for four months for a total equal to IDR 2.4 million, IDR 1 million as a training cost incentive, and IDR 50,000 for survey expenses. The training, which had previously required onsite interaction, was conducted and completed entirely online [23]. Increased financing for pre-work cards is expected to increase the number of persons getting assistance from 2 to 5.6 million [23]. Considering that Pre-Work Card was similarly impacted by the Coronavirus epidemic, the government had great expectations that this card, which was distributed earlier in April 2020, would bring convenience and solutions through worker cards [23]. It is not easy to solve the problem of unemployment. Furthermore, Indonesia is the fourth-largest country in the world. But that doesn’t deny the possibility of resolving the issue. There is a solution to this problem that can be utilized [25]: • Increasing the quality of growth through a hands-on approach. • Capital accumulation is accelerated as a result of banking policy and capital markets. • Improving the quality of monetary policy so that it is more focused on the inflation objectives.
JobSeek Mobile Application: Helps Reduce Unemployment …
• • • •
709
Increase the total productivity factor. Provide specific incentives to industries that require a lot of labor. Strengthening of vocational education. Unemployment reduction scheme in the region.
2 Research Technique We use an online survey created with Google Forms that will be distributed to random responders who will be our friends from campus, home, and the neighborhood. This survey’s target respondents are 15 people. There were 17 people who responded. According to 15 of the respondents, Covid-19 increases the difficulty for job seekers to look for work. Many people also agree that job search applications make it easier for job seekers to find work. In this survey, we also discovered that 9 out of 17 respondents have never used job search applications, which is about half of the total respondents (Flowchart 1).
3 Research Method The research will be carried out in a qualitative manner. The information was gathered from primary sources. The descriptive method will be used in the study. The data was gathered through a survey with a Google form that included questions about job search difficulties. This study focuses on students who graduated during the Covid19 Pandemic. Several considerations and obstacles should be kept in mind, including the lack of data gathered by using Google forms because the forms are not distributed sufficiently. However, this issue can be mitigated by redistributing the Google form when there are insufficient respondents. In calculating the average answer, mean formula will be used. Mean is used to find the average value of a group of numbers. Mean (Average) X=
X ÷n
n= total sample X = total answer The questions which are contained in the Google Form questionnaire includes: 1. 2. 3. 4. 5.
Age City/Province location Current education status Do you think Covid makes finding job harder? What type of job are you currently looking for?
710
T. D. Fatah et al.
Flowchart 1 Research technique
6. 7.
Have you ever tried a job finding application? (LinkedIn, Kalibrr, etc.) Do you agree that job finding applications make it easier to find a job for job seekers?
4 Research Result From our survey results, we learned that most of our respondents are at the age of 20 and are living in Jakarta and are currently attending college. Most of them agree
JobSeek Mobile Application: Helps Reduce Unemployment …
711
that the current pandemic affects them and makes finding a job a lot harder. Most of our respondents are interested in looking for a job that is affiliated in design such as photography, video editor, Graphic design, etc. Most of them never tried using a job finding application before which makes job finding a lot harder and time consuming, therefore they agreed to the fact that using a job finding application makes job finding a lot easier. Based on Fig. 1, most of our respondents were at the age of 20 with the percentage of 41.2% and the least amount of respondent that responds to the Google form were at the age of 17 with the percentage of 5.9%. While the rest of our respondents included the age of 18 with the percentage of 11.8%, the age of 19 with the percentage of 29.4%, and lastly the age of 21 with the percentage of 11.8%. From these results we know that people who are at the age of 20 tend to start looking for a job either full-time or part-time. Based on Fig. 2, most of our respondents which is about 64.7% are located in Jakarta. It means that the majority are located in the capital city. Based on Fig. 3, 70.6% of our respondents are currently studying for bachelor’s degree, followed by High school student with percentage of 17.6%. However, we don’t have any respondent from S3. From Fig. 4, about 88.2% of the respondents agree that Covid makes finding job harder. While on the other hand, 11.8% of the respondents disagree. From this result we know that Covid-19 pandemic affects the difficulty in finding a job. Based on Fig. 5, the most wanted job is design job (Photographer, Video Editor, Graphic Designer, etc.) which answered by 41.2% respondents. While the second most wanted job is Barista which answered by 11.8% respondents. While the other answers are variously distributed. Based on Fig. 6, about 52.9% of the respondents never tried using a job finding application when they try to apply or looking for a job. While on the other hand, 47.1% respondents have already tried job finding application before when they try to apply or looking for a job.
Fig. 1 Respondent age
712
T. D. Fatah et al.
Fig. 2 Does Covid make finding a job harder? Response
Fig. 3 Respondent current education status
Based on Fig. 7, about most of the respondents which is 88.2% agree that job finding application makes it easier to find a job for job seekers. From the result, we know that job finding applications are effective to help the job seekers in finding a job (Table 1).
JobSeek Mobile Application: Helps Reduce Unemployment …
713
Fig. 4 Does job finding app make it easier? Response
Fig. 5 Most wanted job response
5 Discussion From the results of the survey and research, we discuss whether the results obtained are in accordance with what has become of the problem formulation. The results of our discussion based on the survey we conducted show that each of the results obtained is in accordance with our hypothesis that job-seeking mobile applications are helpful to reduce unemployment. The result of the study also showed that the respondents’ answers with “agree” were above 50% on all of the 3 questions. As we can see from the results of the survey, most of our respondents agree on 3 statements. First, they agree that during the pandemic it is a lot harder to find and apply for jobs because most of the available jobs that used to be open before the pandemic decided to close up shop for the purpose of preventing the disease’s spread, resulting in them being on hiatus or running out of business and therefore not accepting any
714
T. D. Fatah et al.
Fig. 6 Ever tried finding job application? Response
Fig. 7 Does job finding application help? Response Table 1 Percentage of the result from the questionnaire No Questions
Agree (%) Disagree (%)
1
Do you think Covid makes finding a job harder?
88.2
11.8
2
Have you ever tried a job finding application?
52.9
47.1
3
Do you agree that job finding applications make it easier to find 88.2 a job for job seekers?
11.8
JobSeek Mobile Application: Helps Reduce Unemployment …
715
new employees at the moment. Second, they agree that they have tried using a job finding application before to look for a job they want to apply for, either through a mobile application or through an online website that specializes in job finding, but some of them successfully find the job they desire, and for some, they don’t. Third, they agree that using a job-finding application makes it a lot easier to look for and apply for their desired job even during the pandemic. With this result, it can be said that job finding applications greatly affected unemployment during the Covid-19 pandemic. This differs from the previous study that was mostly done by researchers regarding the number of job seeker applications before the pandemic. Because of the pandemic, many people were still looking for work by visiting offices, but because of the pandemic, they had to look for work online and use job seeker applications. We also found that there are several factors that can impact the ads engagement, such as authentic pictures [26]. Moreover, the numbers of jobs ads tend to change based on the situation of Covid severity in the areas [27]. The results of our discussion also found that the Covid-19 pandemic still makes finding a job more difficult until this research is finished. Thus, the presence of jobfinding applications is crucial in this pandemic era, as job seekers need as many job vacancies as possible.
6 Conclusion Economic growth and the unemployment rate are important indicators of a country’s development success [22]. A country’s economic growth can be demonstrated by a large number of job openings. However, with so many job openings, it is not always a good economic activity. Nonetheless, job openings are an early indicator of economic activity [28]. In this modern era, humans can achieve their needs and desires by working a job, which is why a job is one of the most important aspects of meeting the needs of human life. Many people have difficulty finding jobs, and the information provided by the person in question is frequently disappointing because the delay in information does not have to carry many of the required requirements. However, governments and organizations should begin to consider how to deal with the immediate and long-term consequences of the Covid-19 economic crisis, particularly in the area of unemployment [29]. According to the research questions, it can be concluded that the results of this research are: 1. 2. 3. 4.
Pandemic has an effect on the difficulty to finding a job Photographer, Video Editor, Graphic Designer, etc. Are the most desired jobs during a pandemic Job seeking application does helps applicants to find a job during a pandemic Job finding applications has not been widely used
716
T. D. Fatah et al.
Acknowledgements This research is funded by the University of Bina Nusantara under International Grant with number No: 017/VR.RTT/III/2021.
References 1. F.S. Nugroho, Aplikasi Pencarian Lowongan Pekerjaan Berbasis Android, Perpustakaan Pusat Unikom, Unikom.ac.id. (2014). https://elib.unikom.ac.id/gdl.php?mod=browse&op= read&id=jbptunikompp-gdl-franssayut-34681 2. E. Littleton, J. Stanford, An avoidable catastrophe: pandemic job losses in higher education and their consequences (2021). https://apo.org.au/sites/default/files/resource-files/2021-09/apo-nid 314011.pdf 3. Jumlah Lowongan Kerja di Jakarta Lebih Sedikit Dari pelamarnya (2021, March 6). Retrieved 12 Jan 2022, from https://databoks.katadata.co.id/datapublish/2021/03/06/jumlah-lowongankerja-di-jakarta-lebih-sedikit-dari-pelamarnya 4. Memasuki New Normal, Ibu Ida Minta Perusahaan Rekrut Lagi Pekerja Yang Ter-PHK: Berita: Kementerian Ketenagakerjaan RI (n.d.). Retrieved 12 Jan 2022, from https://kemnaker.go. id/news/detail/memasuki-new-normal-ibu-ida-minta-perusahaan-rekrut-lagi-pekerja-yangter-phk 5. A. Jalil, Meningkatnya Angka Pengangguran Ditengah Pandemi (Covid-19). Al-Mizan: Jurnal Ekonomi Syariah 3(1), 45–60 (2020). http://www.ejournal.an-nadwah.ac.id/index.php/alm izan/article/view/142 6. Y.A. Rahman, How to solve unemployment problem after pandemic. Padjajaran University (2020, October 23). https://www.researchgate.net/publication/344842950_How_to_Solve_ Unemployment_Problem_after_Pandemic 7. S. Godlonton, Employment risk and job-seeker performance. IFPRI Discussion Paper 01332, International Food Policy Research Institute (IFPRI) (2014). https://papers.ssrn.com/sol3/pap ers.cfm?abstract_id=2414037 8. D.L. Blustein, P.A. Guarino, Work and unemployment in the time of COVID-19: the existential experience of loss and fear. J. Humanist. Psychol. 60(5), 702–709 (2020). https://doi.org/10. 1177/0022167820934229 9. A. Hodder, New technology, work and employment in the era of COVID 19: reflecting on legacies of research. N. Technol. Work Employ. 35(3), 262–275 (2020). https://doi.org/10. 1111/ntwe.12173 10. K.A. Couch, R.W. Fairlie, H. Xu, Early evidence of the impacts of COVID-19 on minority unemployment. J. Public Econ. 192, 104287 (2020). https://doi.org/10.1016/j.jpubeco.2020. 104287 11. M. Anwar, Dilema PHK dan Potong Gaji Pekerja Di Tengah Covid-19. Adalah 4(1), 173–178 (2020). http://journal.uinjkt.ac.id/index.php/adalah/article/view/15752/7347 12. T. Taufik, E.A. Ayuningtyas, Dampak Pandemi Covid 19 Terhadap Bisnis Dan Eksistensi Platform Online. Jurnal Pengembangan Wiraswasta 22(01), 21 (2020). https://doi.org/10.33370/ jpw.v22i01.389 13. Y. Sumarni, Pandemi Covid-19: Tantangan Ekonomi dan Bisnis. Al-Intaj: Jurnal Ekonomi Dan Perbankan Syariah 6(2), 46–58 (2020). https://ejournal.iainbengkulu.ac.id/index.php/Al-Intaj/ article/view/3358 14. I.N. Juaningsih, Analisis Kebijakan PHK Bagi Para Pekerja Pada Masa Pandemi Covid-19 di Indonesia. Adalah 4(1), 189–196 (2020). http://journal.uinjkt.ac.id/index.php/adalah/article/ view/15764/7350 15. R. Komalasari, Manfaat Teknologi Informasi dan Komunikasi di Masa Pandemi Covid 19. Tematik 7(1), 38–50 (2020). https://doi.org/10.38204/tematik.v7i1.369
JobSeek Mobile Application: Helps Reduce Unemployment …
717
16. A. Kramer, K.Z. Kramer, The potential impact of the Covid 19 pandemic on occupational status, work from home, and occupational mobility. J. Vocat. Behav. 119, 103442 (2020). https://doi. org/10.1016/j.jvb.2020.103442 17. R. Islam, R. Islam, T. Mazumder, Mobile application and its global impact. Int. J. Eng. Technol. (IJEST) 10(6), 72–78 (2010). https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.657. 9773&rep=rep1&type=pdf 18. M.M. Plummer, Job seeking and job application in social networking sites: predicting job seekers’ behavioral intentions. Digital Commons @ NJIT (2018). https://digitalcommons.njit. edu/dissertations/195/ 19. E. Latifah, (2020, June 13). Aplikasi Pencari Kerja Terbaik, Melamar Pekerjaan Secara Online—Harapan Rakyat Online. Harapan Rakyat Online. https://www.harapanrakyat.com/ 2020/06/aplikasi-pencari-kerja-terbaik-melamar-pekerjaan-secara-online/ 20. S.J. Niklas, B.Ö.H.M. Stephan, Applying mobile technologies for personnel recruiting–an analysis of user-sided acceptance factors. Int. J. eBus. eGov. Stud. 3(1), 169–178 (2011). https://dergipark.org.tr/en/pub/ijebeg/issue/26200/275874 21. T.T. English, (2019, September 3). Disadvantages of online job applications. Career Trend, Retrieved 31 Jan 2022, from https://careertrend.com/facts-5709745-disadvantages-online-jobapplications.html 22. S. Indayani, B. Hartono, Analisis Pengangguran dan Pertumbuhan Ekonomi sebagai Akibat Pandemi Covid-19. Jurnal Perspektif 18(2), 201–208 (2020). https://ejournal.bsi.ac.id/ejurnal/ index.php/perspektif/article/view/8581/4408 23. E. Rajagukguk, Covid 19 Increases Unemployment (Fakultas Hukum, Universitas Indonesia, (2021, September 28). https://law.ui.ac.id/v3/covid-19-increases-unemployment-by-proferman-rajagukguk/ 24. D. Fine, J. Klier, D. Mahajan, N. Raabe, J. Schubert, N. Singh, S. Ungur, How to Rebuild and Reimagine Jobs Amid the Coronavirus Crisis. (McKinsey & Company, 2020, April 15). https://www.mckinsey.com/industries/public-and-social-sector/our-insights/ how-to-rebuild-and-reimagine-jobs-amid-the-coronavirus-crisis 25. M. Soekarni, I. Sugema, P.R. Widodo, Strategy on reducing unemployment persistence: a micro analysis in Indonesia. Buletin Ekonomi Moneter Dan Perbankan 12(2), 151–192 (2010). https:// doi.org/10.21098/bemp.v12i2.370 26. L. Fedor, The effect of the pandemic on engagement levels of LinkedIn posts. (2021). https:// digitalcommons.bryant.edu/cgi/viewcontent.cgi?article=1032&context=honors_marketing 27. R. Arthur, Studying the UK job market during the COVID-19 crisis with online job ads. PloS One 16(5), e0251431 (2021). https://journals.plos.org/plosone/article?id=10.1371/jou rnal.pone.0251431 28. M. Dias, A.N. Keiller, F.P. Vinay, X. Xu, Job vacancies during the Covid 19 pandemic. (n.d.). https://ifs.org.uk/uploads/IFS%20BN%20-%20job%20vacancies%20and%20Covid% 2019.pdf 29. D.L. Blustein, R. Duffy, J.A. Ferreira, V. Cohen-Scali, R.G. Cinamon, B.A. Allan, Unemployment in the time of COVID-19: a research agenda. J. Vocat. Behav. 119, 103436 (2020). https:// doi.org/10.1016/j.jvb.2020.103436
The Effect of Learning Using Videos on Online Learning of Private University During the Covid-19 Pandemic Angelia Cristine Jiantono, Alif Fauqi Raihandhika, Hadi Handoyo, Ilwa Maulida Anam, Ford Lumban Gaol, Tokuro Matsuo, and Fonny Hutagalung Abstract In Indonesia, the number of cases of Covid-19 has increased over time. Therefore, several sectors such as the business sector, agriculture, plantations and others have to adapt to Covid-19. Even in the education sector, there are several regulations that students and students must comply with including keeping a distance, wearing masks, diligently washing hands, and others. In the education sector, students and students also conduct online learning. There are several ways to do this, such as through video learning. Video learning is a learning method by conveying knowledge or theory in the form of video learning. Many private university students felt that the video learning effects them in the negative way. Therefore, we conduct this study to determine the effect of video learning on students of private university in learning during the covid-19 pandemic. This research method is quantitative research. The population in this study were students of private university majoring in information systems. While the sample taken is 16 students. The technique that we use is an online survey using google form, this survey is distributed to random respondents A. C. Jiantono (B) · A. F. Raihandhika · H. Handoyo · I. M. Anam School of Information System, Bina Nusantara University, Alam Sutera, Indonesia e-mail: [email protected] A. F. Raihandhika e-mail: [email protected] H. Handoyo e-mail: [email protected] I. M. Anam e-mail: [email protected] F. L. Gaol Binus Graduate Program—Doctor of Computer Science, Jakarta, Indonesia e-mail: [email protected] T. Matsuo Advanced Institute of Industrial Technology, Tokyo, Japan e-mail: [email protected] F. Hutagalung University of Malaya, Kuala Lumpur, Malaysia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_60
719
720
A. C. Jiantono et al.
consisting of our friends and strangers from private university. According to the respondents from our survey, the problems that can arise during video learning are difficulty focusing on learning activities that are being carried out, the internet often lags, it is difficult to ask questions when there is material that is not understood, discussion activities with friends become more difficult, etc.
1 Introduction Video learning is a learning method by conveying knowledge or theory in the form of video learning. Video learning aims to convey information and learning that can be presented in an interesting and even entertaining way. It can be said that video learning can be an alternative method in carrying out training that pleases participants [1]. In Indonesia, learning with video learning has been applied since the covid-19 virus spread to several regions in Indonesia. As a precautionary measure, the government has also implemented several regulations, such as social distancing or restrictions on social interaction between communities. This policy also causes learning that starts face-to-face, now students and students have to do online learning, especially with video learning. This policy was issued by the Ministry of Education and Culture (Kemendikbud) as a form of preventing the transmission and spread of COVID-19 which is getting wider. Video in learning must have several elements such as visuals and audio. Visuals used in providing information sources that are easily accessible and in harmony with the audio used in describing the information. There are several advantages in video learning, including efficient and easy learning, easy, difficult, and effective in supporting active learning. Apart from the advantages of online learning, there are several shortcomings in video learning including the cost of making video learning is not cheap, cannot display detailed objects, and not all internet facilities are adequate. In addition, through learning videos, students also cannot interact effectively and actively with lecturers. Lack of interaction can reduce the value in the teaching and learning process. As a result, students are not interested in participating in learning video lessons.
1.1 Video Learning Video learning is a learning method that uses recorded videos to help the learning process [2]. There are several reasons why video learning is a learning method that is considered suitable for learning in a pandemic situation, including: creating a fun learning environment and making participants more interested in learning, making it easier for students to learn and understand learning materials, and making participants
The Effect of Learning Using Videos on Online Learning …
721
interested in making learning videos in accordance with the learning materials they convey (Fig. 1). The advantage of video learning is that there are visuals that can attract participants to [3] concentrate on the material presented from images, sounds, and others. In addition, participants can also watch the learning video anytime and anywhere. Teaching and learning activities through video learning are able to attract or stimulate our senses so that they can make us quickly understand the material. With pictures and movement, it can make the material more interesting than text. The disadvantage of video learning is that teaching is carried out in one direction (Fig. 2).
Fig. 1 Video learning
Fig. 2 Video learning. Source blog.commlabindia.com/elearning-design
722
A. C. Jiantono et al.
1.2 Interest in Studying Interest is a condition in which a person has an interest in something and a desire to learn or know something. According to M. Buchori ([4]: 35), the definition of interest is the awareness of a person, that an object, person, problem, or situation has something to do with him. Interest must be seen as a conscious response, otherwise it is meaningless. According to Sardiman AM ([5]: 76), interest is a desire or a need of the person concerned. According to Cony Semiawan ([6]: 48), the definition of interest is a mental state that produces a directed response to a particular situation or object that gives satisfaction to a person [7]. Learning is a process or activity that occurs because of an active interaction between an individual and his environment that produces changes that are relative in terms of cognitive, psychomotor, and affective aspects. According to Slavin, learning is a process of acquiring abilities that come from one’s experience. Based on the understanding of the experts, we can argue that interest in learning is an interest in a lesson which then encourages the individual to study and pursue learning. Some experts have the opinion that the way to generate interest is to use existing interests. It was also stated by Tanner (Slameto: 138), that students should also try to form new interests by providing information to students about the relationship between a lesson that will be given with the previous lesson material [3]. There are several factors that can generate interest in learning, including lessons that will attract students if there is a relationship with real life, assistance from teachers or lecturers, and opportunities given by lecturers to play an active role. During this pandemic, students’ learning interest is only limited to online. That way, students and lecturers carry out several learning methods that are considered to increase student interest and effectiveness in learning.
1.3 Covid-19 Covid-19 or coronavirus is a disease caused by the corona virus and causes the main symptoms caused by the corona virus and can cause the main symptoms in the form of respiratory problems [8]. This disease originally appeared in late 2019, Wuhan, China. Coronavirus is a single stranded RNA virus that belongs to the Coronaviridae group. Coronavirus is a virus with a new variant that has never been identified. The common symptoms include high fever, dry cough, runny nose, shortness of breath, and sore throat. Symptoms of the corona virus can get worse and faster and can cause death. Symptoms of the virus can appear two days to 14 days after exposure to the virus [8].
The Effect of Learning Using Videos on Online Learning …
723
1.4 Online Learning Online learning is learning that is done from home or does not come directly to the place. This learning is carried out from various existing platforms such as Google Class, Zoom, Microsoft Teams, and there are many other platforms that can be used in online learning. Online learning is currently on the rise, because this learning is applied to all circles at this time in the covid-19 pandemic. In a pandemic like this, online learning is no longer a taboo subject for students and teachers [9]. Teaching is done online and using the internet network which requires good access in order to be maximal in participating in courageous learning. Bold learning must be done using technology that is currently developing very rapidly. In online learning, students “participate” in class by visiting the class website. They work on assignments according to the class schedule. Students communicate with teachers and classmates using email and online discussion forums. In face-toface learning, you often learn by listening, reading, writing, and doing other activities designed by your teacher. Online learning is different because you are not in the same place as other teachers or students. In fact, you may never meet the teacher or other students. Online learning has positive and negative impacts for those who run it, the positive impact for students who take online learning is that they can be more responsible and respect time. Because students have to manage their schedules independently and if we are not disciplined as a result, students will be left behind in online lectures [10]. In addition to the positive impact, online learning also has a negative impact, namely often students who are not serious in learning, even students can learn while playing games and many students do not understand the material presented in online learning.
1.5 Technology In this age of technological development, many people have used technology in their daily activities. Various sectors have also used technology to assist in their activities. In general, the concept of technology is the science that studies skills in creating tools for processing methods to help complete various human jobs. Experts also have their own understanding of technology. Elul [11] states that technology is a whole group of methods that lead rationally and have the characteristics of efficiency in every sphere of human activity. Technology brings a lot of benefits, the benefits of the technology examples are: 1. 2. 3. 4. 5.
Help and facilitate human activities. Can lighten a very heavy job. Can increase employment opportunities. Can be easily operated. Can be used by various groups.
724
A. C. Jiantono et al.
There are different types of technology, namely: 1. 2.
3. 4. 5. 6.
Information technology, which is a technology that can help humans to convey information to others quickly and effectively. Communication technology, which is a technology that can help humans communicate with each other, where they send information to each other using a certain device. Educational technology, namely technology related to the world of education, where the activities utilize certain tools. Transportation technology, which is a technology that helps humans to move from one location to another in a fast time. Medical technology, which is a technology related to medical science, where medical activities already utilize medical technology. Construction Technology, which is a technology related to building structures.
We are presenting this research paper with the aim to find The Effect of Learning Videos on private university student’s Interest in Studying during the Covid-19 Pandemic. Our purpose of presenting the research paper titled “The Effect of Learning Videos on private university Students’ Interest in Studying during the Covid-19 Pandemic”. Therefore, we formulates the problem that may be within video learning methods implementation: 1. 2. 3. 4.
How video learning affects on what you learn? Is learning using video learning effective? If learning using video learning is not effective, please provide your reasons How to increase the effectiveness of learning using video learning
2 Research Technique The technique that we use is an online survey using google form, this survey is distributed to random responders consisting of private university students majoring in Information System that use video learning as their learning method. Figure 3 is flowchart of the research process we conducted. The technique that we use is an online survey using google form, this survey is distributed to random responders consisting of our friends and strangers from private university. After we conducted the survey, in this survey we targeted 15 people. While those who responded were 17 people. Of the 17 respondents, every single one of them have studied in this pandemic situation and learn with video learning methods. In this survey we also learned that all respondents were learning with video learning ineffectively. Moreover, in this survey we also asked, “Does learning with video learning method is effective enough?”. The responses shown that most of them said “NO”. Because a lot of students felt ineffective by learning with video learning methods, we need a way to deal with how to make video learning methods become effective again. We also searched for various sources from the internet and asked for the
The Effect of Learning Using Videos on Online Learning …
725
Fig. 3 Flowchart research technique
respondent opinion through the form that we sent, and we found that giving some questions about the given video learning is the best way to improve this learning method to become effective again. In this way video learning can be followed and to the students more. After finding a research method that can be used, we also looked for data about the results of the study using the quantitative method. Based on data obtained from 16 respondents. Furthermore, each question can be explained in detail as follows: 1.
Related to the first question, namely “How influential is learning using video learning on what you learn?, it was found that 64.7% of people answered influential, 29.4% of students answered very influential, and 5.9% people answered no effect. From the data we get, it can be concluded that most people have the opinion that video learning has an effect on learning (Fig. 4).
726
A. C. Jiantono et al.
Fig. 4 Diagram question 1
2.
Related to the second question “Is learning using video learning effective?, it was found that 50% of people answered effectively and 50% of people answered ineffectively. It can be concluded that learning using video learning is not too effective to continue to run as a learning method. If improvements are made to the quality of video learning, it may increase the effectiveness of learning with the video learning (Fig. 5).
3.
Related to the third question “How to increase the effectiveness of learning using video learning” from this question it was found that 50% of people answered using interesting videos, 31.3% answered giving practice questions about the video given, and 18.7% people answered differently-other. It can be concluded that by providing interesting videos, it can increase the effectiveness of learning using video learning. This is because technology is increasingly improving in
Fig. 5 Diagram question 2
The Effect of Learning Using Videos on Online Learning …
727
Fig. 6 Diagram question 3
this world so that interesting videos can increase the enthusiasm to learn and watch them (Fig. 6).
3 Research Method See Fig. 7.
3.1 Research Methods The research method we used is quantitative analysis. Which means that the data assessed is presented as assessed.
3.2 Research Design The type of research that our group applies to this research report is a quantitative type. We chose this type of research because we wanted to provide an overview of the level of influence of video learning on the interest of the private university students to study during the Covid-19 pandemic. And also because the time period we have is only short so it doesn’t allow us to apply this type of qualitative research. The quantitative result will be delivered by giving the mean median and mode of the response.
728
A. C. Jiantono et al.
Fig. 7 Flowchart research method
3.3 Methods and Sources In this study, we used total sampling with a closed-question method using google form as our intermediary medium with the respondents. We will ask questions about The Effect of Learning Videos on private university students’ Interest in Studying during the Covid-19 Pandemic. The Google form will begin to be distributed on January 14, 2022. Then the data will be processed and given an overview in the form of graphs or charts. There are several things that can be done to prevent this from
The Effect of Learning Using Videos on Online Learning …
729
having fake data. We can use the help of relations to distribute questionnaires and collect fellow research populations in one group and distribute questionnaires.
4 Research Result From our survey results, we learned that most private university students that learned with video learning methods felt that the method affected the way they study (62.5/100). Although the video learning methods affect them, 50% of respondents felt that the video learning method is already effective, but the other 50% felt otherwise. There are also problems regarding the implementation of video learning methods. According to the respondents from our survey, the problems that can arise during video learning are difficulty focusing on learning activities that are being carried out, the internet often lags, it is difficult to ask questions when there is material that is not understood, discussion activities with friends become more difficult, etc. Therefore, we also survey how to improve the effectiveness of learning using video learning methods. The result we get from the respondents is that most of them want the video given to be more attractive, and some want the lecturer to give exercise after the video learning class is done to increase their understanding. Hereby Table 1 results as a result of our survey: From Table 2, we can find out that the mean of how influential learning using video learning is to what is learned from the results of the survey we conducted is 3.24 (which means that video learning has an effect on the learning process), the median of how influential learning using video learning is 3 (influential), and the mode of the survey results is 3 (influential). For the question of whether learning using video learning has been effective, it can be concluded that half of private university students feel that the learning method is already effective, while the other half doesn’t feel the same. Based on the survey results, the reason video learning is less effective is mostly because the students are harder to focus on the learning process, this leads to the students’ lack of understanding of the material being studied. And from our survey results, we can conclude that most of the survey respondents want more attractive videos for the learning methods to make them more attracted and focused on the learning process.
5 Discussion From the results of the survey conducted, we discuss whether the survey results assessed are in accordance with our hypothesis. The results of our discussion is that the survey result we conducted is in accordance with our hypothesis—Lack of interaction can reduce the value in the teaching and learning process. As a result, students are not interested in participating in learning video lessons. The survey result also shows that most students in private university want more attractive videos in the
730
A. C. Jiantono et al.
Table 1 Result of the survey Respondent number
Level of influence of learning with video learning method (1. Very not influential, 4. Very influential)
Effectiveness of the current video learning method (1. Not effective, 2. Already effective)
Way to increase the effectiveness of the current video learning method
1
Very influential
Already effective
Use more attractive video
2
Influential
Already effective
Lecturer need to give exercise after the video learning class to increase understanding
3
Influential
Not effective
Use more attractive video
4
Influential
Already effective
Lecturer need to give exercise after the video learning class to increase understanding
5
Very influential
Not effective
Do onsite learning instead of video learning
6
Influential
Already effective
The speaker need to be more interactive
7
Not influential
Not effective
Use more attractive video
8
Influential
Not effective
Lecturer need to give exercise after the video learning class to increase understanding
9
Very influential
Already effective
–
10
Influential
Already effective
Lecturer need to give exercise after the video learning class to increase understanding
11
Influential
Not effective
Use more attractive video
12
Influential
Not effective
Use more attractive video
13
Influential
Not effective
Lecturer need to give exercise after the video learning class to increase understanding
14
Very influential
Already effective
Use more attractive video
15
Influential
Already effective
Use more attractive video
16
Influential
Not effective
Use more attractive video
video learning process to increase the effectiveness of their learning process with video learning methods and most of private university students want the lecturer to give exercise after the class conducted to increase their understanding.
The Effect of Learning Using Videos on Online Learning …
731
Table 2 Mean, median, and mode of the survey result Question
Mean
How influential is learning using video learning on what you learn?
3.24 (video learning 3 (video learning is is influential on the influential on the learning process) learning process)
Median
Mode 3 (video learning is influential on the learning process)
Has learning using video learning been effective?
The mean of this question is in the middle of the two answers
The median of this question is in the middle of the two answers
There are the same number of respondents from the two options
If learning using video learning is less effective, give your reasons
–
–
Most of the respondents answer is they are having a hard time to focus on the video learning method
How to increase the effectiveness of learning using video learning
–
–
Most of the respondents vote on “use more attractive video for the video learning method” and the second mode is they want the lecturer to give exercise after the class is conducted
6 Conclusion The researchers have reported the results of their research. Our study shows that conducting learning using video learning is quite effective among private university students. However, using video learning may not always be enough to enhance learning. So other ways must be done to improve learning during this pandemic. Using videos that are interesting in terms of discussion or picture quality can produce better learning outcomes and make students gain a high level of understanding. Visual displays that spoil the eye, are easier to catch by the mind than just writing [12]. Learning using interesting audio-visuals and videos will improve learning ability by 50% [13]. This study shows that interesting videos can be a valuable tool to increase the effectiveness of student learning at private university. Acknowledgements The research is made from the research project in Bina Nusantara University for Research Methods in Information Systems subject. The research is helped by Mr. Dr. Ir. Ford Lumban Gaol, S. Si., M. Kom.
732
A. C. Jiantono et al.
References 1. Video learning: learning methods for the digital generation (2021). digital/#:~:text=Video %20learning%20adalah%20metode%20pembelajaran,yaitu%20emotional%2C%20intellectual %20dan%20psychomotoric. Accessed 28 Jan 2022 2. D. Novita, Video based learning method as a media for delivering material in online learning (2021). https://man1muba.sch.id/metode-video-based-learning-sebagai-media-penyampaianmateri-pada-pembelajaran-daring/#:~:text=Video%20based%20learning%20adalah%20peny ampaian,elemen%20yaitu%20visual%20dan%20audio. Accessed 1 Feb 2022 3. A. Mulyana, Understanding student interests and interests in learning (2020). https://ain amulyana.blogspot.com/2012/02/minat-belajar.html#:~:text=Dari%20beberapa%20pengert ian%20tersebut%20dapat,mempelajari%20dan%20menekuni%20pelajaran%20tersebut.& text=Minat%20merupakan%20sifat%20yang%20relatif%20menetap%20pada%20diri%20s eseorang.,-Minat%20ini%20besar. Accessed 29 Jan 2022 4. M. Buchori, Psikologi Pendidikan, Jakarta: Rineka Cipta A.M (1999) 5. Sardiman, Interaksi dan Motivasi Belajar Mengajar: Pedoman bagi (1988) 6. Guru dan Calon Guru. Jakarta: Rajawali Pers F.B. Paimin & Nazaruddin, Pengembangan Media Pembelajaran. Penebar Swadaya: Jakarta (1998) 7. Tiffany, 10 Definitions of interest according to experts (2021). https://dosenpsikologi.com/pen gertian-minat-menurut-para-ahli. Accessed 29 Jan 2022 8. R. Fadli, Coronavirus (2020). https://www.halodoc.com/kesehatan/coronavirus. Accessed 2 Feb 2022 9. M. David, What is online learning is? This is the meaning (14 September 2021). https://artike lsiana.com/apa-itu-pembelajaran-daring-adalah-ini-pengertiannya/. Accessed 4 Feb 2022 10. Citrayuda, Positive and negative impacts of online lectures (20 November 2021). https://www. bhinneka.com/blog/dampak-positif-dan-negatif-perkuliahan-online/. Accessed 4 Feb 2022 11. Miarso, Yusufhadi, Menyemai Benih Teknologi Pendidikan. Jakarta: Kencana (2007) 12. P. Irna, Criteria for rating a video (March 1, 2019). https://www.kompasiana.com/gietta irna3802/5c782cfaaeebe1471c147ec6/kriteria-dalam-penilaian-sebuah-video. Accessed 3 Feb 2022 13. S. Umrah, The Effect of Multimedia Video Learning Media on Menarche on Knowledge and Attitude of V-Grade Students in Ready to Face Menarche (Hasanuddin University, Makassar, 2020). http://repository.unhas.ac.id/id/eprint/1371/2/P102181029_tesis%201-2.pdf. Accessed 4 Feb 2022
Implementation of Artificial Intelligence and Robotics that Replace Employees in Indonesia Hendy Wijaya, Hubert Kevin, Jaya Hikmat, S. Brian Vincent, Ford Lumban Gaol, Tokuro Matsuo, and Fonny Hutagalung
Abstract Artificial intelligence is a technology that is widely adopted in the industry 4.0 era. Artificial intelligence is capable of connecting all devices, so that one can automate all devices without having to be in place. Especially now many machines can interpret certain conditions or events with Artificial Intelligence. The project aims to understand the cause and effect of Artificial Intelligence in human resources in Indonesia. Survey method is used to gather information and through surveys, it is determined that people think AI is a good influence to open new jobs.
1 Introduction Artificial intelligence is no longer a foreign concept to most people. In fact, practically everyone utilizes artificial intelligence-based technologies, such as smartphone cameras [1]. AI is the capacity of computer systems or machine robots to accomplish a variety of intelligence-related tasks. The term “intelligence” is used to describe the H. Wijaya (B) · H. Kevin · J. Hikmat · S. Brian Vincent School of Information System, Bina Nusantara University, Alam Sutera, Indonesia e-mail: [email protected] H. Kevin e-mail: [email protected] J. Hikmat e-mail: [email protected] F. L. Gaol Binus Graduate Program—Doctor of Computer Science, Jakarta, Indonesia e-mail: [email protected] T. Matsuo Advanced Institute of Industrial Technology, Tokyo, Japan e-mail: [email protected] F. Hutagalung University of Malaya, Kuala Lumpur, Malaysia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2_61
733
734
H. Wijaya et al.
features of human conduct in this context. AI is the study of creating machines with human-like abilities such as language comprehension, picture recognition, problem solving, and learning [2]. AI technology is a significant technological development. In industry 4.0, AI technology is reliable enough that it causes the dismissal of employees/HR because they are replaced by AI or robots. More educated human resources are needed to help in industry 4.0 [3].
1.1 History of Artificial Intelligence Alan Turing, a British mathematician, presented a paper titled “Computational Machines and Intelligence” in 1950, which paved the way for the subject of artificial intelligence. The document starts with a straightforward question: “Can machines think?” The Turing test can be used to validate a machine’s capacity to demonstrate intelligent behavior comparable to human intelligence [4]. Turing, then created the Turing test, a way for determining whether machines can think. Quizzes, often known as “Imitation Games” on paper, are intended to be simple assessments that can be used to demonstrate that machines can think [5–9]. The first AI computer program, called Logical Theorist, was written in 1955 by Cliff Shaw, Herbert Simon, Allen Newell, and it proved 38 of the first 52 theorems. A concept for an “artificial intelligence” workshop was made by John McCarthy, an American computer scientist, and his colleagues. With John McCarthy’s consent, the phrase artificial intelligence was officially born in an interview at Dartmouth College in 1956, following the workshop. McCarthy created Lisp, a high-level programming language for AI research in 1958, that is still used today at MIT. While working on developing a computer to play a game of chess that might compete with human players in 1959, Arthur Samuel invented the phrase “machine learning” [10]. Unimate, a 1950s industrial robot designed by George Devol, was the first to emerge on the General Motors assembly line in New Jersey in 1961. Tasks that are hazardous to people, such as carrying die castings from an assembly line, were performed by robots. In 1961, computer scientist and professor James Slagle invented SAINT (Symbolic Automatic Integrator). In 1964, an American computer engineer named Daniel G. Bobrow created STUDENT, an early AI software written in Lisp. This application teaches how to read and answer algebraic word problems. Eliza’s first chatbot, MIT professor Joseph Weizenbaum’s language processing computer program, humorously was turned into a caricature of someone people could connect to in 1966. However, people began to form genuine emotional attachments to him. Despite the fact that, Eliza interacts via text and is unable to learn through human interaction, she serves a vital role in bridging the gap between humans and technology. In 1966, the first mobile robot project, “Shakey” was launched. The project lasted from 1966 to 1972. The project is viewed as an attempt to connect several aspects of AI to navigation and computer vision. At the moment, the robot is on display at the Computer History Museum. A Space Odyssey, the critically acclaimed science
Implementation of Artificial Intelligence and Robotics …
735
fiction film from 2001, was premiered in 1968. The film depicts a HAL (Heuristic Programmed Algorithm computer), an AI-powered nuclear-powered Discovery One spacecraft directed by Stanley Kubrick [11]. Waseda University unveiled WABOT-1, Japan’s first humanoid robot, in 1970. It has the possibility to move its limbs. Some of the combined features include the ability to watch and refrain from speaking. When James Lighthill, an applied mathematician at the British Science Council, reported the status of AI research and indicated that no findings had matched expectations, and the development of AI came to a halt. As a result, the UK government’s support for AI research was drastically reduced. 1977 was the year that Star Wars’ iconic and legendary legacy took over. The film was directed by George Lucas and stars C-3PO, a humanoid robot trained in almost seven million communication modes, and R2-D2, a miniature astromech unfit for human interaction accompanied by a beep sound produced by an electrical device. The Stanford Cart, a remote-controlled automobile robot with a TV built in 1961, spanned a room full of seats in around 5 h without human intervention in 1979, making it one of the first autonomous robots [12]. Waseda University developed the WABOT-2 in 1980, allowing humanoids to communicate with humans, read musical scores, and play electronic organs. On behalf of the Japanese Ministry of International Trade and Industry, $850 million was set aside in 1981 for the Fifth Generation Computer project, which aimed to produce computers that could converse, translate, comprehend images, and examine themselves in the same way that people could. The first self-driving automobile, a Mercedes-Benz van equipped with sensors and cameras, was created in 1986 under the supervision of Ernst Dickmanns and was capable of reaching a speed of 55 mph on empty highways. “Probabilistic Reasoning in Intelligent Systems,” by Judea Pearl, a computer scientist, was published in 1988. Jabberwacky was created by the programmer, Rollo Carpenter in 1988 to interestingly replicate spontaneous human conversation. This is one of the first attempts at generating artificial intelligence through human contact [13]. In 1993, the book called “Elephant didn’t play chess” suggested a new approach to AI, based on continual physical interactions with the environment and the development of intelligent systems from the ground up. Richard Wallace, a computer engineer, established A.L.I.C.E (Artificial Linguistic Internet Computer Entity) in 1995 after being inspired by ELIZA Weizenbaum. Two new features was added to A.L.I.C.E with the addition of natural language sample data collection. Long Short-Term Memory (LSTM), a sort of Repeating Neural Network (RNN) architecture, now used for voice and handwriting recognition, was introduced by Jürgen Schmidhuber and Sepp Hochreiter in 1997. This is the year when the IBM Deep Blue chess game computer beat the reigning world chess champion for the first time. Dave Hampton and Caleb Chung created Furby, the world’s first household robot or pet toy, in 1998. Cynthia Breazeal, an MIT professor, developed the expressive humanoid robot “Kismet” in 1998. It’s a face-reading robot that can detect and reproduce emotions. This robot has eyes, lips, eyelids, and brows, much like a human face. Sony released AIBO (Artificial Intelligence Robot) in 1999, following in the footsteps of Furby. AIBO was supposed to learn through contact with its surroundings
736
H. Wijaya et al.
and users. More than 100 vocal commands was recognized and responded to by the robot [14, 15].
1.2 Industry 4.0 Industry 4.0 is portrayed as a comprehensive breakthrough that digitizes and automates every aspect of the business. This industry is revolutionary for Indonesia, to increase interaction with robots and their use. In Indonesia, the development of Industry 4.0 is very fast. Moreover, assisted by the era of globalization, increases the development of technology in Indonesia. Industry 4.0 also forces employees in Indonesia to enrich themselves more because their competitors are not only humans but robots that can be programmed to work any time. The development of this industry is very profitable for companies because it can cut costs by implementing artificial intelligence technology used by robots [3]. Industry 4.0 has a number of weaknesses, including the inability to preserve the integrity of the manufacturing process due to the lack of human oversight. The challenge of sustaining decency in the manufacturing process without human monitoring or intervention is what is intended. In addition, some man-made forced labor camps have been substituted by automated technology.
1.3 Relevance and Importance of Research This research uses targeted data retrieval techniques, natural language processing technology, and a proprietary mathematical framework to understand how people actually feel about the impact of AI in the context of human resources in Indonesia. This research is feasible because the issue underlying this research is an important issue that is closely related to the impact of AI on human resources and how to find a solution. Research Questions 1. 2.
What is the impact of AI technology in manufacturing, on employees in Indonesia? Will AI technology robotics replace human jobs in Indonesia?
2 Research Technique The technique that is used is an online survey using Google Forms. The survey is about, do people think AI technology will replace employees and take their jobs. 6 questions about AI technology and how it affected their jobs have been provided. This survey forms are distributed to Binus university students, friends at other universities,
Implementation of Artificial Intelligence and Robotics …
737
and people around the age of 20 who work in companies that start switching to industry 4.0. Of the 25 responses, most of them are fresh graduates seeking a job or have just took up a job. Lots of the respondents feel quite worried about losing their job replaced by AI/robots. When asked if AI technology is important, lots of the responses says that it is really important even though they might lose their job. There might be some employees losing their job, but companies need AI technology because it is better to automate some processes, especially in production.
3 Research Method This research aims to understand the cause and effect of Artificial Intelligence in human resources in Indonesia. Research is conducted using quantitative method with descriptive research design. This research uses primary data sources that are taken from surveys conducted on selected samples. Participants are compensated in the form of money for filling the survey. The survey lasts for 5 days and is conducted during workers’ rest hours and workers’ return hours. Before the analysis, data from the survey are prepared. The data undergo data validation, checking of missing data and outliers, and data coding. Then the data are entered into the computer and analyzed using the help of statistical software (Fig. 1).
4 Research Result The survey results show that people, in this era think that AI Technology Manufacturing and Automation robotics can create new types of jobs i.e., jobs that help companies develop robotic AI technologies. The majority of survey respondents are over the age of 20 and have worked for companies. Through the questionnaire result above, it is known that those who fill out this form are aged 20 years and over, the majority of whom have worked related to artificial intelligence technology (Fig. 2). The result of the above query shows that 87.5% of the respondents work in a company (Fig. 3). From 22 responses, it is observed that the majority believe that artificial intelligence technology can replace their position in the company (Fig. 4). From Fig. 5, it is known that 45.5% of people answered that artificial intelligence technology is necessary for companies in industry 4.0. Figure 6 shows that 54.5% of the respondents think AI technology will open a potential for new type of jobs. The responses for the last question reveal that 72.7% of the people who answered this survey, have lost their jobs due to the implementation of AI technology. Thus, this proves that AI technology can replace employees in Indonesia (Fig. 7).
738
Fig. 1 Flowchart of research method
Fig. 2 Response of survey question 1
H. Wijaya et al.
Implementation of Artificial Intelligence and Robotics …
Fig. 3 Response of survey question 2
Fig. 4 Response of survey question 3
Fig. 5 Response of survey question 4
739
740
H. Wijaya et al.
Fig. 6 Response of survey question 5
Fig. 7 Response of survey question 6
5 Discussion From the results of the questionnaire, whether the outcomes are as expected has been discussed. Regarding the query if people will think that their jobs could be replaced by AI/robots, it was anticipated that most of them would respond positively. However, 12 individuals out of 22 are convinced that their job couldn’t be replaced by AI/robots. This result is lower than suspected, which means people think that some jobs couldn’t be replaced. For the fourth question i.e., if people think robots are needed for the company, the answers are the same as predicted. Despite that some people might lose their job to AI/robots, 45.5% of the respondents still trust that AI/robots are required for a company.
Implementation of Artificial Intelligence and Robotics …
741
6 Conclusion Artificial Intelligence (AI) is a system that learns from its mistakes, adapts to new inputs, and performs jobs in the same way that people do. Evolutionary computation and natural language processing are used prominently in most AI instances today, from computers playing checkers to driving automobiles. Computers may be trained to do certain jobs with this technology by analyzing massive volumes of data and identifying patterns from data. There are so many opinions about Artificial Intelligence Technology. Some argue about the advantages and disadvantages of AI. Through this research, it is known that 54.5% of people state that artificial intelligence technology can create new jobs and improve human resources in Indonesia. In addition, 45.5% of people believe that artificial intelligence technology can cause employees to lose their jobs especially in manufacturing companies. Acknowledgements This research was conducted as an assignment for the Research Methods course in Information Systems and to find out public opinion about AI technology.
References 1. Y. Devianto, S. Dwiasnati, Kerangka Kerja Sistem Kecerdasan Buatan Dalam Meningkatkan kompetensi sumber Daya Manusia Indonesia. Jurnal Telekomunikasi Dan Komputer 10(1), 19 (2020) 2. R. Verma, S. Badni, Thesis, Challenges of artificial intelligence in human resource management in Indian IT sector, Mysore University, 2019 3. S. Vaidya, P. Ambad, S. Bhosle, (2018, February 21). Industry 4.0—a glimpse. Procedia Manuf. Retrieved 30 Jan 2022 4. A. Van Wynsberghe, Sustainable AI: AI for sustainability and the sustainability of AI. AI Ethics 1(3), 213–218 (2021). https://doi.org/10.1007/s43681-021-00043-6 5. Q.D. Kusumawardani, Hukum Progresif Dan Perkembangan Teknologi Kecerdasan Buatan. Veritas Et Justitia 5(1), 166–190 (2019). https://doi.org/10.25123/vej.3270 6. I.D. Wahyono, Personalisasi Virtual Laboratory Menggunakan Kecerdasan Buatan. TEKNO 29(1), 86 (2019). https://doi.org/10.17977/um034v29i1p86-96 7. R. Kusumawati, Kecerdasan Buatan Manusia (artificial intelligence); Teknologi Impian Masa DEPAN. ULUL ALBAB Jurnal Studi Islam 9(2), 257–274 (2018). https://doi.org/10.18860/ ua.v9i2.6218 8. H. Riza, A.S. Nugroho, Gunarso, Kaji Terap Kecerdasan Buatan di Badan Pengkajian Dan Penerapan Teknologi. Jurnal Sistem Cerdas 3(1), 1–24 (2020). https://doi.org/10.37396/jsc. v3i1.60 9. M. Rangaiah, History of artificial intelligence with timeline: analytics steps with timeline|analytics steps (n.d.). Retrieved 6 Feb 2022 10. P.A. Zumarsyah, Sejarah Kecerdasan buatan atau artificial intelligence (AI) (Warung Sains Teknologi, 2021 July 4) 11. M. Zahid, Akankah Teknologi dapat Menggantikan pekerjaan manusia di Masa Yang akan datang? (FAKULTAS TEKNIK, (n.d.)) 12. T. Pospisil, Council Post: Robots aren’t Taking over the World (yet)—Artificial Plus Human Intelligence is Still the Best Combo. (Forbes, 2021 December 10). Retrieved 10 Feb 2022
742
H. Wijaya et al.
13. M. Zidane, Mengapa Coca Cola Menggunakan Ai Untuk Membuat mesin Penjual Otomatis Yang Cerdas (2020, December 20). ICHI.PRO. Retrieved 10 Feb 2022 14. A. Mardatila, 7 macam pekerjaan masa depan Yang Akan digantikan Oleh Kecerdasan Buatan (2020, July 28). merdeka.com. Retrieved 10 Feb 2022 15. S.C. Shapiro, Artif. Intell. 7(2), 199–201 (1976). https://doi.org/10.1016/0004-3702(76)900 04-7
Author Index
A Aarthi, J., 91 Aarthi, M., 135 Abdelhamid, Esraa, 559 Abdul Rahaman, Sd., 389 Ahmed, Karim Ishtiaque, 35 Aithal, Sudhanva Suresh, 211 Akter, Flora, 291 Anam, Ilwa Maulida, 719 Anitha, T., 15 Anuradha, T., 35 Aref, Mostafa, 559 Arivalagan, M., 479 Arivazhagan, B., 233 Athilingam, R., 479
B Badry, Rasha M., 579 Bajla, Anish, 619 Balne, Sangetha, 221 Basavaraddi, Sushma, 55 Bedekar, Mangesh, 115 Beulah, Jasmine, 147 Bhan, Anupama, 527 Bhatlawande, Shripad, 373 Bhavatharini, S., 91 Bose, S., 15 Brian Vincent, S., 733 Busygin, Volodymyr, 1
C Chandra Bonik, Choyon, 291 Chandrashekar, N. S., 467 Christopher, Ignasius, 689
Cristin, R., 513
D Danh, Luong Vinh Quoc, 607 Daniya, T., 513 Darmawan, Hendik, 703 Dehraj, Pooja, 177 Desai, Sharmishta, 115 Devona Algista, Satria, 689 Dhiya ‘Ulhaq, Fakhri, 689 Dhorajiya, Arya P., 199 Dilliraj, E., 307 Divij, R., 27 Doraipandian, Manivannan, 265 Dung, Tu Thanh, 607 Durand, Lilian, 279 Dzhus, Oleksandr, 1
E Encheva, Sylvia, 643
F Fatah, Tsabit Danendra, 703 Feel, Haytham Al, 579 Filimonova, Natalia, 703 Fong Peng, Chew, 689
G Gadyal, Vijaylaxmi, 55 Gajjar, Pranshav, 343 Gaol, Ford Lumban, 703, 719, 733 García Márquez, Fausto Pedro, 631
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 P. Karrupusamy et al. (eds.), Ubiquitous Intelligent Systems, Smart Innovation, Systems and Technologies 302, https://doi.org/10.1007/978-981-19-2541-2
743
744 Garg, Aakash, 329 Gautam, Priya, 177 Gayal, Baliram Sambhaji, 249 Gayatri, K. R., 447 Geetha Devi, A., 389 George, Augustine, 147 Gopalakrishnan, K., 101 Gour, G. B., 55 Govindaraj, Shashank, 279 Gowri, T., 221 Grechaninov, Viktor, 543 Guna Nandhini, N., 91 Gupta, Govind P., 423
H Halder, Anoushka, 435 Handoyo, Hadi, 719 Hemanth, K. S., 123 Hema Priya, K., 455 Hikmat, Jaya, 733 Hishore, N., 27 Hutagalung, Fonny, 719, 733
I Ilampiray, P., 455 Islam Khan, Nazmul, 291 Islam, Taminul, 291 Ismail, Sally, 559
J Jaber, Azeez Lafta, 651 Jiantono, Angelia Cristine, 719 Jihadul Islam, Md, 291 Joshi, Aditya, 373 Joshi, Kasturi, 373 Joshi, Riya, 373 Jyothi, Saddi, 351
K Kanakaprabha, S., 75 Kavyadharsini, P., 135 Keerthika, S., 135 Kevin, Hubert, 733 Khalil, Shuker Mahmood, 651 Khandare, Hrishikesh, 423 Khan, Imran, 35 Khoshaba, Oleksandr, 543 Khylko, Olena, 1 Krithivasan, Kannan, 265 Kundu, Arindom, 291
Author Index Kushal, B. H., 211 Kyalkond, Sameer A., 211
L Lavanya, K., 351 Logeswari, G., 15 Lopushanskyi, Anatoliy, 543 Lumban Gaol, Ford, 689 Lyudmila Petrovna, Varlamova, 675
M Madake, Jyoti, 373 Mahmoud, Rehab, 579 Makanyadevi, K., 135 Malagi, Kiran B., 399 Malini, M., 123 Manikandan, B., 479 Manikanta Sanjay, V., 211 Manoj Athreya, H., 211 Marini, Simone, 279 Márquez, Fausto Pedro García, 279 Matsuo, Tokuro, 689, 703, 719, 733 Mohabuth, Abdool Qaiyum, 45 Molodetska, Tetiana, 543 Mozumder, Samsil Arefin, 163 Muñoz del Río, Alba, 631 Murekachiro, Dennis, 665
N Nagaraj, Bedre, 399 Nam, Vo Hoai, 607 Nandhini, C., 479 Nand, Parmd, 493 Natalia, Bella, 689 Naveen, G., 27 Nitin Kumar, R., 447
P Papaelias, Mayorkinos, 279 Patel, Syed Imran, 35 Patil, Rajashekhargouda C., 467 Patil, Sandip Raosaheb, 249 Pham, Duc-Hong, 595 Phong, Le Hong, 607 Pobochii, Ivan, 1 Pooja, 493 Prabhu, D., 15 Prashanthi, Venigalla, 447 Prathap, G., 147 Preethicaa, R., 479
Author Index Preethi, N., 415 Priya, S., 435 Proshad, Pratoy Kumar, 619 Puri, Sudaksh, 527
R Rachman, Muhammad Haekal, 703 Radha, D., 75 Rahman, Md. Sadekur, 619 Raihandhika, Alif Fauqi, 719 Raja Kumar, V., 35 Rajashekar, Vishal, 211 Rajendran, Sujarani, 265 Rajput, Abhishek, 329 Rakhi, Anusree Mondal, 199 Rudhramoorthy, D., 27
S Sabapathi, Ramya, 265 Sabitha, M., 135 Sánchez, Pedro José Bernalte, 279 Sanghvi, Harshil, 343 Santhanalakshmi, S., 75 Santhi, P., 91 Saranya, P., 199 Sashchuk, Hanna, 1 Sasikumar, P., 359 Sathiamoorthy, S., 359 Savantanavar, Vandana S., 55 Saxena, Aayush, 435 Sebhatu, Sıraj, 493 Segovia Ramirez, Isaac, 631 Shah, Pooja, 343 Sharifuzzaman Sagar, A. S. M., 163 Sharma, Abhilasha, 329 Sherin Beevi, L., 455 Shilaskar, Swati, 373 Shunmuganathan, N., 233 Shvachych, Gennady, 1 Singh, Rajendra Bahadur, 35 Singh, Simone, 527
745 Snegha, R., 91 Sowmya Reddy, Y., 351 Sridhar, K., 569 Srijon, Adib Hossin, 619 Srinivasan, Palanivel, 265 Srinivasan, R., 569 Sri Sai Akhileswar, V., 389 Supriya, M., 447 Surya Prasada Rao, B., 389
T Talukder, Rituparna, 619 Tamilselvi, T., 479 Thapliyal, Akshat, 329 Timur Erikovich, Nabiev, 675 Truong, Nguyen Phuc, 607 Tung, Nguyen Thanh, 607
V Vadamodula, Prasad, 513 Vaishnavi, Makkena, 447 Vanathi, P. T., 101 Vasudevan, Aravind, 415 Vijaiprabhu, G., 233 Vijayakumar, P., 307 Vijayalakshmi, R., 455 Vikram, R., 27 Vinothina, V., 147
W Widjaya, William, 703 Wijaya, Hendy, 733
Y Yashoda, 55
Z Zavertailo, Kostiantyn, 543