182 37 13MB
English Pages 443 [426] Year 2022
Advances in Intelligent Systems and Computing 1414
Arti Noor · Kriti Saroha · Emil Pricop · Abhijit Sen · Gaurav Trivedi Editors
Proceedings of Emerging Trends and Technologies on Intelligent Systems ETTIS 2022
Advances in Intelligent Systems and Computing Volume 1414
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. Indexed by DBLP, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST). All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Arti Noor · Kriti Saroha · Emil Pricop · Abhijit Sen · Gaurav Trivedi Editors
Proceedings of Emerging Trends and Technologies on Intelligent Systems ETTIS 2022
Editors Arti Noor Education and Training Division CDAC Noida, Uttar Pradesh, India
Kriti Saroha Education and Training Division CDAC Noida, Uttar Pradesh, India
Emil Pricop Department of Automatic Control, Computers and Electronics Petroleum-Gas University of Ploiesti Ploiesti, Romania
Abhijit Sen Computer Science, and Information Technology Kwantlen Polytechnic University Surrey, BC, Canada
Gaurav Trivedi Department of Electronics and Electrical Engineering IIT Guwahati Guwahati, Assam, India
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-981-19-4181-8 ISBN 978-981-19-4182-5 (eBook) https://doi.org/10.1007/978-981-19-4182-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
We are pleased to present the proceedings of the 2nd International Conference on “Emerging Trends and Technologies on Intelligent Systems” (ETTIS-2022) scheduled on March 22–23, 2022 at Centre for Development of Advanced Computing (CDAC) Noida in association with the Automatic Control, Computers & Electronics Department, Faculty of Mechanical and Electrical Engineering, Petroleum-Gas University of Ploiesti, Romania. With the proliferation of Artificial Intelligence, Intelligent Systems have permeated our lives invariably in every aspect. Intelligent systems solutions are now being deployed almost in all spheres to solve a variety of problems and enhance productivity of the systems. The sectors such as Agriculture, Education, Smart Cities, and Healthcare are utilizing the benefits of Intelligent Systems to name a few. ETTIS-2022 is the second edition of the conference in the series targeting the research in the area of Intelligent Systems. The conference attracted academicians, scientists, researchers, and experts from the different domains to showcase their research ideas and share information about cutting-edge developments in the field on a common single platform. The conference featured invited talk from eminent experts of the different countries besides the regular paper presentations by authors. Although the conference was conducted in virtual mode due to the COVID-19 pandemic, the response to the conference was overwhelming. A total of 95 submissions were received through EasyChair and after double-blind review process by experts, 45 papers were accepted and 33 were presentation under six different tracks. These papers represented the recent developments in the subfields of Intelligent Systems. Selected papers after the further revision are now being published in the Book Series “Advances in Intelligent Systems and Computing” (AISC) by our publishing partner Springer. At this point, we take this opportunity to thank Ministry of Electronics & Information Technology, Govt. of India and Director General, C-DAC for their valuable support. We also thank Sh. Vivek Khaneja, Executive Director, C-DAC Noida and Patron of the ETTIS-2022, for supporting the conference in all aspects to make it a successful event. We would also like to thank Sh. V. K. Sharma, Senior Director &
v
vi
Preface
Group Coordinator (Education and Training) and also the Co-Patron of the Conference for guiding and motivating us at every step so that the conference could maintain a high standard of quality. We also thank Dr. Arti Noor, Senior Director and Principal General Chair of the conference for being actively involved and giving us support for completing this daunting task. The conference program represents the efforts of many people and we are thankful to all who have been a part of the conference and have contributed to successfully organize this event. In this line, we would like to thank Dr. Kriti Saroha, Program Chair; Mr. Sanjay Ojha, Finance and Purchase Chair; and all members of the Organizing committee for coming forward, taking the responsibility and contributing to the conference to make it successful. Our heartiest thanks to all the eminent experts who took time from their busy schedule to deliver invited talk for different tracks highlighting the new developments in the field and adding value to the conference. Special thanks to Prof. Keshab Parhi, University of Minnesota; Prof. Arpan Kar, IIT Delhi; Prof. Gaurav Trivedi, IIT Guwahati; Prof. Abhijit Sen, Kwantlen Polytechnic University, Canada; Prof. Emil Pricop, Petroleum-Gas University of Ploiesti, Romania; Prof. Arvinder Kaur, GGSIP University, Delhi; and Sh. Aninda Bose, Springer. Their talks really energized the young researchers and many of them will definitely enrich their research by these ideas. Our special thanks to all the members of the Advisory Committee and Technical Program Committee for collaborating with us and guiding us whenever required so as to maintain a high quality of the conference. We also thank all the authors for their contribution and participation in ETTIS-2022. Last but not the least, we thank all the members and the administrative staff who have put in tremendous effort to ensure that the virtual/online conference runs glitch free, thus making it a great success. We hope that this conference has stimulated the minds of the researchers in field of Intelligent Systems and they will, in future, surely come up with innovative solutions for applications in terms of effective techniques and algorithms. We feel honored and privileged to be a part of this journey and hope that this conference will continue in the coming years thereby serving its stipulated purpose in future also. Noida, India Noida, India Ploiesti, Romania Surrey, Canada Guwahati, India
Arti Noor Kriti Saroha Emil Pricop Abhijit Sen Gaurav Trivedi
Contents
EmotiSync: Music Recommendation System Using Facial Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selin Sara Varghese, Manjiri Kherdekar, Benitta Mariam Babu, and Archana Shirke
1
Retrospective Review on Object Detection Approaches Using Boundary Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vandana Jhala and Nidhi Gupta
17
Question Classification Based on Cognitive Skills of Bloom’s Taxonomy Using TFPOS-IDF and GloVe . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rahil N. Modi, Kavya P. K., Roshni Poddar, and S. Natarajan
25
Sign2Sign: A Novel Approach Towards Real-Time ASL to ISL Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sudhanva Rajesh, Ashwath Krishnan, and S. Natarajan
39
Analysis of Patient Tuberculosis Tenet Death Reason and Prediction in Bangladesh Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Md. Imtiaz Ahmed, Rezoana Akter, and Fatima Shefaq
53
Portable Electronic Tongue for Characterisation of Tea Taste . . . . . . . . . . Alokesh Ghosh, Hena Ray, Tarun Kanti Ghosh, Ravi Sankar, Nabarun Bhattacharyya, and Rajib Bandyopadhyay
69
e-Visit Using Dynamic QR Code with Application Deep Linking Capability: Mobile-App-Based Solution for Reducing Patient’s Waiting Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sudeep Rai, Amit Kumar Ateria, Ashutosh Kumar, Priyesh Ranjan, and Amarjeet Singh Cheema
85
Gunshot Detection and Classification Using a Convolution-GRU Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tanav Aggarwal, Nonita Sharma, and Naveen Aggarwal
95
vii
viii
Contents
Different Skin Tone Segmentation from an Image Using KNN for Sign Language Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Rakesh R. Savant, Jitendra V. Nasriwala, and Preeti P. Bhatt MuteMe—An Automatic Audio Playback Controller During Emergencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Jeremy Dylan D’Souza, Venkitesh S. Anand, Akhil Madhu, and Shini Renjith Chi-Square Top-K Based Incremental Feature Selection Model for BigData Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Subhash Kamble, J. S. Arunalatha, K. Venkataravana Nayak, and K. R. Venugopal M-Vahitaram: AI-Based Android Application for Automated Crowd Control Management in Bus Transport Service . . . . . . . . . . . . . . . . 141 Prathamesh Jadhav, Sakshee Sawant, Jayesh Shadi, Trupti Sonawane, Nadir Charniya, and Anjali Yeole Automatic Enhancement of Deep Neural Networks for Diagnosis of COVID-19 Cases with X-ray Images Using MLOps . . . . . . . . . . . . . . . . 155 Avik Kundu and Saurabh Bilgaiyan Big Data Disease Prediction System Using Vanilla LSTM: A Deep Learning Breakthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Natasha Sharma and Priya Non-destructive Quality Evaluation of Litchi Fruit Using e-Nose System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Suparna Parua Biswas, Soumojit Roy, and Nabarun Bhattacharyya A Survey of Learning Methods in Deep Neural Networks (DDN) . . . . . . . 189 Hibah Ihsan Muhammad, Ankita Tiwari, and Gaurav Trivedi The Implementation of Object Detection Using Deep Learning for Mobility Impaired People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Pashmeen Singh and Senthil Arumugam Muthukumarswamy A Study on Deep Learning Frameworks for Opinion Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Sandhya Ramakrishnan and L. D. Dhinesh Babu Improvisation of Information System Security Posture Through Continuous Vulnerability Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Navdeep S. Chahal, Preeti Abrol, and P. K. Khosla Design and Development of Micro-grid Networks for Demand Management System Using Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 L. Senthil, Ashok Kumar Sharma, and Piyush Sharma
Contents
ix
Brain Tumor Detection Using Improved Otsu’s Thresholding Method and Supervised Learning Techniques at Early Stage . . . . . . . . . . 271 Madhuri Gupta, Divya Srivastava, Deepika Pantola, and Umesh Gupta Hyperspectral Image Prediction Using Logistic Regression Model . . . . . . 283 Rajneesh Kumar Gautam and Sudhir Nadda Extractive Long-Form Question Answering for Annual Reports Using BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Anusha Kabber, V. M. Dhruthi, Raghav Pandit, and S. Natarajan Endpoint Network Behavior Analysis and Anomaly Detection Using Unsupervised Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Ajay Kumar, C. S. Sajeesh, Vineet Sharma, Vinod K. Boppanna, Ajay S. Chouhan, and Gigi Joseph Handling Cold-Start Problem in Restaurant Recommender System Using Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Saravanakeerthana Perumal, Siddhi Rawal, and Richa An SVM-Based Approach for the Quality Estimation of Udupi Jasmine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Sachin S. Bhat, Nagaraja, Suraj Revankar, B. Chethan Kumar, and Dinesha Routing-Based Restricted Boltzmann Machine Learning and Clustering Algorithm in Wireless Sensor Network . . . . . . . . . . . . . . . . 341 A. Revathi and S. G. Santhi A Systematic Review on Underwater Image Enhancement and Object Detection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Chandni, Akanksha Vats, and Tushar Patnaik IoT-based Precision Agriculture: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . 373 V. A. Diya, Pradeep Nandan, and Ritesh R. Dhote Enhancing the Security of JSON Web Token Using Signal Protocol and Ratchet System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Pragya Singh, Gaurav Choudhary, Shishir Kumar Shandilya, and Vikas Sihag Price Prediction of Ethereum Using Time Series and Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Preeti Sharma and R. M. Pramila Light Weight Approach for Agnostic Optimal Route Selection . . . . . . . . . 415 Nagendra Singh, Chintala Srujan, Dhruva J. Baruah, Divya Sharma, and Rajesh Kushwaha Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
About the Editors
Dr. Arti Noor is presently working as Senior Director at CDAC, Noida. She has done her Ph.D. from IIT, BHU, in 1990. She has more than 30 years of R&D experience in the field of VLSI Design & Technology, Hardware Design of Electronics System and Cyber Security. More than 20 years of teaching experience of VLSI design related courses to M.E. students of BITS, Pilani, and CDAC Noida. Executed 15 Research Development Projects and 20 commercial projects of various complexities in the area of VLSI Design, Cyber Security, Online examination, Educational board developments. She has guided six Ph.D. and 200 student’s projects of B.Tech./M.Tech./M.E. and examined 100 M.Tech. Dissertations. She has published 81 research papers in journals and conferences including one monographs. She has one patent, One Transfer of Technology of CDAC FPGA Board in her credit. She is recipient of ASSOCHEM Women Cyber Influencer 2021 award. She is Life member of Semiconductor Society and Broadcast Engineering Services and IEEE Member. Dr. Kriti Saroha is presently working as Joint Director at C-DAC, Noida. She has received her Ph.D. from GGSIPU, Delhi. She has 22 years of teaching and research experience. She has guided student’s projects of M.Tech./MCA and examined over 50 M.Tech. theses. She has published a book on Computer Organization as per UP Tech University syllabus for MCA and B.Tech (CS), Wiley India Pvt. Ltd. and several research papers in National/International Conferences and journals. Her research interest includes Data Warehousing and Data Mining, AI and Machine Learning, Computer Architecture. Dr. Emil Pricop is currently an Associate Professor and the Head of the Automatic Control, Computers and Electronics Department of the Petroleum-Gas University of Ploiesti, Romania. Also, he is an invited professor at the Computer Engineering Department of Faculty of Engineering (FoE), Marwadi University, Rajkot, Gujarat, India. He has held the position of Senior Lecturer since 2018. Dr. Pricop is teaching computer networking, software engineering, human-computer interaction, and critical infrastructure protection courses. He received his Ph.D. in Systems Engineering from Petroleum-Gas University of Ploiesti by defending in May 2017 the thesis xi
xii
About the Editors
“Research regarding the security of control systems.” His research interest is cybersecurity, focusing primarily on industrial control systems security. Dr. Emil Pricop is co-editor of two books published by Springer, namely Recent Advances in Systems Safety & Security (Springer, 2016) and Recent Developments on Industrial Control Systems Resilience (Springer, 2020). Also, Dr. Pricop is the author or co-author of 2 national (Romanian) patents, six (6) book chapters published in books edited by Springer and over 30 papers in journals or international conferences. Since 2013, Dr. Pricop is the initiator and chairman of the International Workshop on Systems Safety and Security—IWSSS, a prestigious scientific event organized annually. Dr. Pricop participated in more than 100 technical program committees of prestigious international conferences organized under the auspices of IEEE. He has held the vice-chair position of the IEEE Young Professionals Affinity Group-Romania Section from 2017 to 2019. Dr. Abhijit Sen is currently Professor of Computing Science and Information Technology at Kwantlen Polytechnic University, BC, Canada. He holds a Ph. D. from McMaster University, Hamilton, Ontario, Canada and Master of Science degree from University of California, Berkeley, USA, and B.Tech in Electrical Engineering from Indian Institute of Technology, Kharagpur, India. He has over 30 years of academic and administrative experience having worked in organizations such as Canadian Aviation Electronics, Montreal, Canada, Microtel Pacific Research, Burnaby, Canada. He also worked as a consultant to Canada Post, Montreal, and InfoElectonics, Montréal, Canada. He served as a chair of the department for over 14 years. He has also been a visiting professor at Waikato University, Hamilton, New Zealand, and University of Applied Sciences, Munich, Germany, Centre for Development of Advanced Computing (CDAC), India, North China Institute of Aerospace Engineering, China, and Technical University of Applied Sciences, Regensburg, Germany. He has been keynote speaker in several international conferences and served as reviewers and technical committee members in number of international conferences. He also served as external examiner for Ph.D. theses for several universities. His current research interests are in the areas of Wireless Networking and Security, Radio Frequency Identification (RFID), Computing Education and Teaching Methodologies, Distributed Systems and Databases, DevOps, and Artificial Intelligence. He is the recipient of Distinguished Teaching Award of Kwantlen Polytechnic University, BC, Canada. He is a Life Member of Institute of Electrical and Electronics Engineer (IEEE). He served in the Executive committee of IEEE, Vancouver Chapter.
About the Editors
xiii
Dr. Gaurav Trivedi is currently the Associate Professor at IIT Guwahati. He holds a Ph.D. and M.Tech. from IIT Bombay, and Bachelor of Engineering from Shri Govindram Seksaria Institute of Technology and Science (SGSITS), Devi Ahilya University, Indore, Madhya Pradesh. His research interests include Circuit Simulation and VLSI CAD, Electronics System Design, Computer Architecture, Semiconductor Devices, Hardware Security, Embedded Systems and IoT, High Performance Computing, Large Scale Optimization and Machine Learning. He has published several papers in National/International Conferences and Journals.
EmotiSync: Music Recommendation System Using Facial Expressions Selin Sara Varghese, Manjiri Kherdekar, Benitta Mariam Babu, and Archana Shirke
Abstract Emotion is intrinsically associated with the way individuals interact with one another and human beings have the natural ability to look at an individual’s face and predict their mood. However, machines lack a complex brain like that of a human in order to sense and distinctively recognize different emotions correctly. This ability, if learned by a computer or a mobile device—can have remarkable applications in the real world. The function of instantly responding to changes in user’s preference as well as complementing the user’s mood is a valuable asset for such systems. In recent research, it is proved that listeners experience a clear emotional response and one’s taste in music is immensely driven by their personality moods. Without a doubt, a listener’s response to a music piece is dependent on a variety of factors like age, culture, personal preferences, gender, and context. Nevertheless, this setaside, we can also classify songs based on our emotions. Facial expressions are a natural way to express and convey one’s emotions. And these emotions, in turn, affect the mood of the person. So, based on an accurate emotional recognition model, recommending music that anyone can relate to, which creates interest in users will be very useful, especially for monitoring patients with mood disorders to provide them with music therapy or mood enhancement and thereby reduce their anxiety/stress levels. Artificial intelligence is a large-scale, leading, and vital domain that has drawn a lot of researchers in the current age. This domain has an extensive take over the world in a very short span of time. We use it in our everyday life in the form of S. S. Varghese (B) · M. Kherdekar · B. M. Babu · A. Shirke Department of Information Technology, Fr. Conceicao Rodrigues Institute of Technology, Navi Mumbai, India e-mail: [email protected] M. Kherdekar e-mail: [email protected] B. M. Babu e-mail: [email protected] A. Shirke e-mail: [email protected] Fr. Conceicao Rodrigues Institute of Technology, Navi Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_1
1
2
S. S. Varghese et al.
self-driving cars, digital assistants like Amazon Alexa, chatbots, and many more. Developing a system that will recommend music by recognizing the mood of a person based on their facial expressions is the core idea of this project. Keywords Face detection · Emotion recognition · Music recommendation · Artificial intelligence
1 Introduction Feelings or emotions of human beings can be widely categorized as happy, sad, anger, fear, and surprise. This list remains non-exhaustive and can be expanded with variations of these basic emotions which are rather subtle like embarrassment, cheerful, excitement, and contempt. The contortions around the facial muscle region are very minimal, and so even a minute variation will affect the resulting expression which makes it difficult to comprehend the subtle emotions. Various people or even the same person might emote differently each time for the same emotion. In order to gauge the facial expression on an individual’s face, the facial areas like mouth and eyes remain the key region of interest as they tend to exhibit emotions maximally. Machine learning and Neural networks are used widely to detect and draw out gestures from a human’s body. These algorithms are proven to work with pattern recognition and classification-based problems and hence work efficiently for mood detection. Music has always played an integral role in changing the mood of the listener. The music streaming and recommendation applications that prevail currently require a tedious and cumbersome effort to manually segregate songs into various playlists which then is used to produce a relevant playlist predicated on the listener’s mood. This process can be easily eliminated by the new chance of computers to automatically understand and recognize the mood of a person through their facial expressions and recommend songs efficiently. Hence, on a larger dimension, saving time and human labor invested in doing this process manually. To express their emotions people often tend to use their facial expressions. Capturing and identifying the emotions depicted by a user and playing related music accompanying the mood of the user can progressively bring peace to the mind of the user, hence giving a calming effect. The project focuses on comprehending emotions expressed by an individual through facial expressions. An emotion-based music player system is outlined to recognize emotions through a webcam. The system records a real-time video stream of the listener and thereafter it captures the facial features of the listener and detects the emotion the individual expresses with image processing. This project aims to calm and relax the targeted individual and enhance their listening experience by creating a playlist that resonates with the detected emotions of the user.
EmotiSync: Music Recommendation System …
3
2 Literature Survey In recent years, multiple researchers have worked on music players based on mood extraction due to the increase in demand for understanding automated emotion comprehension. Lehtiniemi and Holm [1] proposed a concept of using animated mood pictures in their approach where users can interact with a collection of these animated images and receive song recommendations from related genres based on their selection. After the initial design model was released, it was assessed by 40 Finnish participants, and the thought of selecting music based on mood pictures was found to be effective by 85% of them. But it lacked the ability to personalize the pictures and music associations making the selection and recommendation quite static. Chuang [2] worked on an interactive sensor-based musical system which produces music based on a user’s current emotional state. It uses Arduino, Processing, and Pure Data codes to read in sensor input, determine an emotional profile, and compose music accordingly. It offers a unique way to understand the relationship between emotions and music, and is aimed at users who want to create music and express themselves but do not necessarily possess skills in music composition. Users select music according to their moods and emotions, hence the need to classify music in accordance to moods is in more of a demand. Since every individual will have a different perspective on how music should be classified according to their mood, it becomes a difficult task to determine the appropriate method for classification. So, Bhat et al. [3] worked on an effective mood classification method to discern the mood of any piece of music, by drawing parallels between the spectral and harmonic features of the music and the human perception of music and moods. Patil and Bailke [4] worked on recognizing human emotions through facial expressions by capturing images using Intel’s RealSense SR300 camera and processing it by an artificial neural network. Using the Software Development Kit (SDK) of the RealSense camera, it automatically detects landmarks on the depth image of a face and then a geometric feature-based approach is used for the extraction of features. Mohana Priya et al. [5] have built a real-time CNN-based Mini-Xception network for emotion detection algorithms using frontal face images. The facial regions from the images were extracted using the Vector Feature Model and Histogram Analysis method by dividing the whole image into three feature regions: eye, mouth, and auxiliary region retrieving the geometric shape information of those areas. However, the images in the dataset which had faces with glasses or spectacles interfered with the features learned by the model. Also, for song recommendation, only a dataset of a few songs was considered which were pre-classified into different emotion folders manually just for testing purposes. Their proposed system achieved an overall accuracy of 90.23%. Usually, users have to manually browse through the playlist of songs to select appropriate music. Here, K. S. Nathan et al. [6] proposed a system Emosic that uses an efficient model to create a playlist based on the current behavior and emotional state of an individual. Authors suggest that the existing methods are computationally
4
S. S. Varghese et al.
less accurate, slow, and sometimes even require additional hardware like EEG or sensors. Their proposed system is based on real-time extraction of facial expressions as well as audio features to distinguish specific emotions that will generate a playlist and thus entailing relatively lower computation cost than speech emotion processing. Author Vyas [7] proposed a real-time facial emotion recognition implementation using SVM classification and OpenCV through MoodSound music player. A limited dataset of 400–450 images was used as input in order to minimize the storage on memory and the classified emotion labels were restricted to Happy, Sad, and Neutral moods only. Predicted emotion by the classifier is then passed to a music player which has an already sorted music list according to the three moods, and this list was classified based on the tempo of the songs chosen. So, on passing the mood, the system would play appropriate songs from that list.
3 Existing System Almost all the prevalent music streaming platforms like Ganaa, Amazon Music, and iTunes require users to look for songs from their extensive playlists based on artists and genres, and play songs or create playlists manually according to their mood or liking. Users of these applications themselves search through various playlists to find songs that would calm or pump their mood. Current music players have features like fast forwarding, music categorization, user behavior, and collaborative filtering based on music recommendations and multicast streams. But these factors are now considered to be the basic and common requisites of every music streaming application. In order to elevate the experience of listening, there needs to be a technique to interpret users’ emotions and play music accordingly by generating a sentiment-aware playlist instead of limiting the scope to manual browsing or shuffle playlists.
4 Methodology EmotiSync focuses on implementing real-time mood detection and music classification. The prototype design of our project is divided into these major modules: Face Detection, Emotion Recognition, and Music recommendation as seen in the Fig. 1 architecture design.
EmotiSync: Music Recommendation System …
5
Fig. 1 Architecture Design
4.1 Face Detection The proposed framework first captures a real-time video stream with the help of an inbuilt webcam. As each frame is being captured, it is converted from RGB to grayscale for the classifier to identify faces present in the frame. This image is then sent to the classifier where the region of interest in the image is preprocessed and converted into 48 × 48 pixelated data, and by feature extraction technique the coordinates of the face are extracted from the preprocessed image. A classifier basically works as a program which traces if an image is a positive image containing a face or a negative image that does not have a face in it. Thousands of sample images with faces and no faces are used to train classifiers in order to understand how a new input image can be classified correctly. Open CV has many pre-trained classifiers to detect faces, eyes, and other objects within an image or video stream. The XML files of pre-trained classifiers are stored in the installed cv2 package folder. In OpenCV, there are two pre-trained classifiers, especially for face detection—Local Binary Pattern (LBP) classifier and Haar Cascade Classifier.
6
S. S. Varghese et al.
● Local Binary Pattern (LBP) divides the face image into micro-texture patterns and labels the pixels of an image by thresholding the neighborhood of each pixel and obtains a resultant binary number to classify a face from a non-face. ● Machine learning-based Haar Cascade Classifier uses a cascade function which is trained on Haar-like features (which are eyes, nose, and mouth coordinates) from many images with and without faces in it (Table 1).
Table 1 Haar cascade and LBP algorithm comparison Algorithm Advantages Haar
1. High detection accuracy 2. Low false positive rate
LBP
1. Computationally fast and simple 2. Requires less time for training 3. Robust to local illumination changes
Disadvantages 1. Computationally slow and complex 2. Requires longer time for training 3. Less accurate on black faces 4. Limitations in poor lightning conditions 1. Less accurate 2. High false positive rate
Both these OpenCV-based face detection classifiers have their own advantages and disadvantages but the key difference to note here is of accuracy and speed. Since, our use case deals with emotion recognition which involves facial feature extraction for classifying emotions, more accurate detections are of a major concern than speed; hence, Haar cascade-based classifier is more suitable for face detection and emotion recognition-based projects. OpenCV’s Haar feature-based cascade classifier for face detection is first trained with positive images containing faces which are scaled to the same size resolution and then with arbitrary negative images which do not contain any face in particular. After the classifier is trained, it captures the frequently occurring features or patterns through the entire training set images. Classifier is then applied to the region of interest in the image, i.e., the face. The classifier finds those features throughout the region of interest and returns the coordinates of the face detected in the input image. If it doesn’t recognize those features in the desired region input image, the classifier does not return any coordinate. The process of selecting features or pixelbased rectangular regions representing parts of a human face is called HAAR feature selection which is used in the detection process of cascade classifiers by OpenCV for face detection.
EmotiSync: Music Recommendation System …
7
4.2 Emotion Recognition For emotion recognition, FER-2013 dataset containing grayscaled facial images of human facial expressions is trained on a CNN architecture. In order to identify the emotion that is being expressed in these images, individual feature coordinates from the extracted region of interest (face area) are given as input to train the CNN network. Then the real-time webcam feed frames which act as an unknown set to the classifier are used to test whether the trained network can extract and return the facial landmark coordinates present in this new set correctly based on what it learned from the training set.
Fig. 2 FER-2013 dataset sample raw images
1. FER-2013 Dataset Kaggle’s Facial Emotion Recognition Challenge dataset is used to train the emotion recognition model. This dataset was compiled (Fig. 2) by the organizers of the “Challenges in representation learning” competition in 2013. This dataset has 48 × 48 pixel grayscale images of about 35887 faces which are divided as per usage into—28709 images training subset 7178 images testing subsets. Each image is labeled with one of these seven emotions: anger, disgust, fear, happy, sad, surprise, and neutral. CK+ dataset having seven emotion labels (happy, sad, angry, afraid, surprise, disgust, and contempt) is also usually considered by researchers for Emotion recognition-based projects but it has only 5876 grayscaled and labeled images of 123 different people. These images are extremely clean with posed expressions of individuals with look-alike photo backdrops which easily tends to give a higher accuracy while training the models; moreover, it has comparatively very less number of images of individuals than the huge and complex FER-2013 dataset. So, we chose to work upon various models to get a fairly good performance on the FER-2013 dataset.
8
S. S. Varghese et al.
2. Convolutional Architectures Implemented (a) Simple Sequential Convolutional Model The first model considered for Emotion Recognition is a simple sequential CNN network that consists of an input layer which contains the preprocessed grayscaled 48 × 48 dataset images, 4 consecutive 2D convolution blocks which are made of a 2D convolutional layer with ReLU activation function, a batch normalization layer, a maximum pooling aggregation layer with a 2 by 2 filter and a dropout regularization layer with a rate of 0.25 as suggested by George-Cosmin Poru¸sniuc et al. [8] in their study on CNN architectures for facial expression recognition. Further, two consecutive fully connected dense layers are added with a Softmax activation function at last to the network. All these layers are linearly stacked in sequence, so it’s called a sequential model. The dense layer at the end limits the number of neurons to 7 which is the number of labels in the dataset. And the output of the model gives the predicted emotion probabilities. (b) Mini-Xception Convolutional Network The second CNN model considered for implementation is called “miniXception” which is inspired by Google’s XCEPTION. According to George-Cosmin Poru¸sniuc et al. [8], this architecture is lightweight and extremely portable. It combines the residual blocks of CNN with the XCEPTION model’s depthwise separable convolutions and an additional global average pooling is added which decreases the chances of overfitting problems and reduces the overall trainable parameters in the network. The network contains four residual blocks of which one branch implements depthwise separable convolutions and then the usual 2D convolutional block with global average pooling and softmax activation function is added instead of the parameter-rich dense layers seen in the earlier sequential model.
4.3 Music Recommendation For Music Recommendation, Spotify playlists of songs will be generated based on the seven classes of facial emotions using Spotify Web API and music will be played directly on the application website through Web Playback SDK. Publicly available mood playlist tags on Spotify will be mapped to these seven emotion classes and based on the predicted emotion, a list of songs pertaining to the mood will be used to generate a unique playlist. Other than using the available mood playlists, recommendations will also be suggested by tuning the audio features of Spotify tracks like valence, tempo, energy, danceability, loudness, instrumentalness, and acousticness to suggest songs for the desired and predicted mood. Then the user can
EmotiSync: Music Recommendation System …
9
listen to any song he/she would like to from the recommended playlist. Parallelly, recommendation can also be influenced by user listening history such as their liked album and artists. The mood of the song is identified and mapped based on the user’s mood for music recommendation.
5 Experimental Analysis The four-layered simple sequential CNN model having 4,22,087 trainable parameters was trained to learn emotion classes on a training data sample of 0.9 fractions from the overall FER-2013 dataset. And a prediction accuracy of 57% was obtained after training it for 50 epochs as it can be seen from the classification report Fig. 3 with the precision, recall, F1-score, and support measure values for each trained emotion class.
Fig. 3 Classification Report of trained Simple CNN model
Figure 4 shows the confusion matrix for simple convolutional model architecture based on the predictions made over test data of 7178 images. It can be observed that the trained model performed poorly when it comes to predicting the actual emotions accurately. We can observe that the top two emotions which were predicted fairly well are happy and neutral. But from them only, 19.14% actual happy images were predicted as happy, and 8.99% actual neutral images were predicted as neutral. Moreover, this simple CNN model includes a set of fully connected layers or dense layers which tend to retain most of the training parameters in the network including the number of layers in the network, the number of neurons per layer, the number of training iterations resulting in the increase of total trainable parameters which affects the model training, its inference time and accuracy.
10
S. S. Varghese et al.
Fig. 4 Confusion Matrix for trained Simple CNN model
Fig. 5 Classification Report of trained Mini-Xception model
Similarly, the Mini-Xception model having 56,951 trainable parameters was trained on a training data sample of 0.8 fractions of the overall FER-2013 dataset. And a prediction accuracy of 64% was obtained after training it for 110 epochs as it can be seen from the classification report Fig. 5 with the precision, recall, F1-score, and support measures for each trained emotion class. In this model, there are no dense layers; instead, a global average pooling layer is added which reduces each feature map into a scalar value by taking the average over all elements in the feature map. Because of this layer, the last convolution layer will have the same number of trainable parameters as the number of classes in the dataset, i.e., 7 neurons.
EmotiSync: Music Recommendation System …
11
Fig. 6 Confusion Matrix for trained Mini-Xception model
Figure 6 shows the confusion matrix for the Mini-Xception model architecture based on the predictions made over a similar sample test data of 7178 images. Here, it can be evidently inferred that the Mini-Xception trained model performed significantly better than the previous model. We can observe that the model performed the best accurate predictions for happy (86%), anger (78%), and surprise (76%) emotions. Fear was the least correctly predicted emotion with a 35% recall rate.
6 Result Analysis Table 2 represents an overview of all the trained models on the FER-2013 dataset.
Table 2 Trained models overview Raw images from FER-2013 dataset Training dataset Architecture No of trainable parameters Batch size Training epochs Prediction accuracy
Simple Convolution 4,22,599 64 50 56%
Mini-Xception 56,951 64 110 64%
12
S. S. Varghese et al.
In our developed system, the first step is face detection. An image is captured through the webcam. In order to achieve accurate detection, the captured image that is fed to the classifier needs to be well illuminated. In real-time feed, the face is detected in a rectangular box with each frame captured as seen in the screenshots. User should ensure that their face should not be partially obscured or blocked as it would not be detected in the frame. As we have used a frontal face cascade classifier, the user is expected to look right into the webcam maintaining a frontal pose for the emotions to be captured accurately based on the features detected. The facial expressions will differ from person to person. Hence, the expressions will be then categorized based on the different coordinates present in the dataset. The detection of emotions will be done in our system after this step using our Mini-Xception model. After extracting and mapping the coordinates from the face detected, respective emotions will be displayed. As seen in the pictures below of different users, our system efficiently recognizes different emotions like Happy, Sad, Fear, Surprise, Disgust, etc., with its confidence probabilities. Analyzing the different emotions, the system will further classify and display song playlists based on these emotions to the user.
7 Conclusion and Future Scope In this paper, we have proposed a system for music recommendation using facial expressions to satisfy the purpose of automating the selection process of songs in existing music streaming platforms and play songs according to an individual’s emotional state of mind and their rightful preferences, thereby eliminating the manual efforts of creating and managing mood-based playlists. We focused on developing an efficient Emotion Recognizer and worked on two models, Simple CNN and MiniXception, for recognizing emotions through user’s facial expressions and inferred that the Mini-Xception model performed fairly well with the FER-2013 dataset for almost all seven emotion labels. The notable contributions of researchers in the same field were discussed and analyzed in this research. Various aspects of previous implementations, its pros, and cons were studied and the preferred Mini-Xception model algorithm was narrowed and worked upon (Figs. 7, 8, 9, 10, 11, and 12).
EmotiSync: Music Recommendation System …
13
Fig. 7 Happy emotion detected with 0.998 confidence
Fig. 8 Sad emotion detected with 0.853 confidence
The one factor other than emotion that has a significant effect on a person’s mood is weather. So, a weather-based music player which would fetch the weather of the desired city, and recommend music based on the weather conditions could also be developed and integrated with this implementation. In future, to enhance the accuracy of detecting the user’s mood perfectly, emotions could be detected through speech (using voice), body language, and facial expressions. The scope of this music player can be expanded to include Music Therapy. Music therapy is a revolutionary methodology for treating people with mental disorders using the power of music.
14 Fig. 9 Neutral emotion detected with 0.898 confidence
Fig. 10 Angry emotion detected with 0.808 confidence
Fig. 11 Fear emotion detected with 0.505 confidence
S. S. Varghese et al.
EmotiSync: Music Recommendation System …
15
Fig. 12 Surprise emotion detected with 0.760 confidence
References 1. Lehtiniemi, A., & Holm, J. (2012). Using animated mood pictures in music recommendation. In 16th International Conference on Information Visualisation (pp. 143–150). https://doi.org/ 10.1109/IV.2012.34 2. Chuang, G. (2015). EmotiSphere: From emotion to music. In TEI15 Proceedings of the Ninth International Conference on Tangible, Embedded Embodied Interaction (pp. 599–602). ACM. 3. Bhat, A.S., Amith, V., Prasad, N.S., & Mohan, D.M. (2014). An efficient classification algorithm for music mood detection in Western and Hindi music using audio feature extraction. In 5th International Conference on Signal and Image Processing (ICSIP), Jeju Island (pp. 359–364). https://doi.org/10.1109/icsip.2014.63 4. Patil, J.V., & Bailke, P. (2016). Real time facial expression recognition using RealSense camera and ANN. In International Conference on Inventive Computation Technologies (ICICT) (pp. 1–6). https://doi.org/10.1109/INVENTIVE.2016.7824820 5. Priya, M., Haritha, M., Jayashree, S., Sathyakala, M. (2018). Smart music player integrating facial emotion recognition. International Science and Technology Journal, 7(4). 6. Nathan, K. S., Arun, M., & Kannan, M. S. (2017). EMOSIC—An emotion based music player for Android. In IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 371–276. https://doi.org/10.1109/ISSPIT.2017.8388671 7. Vyas , M. (2021). MoodSound: A Emotion based Music Player. International Research Journal of Engineering and Technology (IRJET), 08(04) 8. Porusniuc, G., Leon, F., Timofte, R., & Miron, C. (2019). Convolutional neural networks architectures for facial expression recognition, 1–6. https://doi.org/10.1109/EHB47216.2019. 8969930.
Retrospective Review on Object Detection Approaches Using Boundary Information Vandana Jhala and Nidhi Gupta
Abstract In computer vision, object detection is an approach for identifying and locating objects from theimages and videos. It can also be used to measure the number of objects present in a scene determining the precise location as well. A technique for detecting boundaries between two objects comes under semantic ontology. The semantic boundary and edge detection is a difficult task as assessment for an edge cannot be purely grounded on low-level features like a gradient. Semantic learning of classifiers involves the knowledge of edge-labels, which is a complex problem in image processing. Here we examine several levels of information in order to adopt a feasible method where all edges required pixels for the continuous detection of the objects. In this paper, we study the application of object detection and several recent approaches developed using boundary information in past decades. The associated drawbacks are also highlighted in this work to provide the scope of improvement in this research field. Keywords Object detection · Boundary information · Machine learning · Edges · Feature extraction
1 Introduction Object detection is a computer vision approach for identifying and positioning objects in the images. Object detection, in particular, is to collect boundary information around identified items allowing us to see where they are in the image [1]. This technique recognizes and defines items like people, automobiles and animals from the digital photos and videos. It is all related to computer vision and image processing applications. Such a technique may categorize single or multiple items in a digital V. Jhala · N. Gupta (B) Department of Mathematics and Scientific Computing, National Institute of Technology Hamirpur, Hamirpur, Himachal Pradesh, India e-mail: [email protected] V. Jhala e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_2
17
18
V. Jhala and N. Gupta
image at the same time. An object is defined as an image element which uses significant information in the process of edge and boundary detection. The whole procedure works in all the techniques of edge and boundary detection to get the initial segmentation about boundary with the help of object information. Further, object recognition is a technique of identifying the objects present in the image. All object classes have their specific descriptions to distinguish them from others which helps in recognizing similarities or dissimilarities in the objects. Object detection provides capabilities to identify various types of objects in computer vision. It is concerned with the identification of an object in an image. Object detection is a classical visual task and the most important region in the visual application. It is mostly concerned with the location and detection of certain things within an image as considered a fundamental job in the object recognition process [2]. The motive of object detection is to draw contours between dissimilar objects in the image while discarding internal features [3, 4]. So existing works have necessitated the advanced degree of semantic information for solving this more efficiently. The solution is to simulate the complete set of classes which are uniquely labeled. However, the hypothesis is that high level information can identify boundaries without attempting labeling of pixels. The semantic distance between two region boundaries in an image is determined by intensity and color information [5]. In this paper, we discuss about the technique of object detection using edge and boundary information. Considering edge detection as an essential method, it is been extensively used in vision and edge detection algorithms. The low-level boundary features are extracted in pre-processing step for object detection effectively. The likelihood estimation of determining the whole geometry of the significant boundary information of the image is based on identified edges. The efficiency of several edge detection algorithms is based on this measured likelihood of extracting boundary information [6]. The boundary grouping method is discovered to target at detecting the significant closed boundary from the fragment. This formation can be generally applied to detect paths in the satellite images collected from Google maps. The results of such an existing algorithm are highly adaptive. The actual performance of these existing methods in vision applications is more important in embedded applications, because often they communicate with the real-world. As a consequence, boundary detection technique computations must be enhanced being an essential step for implementing advanced methods in real-world applications. The paper is organized into six sections. The introduction about object detection has been described in detail in Sect. 2. Applications of object detection in various fields are discussed in Sect. 3. Section 4 includes a thorough review of several approaches for object detection. Lastly, Sects. 5 and 6 include conclusive remarks and future scope, respectively.
Retrospective Review on Object Detection Approaches …
19
2 Object Detection Object detection has been used around for a while but it is getting more demand in different types of applications in industries or organizations. Boundary detection, edge detection and line approximation methods are the main techniques which helps in object detection effectively [7, 8]. This section briefly provides an idea of each technique as below.
2.1 Boundary Detection Approaches based on gradient information, machine learning methodology and salient information are the basic three types of boundary detection algorithms. The Canny edge operator is one of the conventional and well-known edge detector methods belonging to gradient based methods. It is designed to detect local edges but this also places a challenge to the system as it finds both true boundaries and false positive edges at the same time [9].
2.2 Edge Detection Human visual processing begins with edge detection, which is again a critical and challenging step. On the other hand, filters have been utilized and subjected to some recent studies [10]. Edge detection is well-known to be extremely important in computer vision. Various software has been used by a number of researchers to utilize simple edge detection algorithms. However, such a programmed technique has been discovered to be ineffective for real-time applications. Comparatively, on hardware platforms, edge detection algorithms have been proved more effective for real-time applications. This could be successful due to the viability of the very large-scale integration of the hardware execution process [8].
2.3 Line Approximation Algorithm Line approximation approaches have been widely used to obtain information on directions or inclinations of edges in the image. The fragments are served in the boundary grouping portion in the form of straight line segments. Line fitting is a well-studied problem with numerous efficient available solutions. Although the mathematical formulas and algorithmic solutions for some methods vary. However, the underlying basic concepts remain the same, i.e., finding a collection of line
20
V. Jhala and N. Gupta
segments which are well matched with the discovered edge pixels. The line approximation method finds the smallest number of line fragments. The results of the empirical study show the best boundary detection efficient method regardless of the edge detector parameters used in the approach [6].
3 Applications Object detection has been utilized as the backbone in several industry for personal security and viable office productivity. Many aspects of computer vision have been implemented for object detection and recognition in several machine learning based surveillance of automatic electric vehicle systems for security and inspection. Here several procedures for surveillance may be image retrieval of path and obstacles observation during the destination reached. Object recognition will have extended possibilities for several other utilization through continuous movement by removing substantial hurdles [7]. From the state-of-the-art, here we discuss several important applications of object detection in brief. Automatic disease detection from breast images is discovered using Computer Aided Diagnosis (CAD) devices through mammography. The foremost goal of utilizing mammography in CAD systems is to extract features and provide services in estimating contours to illustrate and define the boundaries of breasts effectively. Medical imaging and perceptive analysis of different organs and body tissues have recently examined the contours from medical images [10, 11]. This area has been broadly explored by several researchers during the last decade. However, the large dataset availability is still an issue and which has to be explored in the coming years [12]. The development of a new algorithm which must be capable of segmenting large medical images is a critical first step. Also, the detection of shadows in an image is a critical research area in the field of image processing. Specifically, when images are surrounded by a shadow like in satellite images. However, the shadows can be helpful at times and their presence may be beneficial too. The shadows cast by buildings in satellite images are useful in measuring the structure of a building based on dimensions, which may be beneficial in urban planning and 3D scene constructions. Shadows, on the other hand, obscure important details of objects for false color tones and variations to the underlying edges. Border detection algorithms and nonlinear filtering have been employed using echo-cardiography images to find cardiac quiescence signals. This quiescent signal is utilized for investigating cardiac gating for improving non-movable images in cardiac computational tomography. Furthermore, thresholding has been widely used in echo-cardiography for developing binary images to implement boundary detection approaches. The goal of tumor detection from brain images using boundary information for detecting tumors peripheral. All image segments are subjected to a boundary detection technique, which is then used to distinguish the tumors from the normal images [12]. In this procedure, the tumors present in every slice are treated for determining the volume for 3D
Retrospective Review on Object Detection Approaches …
21
Fig. 1 Corner detection [13]
rendering. Corner detection is again an important application in image interpretation as corners are the most visible points in the high pitch landscape images compared to different features [11, 13]. A distinct feature point, as depicted in Fig. 1, here it is the position where the radio waves of the gradient variations are high.
4 Literature Review The fundamental of image processing applications is to carry out edge detection, which is especially useful in motion detection, pattern recognition, image segmentation, remote sensing and medical applications. Many conventional methods like pixel labeling algorithms have been emerged as a novel research field in image processing. There are several different types of prior edge detection assessment methods. They can be broadly divided into two categories: subjective and objective approaches. The former methods evaluate edge detection efficiency using human-visual perception and judgment [2]. Even though many edge detection assessment methods have been introduced, but remained as a difficult job. The difficulty in selecting an acceptable performance measure for edge detection results in a major challenge. Edge detection is used as a pre-processing phase in most applications to remove insignificant boundary features which are then entered into subsequent steps. A comprehensive study of several methods is done in this field. A comprehensive review of several methods has been done and summarized in Table 1. In 2006, Hoogs and Collins [5] presented a technique to detect the boundary information of objects in images and calculated semantic distance using semantic ontology Wordnet. In the same year 2006, Liu et al. [6] presented a novel strategy for image segmentation using a ratio contour algorithm. The boundary grouping part was implemented using the ratio contour algorithm. The unbiased combination of 3 essential Gestalt rules: proximity, closure, and consistency were applied. All these rules have been widely used in many existing psychological and psychophysical experimental studies for determining the boundary saliency. In 2010, Kanade et al. [9] presented a novel strategy which included that using the context of categorization boundary detection problems can be resolved. A line
22
V. Jhala and N. Gupta
Table 1 Literature review on object detection methods using boundary information Authors (Year of Publication)
Dataset
Proposed method Classifier
Accuracy (%)
Hoogs and Collins UC Berkeley [5]
Semantic ontology using WordNet
84
Liu et al. [6]
Natural images
Ratio contour algorithm
Kanade et al. [9]
LIDAR
Three dimensional sensor are used which directly estimate the distance to surface
SVM
Xie and Tu [14]
BSD500, NYU depth
Holistically nested edge detection
F-score
Winder et al. [12]
MIAS, INBreast and BCDR
Active contour Convolutional model and canny neural net edge detection
98.4, 94.3, and 95.6
Ju et al. [8]
LabelMe
Adaptive detection tracking filter
84.63
F-score
75
MLP
78.2 and 74.6
scaling LIDAR generated a series of range measurements in polar coordinates with the set angular increments. The true location of an object’s border is likely to reside anywhere between the measured rays since the border can be found anywhere in this continuum. The goal of the classification challenge was to check the two successive range measurements between the objects. In 2015, Xie and Tu [14] presented a novel approach by examining related neural network-based techniques, focusing on multiscale and multilevel feature extraction. The challenge of identifying edge and object boundary information was inherently difficult. In 2017, Winder et al. [12] presented the method to generate pixel level scores without using labels of pixels in the training phase. However, both of these methods have one similarity in the prediction of an image to generate pixel scores. In 2020, Zakariah and AlShalfan [13] presented the model using VLSI technology to accurately find a corner point of an object which is a difficult task in pattern recognition and 3D image reconstruction. Several corner detection algorithms have been developed through time for computer vision applications such as object recognition, motion tracing, stereo matching, image registration, and camera standardization. In 2020, Ju et al. [8] presented an approach for object boundary detection system, which was established on the ability to distinguish between several activity patterns of the brain. Each of which was associated with a certain mental job.
Retrospective Review on Object Detection Approaches …
23
In 2021, Yao and Wang [1] proposed the method based on regional and boundary information along with an attentional feature enhancement module for object detection. The results were outperforming different datasets and existing techniques.
5 Conclusion We have studied several innovative methods for object detection and tried to collect information on object boundary detection approaches using image processing techniques. The overall framework is presented for the conjugation of ontological information into boundary and edge detection. The associated results have been elaborated in detail along with proposed further work. The results have disclosed that the semantic distance information is a subsidiary of the visual distance. In this study, we observed several supervised learning algorithms using boundary information for boundary and edge detection and made understanding on how these methods actually work.
6 Future Scope Individuals are endlessly inventing new objects, therefore new classes could be added in the existing systems to detect boundaries and edges. Future work can be extended in the medical field which can detect micro thin layers from lung images and can easily distinguish the abnormal tissues.
References 1. Yao, Z., & Wang, L. (2021). ERBANet: Enhancing region and boundary awareness for salient object detection. Neurocomputing, 448, 152–167. 2. Sun, Y., & Fisher, R. (2003). Object-based visual attention for computer vision. Artificial intelligence, 146(1), 77–123. 3. Westenberg, M. A., et al. (2004). Contour and boundary detection improved by surround suppression of texture edges. Image and vision computing, 22(8), 609–622. 4. Adelson, E. H., et al. (2014). Crisp boundary detection using pointwise mutual information. In ECCV 2014: Computer Vision—ECCV 2014 (Vol. 8691, pp. 799–814). 5. Hoogs, A., & Collins, R. (2006). Object boundary detection in images using a semantic ontology. In Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06). IEEE. 6. Liu, T., et al. (2006). Evaluating edge detection through boundary detection. EURASIP Journal on Advances in Signal Processing, 1–15. 7. Torresani, L., et al. (2015). High-for-low and low-for-high: Efficient boundary detection from deep object features and its applications to high-level vision. In IEEE International Conference on Computer Vision (pp. 504–512).
24
V. Jhala and N. Gupta
8. Ju, Z., et al. (2020). A novel approach to shadow boundary detection based on an adaptive direction-tracking filter for brain-machine interface applications. Applied Sciences, 10(19), 6761. 9. Kanade, T., et al. (2010). Boundary detection based on supervised learning. In IEEE International Conference on Robotics and Automation (pp. 3939–3945). IEEE. 10. Papari, G., & Petkov, N. (2011). An improved model for surround suppression by steerable filters and multilevel inhibition with application to contour detection. Pattern Recognition, 44(9), 1999–2007. 11. Fraser, C. S., et al. (2012). Performance comparisons of contour-based corner detectors. IEEE Transactions on Image Processing, 21(9), 4167–4179. 12. Winder, J., et al. (2017). Fully automated breast boundary and pectoral muscle segmentation in mammograms. Artificial Intelligence in Medicine, 79, 28–41. 13. Zakariah, M., & AlShalfan, K. (2020). Image boundary, corner, and edge detection: Past, present, and future. International Journal of Computer Electrical Engineering, 12(2), 39–57. 14. Xie S., & Tu, Z. (2015). Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1395–1403). IEEE.
Question Classification Based on Cognitive Skills of Bloom’s Taxonomy Using TFPOS-IDF and GloVe Rahil N. Modi, Kavya P. K., Roshni Poddar, and S. Natarajan
Abstract Question classification has been a widely researched domain in Natural Language Processing. In the context of learning, it is the task of associating each question with its corresponding skill. Bloom’s Taxonomy is a widely accepted set of hierarchical models used to classify educational content into different levels of complexity and has been used as a guideline in designing questions of various cognitive levels. The cognitive domain list has been the primary focus of most traditional education and is frequently used to structure curriculum and assessments. Knowledge, remember, and apply are the most common classes used for classifying questions. This paper compares different approaches that encode questions into embeddings and classify them using several deep learning models. The feature extraction techniques used include combinations of TF-IDF, POS, and GloVe. The classification models include seven deep learning approaches—K-Nearest Neighbor, Logistic Regression, Support Vector Machines, Random Forest, AdaBoost, Gradient Boost, and XGBoost classifiers. These models were then evaluated with three different feature extraction approaches to infer the best combination. The results of this paper conclude that the novel approach combining TFPOS-IDF and GloVe outperforms the other approaches and efficiently classifies questions. Keywords Question classification · Bloom’s taxonomy · Cognitive skills · TF-IDF · GloVe · POS · TFPOS-IDF
1 Introduction Assessing students in a comprehensive, effective manner is an important task. Bloom’s taxonomy was developed to improve communication between educators on the design of curricula and to create standard assessments that include all types of R. N. Modi (B) · Kavya P.K. · R. Poddar · S. Natarajan PES University, Bengaluru, Karnataka, India e-mail: [email protected] URL: https://www.pes.edu © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_3
25
26
R. N. Modi et al.
questions. The six cognitive skills under Bloom’s taxonomy are Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation. This paper is focused on lower order skills and describes the process of building a model that classifies questions into one of the three types. The model is then applied to build a novel database of questions for high school students. It is necessary to have an appropriate balance of questions in terms of cognitive skills in examinations. To facilitate this, we attribute each question with its corresponding skill by means of question classification. Most of the research in this aspect involves domain-specific constraints. We present a model that classifies questions based on the cognitive skills of Bloom’s taxonomy without any domain-specific constraints. The dataset used is subject agnostic, creating a generalized classifier. Questions are short and generally do not contain a lot of information. Thus, extracting features that can classify these questions accurately is challenging. We demonstrate a detailed comparison of three different feature extraction techniques— a standard TF-IDF approach, TF-IDF + GloVe, and a novel TFPOS-IDF + GloVe. This has been detailed in the next sections of the paper.
2 Literature Review Automatic Feedback Generation and Text Classification have received significant attention due to the recent advances in Natural Language Processing. Specifically, the classification of questions based on Bloom’s taxonomy for education has been researched. This section details relevant and related work of previous authors. Patil et al. classify questions based on the revised Bloom’s Taxonomy introduced in 2001 by Lorin Anderson and David Krathwohl [1]. This paper classifies questions using machine learning classifiers such as K-Nearest Neighbor (KNN) and Support Vector Machines (SVMs) [1]. It does not describe the use of any specialized feature selection methods and is a highly domain-specific approach. SVM outperforms KNN but does not give an overall stable result on the chosen test set. Waheed et al. developed a transformer-based model named BloomNet [2] that explicitly encapsulates the linguistic information along with semantic information to classify the course learning outcomes thereby improving the model’s distributed performance as well as generalization capability. Ghalib et al. also addressed question classification with labels including the cognitive skills of bloom’s taxonomy [3]. This paper uses multiple machine learning algorithms such as Naive Bayes (NB), KNN, with and without applying feature selection methods including chi-square, mutual information, and odds ratio, to classify questions on the basis of the six cognitive skills. The best result achieved was the macro F1-measure of the KNN classifier with mutual information feature selection.[3] This indicates that applying these feature selection methods positively impacts the question classification and results in a good performance. Alammary et al. introduced a new feature extraction method modified TF-IDF method that would better suit Arabic questions by applying higher weights to verbs
Question Classification Based on Cognitive Skills …
27
and interrogative words and thereby classify them appropriately [4]. The proposed method performed the best when combined with the NB algorithm, but the difference in performance between NB, SVM, and logistic regression was not found to be statistically significant [4]. The three feature extraction methods discussed do not differ in terms of complexity, and the proposed method outperforms the traditional TF-IDF method. Classification is highly dependent on the domain from which questions are taken, but Mohammed et al. present a classification model to classify exam questions that belong to several areas [5]. The paper decouples the domain from the classification model by generalizing it. Questions are classified by extracting features using TFPOS-IDF and word2vec. These features are further used as input to different machine learning classification models. The results obtained from the SVM outperformed all other classical models.
3 Methodology Question classification models take the basic questions and their labels as input, extract features, feed them to a classification model, and get a trained model which can classify questions of different domains to the appropriate cognitive skill. Figure 1 depicts the proposed methodology for question classification which involves five stages.
3.1 Question Dataset This paper aims to classify questions based on the first three cognitive skills of Bloom’s taxonomy—Knowledge, Remember, and Apply [6]. Two open-ended datasets are used to get the training data. This data contains questions that are split across multiple domains and bring a sense of generality to the model. The first dataset [7] contains questions collected from multiple websites, books, and articles, amounting to a total of 141 questions from 6 different cognitive skills. Since the scope of this paper is limited to the first three cognitive skills, only 64 questions out of 141 are taken for training. The second dataset was introduced by Yahya et al. (2012) [8]
Fig. 1 Proposed methodology for question classification
28
R. N. Modi et al.
which is used as a standard dataset for question classification based on cognitive skills.
3.2 Natural Language Processing Natural Language Processing involves four sub-phases—Tokenization, Normalization, Parts of Speech(POS)-Tagging, and Lemmatization as part of preprocessing.
3.3 Feature Extraction In this paper, we have combined multiple feature extraction techniques to create an aggregated feature vector to increase the performance of the classifier.
3.4 Classification The next step involves choosing a supervised machine learning classifier to train a model which can be used to predict cognitive skills for any new question.
3.5 Evaluation The data was split into three sections: train, validate, and test data. After each epoch, the validation data was used for fine-tuning and preventing models from underfitting. Overfitting was controlled by restricting the number of epochs to an appropriate number. The metrics used to evaluate the model include accuracy, precision, recall, and F1-score.
4 Implementation The first phase is a question dataset which includes the combination of questions from two standard datasets [7, 8]. It has a total of 364 questions from three lower cognitive skills. The next phase is natural language processing which involves processing these 364 questions and extracting features from them. It involves four steps: Tokenization, Normalization, POS-Tagging, and Lemmatization using Spacy.
Question Classification Based on Cognitive Skills …
29
The third phase is the feature extraction phase that compares three approaches and concludes that combining feature extraction techniques outperforms other approaches in domain-independent classification.
4.1 TF-IDF Approach In this approach, Term Frequency (TF) and Inverse Document Frequency (IDF) are used to obtain a value for each word in the question. This method initially needs to fix a corpus and all documents are separately distinguished. We have a corpus with 364 question documents. Each word in each document must have the TF-IDF calculated with the following formula: T F − I D F(t, d) = T F(t, d) ∗ I D F(t),
(1)
where T F − I D F(t, d) represents the TF-IDF embedding of word t in document (question) d, T F(t, d) represents the TF of term t in document d, and I D F(t) represents the IDF of term t in the corpus. The two arguments t, d along with TFIDF indicate that this approach of feature extraction is highly dependent on the dataset taken for training.
4.2 TF-IDF + GloVe Approach The second approach combines TF-IDF with GloVe Embeddings. We find TF-IDF and Glove embeddings (of dimension 100) for all the words in a question. The following computation is performed to get the final TF-IDF + GloVe embedding: (T F − I D F + GloV e)(d) = [GloV e(t1 )]100×1 ∗ [T F − I D F(t1 , d)]1×1 + [GloV e(t2 )]100×1 ∗ [T F − I D F(t2 , d)]1×1 + .... + [GloV e(tn )]100×1 ∗ [T F − I D F(tn , d)]1×1
(2)
In the above equation, (T F − I D F + GloV e)(d) is TF-IDF with GloVe for document (question) d. GloV e(tn ) represents GloVe embedding for the nth term (word) in the question. T F − I D F(tn , d)is the TF-IDF for the nth term in document d. The final T F − I D F + Glove(d) is a matrix of dimension 100 × 1 and is a representation of the whole question as a single vector. By adding the GloVe vector, we have generalized the embedding to multiple domains and decoupled it from our current dataset. This approach considers all the words with the same priority, while in question, few words have a higher priority in prediction.
30
R. N. Modi et al.
4.3 TFPOS-IDF + GloVe Approach This approach uses POS of speech to weigh the words in the question. As per our cognitive skill classification, verbs contribute the most, followed by nouns and adjectives. Hence, the following weights are used while finding TF: ⎧ ⎪ ⎨w1 if t is a verb w pos (t) = w2 if t is a noun or adjective ⎪ ⎩ w3 otherwise,
(3)
where w pos (t) is the weight assigned based on POS and w1 = 5, w2 = 3, and w3 = 1 are chosen so that verbs, nouns, and adjectives have more weightage on classification than other POS. This will drastically affect the classification. Based on weightage the T F P O S(t, d), T F P O S − I D F(t, d), and T F P O S − I D F + Glove(d) are calculated as follows: c(t, d) ∗ w pos (t) T F P O S(t, d) = , i c(ti , d) ∗ w pos (ti )
(4)
T F P O S − I D F(t, d) = T F P O S(t, d) ∗ I D F(t),
(5)
(T F P O S − I D F + GloV e)(d) = [GloV e(t1 )]100×1 ∗ [T F P O S − I D F(t1 , d)]1×1 + [GloV e(t2 )]100×1 ∗ [T F P O S − I D F(t2 , d)]1×1 + .... + [GloV e(tn )]100×1 ∗ [T F P O S − I D F(tn , d)]1×1 .
(6) In the above three equations, T F P O S(t, d) represents the weighted TF based on POS for term t in document d. T F P O S − I D F(t, d) represents weighted TF-IDF for term t in document d. T F P O S − I D F + Glove(d) is a vector of dimension 100 × 1 representing a combination of weighted TF-IDF along with GloVe. This is domain-independent due to GloVe and also has weights based on important terms. Thus, approach 3 is better than the other two approaches. The three approaches show different ways of extracting features from the processed data. Each one solves the limitations of the other. The labels need to be encoded either using One Hot Encoding or Label Encoding to be used in finding evaluation metrics of classification models. The encoding used here is Label Encoding which encodes labels “Apply” as 0, “Knowledge” as 1, and “Remember” as 2. Label encoding helps the classification model to deal with numbers which are far easier than dealing with words or strings. This is done as a feature extraction step. The next step is the classification where different models are used to classify the questions. Initially, we split the set of questions into the train, validate, and test sets. The feature matrix has each vector of dimension 100 for 364 questions. The overall matrix dimension is 364 × 100. This is split into 75% train and 25% test. This
Question Classification Based on Cognitive Skills …
31
75% train further is split into 90% train and 10% validation split. The train split is given to the model with the labels and the validation split helps the model to validate its training at the end of each epoch. Finally, after all the epochs are completed the training process ends and the model would be ready to test. Then the model is given with the test questions and asked to predict the labels (cognitive skills). These predicted labels are compared with actual test labels to find the accuracy, precision, recall, and F1-score of the model. Few models use the only train(75%) and test(25%) splits only, while others might use all three train, validate, and test splits. The different models which are used include KNN, Logistic Regression, SVM, Random Forest, AdaB, Gradient Boost Classifier, and XGBoost Classifier.
5 Results The evaluation of the classification models involves computing the following metrics—weighted precision, weighted recall, and weighted F1-score. TP , T P + FP
Pr ecision =
Recall =
F1 − scor e =
(7)
TP , T P + FN
(8)
2 ∗ (Recall ∗ Pr ecision) . Recall + Pr ecision
(9)
The equations for these weighted metrics are n W eighted Pr ecision =
i=1
pr ecision i ∗ suppor ti n , i=1 suppor ti
(10)
n W eighted Recall =
i=1 r ecalli ∗ suppor ti n i=1 suppor ti
n W eighted F1 − Scor e =
i=1
,
F1 − scor ei ∗ suppor ti n . i=1 suppor ti
(11)
(12)
32
R. N. Modi et al.
5.1 Results of K-Nearest Neighbor (KNN, where K=3) Figure 2 explains weighted recall, precision, and F1-score for different feature extraction approaches when given to the KNN classification model with K=3. It can be noted that TFPOS-IDF + GloVe Approach has the best results among the three approaches and the results increase comparatively from the TF-IDF approach to TF-IDF + GloVe approach and finally highest in TFPOS-IDF + GloVe Approach.
5.2 Results of Logistic Regression The results of logistic regression can be seen in Fig. 3. These results are not better than results obtained using KNN but follow the same pattern—TF-IDF approach has the least efficiency, TF-IDF + GloVe has moderate efficiency, and TFPOS-IDF + GloVe approach has the highest efficiency. This indicates that the proposed feature extraction approach (TFPOS-IDF + GloVe) tries to improve efficiency in all types of models in comparison to the standard TF-IDF approach.
5.3 Results of Random Forest Classifier Random forest classifier is one of the bagging techniques that use random fullstrength decision trees to generate predictive results. Its results are better than standard KNN and logistic regression. It also favors our TF-IDF + GloVe and TFPOSIDF + GloVe approaches as there is a huge increase in metrics from one approach to another. The results can be seen in Fig. 4.
Fig. 2 Results of KNN based on three feature extraction approaches
Fig. 3 Results of Logistic Regression based on three feature extraction approaches
Question Classification Based on Cognitive Skills …
33
Fig. 4 Results of Random Forest Classifier based on three feature extraction approaches
5.4 Results of AdaBoost Classifier AdaBoost is one of the basic boosting techniques where decision stumps are considered weak learners. Multiple stumps are tuned to get appropriate results. Its results are very similar to random forest with precision little higher than latter, while recall and F1-score are lower than latter. It also shows a marginal increase among all three approaches from TF-IDF to TF-IDF + GloVe and finally to TFPOS-IDF + GloVe approaches. The results can be seen in Fig. 5.
5.5 Results of Gradient Boosting Classifier Gradient boosting classifier is an AdaBoost classifier with gradient descent used to minimize the loss function while classification. It uses decision stumps as weak learners. The results as seen in Fig. 6 are better than random forest and AdaBoost due to gradient descent helping to minimize loss. Each approach improves metrics by almost 0.1 from TF-IDF to TF-IDF + GloVe to TFPOS-IDF + GloVe. This indicates the importance of the feature extraction approaches and their suitability to give higher efficiency with all classification models.
Fig. 5 Results of AdaBoost Classifier based on three feature extraction approaches
Fig. 6 Results of Gradient Boost Classifier based on three feature extraction approaches
34
R. N. Modi et al.
5.6 Results of XGBoost Classifier XGBoost classifier stands for eXtreme Gradient Boost. It is a Gradient boost classifier with more accurate approximations to find the best tree model. But in our case, XGBoost’s approximations do not help, as the results show that the efficiency is not greater when we weighted TF-IDF + GloVe using POS that is in approach TFPOSIDF + GloVe. However, in the second approach (TFIDF + GloVe), its approximations worked well and gave results that were better than Gradient Boost. Thus, this model doesn’t support weighted feature extractions. The results can be seen in Fig. 7.
5.7 Results of Support Vector Machine (SVM) SVM is one of the best and most commonly used text classification models. Its main logic is to have a hyperplane separate different classes and try to maximize the margin of the hyperplane so that the classification is efficient. It uses the kernel to generalize and make classification possible at higher dimensions. The Radial Basis Function (RBF) kernel is used here to make computation possible for infinite dimensions. The results obtained using the kernel outperformed all other classification models. Figure 8 shows that the results of SVM obtained using the TF-IDF approach were the least among all models. But when we shifted to TF-IDF + GloVe the results drastically increased with a margin of 0.15 which outperformed many models. Finally, when the novel TFPOS-IDF + GloVe approach was used then it increased by a margin of 0.1 from the second approach and outperformed all other classification models. This indicates the importance of the feature extraction techniques and how they increase the efficiency of models.
Fig. 7 Results of XGBoost Classifier based on three feature extraction approaches
Fig. 8 Results of SVM based on three feature extraction approaches
Question Classification Based on Cognitive Skills …
35
Fig. 9 Confusion Matrix and Distribution of labels using SVM Model on all three feature extraction approaches
5.8 Comparison of Different Approaches Using the SVM Model Figure 9 depicts the confusion matrix and distribution of predicted labels by the SVM model using three feature extraction approaches. Using the TF-IDF approach, there were 46 misclassifications out of 91 test data. Misclassifications (False Positives and False negatives) were more than correct classification in the TF-IDF case. The distribution was highly skewed towards Knowledge. The second approach (TF-IDF + GloVe) fared much better than TF-IDF as it had 31 misclassifications out of 91 test data. GloVe embeddings addition to TF-IDF tuned a few parameters and made classification better. The distribution also was more stable than TF-IDF but not accurate enough. The TFPOS-IDF + GloVe approach outperformed the other two approaches by having minimum misclassification of 21 out of 91 test data. The distribution was also fairly stable in comparison to the other two approaches. The weights given based on POS were effective in reducing misclassifications. This way the novel TFPOS-IDF + GloVe was better on a generalized dataset without any specific domain.
5.9 Overall Results In terms of feature extraction, the proposed TFPOS-IDF + GloVe approach outperformed other approaches. However, it is also important to know which classification model outperforms all other models using this approach. From Fig. 10, it is clear that SVM with RBF kernel outperforms all other classification models considered in this paper while using TFPOS-IDF + GloVe feature extraction technique. It has the highest recall, precision, and F1-score. Figure 11 depicts F1-score for different models and their comparison at different feature extraction techniques along with Precision vs Recall and ROC (Receiver Operating Characteristic) curves. Each model’s elevation across different approaches can be compared using these line plots. They are very useful in identifying the marginal increase in metrics between different feature extraction approaches for different models.
36
R. N. Modi et al.
Fig. 10 Results comparing different models which are using TFPOS-IDF + GloVe approach
Fig. 11 Line Graph denoting weighted F1 − scor e values of different models following three feature extraction approaches along with Precision vs Recall and ROC curves
6 Future Work and Conclusion This paper tries to generalize the model by combining two domain agnostic datasets. It uses the lower three cognitive skills for classification. The final approach combines Glove and TFPOS-IDF for feature extraction followed by an SVM classifier. This approach had an enormous increase in efficiency compared to previous approaches. Future work involves the inclusion of all six cognitive skills for classification [5], performance improvement of SVM using kernels optimization [9, 10], and support for other languages [4].
References 1. Patil, S. K., & Shreyas, M. M. (2017). A comparative study of Question Bank classification based on revised Bloom’s taxonomy using SVM and K-NN. In 2nd International Conference On Emerging Computation and Information Technologies (ICECIT) (pp. 1–7). https://doi.org/ 10.1109/icecit.2017.8453305 2. Waheed, A., Goyal, M., Mittal, N., Gupta, D., Khanna, A., & Sharma, M. (2021). BloomNet: A robust transformer based model for Bloom’s learning outcome classification. arXiv preprint 2108. https://doi.org/10.48550/arXiv.2108.07249 3. Ghalib, N., & Hammad, D. S. (2020). Classifying exam questions based on Bloom’s taxonomy using machine learning approach. In Technologies for the Development of Information Systems
Question Classification Based on Cognitive Skills …
37
4. Alammary, A. S. (2021). Arabic questions classification using modified TF-IDF. IEEE Access, 9, 95109–95122. https://doi.org/10.1109/access.2021.3094115 5. Mohammed, M., & Omar, N. (2020). Question classification based on Bloom’s taxonomy cognitive domain using modified TF-IDF and word2vec. PLOS ONE. https://doi.org/10.1371/ journal.pone.0230442 6. Zhang, J., Wong, C., Giacaman, N., & Luxton-Reilly, A. (2021). Automated classification of computing education questions using Bloom’s taxonomy. In Australasian Computing Education Conference (pp. 58–65). https://doi.org/10.1145/3441636.3442305 7. Haris, S. S., & Omar N. (2015). Bloom’s taxonomy question categorization using rules and N-gram approach. Journal of Theoretical & Applied Information Technology, 76, 401–407 8. Yahya, A. A., Osman, A., & Alattab, A. A. (2013). Educational data mining: A case study of teacher’s classroom questions. In 13th International Conference on Intellient Systems Design and Applications (pp. 92–97). https://doi.org/10.1109/isda.2013.6920714 9. Gupta, U., & Gupta, D. (2021). Kernel-target alignment based fuzzy lagrangian twin bounded support vector machine. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 29, 677–707. https://doi.org/10.1142/s021848852150029x 10. Gupta, U., & Gupta, D. (2021). Regularized based implicit lagrangian twin extreme learning machine in primal for Pattern Classification. International Journal of Machine Learning and Cybernetics, 12, 1311–1342. https://doi.org/10.1007/s13042-020-01235-y
Sign2Sign: A Novel Approach Towards Real-Time ASL to ISL Translation Sudhanva Rajesh, Ashwath Krishnan, and S. Natarajan
Abstract Sign language is the main form of communication for people with speech and hearing imparity. There are over 300 types of sign languages across the world, with American Sign Language (ASL) and Indian Sign Language (ISL) most popularly used in the United States of America and India, respectively. When a person familiar with ASL has to communicate with a person familiar with ISL, there is bound to be a communication gap and calls for the need of skilled translators. We aim to automate this process by proposing a novel solution that can translate real-time ASL input into ISL. The proposed methodology uses a CNN trained to recognize 36 different classes of ASL signs. The recognized signs are mapped to the respective ISL signs and joined together to form a video. The CNN model for recognizing ASL achieved a testing accuracy of 96.43%. Keywords Indian Sign Language · American Sign Language · Sign language translation · Sign language generation
1 Introduction In the hearing and speech disabled community, sign language is the major means of communication. There has always been a communication gap between this section of society and the rest of the world, with the impaired often relying on translators to act as intermediaries while communicating. There are over 300 different forms of sign languagesused around the world, with around 300 different types being used all around the globe. This is due to the fact that sign languages evolved naturally among people of many ethnic backgrounds. Indian Sign Language (ISL) is most commonly S. Rajesh (B) · A. Krishnan · S. Natarajan Department of Computer Science and Engineering, PES University, Bengaluru, Karnataka, India e-mail: [email protected] S. Natarajan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_4
39
40
S. Rajesh et al.
Fig. 1 ASL and ISL Hand Gestures
used in India. There exist different dialects of ISL present, but efforts have been taken in the recent past to standardize ISL. In the United States of America, the most extensively used sign language is American Sign Language (ASL). It is a known fact that the ratio of the number of translators present to the number of hearing and speech impaired people is very low. This adds to the isolation of such people as they face difficulty in communication. This has a major impact on their day-to-day life. Furthermore, enabling communication between users of ASL and ISL would require the presence of multiple skilled translators, which is often not possible. The difference in the ASL and ISL signs for the same letter can be visualized in Fig. 1. This indicates the need for an automated system that can translate signs from one form to another. In this paper, we propose a solution where real-time input of ASL is converted to ISL. The proposed methodology aims to detect and classify signs in ASL and provide a video output of the respective signs in ISL. The system was trained to identify 36 classes which include 26 alphabets and 10 numbers. The detected signs are then put together as text, and separated into words. The corresponding signs for the words in ISL are joined into a video which is given as the output. Words not having a sign representation in ISL are finger spelt. For joining the various ISL videos, MoviePy has been used. MoviePy is a video editing Python package that’s widely used for basic video operations (cuts, con-
Sign2Sign: A Novel Approach Towards Real-Time ASL …
41
catenations, title insertions), video compositing, and video processing. The library supports reading and writing videos in the most popular video formats. Input images are preprocessed and passed into the model to be predicted. We perform histogram segmentation to detect the hand in the ROI along with the various preprocessing steps.
2 Related Work This section discusses the relevant background work on Sign language recognition that has been done. Revanth et al. (2019) [1] proposed a technique for developing a sign language recognition system where the signs presented by a voice impaired person are converted to text. The authors propose using a Support Vector Machine (SVM) for classifying the presented signs. The paper describes how gesture recognition of human hands is a very tedious task due to the presence of multiple joints in the human skeletal structure. The skeletal structure of a human hand has 27 degrees of freedom. The proposed methodology takes the input frame and uses only the region of interest (ROI). The authors follow a number of preprocessing tasks on this ROI. The ROI first undergoes skin masking, after which Canny Edge Detection is performed. Feature detection on this processed ROI is done by Oriented Fast and Rotated Brief (ORB), which is an efficient and viable alternative to SIFT and SURF. ORB is not only more computationally efficient but also faster than SIFT. Key Points in the ROI were detected using FAST keypoint detectors, and the Harris corner measure was used to get the prominent key points. Finally, the BRIEF descriptor was used to provide descriptors to these key points. The suggested approach received a 0.90 F1-score. Hatibaruah et al. (2020) [2] proposed the use of convolutional neural networks to recognize ISL signs. An ISL database with 26 alphabets and 10 numbers was used to train the model. For picture segmentation, the approach employs the histogram back projection technique. The model was trained for 5 epochs and obtained 99.89 per cent testing accuracy and 99.85 per cent validation accuracy. The system was tested with data fed in real time. Once the input is fed in, the histogram is calculated and saved. The images are converted from RGB to HSV to binary. Morphological and noise removal filters were added as well. The best results were observed with a light background with gloves on. Yadav et al. (2015) [3] used a unique method for feature extraction. The RGB components of each image corresponding to the signs that have to be stored are extracted. The authors then proceed to apply cosine, wavelet, and Haar Transforms on every individual plane of the signs. The feature vectors of the factional energy coefficients were then prepared. A database containing the feature vectors of the training images was created. The methodology uses a KNN classifier which was trained on this database. By varying the value of ‘k’-nearest neighbors, different models were built and the performance of these models was evaluated using different
42
S. Rajesh et al.
similarity measures such as Cosine Similarity, Euclidean distance, Correlation, and City Block Metric. Different rules were employed to select the optimal KNN model to classify the data, such as nearest, random, and consensus. With 1.5625 per cent fractional energy, the Hybrid Wavelet Transform outperforms the Haar and Cosine Transforms, according to the study. Gupta et al. [4] suggested a method that begins by distinguishing between movements that make use of both hands, and those that make use of only a single hand. This task is accomplished by employing an algorithm that classifies images based on HOG characteristics by making use of a SVM classifier. The SIFT and HOG features are combined, and classification of the signs is then done by making use of a KNN classifier. Rekha et al. [5] proposed an approach where the YCbCr skin color model is used to segment and detect the Hand area of the input picture. For identifying hand postures, complexity defects methods, wavelet packet decomposition (WPD-2), and Principal Curvature-Based Region (PCBR) detector are carried out. Multi-class nonlinear SVMs were used to classify the recognized posture. The trajectory feature vector was used to classify dynamic motions using Dynamic Temporal Warping (DTW). Sharma et al. [6] analyzed three models: a hierarchical neural network, a pretrained VGG16 with transfer learning and VGG16 with fine-tuning that was trained on about 149,000 images, consisting of all the 26 alphabets of the ISL. The hierarchical model outperformed the other two with an accuracy of 97% for two-hand gestures and 96.52 for single-handed gestures. Shamrat et al. [7] proposed a CNNbased approach to identify numbers of the Bangladeshi Sign Language. The dataset consists of 310 images with 31 images for each class. Skin segmentation is carried out to separate the signs from the input image and the D-LBP technique is used to fix low-resolution images. The proposed approach achieves an accuracy of 99.8%. Gupta et al. [8] suggested a hybrid approach to reduce the effects of noise and outliers by using TBSVM with the method of least squares and by assigning different fuzzy values. The proposed KTA-FLSTBSVM outperforms existing approaches such as LSTSVM and TBSVM in terms of classification accuracy and the training time taken by the CPU.
3 Methodology The methodology can be split into six parts: (1) Labeling and collection of the dataset, (2) Creating the hand histogram, (3) Preprocessing the input hand postures, (4) Passing pre-processed images to the CNN (5) Predicting ASL signs, and (6) Stringing together corresponding ISL signs to form meaningful sentences. The subsections that follow go through the subject in further depth. The method proposed in this paper can be visualized in Fig. 2.
Sign2Sign: A Novel Approach Towards Real-Time ASL … Fig. 2 Workflow of proposed methodology
43
Generating
Preprocessing ROI
Hand Histogram
containing ASL Sign
No
Passing pre-processed frame into CNN
frame count greater
Count the frames
than threshold?
of the same sign
Yes
Store sign in list of predicted signs
Split recognized signs into words
Word present in
Search for word
ISL vocabulary?
in ISL vocabulary
Yes
Add ISL sign of word to output Video
No
Add ISL sign of each letter of word to output Video
3.1 Labeling and Collection of the Dataset The dataset used to train the CNN model consisted of 10500 images. These images were collected and labelled manually and consist of 36 classes which include 26 alphabets belonging to the English lexicon and 10 numbers. These images underwent a number of preprocessing steps including converting the RGB image to the HSV colorspace, Gaussian and median blurring, and finally binary thresholding.
3.2 Creating the Hand Histogram In order for the classification model to recognize hands in real-time, image histograms are used. The main aim of this step is to differentiate the skin from its surroundings. The image histogram is a graph that depicts the distribution of pixel intensities in a
44
S. Rajesh et al.
digital image. Therefore, hand postures are placed in a region of interest from which the pixel values are extracted and stored as threshold values which are later used to segment images.
3.3 Preprocessing the Input Hand Postures The preprocessing of the images first begins with extracting hand postures from the region of interest and converting them into the HSV colourspace. This is then followed by Gaussian blurring to remove noise from the surroundings. The blurred image is obtained after performing a convolution operation against an 11 × 11 filter. Median blur is then performed to remove salt and pepper noise, which is characterized by randomly occurring white and black pixels. Binary thresholding is performed on the blurred image, which is a segmentation technique used for separating objects from their background. The threshold obtained from the hand histogram is compared with each pixel of the image. This divides the input image into two groups: The first group has pixel values lesser than the threshold whose pixel value is now set to 0 and the second group has pixel intensities greater than the threshold, whose pixel values are set to 255. Contours are then found on the thresholded image and the contour with the largest area is passed onto the convolutional neural network. The ASL signs after preprocessing can be visualized in Fig. 3.
3.4 Passing Pre-processed Images to the CNN The preprocessed images from the previous step are passed into the CNN. The CNN model consists of three convolutional layers with a ReLu activation function, having 16, 32, and 64 filters each. A max-pooling layer follows each of the first two convolutional layers. A flattening layer, a dense layer with 128 units having the ReLu activation function, a dropout layer, and a final dense layer follow the convolutional and pooling layers. The optimizer used is the stochastic gradient descent optimizer with the learning rate set to 0.01. Figure 4 summarizes the CNN model architecture.
3.5 Predicting ASL Signs When the images have been preprocessed, they are passed as the input to the classification model. The model predicts the hand postures frame by frame. In order to finalize the letter/word represented within the region of interest, the number of continuous frames for that particular hand posture is counted, and if it is above a threshold value, say 50 frames, we predict the hand posture with certainty and add the recognized posture to our list of hand postures. In order to separate words, if no
Sign2Sign: A Novel Approach Towards Real-Time ASL …
45
Fig. 3 ASL signs after preprocessing
hand posture is shown in the region of interest for a given number of frames, a space is added between the preceding and following word/letter and the upcoming hand posture is considered to be a part of a new word. Hence, the real-time input of hand postures has now been converted into a list of postures to form a sentence.
3.6 Stringing Together Corresponding ISL Signs to Form Meaningful Videos In order to join together the recognized hand postures into a meaningful sentence and create a translation of the same in ISL, python’s MoviePy library has been used. To achieve a translation in the form of a video, each word in the sentence is crosschecked with the vocabulary of the ISL. If the provided hand pose in ASL is included in the ISL vocabulary, the video corresponding to the word is fetched and added to the list of ISL videos. If a word does not appear in the ISL lexicon, it is fingerspelled letter by letter and added to the list of ISL clips. The videos present in the list of ISL videos are concatenated together with the help of MoviePy to form a continuous stream of ISL hand postures, translated from ASL.
46 Fig. 4 CNN model architecture
S. Rajesh et al.
Input
Output
Conv2D
SoftMax
Max Pooling 2D
Dense
Conv2D
Dropout
Max Pooling 2D
Dense
Conv2D
Max Flattening Pooling 2D
4 Results The convolutional neural network was trained on 36 classes, having a total of 10,500 images. The model achieved an average test accuracy of 96.43% . Real-time recognition of ASL signs can be visualized in Fig. 5. The training and validation sets’ accuracy is shown in Fig. 6. The recognized ASL signs are first converted into text and split into words. ISL signs corresponding to the words are appended to a list of ISL videos based on the presence of the ASL sign in the vocabulary of ISL. If a word does not appear in the ISL lexicon, it is fingerspelled letter by letter and added to the list of ISL clips. MoviePy is then used to concatenate these individual ISL videos to form a meaningful translation from ASL to ISL. Figure 7 shows the average time taken to generate a video for ASL signs in realtime based on the length of the text corresponding to the sign. As can be seen from the graph, it takes longer to construct a video translation to ISL if a given word is not present in the vocabulary. Table 1 shows the Precision, Recall, and F1-score for each of the 36 classes for which the model was trained to identify. The classes include 26 alphabets and 10 numbers of the English lexicon. Table 2 compares the proposed workflow with other models.
Sign2Sign: A Novel Approach Towards Real-Time ASL …
Fig. 5 Real time-detection of ASL
Fig. 6 Accuracy on the training and validation sets per Epoch
47
48
S. Rajesh et al.
Fig. 7 Average time taken to generate a video
5 Conclusions This paper proposed an approach for translation between two of the most popularly used sign languages in the world. ASL to ISL translation has been done in three steps. (1) Preprocessing of each input frame containing an ASL sign. Preprocessing includes converting the frame to a HSV colorspace, Gaussian blurring, and median blurring to remove noise, binary thresholding, and finding contours. (2) The preprocessed images are passed through the CNN which recognizes the ASL sign. In order to finalize the predicted sign, the number of continuous frames for that particular hand posture is counted and if it is above a threshold value, the sign is predicted with certainty and added to the recognized list of postures. (3) The recognized ASL signs are split into words, and the ISL signs for each of the words are joined together to form a video. If a word does not appear in the ISL lexicon, it is fingerspelled letter by letter and the corresponding ISL postures are added to the video. Thus, this paper presents an approach towards translating ASL signs into ISL signs, thereby automating the process of translation and removing the need for a human intermediary.
6 Future Directions The proposed method uses a CNN for recognizing ASL signs. This could be improved by including transfer learning for using bigger models thereby increasing the accuracy of the model. The current model has been trained to recognize 36 different ASL signs, this could be improved to recognize a larger number of classes. In the proposed methodology, the ISL gestures are joined to form a video as the output, future scope could include using an animation tool to generate animations for the output.
Sign2Sign: A Novel Approach Towards Real-Time ASL … Table 1 Precision, Recall, and F1-score of the individual classes class Precision Recall A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9
1.00 1.00 0.90 0.98 0.98 0.97 1.00 0.93 0.99 1.00 1.00 0.99 0.99 0.99 0.99 1.00 0.99 1.00 0.97 0.67 0.97 0.99 0.98 1.00 0.98 0.97 1.00 1.00 0.92 0.97 0.95 0.99 0.95 0.98 1.00 0.71
0.96 1.00 0.95 0.99 1.00 0.64 0.92 1.00 1.00 1.00 1.00 1.00 0.95 0.58 0.82 1.00 1.00 0.99 0.98 0.97 1.00 0.96 0.98 1.00 0.99 0.97 1.00 0.99 0.98 1.00 0.99 0.96 1.00 0.99 0.99 0.98
49
F1-score 0.98 1.00 0.92 0.99 0.99 0.78 0.96 0.96 1.00 1.00 1.00 0.99 0.97 0.73 0.90 1.00 0.99 0.99 0.97 0.79 0.98 0.98 0.98 1.00 0.98 0.97 1.00 0.99 0.95 0.98 0.97 0.98 0.97 0.99 0.99 0.83
50
S. Rajesh et al.
Table 2 Comparison of proposed workflow with other models References No. of classes Language Model Beena et al. [9] Quesada et al. [10] Raheja et al. [11] Hore et al. [12] Agrawal et al. [13] Proposed
Avg accuracy (%)
35 26
American American
CNN, SVM SVM
92 86.75
4 22 36
Indian Indian Indian
SVM PSO SVM
97.5 99.96 93
36
Indian
CNN
96.43
References 1. Revanth, K., & Raja, N. S. M. (2019). Comprehensive svm based indian sign language recognition. In 2019 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN) (pp. 1–4). IEEE. 2. Hatibaruah, D., Talukdar, A. K., & Sarma, K. K. (2020). A static hand gesture based sign language recognition system using convolutional neural networks. In 2020 IEEE 17th India Council International Conference (INDICON) (pp. 1–6). IEEE. 3. Yadav, N., Thepade, S., & Patil, P. H. (2015). Noval approach of classification based indian sign language recognition using transform features. In 2015 International Conference on Information Processing (ICIP) (pp. 64–69). IEEE. 4. Gupta, B., Shukla, P., & Mittal, A. (2016). K-nearest correlated neighbor classification for indian sign language gesture recognition using feature fusion. In 2016 International Conference on Computer Communication and Informatics (ICCCI) (pp. 1–5). IEEE. 5. Rekha, J., Bhattacharya, J., & Majumder, S. (2011). Shape, texture and local movement hand gesture features for Indian sign language recognition. In 3rd International Conference on Trendz in Information Sciences & Computing (TISC2011) (pp. 30–35). IEEE. 6. Sharma, A., Sharma, N., Saxena, Y., Singh, A., & Sadhya, D. (2021). Benchmarking deep neural network approaches for Indian sign language recognition. Neural Computing and Applications, 33(12), 6685–6696. 7. Shamrat, F. J. M., Chakraborty, S., Billah, M. M., Kabir, M., Shadin, N. S., & Sanjana, S. (2021). Bangla numerical sign language recognition using convolutional neural networks. Indonesian Journal of Electrical Engineering and Computer Science, 23(1), 405–413. 8. Gupta, U., & Gupta, D. (2021). Kernel-target alignment based fuzzy Lagrangian twin bounded support vector machine. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 29(05), 677–707. 9. Beena, M., Namboodiri, A., & Thottungal, R. (2020). Hybrid approaches of convolutional network and support vector machine for American sign language prediction. Multimedia Tools and Applications, 79(5), 4027–4040. 10. Quesada, L., López, G., & Guerrero, L. (2017). Automatic recognition of the American sign language fingerspelling alphabet to assist people living with speech or hearing impairments. Journal of Ambient Intelligence and Humanized Computing, 8(4), 625–635. 11. Raheja, J., Mishra, A., & Chaudhary, A. (2016). Indian sign language recognition using Svm. Pattern Recognition and Image Analysis, 26(2), 434–441.
Sign2Sign: A Novel Approach Towards Real-Time ASL …
51
12. Hore, S., Chatterjee, S., Santhi, V., Dey, N., Ashour, A. S., Balas, V. E., & Shi, F. (2017). Indian sign language recognition using optimized neural networks. In Information technology and intelligent transportation systems (pp. 553–563). Springer. 13. Agrawal, S. C., Jalal, A. S., & Bhatnagar, C. (2012). Recognition of Indian sign language using feature fusion. In 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI) (pp. 1–5). IEEE.
Analysis of Patient Tuberculosis Tenet Death Reason and Prediction in Bangladesh Using Machine Learning Md. Imtiaz Ahmed, Rezoana Akter, and Fatima Shefaq
Abstract Tuberculosis disease is one of the world’s top infectious diseases that leads to a huge number of patient death worldwide. Tuberculosis gradually attacks the lungs of the patients, where Mycobacterium tuberculosis is one of the main reasons for tuberculosis attacks which are caused by the bacteria. As innovation of new techniques helps generation minimize the tasks or observation period, Machine learning is one of the popular techniques by which a person or an organization can easily build a model to evaluate data, generate ideas, or prediction of values. Machine learning algorithms are used enormously in different sectors, thus using Machine learning models in the health sector increasing rapidly. Health professionals can easily predict or observe a patient’s disease using the previous history of the same patient or different similar patient’s history. In the paper, tuberculosis patient’s death rationale is harmonized from the World Health Organization dataset of tuberculosis disease’s class called causes and deaths, where the country Bangladesh’s dataset has been used. Feature of the dataset is one of the main concerns of the patient’s death, which is identified using the Machine learning regression and classification algorithm. Linear Regression, Logistic Regression, Decision tree, Random forest, KNN, XGB, Adaboost and algorithms are used in the process to create a model which can identify the best features and it is figured out that Random forest provides the best results. The prediction model for finding the number of death of patients build using the machine learning regression algorithms, where linear regression prediction accuracy is 0.99943, however, the linear model’s features selection for the process are not the best noticeable. The random forest algorithm’s prediction accuracy was found 0.97820, which is nearest to the linear regression accuracy. In one sentence, it is figured out that Random forest is the best-observed algorithm in both prediction accuracy and feature importance detection.
Md. I. Ahmed (B) · R. Akter Prime University, Dhaka, Bangladesh e-mail: [email protected] F. Shefaq North South University, Dhaka, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_5
53
54
Md. I. Ahmed et al.
Keywords Tuberculosis · Machine learning · Prediction of death per year · Feature importance
1 Introduction Tuberculosis is a disease that needs to be treated properly otherwise it may be the cause of a patient’s death. Mycobacterium tuberculosis is one of the deadly types of tuberculosis by which many patients need to be taken care of in an appropriate way [1]. In regards to tuberculosis treatment, the World health organization (WHO) invented the idea of DOTS (Directly Observed Treatment, Short-course) [2]. DOTS implementation is costly and implementation for the treatment of a patient will be in need of lengthy time. In many cases, patients leave the treatment course as it normally takes a good amount of time. World health sectors are still hugely concerned about tuberculosis since the identification of its death rate. It is taken the world’s top infectious disease where the ratio of the dead is going down each day however it is still a big concern for any country as its dead ratio is still high [3]. Tuberculosis is mainly caused by bacteria, and it affects the lungs. Tuberculosis patient’s dead reports are published by the World Health Organization each year [4]. In that sense, it is still a highly contagious disease that needs to be taken care of perfectly. Machine learning is a technique where a large dataset can be executed properly with the threshold classification and regression [5]. Machine learning can predict future data of specific issues, or specific labels. Machine learning is now a widely used technique that can help to sort out data and analysis of the data. Feature importance is another big concern for Machine learning where important features or variables of a dataset can be easily identified [6]. Machine learning uses in medical science are immeasurable as medical science always works with the parameter of identification, experiments, and implementation. Machine learning can help health experts to identify a case more stochastically rather than doing it manually. Machine learning prediction model and feature importance model can help the experts to identify the reason or any variable’s importance regards the patient’s health situation, and hence experts can provide accurate treatments to patients. As tuberculosis is a vital issue still nowadays and due to the COVID-19, its importance more increased as COVID-19 leads to the patient’s pneumonia situation worst as tuberculosis is related to the patient’s pneumonia [7]. In this paper, Machine learning regression algorithms have been employed to predict the number of deaths in Bangladesh in the future by creating a Machine learning model using the World Health Organization datasets. At the same time, it is checked what factors are the concern of the death in tuberculosis using the different algorithm’s feature importance results. The results are well expected and through the model, one person can easily identify the reasons for the death of tuberculosis.
Analysis of Patient Tuberculosis Tenet Death …
55
2 Related Work World Health Organization updates and regulates the data of tuberculosis world worldwide as it is still a vital infectious disease in the world. The work of other researchers in the field of tuberculosis shouldn’t be ignored as the researcher’s potential research on the disease tuberculosis still impacts largely worldwide. In the modern era, technology has been updating drastically and no person can ignore the use of new ideas and technologies. Machine learning and data science impacts on technological inventions are increasing rapidly as it’s the potentiality to identify or classify an issue or solve an issue. Mycobacterium tuberculosis is a savage type that causes tuberculosis deadly. Tuberculosis (TB), caused by Mycobacterium tuberculosis (M.tb), causes the highest number of deaths globally for any bacterial disease necessitating novel diagnosis and treatment strategies. In that sense, Machine learning’s Naïve Bayes, k nearest neighbor, support vector machine, and artificial neural network were used to build the prediction models where the average prediction accuracy was 85% [8]. Machine learning methods have been widely applied for timely predicting resistance of MTB given a specific drug and identifying resistance markers. Machine learning algorithm’s Area under Curve (AUC), Precision, Recall and F1-score are used to find out the best-fitted model of machine learning which can focus on which drug can be successful in treating the MTB [9]. Tuberculosis treatment courses are a big challenge to prevent patient’s health from tuberculosis where the DOTS (directly observed treatment, short-course) therapy is a key to the course. Six algorithms of Machine learning have been used to sort out the best prediction accuracy where the precision and recall of the algorithm have been inspected as well. The decision tree algorithm’s prediction accuracy found the best algorithm for the treatment course [10]. DOTS course is a little bit expensive and lengthy process, to overcome the complexity, a machine learning model was developed with 4213 patients, out of which 64.37% completed their treatment. Results were evaluated using 4 performance measures; accuracy, precision, sensitivity, and specificity [11]. Drug-based methods are widely utilized methods to give proper treatment for any disease in medical science as it is thought for a patient’s first course. In that sense, Machine learning models are being used to predict tuberculosis drug resistance from the whole genome sequencing where the gradient boost tree found an effective algorithm regarding its AUC score and prediction score [12]. Tuberculosis drug protection rates against tuberculosis disease are trained on the Machine learning modules where Artificial Neural networks provide the best accuracy [13]. Tuberculosis death prediction is a key observation nowadays in accordance with the death rate worldwide and in that case, machine learning can help healthcare representatives to provide the best efforts to patients with bad conditions. Based on that thinking the Brazilian health database is examined in the algorithms of Machine learning where the Gradient boosting algorithm provides the best accuracy [14]. Machine learning algorithm’s prediction was used to identify features associated with
56
Md. I. Ahmed et al.
treatment failure and to predict which patients are at the highest risk of treatment failure where data was collected from the National Institute of Allergy and Infectious Diseases, and Simulation was done by the AUC scores [15]. The hospital’s medical department’s record was implemented in the process of the machine learning algorithm to predict and determine the risk factors associated with the disease Tuberculosis where the classification method performed well and got an accuracy of 67.5–73.4% [16]. Tuberculosis spreads rapidly in the Animal sectors and Machine learning applications used on the farm data create a model for the animal tuberculosis death prediction where the gradient boosted trees were used and found an accuracy of 88.07% [17].
3 Dataset 3.1 Dataset Collection Dataset is always required to create a model in Machine learning and predicting the efficacy of a model variable is always needed. Dataset for this research is collected from the World Health Organization (WHO) for the country named Bangladesh only, though the dataset has the world’s different country’s documentation [4]. The dataset has four categories where each category has multiple variables named as Cases and deaths, Drug-resistant TB, Co-epidemics of TB and HIV and Treatment success [18]. The cases and deaths variable has been clasped into consideration as this paper focused on the causes of death and possible death prediction in near future. The cases and deaths variable has 5 different categories where each of them also has multiple variables [19]. The Mortality Data has two identical variables that contain the number of deaths in each country [20], the incidence Data has five different variables [21], Treatment coverage Data has three variables [22] and New case notifications Data has six variables [23], Previously treated case notifications data has also five variables [24], all the dataset contains the tuples 2000–2019 years, means per year data for each country. In that regard, the year variable is also taken into account for simulation.
3.2 Dataset Elaboration All the variables of the tuberculosis case and deaths have been merged as one dataset where the year variable is taken as the first attribute. As already stated above that the Case and Deaths class of tuberculosis has already been 5 different attributes and each of them has multiple variables. Below are given the stepwise name of the variables that have been taken into the implementation. Year: Year (2000–2019). I1: Number of incident tuberculosis cases.
Analysis of Patient Tuberculosis Tenet Death …
57
I2: Incidence of tuberculosis (per 100 000 population per year). I3: Number of incident tuberculosis cases in children aged 0–14. I4: Number of incident tuberculosis cases, (HIV-positive cases). I5: Incidence of tuberculosis (per 100 000 population) (HIV-positive cases). N1: New or unknown treatment history cases: Pulmonary, bacteriologically confirmed. N2: New or unknown treatment history cases: Pulmonary, clinically diagnosed. N3: New cases: Pulmonary, smear-positive. N4: New cases: Pulmonary, smear-negative/unknown/not done, and other new cases. N5: New cases: extrapulmonary. N6: Treatment history unknown. R1: Relapse cases: Pulmonary, bacteriologically confirmed. R2: Relapse cases: Pulmonary, clinically diagnosed. R3: Relapse cases: extrapulmonary. R4: Relapse cases (pre-2013 definition). R5: Previously treated cases, excluding relapse. T1: Tuberculosis treatment coverage. T2: Number of incident tuberculosis cases. T3: Tuberculosis—new and relapse cases. M1: Deaths due to tuberculosis among HIV-negative people (per 100 000 population). M2: Number of deaths due to tuberculosis, excluding HIV.
4 Methods and Process The merged dataset was implemented into the google collaboratory, then variables who have null values of the dataset have been checked. Those cells of the variables with null values are replaced with the zero (0) number. After that, the dataset has been split into features and labels, where the M2 column is used for the labels and the rest 21 columns are used for the features. Once the labels and features are prepared it’s then loaded to the Machine Learning regression and classification algorithms to figure out the best features for that death causes in different years in Bangladesh. After that, the dataset has been divided into train and test models to find out the prediction accuracy of the number of deaths using the different regression algorithms of Machine learning. Precision, Recall and F1 scores have been tried to identify through the process, however, it’s not possible because the dataset is not classification data rather its labels are actually regression data.
58
Md. I. Ahmed et al.
4.1 Dataset Scrubbing Dataset has null values in I3, N1, N2, N3, N4, N6, R1, R2, R3, R4 and R5, columns and those null values have been replaced with 0 using the fillna() function in python. Using the python correlation function the dataset dependencies on each variable tried to sort out and found that ‘Number of deaths due to tuberculosis, excluding HIV(M2)’ depends mostly on the ‘Deaths due to tuberculosis among HIV-negative people (per 100 000 population) (M1)’, ‘New cases: Pulmonary, smear-negative/unknown/not done, and other new cases (N4)’, ‘Relapse cases (pre-2013 definition) (R4)’, and ‘New cases: Pulmonary, smear-positive (N3)’, in which the M1 has the same values as it determines per hundred thousand people possibility to the tuberculosis death as M2 contains the total number of death per year. However, the N4, N3, denote the smear-positive and smear-negative cases of tuberculosis that can be taken as a vital issue. R4 relates to the relapse cases that another vital threshold for thinking of the disease death reason. Values of the correlation function of the death factors(M2) are given below: Year: –0.926302, I1: –0.921119, I2: NaN, I3: –0.561074, I4: –0.818953, I5: –0.785236, N1: –0.870487, N2: –0.865307, N3: 0.505247, N4: 0.823851, N5: –0.939251, N6: 0.156880, R1: –0.902272, R2: –0.877094, R3: –0.872054, R4: 0.614663, R5: –0.081896, T1: –0.986672, T2: –0.921119, T3: –0.987304, M1: 0.985. In Fig. 1, The total tuberculosis death number is plotted against the M1, N4, R4, and N3 as these four features are found the main reason for using the correlation function. For better understanding, a visual pie chart of the death numbers per year is scattered in Fig. 2.
Fig. 1 M2 variable’s dependency on the other variable of the dataset
Analysis of Patient Tuberculosis Tenet Death …
59
Fig. 2 A visual pie chart of the death number’s per year in Bangladesh
4.2 Feature Importance Techniques Using Machine Learning Algorithms M2 column’s dependencies to other columns have been identified using the correlation function. In the next step, how the variables actually depend on the M2(Number of deaths due to tuberculosis, excluding HIV) variable, are being taken into account by training the dataset in different Machine learning algorithms. Before implementing the dataset, the dataset has been sorted into features and labels where x denotes the features and y denotes the labels. y contains the M2 and the rest of the variables represent features. The machine learning regression and classification algorithms are employed to find out the best feature by using the model coefficient. The enumerate() function is used to get each feature score depending on the M2 column. For regression feature importance, the Linear regression, Logistic regression, Decision Tree regression, Random Forest regression, KNN-permutation regression, Gradient boosting regression (XGB), Adaboost regression, models accustomed to identify the best feature that depends on the M2 variable. For classification of the data, the Decision Tree classifier, Random Forest classifier, KNN-permutation classifier, Gradient boosting classifier (XGB) and Adaboost classifier are being used. The scores of each of the models have been listed in Table 1.
0.000
0.000
−0.001
−0.001
−0.2373
R4
0.000
0.000
−0.024
0.059
−0.000
0.0304
−0.6413
−534.35
T2
T3
M1
0.000
0.000
0.000
0.000
0.0508
94.746
R5
T1
0.000
0.002
0.001
2.4201
−3.6365
0.018
R2
0.002
R3
0.2556
R1
0.010
0.000
0.013
−0.000
−0.3543
0.4095
N5
N6
0.002
−0.010
0.0667
N4
0.667
0.010
0.024
0.000
0.000
0.000
0.000
−0.035
0.2932
0.1139
N2
N3
0.0277
N1
0.063
0.000
0.000
13.802
−23,477
I4
I5
0.024
−0.0589
I3
0.000
0.286
−0.024
−0.000
0.0304
−0.0000
I1
I2
DT
0.003
LoR
−0.000
LR
2388.60
Regression algorithm importance
Year
Variables
0.065
0.097
0.069
0.073
0.006
0.029
0.102
0.058
0.073
0.000
0.105
0.025
0.022
0.047
0.053
0.037
0.031
0.003
0.000
0.041
0.054
RF
0.0000
68,456,400
485,600
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
99,600
0.0000
18,858,400
0.0000
20,899,600
0.0000
0.0000
0.0000
0.0000
485,600
0.0000
KNN
Table 1 Regression and Classification algorithm’s feature importance result XGB
0.005
0.000
0.000
0.521
0.001
0.018
0.000
0.000
0.000
0.002
0.000
0.004
0.006
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.439
ADA
0.110
0.031
0.074
0.052
0.010
0.031
0.041
0.054
0.087
0.000
0.062
0.026
0.066
0.039
0.055
0.035
0.011
0.015
0.000
0.124
0.066
0.054
0.000
0.054
0.123
0.166
0.081
0.000
0.000
0.000
0.000
0.144
0.085
0.072
0.054
0.108
0.000
0.000
0.000
0.000
0.054
0.000
DT
0.064
0.087
0.086
0.070
0.052
0.046
0.020
0.019
0.033
0.018
0.080
0.057
0.041
0.024
0.027
0.052
0.070
0.003
0.000
0.072
0.067
RF
XGB
0.000
0.100
−0.010
0.000
0.000
0.000
0.000
0.000
0.000
0.000
−0.010
0.000
0.010
0.084
0.079
0.000
0.092
0.078
0.164
0.000
0.000
0.000
0.000
0.000
0.176
0.100
0.000
0.000
−0.030 0.000
0.089
0.000
0.000
0.000
0.000
0.134
0.000
0.000
0.000
0.000
−0.010
0.000
KNN
Classification algorithm importance ADA
0.040
0.080
0.100
0.080
0.220
0.000
0.040
0.020
0.000
0.000
0.020
0.000
0.200
0.040
0.000
0.000
0.080
0.000
0.000
0.040
0.040
60 Md. I. Ahmed et al.
Analysis of Patient Tuberculosis Tenet Death …
61
4.3 Visualization of Algorithm Feature Selection The regression features importance results and the classification features importance results vary as noted in the above table. In that contrast, the figures of different regression algorithms are sorted out for a better understanding of the process.
4.4 Visualization and Analysis of Regression Algorithm In linear regression, it is clearly identified that the ‘Year’, ‘Deaths due to tuberculosis among HIV-negative people (per 100 000 population) (M1)’ variables have the most positive values 2388.60017 and 94.74634 respectively, where it has one feature that has the most negative values −23,477.60713 named ‘Incidence of tuberculosis (per 100 000 population) (HIV-positive cases) (I5)’. It can be taken that these three features are best in accordance with the ‘Number of deaths due to tuberculosis, excluding HIV (M2), label. In logistic regression, it is identified that the features ‘New or unknown treatment history cases: Pulmonary, bacteriologically confirmed (N1)’ and ‘Tuberculosis— new and relapse cases (T3)’, have the best choices as features where their score against the label M2, is 0.06310 and 0.05997 discretely (Fig. 3). Decision tree regression analysis found the best features as the ‘New or unknown treatment history cases: Pulmonary, clinically diagnosed (N2)’ got a score of 0.66760 and ‘Number of incident tuberculosis cases (I1)’ scored 0.28638. In Random forest regression analysis, the best features figured out the ‘New cases: extrapulmonary (N5)’, ‘Relapse cases: extrapulmonary (R3)’, ‘Tuberculosis—new and relapse cases (T3)’, where each of them scored 0.10551, 0.10253, 0.09795, respectively. The KNNpermutation regressor figured out the best features as the ‘Tuberculosis—new and relapse cases (T3)’, ‘New or unknown treatment history cases: Pulmonary, bacteriologically confirmed (N1)’, and ‘New cases: Pulmonary, smear-positive (N3)’ scored 68,456,400.00, 20,899,600.00, 18,858,400.00, individually. Gradient boosting regressor (XGB) figured that the features ‘Tuberculosis treatment coverage (T1)’ and ‘Year’ scored 0.52144 and 0.43900, respectively. The features called ‘Number of incident tuberculosis cases(I1)’, ‘Deaths due to tuberculosis among HIV-negative people (per 100 000 population) (M1)’ and ‘Relapse cases: Pulmonary, bacteriologically confirmed (R1)’ got the highest scores where the scores are 0.12474, 0.11044, 0.08775, independently, however, the ‘Year’ features also noticeable in this process. Adaboost regressor has figured out that ‘Number of incident tuberculosis cases (I1)’, ‘Deaths due to tuberculosis among HIV-negative people (per 100 000 population) (M1)’, ‘Relapse cases: Pulmonary, bacteriologically confirmed (R1)’, scores the highest among the other features which are 0.12474, 0.11044, 0.08775, respectively.
62
Md. I. Ahmed et al.
Fig. 3 Regression analysis of feature importance through multiple algorithms
4.5 Visualization and Analysis of Classification Algorithm The linear regression and logistic regression are not a classifier, thus the Decision tree, Random forest, KNN-permutation, Gradient boost (XGB) and Adaboost, classification are examined in this process of classification where each of them predicts differently (Fig. 4). The decision tree classifier predicts the features ‘Previously treated cases, excluding relapse (R5)’, ‘New cases: extrapulmonary (N5)’, ‘New or unknown treatment history cases: Pulmonary, bacteriologically confirmed (N1)’ as the best features as their importance value is 0.16667, 0.14493, 0.10870, individually. The random
Analysis of Patient Tuberculosis Tenet Death …
63
Fig. 4 Classification analysis of Feature importance through multiple algorithms
forest classifier got the best scores of 0.08780, 0.08693, 0.08063 for the features ‘Tuberculosis—new and relapse cases (T3)’, ‘Number of incident tuberculosis cases (T2)’, and ‘New cases: extrapulmonary (N5)’, respectively. The K-nearest neighbor permutation classification predicts the ‘Tuberculosis—new and relapse cases (T3)’, ‘New cases: Pulmonary, smear-positive (N3)’ features have the best positive values sequentially as 0.10000, 0.01000. The gradient boosting classification algorithm predicts the feature ‘New cases: Pulmonary, smear-negative/unknown/not done, and other new cases (N4)’ scored the most that are 0.17606 where the next possible best features are ‘Relapse cases (pre-2013 definition) (R4)’ and ‘Year’ where those features scored as 0.16466 and 0.13466, respectively. The AdaBoost classifier predicts the features named ‘Previously treated cases, excluding relapse (R5)’, ‘New cases: Pulmonary, smear-positive (N3)’, ‘Number of incident tuberculosis cases, (HIV-positive cases) (I4)’, as the best because their predict scores are 0.22000, 0.20000, 0.08000, respectively.
4.6 Prediction Accuracy of Algorithms The dataset used in this process is actually regression-based data, where it is easily noticeable from Fig. 1, that the labels named M2 have decrements since each year
64 Table 2 Regression and classification algorithm’s feature importance result
Md. I. Ahmed et al. Algorithm
Prediction accuracy
Linear regression
0.99943
Logistic regression
0.16666
Support vector regression
−0.01888
Decision tree regression
0.92247
Random forest regression
0.97820
K-nearest neighbor regression
0.77539
Gradient boost regression (XGB)
0.89486
Adaboost regression
0.93728
the number of deaths decreased. Thus, the classification algorithm accuracy is not acceptable in this paper wherein each of the classification algorithm’s predicted scores is 0.16666. However, the regression model of each of the Machine learning algorithms shows great achievements. For regression model, the linear regression, logistic regression, Decision tree, Random forest, Support vector regressor, KNN, Gradient boosting (XGB), and Adaboost, algorithms have been used where each of the algorithms has a difference from each other. The KNN regressor model predicts the negative value where the all-other algorithms predict positive. Each of the algorithm scores is stated below in Table 2. The artificial neural network and impact learning were also tried in the process of the prediction; however, it was not successful as the dataset isn’t based on classification type.
5 Discussion The implementation focuses on the death reasons of the patients due to the infectious disease tuberculosis which normally attacks the lungs of the patients. There are different parameters by which it can be caused where the dataset has already the best causes and the implementation has the focus on finding the prime causes for which a patient can be dead. In a meantime, the paper focuses on the prediction model and wanted to build a model where death number per year can be predicted in the future. In the prediction model, not only the number of death (M2) feature can be predicted, however, other features of the dataset can be predicted as well. As per observation, the dataset contains the data till 2019, however, 2020 data is still not visible. Researchers or health experts can easily predict the data of 2020 by taking each of the features like labels, where there can be contradictions, however, it will work with proper implementation and methodology. Feature importance is one of the key concerns of this research where regression and classification model of machine learning algorithms has been implemented. Each
Analysis of Patient Tuberculosis Tenet Death …
65
of the algorithm’s choices of features are different and some similar, it is observed which features are the most important for a patient’s death in tuberculosis diseases. From the regression, variable importance found that I1, M1, has the highest weighted cause for the death of patients where I1 denotes the ‘Number of incident tuberculosis cases’ and M1 denotes the ‘Deaths due to tuberculosis among HIV-negative people (per 100 000 population)’. Both I1 and M1 are the regular or known factors for a person to identify the death number per year that can be collected from the medical records. T3 is one important feature that is named ‘Tuberculosis—new and relapse cases’ where the new and relapse cases come accordingly. However, the R1 factor can be noted as 2 algorithms that predict its importance and it denotes the ’Relapse cases: Pulmonary, bacteriologically confirmed’. Relapse cases must be taken care of intensively due to this reason many patients die of tuberculosis. Other important variables that relate to death are the Year, N1, N2, N3, N5, T1, R3, etc. It can be stated that a healthcare professional must give attention to the patient with Relapse cases like the death of patients mostly depends on it by the observation of regression analysis. From the classification algorithm it is figured out that the R5, T3, N5 and N3 features have the most important impact as the different algorithm’s called it twice in the process where each of them denotes as ‘Previously treated cases, excluding relapse (R5)’, ‘Tuberculosis—new and relapse cases (T3)’, ‘New cases: extrapulmonary (N5)’, ‘New cases: Pulmonary, smear-positive (N3)’, respectively. It can be noted that Relapse or previously tuberculosis treated person needs extra care. It is found in the regression analysis as well. T3 importance was found in the regression analysis as well, however, it seems common as the new and relapse both of the parameters used. New cases called extrapulmonary (N5) is one of the findings that are very relevant to the research as it’s taken into account as extrapulmonary is very important in rural and local life. One person within the 3 persons can have the disease from a family that can be also noted but not vital issues (N3). The other important features that are sorted out from the classification algorithms of Machine learning are T2, R4, N1, Year, I4, etc. The prediction model was used to predict the death of patients per year in the near future from the previous dataset implemented model. The linear regression model prediction accuracy is the best whose score is 0.99943. The next best model is found in the Random forest whose score is 0.97820. The third best model is the Adaboost model, its prediction accuracy is 0.93728. The Decision tree, KNN and XGB, these predictions are all good but not likely the above three algorithm’s predictions. There can be fluctuation if one changes the labels however it will help or guide the researchers to find out the best model using a similar kind of model.
6 Conclusion The methodology of this research is mainly focused on the reason for which the death cases arise or drop in Bangladesh. Thus, it focuses on different machine learning
66
Md. I. Ahmed et al.
algorithm’s feature selection technique that identifies the most important features that are the causes of the patient’s death in tuberculosis. In this essence, the dataset has been trained to different machine learning regression and classification algorithms, where each of the regression and classification algorithms selects different features in various processes. Random forest regression and classification models are the best ones that took all the features into the process positively, by that it finds the best features of the dataset. It is clearly noted on the prediction accuracy that Random forest provides the second-best accuracy, that is 0.97820 very nearest to the first best one. However, the linear regression model cannot use all of the features properly as the random forest used. In that finding, it can be stated that the Random Forest algorithm provides the best accuracy in regards to the feature importance.
References 1. Heifets, L. (2000). Conventional methods for antimicrobial susceptibility testing of Mycobacterium tuberculosis. In: I. Bastian & F. Portaels (Eds.), Multidrug-Resistant Tuberculosis. Resurgent and Emerging Infectious Diseases (Vol. 1). Dordrecht: Springer. https://doi.org/ 10.1007/978-94-011-4084-3_8. 2. WHO Working Group on DOTS-Plus for MDR-TB. Scientific Panel & World Health Organization. Communicable Diseases Cluster. Guidelines for establishing DOTS-Plus pilot projects for the management of multidrug-resistant tuberculosis (MDR-TB)/writing committee: Scientific Panel of the WHO Working Group on DOTS-Plus for MDR-TB. World Health Organization. https://apps.who.int/iris/handle/10665/66368. 3. Tuberculosis, TB is caused by bacteria (Mycobacterium tuberculosis) and it most often affects the lungs, World Health Organisation. https://www.who.int/health-topics/tuberculosis#tab= tab_1. 4. Global Health Observatory data repository, World Health Organization. https://apps.who.int/ gho/data/node.main. 5. Putra, I. P. E. S., Brusey, J., Gaura, E., & Vesilo, R. (2018). An event-triggered machine learning approach for accelerometer-based fall detection. Sensors, 18, 20. https://doi.org/10.3390/s18 010020 6. Altmann, A., Tolo¸si, L., Sander, O., & Lengauer, T. (2010). Permutation importance: A corrected feature importance measure. Bioinformatics, 26(10), 1340–1347. https://doi.org/10. 1093/bioinformatics/btq134 7. Oliwa, J. N., Karumbi, J. M., Marais, B. J., Madhi, S. A., & Graham, S. M. (2015). Tuberculosis as a cause or comorbidity of childhood pneumonia in tuberculosis-endemic areas: a systematic review. The Lancet Respiratory Medicine, 3(3), 235–243. ISSN 2213–2600, https://doi.org/10. 1016/S2213-2600(15)00028-4. 8. Jamal, S., Khubaib, M., Gangwar, R., et al. (2020). Artificial Intelligence and machine learning based prediction of resistant and susceptible mutations in Mycobacterium tuberculosis. Science and Reports, 10, 5487. https://doi.org/10.1038/s41598-020-62368-2 9. Kouchaki, S., Yang, Y., Walker, T. M., Sarah Walker, A., Wilson, D. J., Peto, T. E. A., Crook, D. W., & Clifton, D. A. (2019). CRyPTIC Consortium, Application of machine learning techniques to tuberculosis drug resistance analysis. Bioinformatics, 35(13), 2276–2282. https://doi.org/10. 1093/bioinformatics/bty949 10. Kalhori, S., & Zeng, X. (2013). Evaluation and comparison of different machine learning methods to predict outcome of tuberculosis treatment course. Journal of Intelligent Learning Systems and Applications, 5(3), 184–193. https://doi.org/10.4236/jilsa.2013.53020
Analysis of Patient Tuberculosis Tenet Death …
67
11. Hussain, O. A., & Junejo, K. N. Predicting treatment outcome of drug-susceptible tuberculosis patients using machine-learning models. Informatics for Health and Social Care, 44(2), 135– 151. https://doi.org/10.1080/17538157.2018.1433676. 12. Wouter, D., Sofia, C., Jody, P., Diez, B. E., Susana, C., Ruth, M., Luigi, P., & Clark Taane, G. (2019). Machine learning predicts accurately mycobacterium tuberculosis drug resistance from whole genome sequencing data. Frontiers in Genetics, 10, 922. https://doi.org/10.3389/ fgene.2019.00922. 13. Lai, N.-H., Shen, W.-C., Lee, C.-N., Chang, J.-C., Hsu, M.-C., Kuo, L.-N., Yu, M.-C., Chen, H.-Y. (2020). Comparison of the predictive outcomes for anti-tuberculosis drug-induced hepatotoxicity by different machine learning techniques. Computer Methods and Programs in Biomedicine, 188, 105307. ISSN 0169-2607. https://doi.org/10.1016/j.cmpb.2019.105307. 14. Lino Ferreira da Silva Barros, M. H., Oliveira Alves, G., Morais Florêncio Souza, L., da Silva Rocha, E., Lorenzato de Oliveira, J.F., Lynn, T., Sampaio, V., & Endo, P.T. (2021). Benchmarking machine learning models to assist in the prognosis of tuberculosis. Informatics, 8, 27. https://doi.org/10.3390/informatics8020027. 15. Sauer, C. M., Sasson, D., Paik, K. E., McCague, N., Celi, L. A., et al. (2018, November 20). Feature selection and prediction of treatment failure in tuberculosis. PLOS ONE, 13(11), e0207491. https://doi.org/10.1371/journal.pone.0207491. 16. Balogun, O. S., Olaleye, S. A., Mohsin, M., & Toivanen, P. (2021). Investigating machine learning methods for tuberculosis risk factors prediction—A comparative analysis and evaluation. In Proceedings of the 37th International Business Information Management Association (IBIMA) (pp. 1056–1070). 17. Tuberculosis, Global Health Observatory data repository, World Health Organization, Link: https://apps.who.int/gho/data/node.main.1315?lang=en. 18. Pereira, L. E. C., Ferraudo, A. S., Panosso, A. R., Carvalho, A. A. B., Mathias, L. A., Saches, A. C., Hellwig, K. S., & Ancêncio, R. A. (2020, September). Machine learning to predict tuberculosis in cattle from the state of Sao Paulo, Brazil. European Journal of Public Health, 30(Supplement_5), ckaa166.849. https://doi.org/10.1093/eurpub/ckaa166.849. 19. Cases and deaths, Tuberculosis, Global Health Observatory data repository, World Health Organization. https://apps.who.int/gho/data/node.main.1316?lang=en. 20. Mortality Data by country, Cases and deaths, Tuberculosis, World Health Organization. https:// apps.who.int/gho/data/node.main.1317?lang=en. 21. Incidence Data by country, Cases and deaths, Tuberculosis, World Health Organization. https:// apps.who.int/gho/data/node.main.1320?lang=en. 22. Treatment coverage Data by country, Cases and deaths, Tuberculosis, World Health Organization, Link: https://apps.who.int/gho/data/node.main.1323?lang=en. 23. New case notifications Data by country, Cases and deaths, Tuberculosis, World Health Organization. https://apps.who.int/gho/data/node.main.1326?lang=en. 24. Previously treated case notifications Data by country, Cases and deaths, Tuberculosis, World Health Organization. https://apps.who.int/gho/data/node.main.1327?lang=en.
Portable Electronic Tongue for Characterisation of Tea Taste Alokesh Ghosh, Hena Ray, Tarun Kanti Ghosh, Ravi Sankar, Nabarun Bhattacharyya, and Rajib Bandyopadhyay
Abstract Tea, being one of the largest consumed beverages in the world, is exported to many countries from India. This calls for the need for a rapid and effective method for quality assessment of tea. The present practice of quality evaluation involves human tasters who assign scores to tea samples based on taste, smell and visual appearances. Hence, these scores are subjective of human biases, non-repeatability and error-prone. Attempts for alternate evaluation techniques were made to evolve an efficient and objective technique for quality assessment of tea using the biomimetic measurement system such as electronic nose, electronic tongue and electronic vision in the last few years. In this study, we have developed a portable Electronic Tongue (e-Tongue) for the characterisation of tea. We have taken a novel approach to develop and fabricate polymer membrane sensors that operate on the potentiometric principle. We have trained the e-Tongue device with various statistical and neural networkbased classification algorithms. Back Propagation-Multi Layer Perceptron, Accuracies obtained employing Probabilistic Neural Network, and Multiple Discriminant Analysis are 90%, 92% and 96%, respectively. The potential of e-Tongue using potentiometry in the evaluation of the quality of tea, as found in this study, can be explored further to make it suitable for commercial use. Keywords e-Tongue · Ion-selective polymer membrane · Principal Component Analysis (PCA) · Probabilistic Neural Network (PNN) · Multiple Discriminant Analysis (MDA) · Back Propagation-Multi Layer Perceptron (BP-MLP)
A. Ghosh (B) · H. Ray · T. K. Ghosh · R. Sankar · N. Bhattacharyya Centre for Development of Advanced Computing, Kolkata 700091, India e-mail: [email protected] R. Bandyopadhyay Jadavpur University, Salt Lake, Kolkata 700091, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_6
69
70
A. Ghosh et al.
1 Introduction The age-old technique of tea quality estimation and, thereby, determination of price is based on expert human tea tasters who judge tea quality by employing their sensory organs like nose, tongue, mouth and eyes. But this method is associated with problems of subjectivity and non-repeatability. Attempts have been made to evolve an efficient and objective technique for quality assessment of tea using biomimetic measurement systems such as electronic nose, electronic tongue and electronic vision in the last few years [1–4]. This study deals with one of such assessment techniques of tea liquors utilising an electronic tongue and a set of ion-selective polymer membrane sensors. The sensation of taste can be categorised into five basic tastes: sweetness, sourness, saltiness, bitterness, and umami. Taste buds are able to differentiate among different tastes through interaction with different molecules or ions. The taste of food or beverages is a combination of the basic tastes, along with smell (olfaction) and trigeminal nerve stimulation that also handles touch, texture, and temperature. A survey on the study of e-Tongue in beverages is presented in this section. Gutiérrez et al. [5], in 2013, presented their work on the classification of beer using PCA and LDA. Wu et al. [6] worked on the evaluation of the bitterness of coffee in 2020. Wei et al. [7], in 2013, worked on pasteurised milk quality and storage time with PCA, SVM and PLS models. Liu et al. [8] in 2020 studied PCA and fuzzy evaluation model for sensory attributes of liquors. Tea (Camellia Sinensis), the ‘magic’ herb, is among the largest consumed beverages in the world. Quality estimation of tea via taste is a difficult task because of the presence of more than 500 various biochemical compounds, either organic or inorganic. The chemical composition of tea would vary based on varietal differences, environmental effects, methods of processing and mode of preparation. It also depends on the time of the year it is picked, which leaves are picked, on the processing, and the growing region. The major biochemical compounds responsible for the taste of tea are summarised in Table 1. The alternative approach is to find out the percentage of major quality determining compounds present in the tea sample using analytical measuring instruments such as capillary electrophoresis [9], GC–MS [10], High-Performance Liquid Chromatography techniques, etc. for tea quality estimation. However, these techniques Table 1 Biochemical compounds in tea responsible for taste
Compounds
Taste
Polyphenol
Astringent
Amino acids
Brothy
Caffeine
Bitter
Theaflavins
Astringent
Thearubigins
Ashy and slight astringent
Portable Electronic Tongue for Characterisation of Tea Taste
71
are not widely used commercially due to their high costs, and as these methods are time-consuming, they require skilled personnel and a laboratory setup. The above-mentioned problems led us to develop a portable electronic tongue [11–13], which is a fast, easy to work and field usable instrument to classify different grades of black tea based on taste and to produce a taste signature identical to human perception. Some previous works on this are reported in Adhikari et al. [14] and Sinha et al. [15]. Our approach here is not to develop any generic standard or model for taste assessment of tea in which the taste is related to the presence of biochemical compounds in it. Rather it provides a platform in which human perception of tea taste can be translated into the device in the form of machine training so that the system will be able to cater for the needs of region-specific or group-specific perception of taste. The work is divided into the following three parts, (i) Development of Membrane Sensor Array, (ii) Development of Embedded Electronics for the acquisition of Sensor responses and storage and display of results, and (iii) Development of suitable Pattern Recognition Engine to identify and classify the “taste” quality of a given sample.
2 Description of Membrane Sensor Electrode and Array Three membrane sensing materials, suitable for this application, are selected from the membrane sensors [11, 14, 15] developed and fabricated by the Indian Institute of Technology, Kharagpur, for human taste characterisation. The said three membrane sensing materials are HDTC modified PVA-PAA membrane, modified PVA-coethylene (EVOH) membrane and Phosphorylated and crosslinked PVA membrane. The criteria for this selection is made through a study of their responses in tea liquor along with a thorough check of stability and repeatability. Each sensor probe is constructed [16] using non-reactive materials like Teflon and glass in an identical way as described here and is shown in Fig. 1. The upper portion of the Teflon tube is threaded to fix it to the base of the electrode assembly, from where electrical connections are taken and fed to the electronic circuitry for processing. A cylindrical glass tube, open at both ends, joins this Teflon tube with another Teflon tube with the same cross-sectional area. A thin circular-cut membrane is placed with a washer so that it sits firmly at the bottom of the Teflon tube. A circular cap is fixed to the Teflon tube using a thread so that the membrane is firmly placed in a watertight manner. The only exposure of the membrane to the sample analyte is through a small hole at the centre of the circular cap. A silver wire is inserted into the entire assembly so that it provides electrical connectivity for measurement. This membrane electrode is filled with 1 mM KCl solution. Three such electrodes are made for the three types of selected membranes to form the sensor array. Ag/AgCl reference electrode from HCH Instruments Inc., USA, is used as the reference electrode for this membrane array assembly. The Membrane array is formed by placing the Reference electrode at the centre and the three membrane electrodes equidistant from the Reference.
72
A. Ghosh et al.
Fig. 1 Membrane sensor electrode
The entire assembly is fitted in a round-shaped Teflon sheet. A temperature probe is also inserted to measure the temperature of the sample under test. The sample is taken in a glass beaker, and the entire assembly is placed on the beaker so that the membrane electrodes, the reference electrode and the temperature sensor are dipped into the sample. The three-electrode member sensor array assembly with the tea liquor sample is shown in Fig. 2.
3 Description of the Embedded Electronics When an analyte, e.g., tea liquor, is placed into the test beaker for analysis and the membrane electrode assembly is dipped into the analyte, a potential of very low magnitude (~mV) is developed across the membrane electrodes and the reference electrode. For the same analyte sample, we get different potential values for the three different membrane types that form the “taste-print” of that sample. The challenge of capturing these very low potential values correctly is overcome by employing a proper sensing circuit, as shown in Fig. 3. The current drive capabilities of the developed potentials across membrane electrodes are very low, so normal amplifiers are found unsuitable for capturing these signals. A very precise and ultra-low input current (25 fA) Electrometer amplifier [17] may be able to capture such signals. The sensing electronics are built on LMC6001 in unity gain buffer configuration. Signals from the three membrane sensors and the temperature sensor are fed to the four amplifier channels. These analog signals are fed to the four channels of a multiplexed 16bit ADC [18] (ADS1115) to convert them to digital with programmable data rates
Portable Electronic Tongue for Characterisation of Tea Taste
73
Fig. 2 Three-electrode membrane sensor assembly with tea liquor sample
Fig. 3 Sensing electronics block diagram
ranging from 8 SPS to 860 SPS. Digital outputs are finally provided to the 16-bit PIC24 series microprocessor for processing employing the I2C protocol. The features of the portable Electronic Tongue are simple operation, low power, low cost, battery operation and an integrated sensor and sensing system. It includes
74
A. Ghosh et al.
membrane electrode array as sensor elements, rugged for field usage, touch screen based user interface, data storage on micro SD card, PC interface through USB, and results in graphics display (240 × 320, 3.2 in.). Block diagram [16] of the portable e-Tongue (PET) in its entirety is shown in Fig. 4. The heart of the hardware is a 16-bit PIC 24 series microprocessor with a touch screen Graphics display [19] for user interactions. The microprocessor has an inbuilt internal FLASH to store both programs and data. It also has a Graphics Controller with a 16-bit RGB565 interface as a support for graphics display. It contains an internal RAM of 96 kB. As the internal RAM capacity is not sufficient as a frame buffer for the 240 × 320 QVGA Graphics Display, an external 16-bit RAM of 256kB is interfaced. Another 16-Mbit serial FLASH is connected to the processor through an SPI link to store calibration and reference data. A real-time clock (RTC) circuit with a 3.3 V dedicated battery is provided to the processor by a I2C link so that time stamping of captured data can be made. A 4 GB SD memory card is interfaced through an SPI link for the storage of training and test data. The file system employed here is the simple FAT file system. The entire system is operated with a single cell 3.7-V 1000 mAh Li-ion rechargeable battery as Power Supply. BQ24040 charger IC is employed to construct the battery charging circuit. A dc-dc boost circuit steps up the voltage to 5 V, required at different portions of the developed board. A linear 3.3 V regulator provides power to the processor circuitry. The algorithm embedded into the device is able to classify tea samples based on the liquor characteristics in different known classes. Figure 5 shows the developed portable e-Tongue (PET) device in operation.
Fig. 4 Block diagram of the entire Portable e-Tongue (PET) device
Portable Electronic Tongue for Characterisation of Tea Taste
75
Fig. 5 Portable e-Tongue (PET) device in operation
4 Experimentation 4.1 Stability and Repeatability Check of the Membrane Sensors Temporal stability of the three membranes, i.e., HDTC modified PVA-PAA (HDTCPVA-PAA) PVA–Co-Ethylene (Phos. PVA), and Phosphorylated PVA-PAA (Phos. PVA-PAA) in 1 mM KCl is shown in Fig. 6. The potentials are observed at regular intervals of 1 s and are quite stable. The membrane electrode device is tested in three consecutive measurements to tea liquors of different types, and different response patterns are observed. Response in one type of tea liquor is shown in Fig. 7. The above results confirm the stability and repeatability of the selected membranes in measuring tea liquor characteristics. The temperature has a great role to play in the taste-generating chemicals of the tea liquor, and they are known to be optimum in the range of 45–60 °C. Our experimentation to check responses at different temperatures shows that the maximum potential from tea liquor is obtained around 50 °C, which is evident from Fig. 8. Hence, the entire study on tea characterisation is carried out in a constant temperature bath of 50 °C. Fig. 6 Temporal stability of membranes in 1 mM KC
76
A. Ghosh et al.
Fig. 7 Membrane responses for a single tea type
Fig. 8 Variation of membrane potential with temperature a 1st iteration b 2nd iteration
4.2 Experimentation with Tea Samples Samples are collected from various tea gardens like Danguaghar B P C 2/14, Simulbarie B P ex-19/14, Mazhabari B P C-42/14, Karbala Bolson c-66/14, PaharGoomiah B P C-1/14, Mathura B P, Halmari B P-1, Antuvally TGBOP etc. Tea samples are analysed by expert Tea Tasters and are generally assigned scores in the range of 1–10 based on colour, aroma and various other taste parameters. Since so many parameters are involved in tea quality assessment, an objective and correct machine-based assessment are only possible through the fusion of multiple technologies like electronic vision, electronic nose and electronic tongues. For taste characterisation, an array containing a large number of sensors is required for the proper evaluation of taste. Instead of trying to generate a taster like taste score for tea liquors, we limit ourselves in this study to check the efficacy of the three
Portable Electronic Tongue for Characterisation of Tea Taste
77
polymer membrane sensors in tea characterisation and to develop a suitable portable electronic device for the same purpose. Additionally, we have studied the suitability of the developed e-Tongue with a limited number of membrane sensors on its ability to classify tea samples into a few known classes of tea based on taste.
5 Analysis of Acquired Data The e-Tongue device is trained and tested at IIT, Kharagpur and Nagrakata Tea Research Association for five different tea samples, namely Danguaghar B P, Simulbarie B P, Mazhabari B P, MIN FOP and Okayti FOP. A total of 200 samples (40 samples of each tea category) are subjected to the test. Table 2 shows the total number of samples collected from each of the testing centres. Initially, Principal Component Analysis (PCA) [20] is performed to check whether the data samples can be segregated to different distinct clusters based on their eTongue signatures. The results are satisfactory; hence, the device is trained with various statistical and neural network-based classification models to efficiently distinguish the samples into discrete categories. In view of the capability of neural networks to learn input–output relations from a training data set, neural networks have been the favoured choice for the researchers of electronic olfaction [9].
5.1 Principal Component Analysis (PCA) In the first phase of the study, the sensor data is visualised in 2D plots to check whether the samples form distinct clusters based on their e-Tongue signatures with no overlapping. Principal Component Analysis (PCA) is chosen for dimensionality reduction to 2D. The algorithm is demonstrated in Fig. 9. Figure 10 shows that around 99.9% of data is retained after dimension reduction. It is evident from Fig. 11 that the algorithm is able to differentiate the e-Tongue signatures of the samples and plot them into five distinct clusters. It reveals that classification algorithms fed to this dataset may give desired results. Table 2 Samples collected for testing
Testing centre
Total number of samples
IIT, KGP
120
Nagrakata tea research association
80
78
A. Ghosh et al.
Fig. 9 Algorithm for PCA
Fig. 10 Variance ratio of principal components
5.2 Multiple Discriminant Analysis (MDA) Multiple Discriminant Analysis (MDA) is a statistical classification algorithm used to classify a dataset having more than two categories. It is a suitable algorithm if the number of outliers is negligible, the sample size is adequate, and collinearity among the sensor values is low. The discriminant analysis model is given by: Y = b0 + b1 X 1 + b2 X 2 + b3 X 3 + · · · + bk X k where Y = Dependent Variable (codes as 1, 2, 3, …).
Portable Electronic Tongue for Characterisation of Tea Taste
79
Fig. 11 PCA plot of five tea samples
bs = Coefficient of Independent Variable (should maximize the separation between the groups of the dependent variable). X s = Predictor or Independent Variable (should be continuous).
5.3 Probabilistic Neural Network (PNN) The Probabilistic Neural Network (PNN) [21] classifier guarantees to converge to an optimal classification when the dataset is considerately large. It is based on the idea of conventional probability theory to construct a neural network for classification. In this study, we have trained the device using PNN to classify the tea samples into distinct groups based on their liquor characteristics. Figure 12 shows the algorithm for PNN classifier.
5.4 Back Propagation—Multi Layer Perceptron (BP-MLP) Multi Layer Perceptron (MLP) is a feed-forward Artificial Neural Network (ANN) that uses backpropagation (BP) for training the network. Backpropagation refers to the process of readjusting the weights of the connections of the ANN through feedback on the errors in the model. A three-layer BP-MLP [22] model with one input layer, one hidden layer, and one output layer is considered for training the developed
80
A. Ghosh et al.
Fig. 12 PNN classifier algorithm
e-Tongue device with tea samples. Convergence during the training process has been obtained with acceptable accuracy for eight nodes in the hidden layer.
6 Results and Discussion The samples are tested using ten-fold cross-validation. In this method, the dataset is divided randomly into 10 equal parts. 9 out of those parts are used for training, and 10% is reserved for testing. The process is repeated 10 times, each time reserving a different 10% for testing. The results of ten-fold cross-validation for different classification algorithms are listed in Table 3. Figure 13 shows the accuracy of the ML models on the tea samples collected from IIT, KGP and Nagrakata Tea Research Association. For training and testing, the dataset is split into 80–20 proportions. It is observed that the accuracies of these algorithms vary from 90 to 96%. MDA is found out to be the best-suited algorithm for this use case.
7 Conclusion The present study on the developed e-Tongue sensing system shows very good accuracy (96%) in classifying tea samples through tea liquor analysis. This promises an improvement of the device to predict the quality of unknown tea samples and publish Taster-like quality scores in the range of 1–10. This requires training of the e-Tongue device with a large number of samples employing an experienced tea-taster. Since
Portable Electronic Tongue for Characterisation of Tea Taste Table 3 Results of the ten-fold cross-validation method for data patterns
Cross-validation fields
81 Number of correctly identified data patterns using ML models MDA
PNN
BP-MLP
1
20
20
18
2
18
17
17
3
20
16
18
4
17
18
18
5
19
19
16
6
20
18
18
7
20
18
18
8
18
19
19
9
20
19
19
10
20
20
19
Total misclassified patterns
8
16
20
Classification rate
96%
92%
90%
Fig. 13 Accuracy of ML algorithms
the performance of this approach is human Taster specific, the device may then be trained to cater for region-specific or group-specific requirements also. Although preliminary results are quite encouraging, there is ample scope for improvement. Firstly, the number of membrane sensors is required to be increased for proper characterisation of tea taste. For this, we need to develop various types of polymer membranes and to find out the ones good at responding to tea liquors. Another lacuna of the system observed is the reproducibility of membrane characteristics, i.e., responses of the same type of membrane to different batches are not
82
A. Ghosh et al.
identical. So, membrane production methodology needs to be improved to obtain reproducible characteristics. Thirdly, a proper statistical or neural network model has to be found out to predict tea quality through the correlation of multisensory data with human Taster’s scores. The results have opened up a new horizon for the characterisation of tea liquor and promise to be extended for various other food and beverages. Acknowledgements The authors are thankful to Prof Basudam Adhikari (Retd.), IIT, Kharagpur, and Dr. Tridib Sinha and Dr. Manmata Mahato for their contribution in the preparation of tea specific ion-selective polymer membranes for this study.
References 1. Toko, K. Book on Biomimetic Sensor Technology (pp.113–117). 2. Akuli, A., Joshi, R., Dey, T., Pal, A., Gulati, A., & Bhattacharyya, N. (2012). A New method for rapid detection of total colour (TC), theaflavins (TF), thearubigins (TR) and brightness (TB) in orthodox teas. In ICST 2012. 3. Sharma, J., Panchariya, P. C., & Purohit, G. N. (2013). Clustering algorithm based on K-means and fuzzy entropy for E-nose applications. In ICAES 2013. 4. Sarkar, S., Bhattacharyya, N., & Palakurthi, V. K. (2011). Taste recognizer by multi sensor electronic tongue: A case study with tea quality classification. In International Conference on Emerging Applications of Information Technology 2011. 5. Gutiérrez, J. M., Haddi, Z., Amari, A., Bouchikhi, B., Mimendia, A., Cetó, X., & Del Valle, M. (2013). Hybrid electronic tongue based on multisensor data fusion for discrimination of beers. Sensors Actuators B Chem., 177, 989–996. https://doi.org/10.1016/j.snb.2012.11.110 6. Wu, X., Miyake, K., Tahara, Y., Fujimoto, H., Iwai, K., Narita, Y., Hanzawa, T., Kobayashi, T., Kakiuchi, M., Ariki, S., Fukunaga, T., Ikezaki, H., & Toko, K. (2020). Quantification of bitterness of coffee in the presence of high-potency sweeteners using taste sensors. Sensors Actuators B Chemical, 309, 127784. https://doi.org/10.1016/j.snb.2020.127784 7. Wei, Z., Wang, J., & Zhang, X. (2013). Monitoring of quality and storage time of unsealed pasteurised milk by voltammetric electronic tongue. Electrochimica Acta, 88, 231–239. https:// doi.org/10.1016/j.electacta.2012.10.042 8. Liu, J., Zuo, M., Low, S.S., Xu, N., Chen, Z., Lv, C., Cui, Y., Shi, Y., & Men, H. (2020). Fuzzy evaluation output of taste information for liquor using electronic tongue based on cloud model. Sensors (Switzerland), 20(3). https://doi.org/10.3390/s20030686. 9. Zuo, Y., Chen, H., & Deng, Y. (2002). Simultaneous determination of catechins, caffeine and gallic acids in green, Oolong, black and pu-erh teas using HPLC with a photodiode array detector. Talanta, 57(2), 307–316. 10. Palit, M., Tudu, B., Dutta, P. K., Dutta, A., Jana, A., Roy, J. K., Bhattacharyya, N., Bandyopadhyay, R., Chatterjee, A. (2010). Classification of black tea taste and correlation with tea taster’s mark using voltammetric electronic tongue. IEEE Transactions on Instrumentation and Measurement. 11. Halder, A., Mahato, M., Sinha, T., Adhikari, B., Mukherjee, S., & Bhattacharyya, N. (2012) Polymer membrane electrode based potentiometric taste sensor: A new sensor to distinguish five basic tastes. In ICST 2012. 12. Winquist, F., Krantz-Rülcker, C., Lundström, I., Ivarsson, P., Holmin, S., Hojer, N. E., KrantzRulcker, C., & Winquist, F. (2003). Electronic Tongues and Combination of Artificial Senses. Weinheim, Germany: Wiley-VHC.
Portable Electronic Tongue for Characterisation of Tea Taste
83
13. Maxwell, C. (2001). Discrimination of tea by means of a voltammetric electronic tongue and different applied waveforms. Sensors and Actuators, B: Chemical, 76(1–3), 449–454. 14. Adhikari, B., Mahato, M., Sinha, T., Halder, A., & Bhattacharya, N. (2013). Development of novel polymeric sensors for taste sensing: Electronic tongue. SENSORS, 1–4. IEEE. 15. Sinha, T., Halder, A., Mahato, M., Adhikari, B., Sarkar, S., & Bhattacharyya, N. (2012). Discrimination of tea quality by polymer membrane electrode based potentiometric taste sensor. In 2012 Sixth International Conference on Sensing Technology (ICST) (pp. 781–784). 16. Ray, H., Ghosh, A., Das, A., Ghosh, T. K., Kanjilal, R., & Bhattacharyya, N. (2020). Apparatus for estimation of quality of beverages through electrochemical sensing technology. Indian Patent No- 336533, Granted On: 06/05/2020 17. Texas Instruments. Retrieved December 29, 2021, from http://www.ti.com/product/lmc6001. 18. Texas Instruments. Retrieved December 29, 2021, from http://www.ti.com/product/ads1115. 19. Microchip. Retrieved December 29, 2021, from https://www.microchip.com/en-us/solutions/ displays/embedded-graphics-solutions/embedded-graphics-products. 20. Wall, M. E., Rechtsteiner, A., & Rocha, L. M. (2003). Singular value decomposition and principal component analysis. In D. P. Berrar, W. Dubitzky & M. Granzow (Eds.), A Practical Approach to Microarray Data Analysis (pp 91–110). Norwell, M A: Kluwer. 21. Bhattacharyya, N., Bandyopadhyay, R., Bhuyan, M., Tudu, B., Ghosh, D., & Jana, A. (2008). Electronic nose for black tea classification and correlation of measurements with “Tea Taster” marks. IEEE Transactions on Instrumentation and Measurement, 57(7), 1313. 22. Banerjee (Roy), R., Chattopadhyay, P., Rania, R., Tudu, B., Bandyopadhyay, R., & Bhattacharyya, N. (2011). Discrimination of black tea using electronic nose and electronic tongue: A Bayesian classifier approach. In International Conference on Recent Trends in Information Systems 2011.
e-Visit Using Dynamic QR Code with Application Deep Linking Capability: Mobile-App-Based Solution for Reducing Patient’s Waiting Time Sudeep Rai, Amit Kumar Ateria, Ashutosh Kumar, Priyesh Ranjan, and Amarjeet Singh Cheema Abstract Reducing the patient waiting time in hospitals for effective clinical care is always a challenging task for hospital administrators and Hospital Management Information Systems (HMIS). With increasing patient loads at registration counters and limited health infrastructure, minimizing patient waiting time for efficient service delivery is of paramount importance. This is one of the problems of interest and our solution tries to reduce the turnaround time by minimizing the load at registration counters. By effective use of mobile technology and quick response codes we can provide patient for self-registration of their visits and avoid long queues at registration counters and directly visit doctor’s room. This would be beneficial for hospital management to provide better patient care, serve more number of patients, and effectively plan hospital manpower. This paper presents a pilot implementation of this solution in tertiary-level hospitals in India. We also present the results and statistics available after implementation of this solution and demonstrate how it affects turnaround time for patients. Keywords Health informatics · Queue management · Quick response codes · Hospital resource management · Waiting time · Mobile technology · Deep linking
S. Rai · A. K. Ateria (B) · A. Kumar · P. Ranjan · A. S. Cheema Centre for Development of Advanced Computing, Noida, Uttar Pradesh, India e-mail: [email protected] URL: https://cdac.in/ S. Rai e-mail: [email protected] A. Kumar e-mail: [email protected] P. Ranjan e-mail: [email protected] A. S. Cheema e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_7
85
86
S. Rai et al.
1 Introduction Hospital Management Information Systems (HMIS) are software systems comprising of clinical and non-clinical modules catering to various workflows in a hospital. Registration module of HMIS is the first interaction point for the patients and all patients visit registration counter for getting treated in hospitals. This module takes care of patients demographics and assigns department-wise and unit-wise queue numbers to the patient and all patients and doctors have to follow queue numbers system. This module also takes care of doctor’s roster and schedules, patient referrals and appointments. It closely operates in conjunction with the billing and hospital reporting modules where the data from patient registration is used for patient billing and hospital resource planning. Our HMIS [9] has been deployed and is operational at various public sector hospitals, including tertiary hospitals. The HMIS is a comprehensive solution that caters to multiple hospital workflows [6] and is also compliant with Electronic Health Record (EHR) standards [10, 11].
2 Related Work A lot of work has been done and presented for various approaches for queue management at hospitals and improvement of patient care delivery. QR-based solution by Perdana et al. [5] focuses on embedding all the clinical records into QR code and on arrival at hospital, patient will scan QR and this data will be uploaded into their website along with dynamic queue number assigned to patient. This clinical information will be available to doctor as well. Solution provided by Hedau et al. [3] is a queue management system for patients where patients can book appointment via mobile app and real-time notifications are sent to patient to reduce waiting time at OPD. A block-chain-based solution using QR codes by Manuel et al. [4] is also a point of study where the system encapsulates patient’s clinical data in QR codes which is stored in block chains. This system has significant importance as it has been implemented in one of the hospitals in India and developed by mainly focusing on practical challenges in Indian healthcare delivery system. Aizan et al. [1] present a token-based queue management system and as supportive Android-based system for display of token numbers. Sahney [7] presents his model of queue management system implemented at a hospital in New Delhi, India which is also based on Android apps and web application for report generation and data analysis for better queue management. Some of the proposed solutions also introduced Internet of Things with real-time queue tracking. To reduce waiting time for patients in specific service areas, some study proposed solution specific to waiting time reduction for IPD surgeries [2], triage, or observation period in emergency departments. A close relative of our solution is also presented by Soman et al. [8] which is a mobile-augmented smart queue management system for hospitals. It proposes mobile-based token generation
e-Visit Using Dynamic QR Code with Application …
87
and displays the same after fetching data from HMIS web application through REST APIs. Doctors have the facility of calling next queue number, recall previous queue number, or jump to any specific queue number. Most of the solutions mainly focus on using the QR codes for encapsulating and sharing clinical data of the patients or either assigning dynamic token numbers for queue management. Our solution mainly focuses on improving the patient’s waiting time for getting consultation by giving self-visit confirmation by patients at doctors’ room. There is no need or minimal intervention of the counter-clerks for assigning the queue number to the patients.
3 Approach and Workflow As per existing workflow of the hospitals all patients visit registration counters for getting registered in hospitals or to visit stamp in departments. This takes a lot of time in fetching the details of the patients, entering the visit details, processing payments if any. So crucial time of the patients gets wasted at registration counters. In proposed approach whenever a patient revisits hospital he’ll directly go to department. In department there will be dynamic QR available for scanning. Dynamic QR has validity of 30 min. After 30 min this code will be expired and new code will be generated. Now patient will log into the HMIS mobile app with his mobile number and will scan the QR under option Self-Registration. After scanning system will check whether the code is valid. QR code contains following information: 1. 2. 3. 4. 5.
Dynamic Token No. Department. Unit. Geo Location. Hospital Code.
If code is valid then system will check whether the department/unit is working or not. If department and unit is working then system will check for payment or renewal logic. If no renewal is due and no payment is found then system will directly visit stamp the patient in HMIS and queue number will be shown on the screen along with message. Queue number slip download feature is also available in the mobile app. Now patient will be consulted by the doctor in accordance with his queue number. To stop the misuse of the QR stamping such as visit stamp by the patient even not visiting the hospital, we have introduced geo-location check. If the patient is in location of 100 meters of hospital boundaries, then only the system will allow the stamping. It also has provision for showing live queue no status and Expected Waiting Time (ETA) to the patient so that patient have real-time information about his turn.
88
S. Rai et al.
Fig. 1 QR-based e-visit process architecture
4 System Architecture System architecture of the proposed solution is presented in Fig. 1. There are three main components of the proposed architecture. First component is web based and generates the QR code either department-wise or unit-wise with geo-location codes of the hospital. This QR expires within a particular time frame, which is set at 30 min and a new QR code will be generated. Second component is mobile-app-based scanning of QR. Mobile app location feature will verify that patient is scanning the QR within hospital premises ( 100 m). Third component of the architecture specifies that after scanning the QR, data goes to HMIS application server using REST APIs and HMIS application server will process the data and will verify whether code is valid and not expired along with geo-location verification. If geo-location verification is successful then patient visit is confirmed in the hospital.
4.1 Features of QR-Based E-Visit The key features of QR-based e-visit are as follows: 1. Minimize the queue at registration counters—The main and most useful feature of this solution is that queue at registration counters got reduced significantly as patients can directly go to departments and get their visit stamped. 2. Single click Visit Stamp—In earlier systems, patient interacts directly with registration clerks and clerks enquire about patient and fetch the patient’s data, enter visit details in system, and process the payment. This is quite time-consuming and hectic process. Now patient after logging into mobile app can directly scan
e-Visit Using Dynamic QR Code with Application …
3.
4.
5.
6.
7.
8.
9.
89
the QR and as and when scan gets completed, his visit is saved in HMIS system. It just takes few seconds and is quite easy and quick process. Dynamic QR generation and QR expiry—For security purposes, codes are made dynamic and will expire after some time. New codes will be generated automatically and displayed on screen. Verification of patient geo-location—This feature enables system to verify the patient’s geo-location within hospital premises. If patient is not within hospital premises then the system will not allow visit of the patient. Live Queue No Status/Expected Waiting Time—Through his mobile app patient can also view last queue number attended by doctor and will also be displayed expected waiting time. This will help patient in managing his crucial activities and can plan other hospital activities like sample collection, billing, etc. Application Deep Linking—With this feature of QR codes application URLs and other information can be deep linked in QR codes thus enhancing the functionality of it. We are also using this functionality to automatically open mobile app or ask for mobile app installation from Google PlayStore. Paper Less Visit—As soon as visit gets confirmed, patients are provided Queue No Slip on their mobile app. By showing this queue number slip patient can enter doctor’s room as per their turn. No bar code or hard copy of queue number is generated from system. It enables paperless visit through the system. Currently, this solution is primarily for consultation visits which is first interaction point for further treatment in hospital. This includes that majority of patients falls in this category. We can further extend this solution to other counters also where patient foot fall is high. This solution can be implemented at sample collection counters, billing counters for assigning dynamic queue to the patients. This will result in analyzing more complex data and provide better patient care services at all places where similar type of functionality could co-exist. As per existing data available in system around 700 patients per day are being served with this solution. If we implement this solution in other hospital places then this figure may increase around 80%, i.e., this figure will go up to 1200 patients. We may also implement this solution where numbers are high and there are numerous mobile users using HMIS mobile app.
4.2 Department-wise QR Generation Process This section elaborates how user can generate department-specific QR code in HMIS web application. After successfully logging into the HMIS application, user can perform following activities to generate QR code. Dynamic QR generation process is available in registration module under reports section. 1. Login into the HMIS application hmis.rcil.gov.in on web browser.
90
S. Rai et al.
(a) Generation Process
(b) Quick Response Code
Fig. 2 Department-wise QR generation
2. Go to Registration Module, Reports Section. Find menu “Department-wise QR Generation”. 3. Open the process and select department and unit. 4. Click on generate QR button. 5. Now browser will ask for geo-location access on top left corner, Fig. 2a. 6. Allow access, click allow button. Now again click on generate QR button. 7. Now in screen QR code will be displayed along with latitude and longitude, Fig. 2b 8. After 30 min this QR code will expire and a new QR code will be generated.
4.3 Mobile App Visit Process This section focuses on visit stamping by patient through mobile app. It elaborates step-by-step process of visit stamping by patient through Android mobile app. 1. Patient logs in HMIS Android mobile app with his mobile number. 2. If patient is registered in HMIS then patient’s CR number and other demographic details will be shown on mobile app. 3. If patient is not registered in HMIS then he will have option to register himself. After entering his demographics, he will be registered in HMIS and a CR number will be allotted to him. CR Number is unique number across HMIS which is given to the patient for future follow-up in hospitals. 4. Patient can also register his family members with same mobile number in the manner discussed above. 5. Now it comes to the main part, i.e., visit generation. For this patient will tap on Self-Registration icon. On tapping it mobile camera will be turned on and it will ask for geo-location access, Fig. 3. 6. Scan the QR with help of mobile camera. 7. Now system will check for QR validity and expiry. If QR is valid then system will verify the patient’s renewal and payment details. If no payment is required then system will directly stamp the patient and generate the queue number slip, Fig. 3.
e-Visit Using Dynamic QR Code with Application …
91
Fig. 3 Mobile app wire frames
Fig. 4 Real-time queue status
8. If payment is required then system will be redirected on Payment Gateway. After successful payment, the system will save the visit details of the patient. 9. System also has provision to show current status of queue number and ETA on patient’s mobile app, Fig. 4.
92
S. Rai et al.
Fig. 5 Hospital statistics
Fig. 6 Time comparison : mobile and web stamping (Dataset: 338 patients)
4.4 Hospital KPIs and Statistics This solution also comes with a live dashboard view of the e-visit being done in the hospitals, Fig. 5. It shows that patients are using this solution and its count is getting increased day by day. Here hospital administration can perform decision-making to improve patient care services by employing staff involved in registration at some other places for better healthcare delivery platform. This dashboard also shows the comparison graph of time taken between mobile-app-based visit and web-based Visit from registration counters. The comparison dataset is for 1 day for around 338 patients (Fig. 6).
e-Visit Using Dynamic QR Code with Application …
93
5 Conclusion and Future Work In this paper, we tried to present a solution that can be very helpful in reducing the patient waiting time. This solution is currently implemented in PAN India chains of Indian Railway Hospitals and AIIMS Bhopal. This is an effective use case of mobile technologies and focuses on how it can transform people’s lives. With the recent advances in mobile communication technologies and high-speed network connectivity, mobile interfaces for HMIS workflows have started to evolve. This was an initial effort toward better patient care. Some of the future effort which can make this solution more effective are listed as follows: 1. Similar solution can also be made functional at other counters like Sample Collection, Billing, etc. 2. In addition, it can also be used in other HMIS applications as clinical data workflows can be configured for other hospitals.
References 1. Aizan, A. L., Mukhtar, A. Z., Bashah, K. A. A., Ahmad, N. L., Mohd Ali, M. K. A. (2019). Walk-away’ queue management system using mysql and secure mobile application. Journal of Electrical Power and Electronic Systems, 1. 2. Cui-zhi, L. (2009). The application of electronic queue management system in the triage of the emergency department. Clinical Medical Engineering, 10. 3. Hedau, K., Dhakare, N., Bhongle, S., Hedau, S., Gadigone, V., & Titarmare, N. (2018). Patient queue management system. International Journal of Scientific Research in Computer Science, Engineering and Information Technology 4. Manuel, A. C., Elias, B., Jose, C., Sivankutty, G., Satheesh, A. P., & Sivankutty, S. (2021). Patient care management using qr code: Embracing blockchain technology. International Journal of Engineering Research & Technology (IJERT), 09. 5. Perdana, R. H. Y., Taufik, M., Rakhmania, A. E., Akbar, R. M., & Arifin, Z. (2019). Hospital queue control system using quick response code (qr code) as verification of patient’s arrival. International Journal of Advanced Computer Science and Applications, 10(8). 6. Ranjan, P., Soman, S., Ateria, A. K., & Srivastava, P. K. (2018). Streamlining payment workflows using a patient wallet for hospital information systems. In 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS) (pp. 339–344). IEEE 7. Sahney, R. (2016). Smart opd framework—new era in the digital healthcare initiative of sir ganga ram hospital. Current Medicine Research and Practice, 6(5), 204–207. 8. Soman, S., Rai, S., Ranjan, P., Cheema, A. S., & Srivastava, P. K. (2020). Mobile-augmented smart queue management system for hospitals. In 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS) (pp. 421–426). IEEE 9. Soman, S., Ranjan, P., & Srivastava, P. (2020). A distributed architecture for hospital management systems with synchronized EHR. CSI Transactions on ICT, 8(3), 355–365. 10. Srivastava, P. K., Soman, S., & Sharma, P. (2017). Perspectives on SNOMED CT implementation in indian HMIS. In Proceedings of the SNOMED CT Expo 2017. SNOMED International 11. Srivastava, S., Soman, S., Rai, A., Cheema, A., & Srivastava, P. K. (2017). Continuity of care document for hospital management systems: An implementation perspective. In Proceedings of the 10th International Conference on Theory and Practice of Electronic Governance (pp. 339–345). ACM
Gunshot Detection and Classification Using a Convolution-GRU Based Approach Tanav Aggarwal, Nonita Sharma, and Naveen Aggarwal
Abstract Rising demand of firearms all over the globe for various reasons has created a strong demand for uninterrupted surveillance systems to monitor different regions. Personal firearms availability inciting civil violence, poaching which has been a major contributor to the ever-increasing number of animal species facing threat of extinction are a few areas demanding immediate attention. Different infrastructure-based solutions like video surveillance and infrared thermography have been proposed to monitor and investigate any illicit activity, but these are not cost-effective, especially in large dense areas with compact viewing angles. We propose a model for acoustic-based approach to detect gunshot and further classify it based on its type/caliber. The proposed solution uses a deep learning convolutionalGRU based model to detect a gunshot in an audio stream and classify it based on type/caliber. Gunshot data used contains 830 instances of eight different types of guns which are overlapped on noisy background to create genuine instances. The audio, either real-time or pre-record, after computing Mel-Frequency Cepstral Coefficients (MFCCs), is converted into 2D images that are fed to the model. In case the model recognizes any gunshot, the timestamp of the gunshot and weapon caliber is returned, alerting the concerned authorities. Gunshots were detected correctly with an average classification accuracy of over 80%. Keywords Gunshot classification · Audio signal processing · Acoustic event detection · Deep learning · CNN · GRU
1 Introduction The civilian firearms industry has been growing substantially and is projected to grow twofold in the coming decade [1]. Rise in this civilian held or illegally acquired T. Aggarwal (B) · N. Sharma Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India e-mail: [email protected] N. Aggarwal UIET, Panjab University, Chandigarh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_8
95
96
T. Aggarwal et al.
firearms has made the need of continuous monitoring ineluctable. Residential areas and Protected Areas (PA) are two major areas of interest. Also, the number of species of animals in the endangered category continues to increase over the years, poaching has remained a major contributing factor as the gap between human settlements and protected areas keeps getting thinner, owing to the rapid urbanization. Detecting gunshots and predicting gun type can help reduce the response time of concerned security authorities and help in effective investigation. Depending on the conditions and the type of gun used, a typical gunshot can be heard within the range of a few kilometers. Methods like real-time video monitoring through wireless image sensor networks (WSNs) [2] and infrared thermography [3] have been adopted in the past to monitor and investigate any criminal activity, but feasibility issues like lack of infrastructure, dense dynamic environments, and human errors cannot be ignored. We propose an audio-based approach that can use both real-time audio capturing for monitoring purpose and pre-recorded media for investigating purposes, saving a lot of time and man power in terms of managing the large yet growing amount of area under surveillance. Thus, using audio sensors to observe all the acoustic activity with a gunshot acting as a trigger to alarm the concerned authorities significantly reduces the need of human monitoring and is thus more scalable. By using machine learning, we can design a model to process an audio segment to detect any gunshots, after selecting the acoustic features to identify gunshots. We can also use deep learning networks to build an end-to-end solution, without the need of feature selection. Convolutional Neural Networks (CNN) use convolution operation between layers allowing networks to be deeper with lesser parameters, making them suitable for tasks like image classification and signal processing. Recurrent Neural Networks (RNN) use internal memory to remember some information that it may need to use later, making it useful for problems involving sequential data. Previous works in this area include a feature extraction approach to extract magnitude corresponding to different frequency bands and feeding it to a neural network for gunshot classification [4] using smartphone device. Low-power consumption-based binary gunshot detector for forest areas [5], gunshot detection, and localization system using muzzle blast and shockwave strength to train neural network [6] has also been proposed, but without further classification. Fourier cosine coefficients have also been used to extract features of a gunshot sound and identify caliber for 80 gunshot sounds [7]. A Support Vector Machine (SVM) for gunshot detection and convolution neural network for classification [8] was also used. In this paper, we propose a single CNN-GRU deep learning model for the purpose of both gunshot detection and classification. Dataset used consisted of 830 gunshot sounds of eight different types/caliber. The model returns the timestamp of detected gunshot and its type, useful for both real-time and recorded audio streams. Keeping in mind that in actual application, the model might run on edge devices with limited computational power, we also compare the performance of multiple architectures for the CNN-GRU model. As the use-case lies in a dynamic environment, the solution needs to be robust. The audio file used for training and development dataset is synthesized artificially using audio overlap technique, where different gunshot sounds recorded from varying
Gunshot Detection and Classification …
97
position with respect to the recorder are overlay on different background sounds, trying to simulate real-world conditions. Time-domain audio signals are converted into frequency domain to visualize them as heatmaps, converting the problem of acoustic event detection to an image object detection task. Main contributions of this paper are as follows: • Implementation and tuning of a single Deep Learning Neural Network for both gunshot detection and classification. • One-vs-All analysis for classification of different types/caliber of gunshots. • Comparison and analysis of different CNN-GRU architectures on the gunshot dataset. The organization of the rest of the paper is as follows : In Sect. 2, related work done in acoustics for gunshot detection and classification is discussed. Section 3 describes the relevant theory, methodology followed, and the features used. The proposed model and system design is described in Sect. 4. Section 5 discusses the results of classification experiments and compares the performance of different model architectures, followed by Sect. 6 which presents conclusion.
2 Related Work Sound is an important part of our lives and it contains a lot of information. Not only it is easier to collect and process in terms of infrastructure required, it can also contain information to unveil patterns unrecognized otherwise. A recent surge in available audio data has stimulated research in acoustic event detection and classification, to extract more information about an environment from audio [9]. Acoustic event detection aims to detect and label certain events in an audio stream, possibly with associated timestamps. Acoustic event classification aims to label an audio stream based on a particular set of characteristics. One approach is to use “Bagof-frames” [10] that aims to represents a scene as a set of low-level local spectral features such as Mel-frequency Cepstral Coefficients (MFCCs). Other approach is to use a set of high-level features usually captured as “acoustic atoms” which represent an acoustic event, usually learned in an unsupervised manner. Gunshot detection systems such as ShotSpotter have been used as a part of audiosurveillance systems to assist law enforcement in many cities. But these can be expensive and require human intervention. In 2007, a system to detect anomalous acoustic events in public places such as screams or gunshots along with localization using two parallel GMMs (Gaussian mixture models) classifiers was described [11]. Gunshot detection and localization system using muzzle blast and shockwave to train neural network has also been presented [6]. For a system to be scalable, it need to be cost- and power-effective while maintaining the accuracy at the desired task. For surveillance in forest areas to discourage poaching activities or in cities with lagging infrastructure, efficient systems are the way forward. Many algorithms
98
T. Aggarwal et al.
have been proposed including short-term Fourier transform (STFT), Gaussian mixtures, Markov models, etc. for the purpose of gunshot detection, classification, or localization. Their statistical evaluation has been done by Chacon-Rodriguez et al. [12] in terms of accuracy and power requirements. Low-power consumption-based binary gunshot detector for forest area was proposed in 2015 [5]. Morehead et al. [13] proposed a low-cost gunshot detection system using Deep Learning on Raspberry Pi. Further forensic analysis includes classification of the gunshot based on the type/caliber of gun which can help in investigations to identify suspects. Different methods have been proposed for the same which includes using Fourier cosine coefficients to extract features and identify caliber of gunshot [7]. Eva et al. [14] used two phased selection process, Hidden Markov Model (HMM) classification to recognize gun type. Frequency band feature extraction to train a neural network [4], Support Vector Machine (SVM) for gunshot detection, and Convolution Neural Network (CNN) for classification methods were also presented [8]. Dogan [15] proposed a fractal H-tree pattern and statistical feature extraction-based classification model using Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classifiers. In our work, we have used a single deep learning neural network (CNN-GRU) model for both gunshot detection and classification. We use spectral-temporal domain feature MFCCs (mel frequency cepstral coefficients) to train the neural network for classification of 830 gunshot recordings. Our objective is to analyze the model performance on the dataset and present a one-vs-all analysis for classification into different types/caliber of gunshot. We also compare multiple model architectures based on accuracy and trainable parameters, as lightweight models offer more utility. Section 5 contains the details of the experiments described.
3 Methodology 3.1 Converting Audio to MFCCs Audio features can be categorized based on the information used for their representation as time domain, frequency domain, and time-frequency domain features. Timedomain features include amplitude envelope, RMS energy, zero crossing rate, etc. representing the respective characteristics of signal with respect to time. Frequencydomain features of a signal tell us about how much of signal lies in a frequency band and includes band energy ratio, spectral centroid, spectral flux, etc. In frequency domain, we loose information about time and in time domain we loose the information about frequency. So time-frequency domain features of signal, like spectrograms, constant Q-transform, MFCCs, help us preserve features of both time and frequency domains. Time-frequency analysis includes nothing but frequency-domain features of smaller chunks of the signal, which are then combined to obtain a collective feature. Figure 1 shows a typical frequency-domain feature pipeline showing
Gunshot Detection and Classification …
99
Fig. 1 Frequency-domain feature pipeline
Analog-to-Digital Converter (ADC), framing, windowing, Fourier transform, and feature computation steps. To convert a signal from time domain to frequency domain, we use Fourier Transform (FT). Fourier transform is used to represent a signal (audio, image, video) as infinite sum of orthogonal signals, converting it into individual spectral components and providing frequency information about the signal. To store audio data digitally, we convert continuous audio signals to discrete samples (continuous signal g(t) → xn , total samples N), we have to use Discrete Fourier Transform (DFT), which uses summation instead of the integral in Fourier transform. x( ˆ f) =
N −1 {
x(n).e−i2π f n
(1)
n=0
In practice, we use Fast Fourier Transform (FFT) [16], which is an optimized algorithm to implement DFT. It is faster as it exploits redundancies and keeps N (no. of samples) a power of 2. But as we want to retain time information, we cannot perform FFT on whole signal at once, rather we use Short-Term Fourier Transform (STFT). In STFT, as the name suggests, we use small overlapping frames to represent a small chunk of audio signal and apply FFT on each frame, sliding through the complete signal, thus each frame acting as a timestamp. For our experiment, we keep the framesize as 256 and hop length as 128, where total samples are 441000 (44100Hz * 10s) for our data. As a DFT yields a spectral vector, after applying STFT on the signal, we get a spectral matrix, whose values represent the angle spectrum in radians, with dimensions as (frequency bins, frames).
100
T. Aggarwal et al.
Fig. 2 MFCCs representation of a background audio sample with dimensions (51, 8) no. of frames in X-axis act as a proxy of time
f ramesi ze +1 2 samples − f ramesi ze + 1 Frames = hoplength Fr equency Bins =
(2) (3)
As humans perceive frequency on a logarithmic scale rather than linear scale, difference between 1000 and 2000 Hz sound would be more evident than difference between 10,000 and 11,00 Hz, even if they are equal units apart. So, Mel scale is used which is based on logarithmic transformation of a signal’s frequency (above 1000 Hz), where equal distance sounds are perceived as equal distance. A Mel filter bank is given by (no. of Mel bands, frequency Bins). A Filter Bank separates the input signal into multiple components. Typically we apply 40 triangular filters on Mel scale to Y, to extract frequency bands. Idea of a cepstrum (inverse DFT of log spectrum) was developed in 1960s, C(x(t)) = F − [log F[x(t)]]
(4)
Applying DCT (discrete cosine transform) on the 40 filter bank coefficients, we can compress and decorrelate them. Mostly 2–13 cepstral coefficients contain the most information. MFCCs describe large structures of spectrum, ignoring fine structural details that might not be relevant. Figure 2 shows the MFCCs representation of a background audio file before adding gunshot sound over it.
Gunshot Detection and Classification …
101
Fig. 3 MFCCs spectrum of the forest background audio sample with a 380 (pistol) gunshot audio file overlay on it
3.2 Constructing the Audio Dataset To construct the audio dataset, we use two categories of audio files which we store in two different directories, “backgrounds” containing different background sounds and “gunshots” containing different gunshot sounds. For background sound, we use different noisy scenarios, like windy, rainy setting, insect buzzing, to simulating the forest environment. For gunshot sounds, we use the Gunshot Audio Forensics Dataset provided by the National Institute of Justice (NIJ) [17]. The length of background audio clips is 5 sec, and for gunshots it is between 1 and 3 sec. The sampling rate of all audio files used is 44.1 KHz. To fabricate a training example, we select a random file from the “gunshot” directory and a random audio file from the “background” directory, and overlap the gunshot audio over the background audio at a random position, to prevent any fallacious pattern in the training data. The length of fabricated audio file is same as the length of background audio file, i.e., 5 sec. After overlapping, we use eight MFCCs representation as described in the previous subsection. As an example, a 380 (pistol) gunshot audio and 0.380W (rifle) gunshot audio overlapped on forest background audio are shown in Figs. 3 and 4, respectively. Repeating these steps, to generate multiple audio samples and convert them into MFCCs, we get our training and testing (development) dataset. As the gunshots in data are recorded from varying positions, after adding the background noise the Signal-to-Noise Ratio (SNR) varies in range 5–20 decibels (dB).
102
T. Aggarwal et al.
Fig. 4 MFCCs spectrum of the forest background audio sample with a 0.308W (rifle) gunshot audio file overlay on it
3.3 Training a Deep Learning Model Using artificial neural networks, we can build an end-to-end model that enables us to bypass steps like feature engineering, which would have been demanding for gunshot classification task. Deep neural network further allows each layer to be specialized in a particular task. We use a series of CNN and RNN layers as discussed in the next section.
4 Model For our task, as the input data is an image (shown in Figs. 3 and 4), we can use Convolutional Neural Network (CNN) that uses a series of convolution and pooling layers to extract features while reducing the data dimensions, until it’s feasible to feed it to flat layers. Also, for detecting exact timestamp of gunshot and to classify them based on type, gunshot echo or the shockwave disturbance may play a major role not only for classification, but also for discarding the false positives (gunshots have long echo than firecrackers). So we cannot ignore the sequentiality of data, as past events may have a definite effect on the future events, which makes it important to remember them. Recurrent Neural Networks (RNN) use memory to process recurring inputs, also providing connections between nodes of the same or previous layer. Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTMs) are specialized RNNs that solve the problem of vanishing gradients using gates to control the flow of informa-
Gunshot Detection and Classification …
103
Fig. 5 Block diagram of system design
Fig. 6 Model structure of the convolutional-GRU network
tion. As we have already converted audio data to 2D MFCCs, we feed these images to a CNN-GRU model (model architecture discussed in the next section) along with the output which is a type-labeled timestamp vector of the gunshot, containing gun-type number at the timestamp of detected gunshot and zeros elsewhere. A total of 830 instances of MFCCs were generated and used for training. Figure 5 shows the block diagram of system design where the frequency-domain feature extraction step was shown in detail in Fig. 1. Though LSTMs achieve slightly better accuracy on speech data than GRUs, they are more computationally expensive, and for smaller networks, the difference in their performance is insignificant, with GRUs taking less time [18]. So we can safely assume that choosing GRU over LSTM will not affect the model accuracy and will also be less computationally expensive. After experimenting with multiple neural network architectures, including varied number of stacked convolutional layers followed by varied number of stacked GRUs, the model architecture with two 1D convolutional layers, followed by two GRU layers, showed better performance (discussed in detail in next section). The model architecture diagram is shown in Fig. 6. We feed the input to the 1D convolutional layer having 196 units. The 216 timestamp MFCCs representation is converted to 51 step output using a window size of ] = 51). The convolutional layer reduced the dimensions 15 and stride of 4 ([ 216−15 4
104
T. Aggarwal et al.
Table 1 Model layer-wise description with number of parameters Output shape No. of parameters Layer type Input Conv1D_1 Activation Conv1D_2 Activation GRU_1 GRU_2 Time Distributed Total Parameters : 824,537
(None, 216, 8) (None, 51, 196) (None, 51, 196) (None, 51, 196) (None, 51, 196) (None, 51, 128) (None, 51, 128) (None, 51, 1)
0 23,716 0 576,436 0 125,184 99,072 129
of input preserving the important information, which speeds up the upcoming GRU (RNN) layers. It is then followed by a second 1D convolutional layer which uses a stride of 1 with padding to keep the output shape constant. The output of both convolutional layers passes through the activation layer which uses “Relu” activation. Following these layers are two GRU layers, with 128 units, that iterate the input sequence from left to right and learn about the sequentiality of data. The final layer of the model is a time distributed dense layer with a single unit for each of the 51 output steps. With sigmoid activation, it outputs a single binary-valued 1D vector of length 51 containing gun-type number at the timestamps where a gunshot was detected, and 0 elsewhere. To optimize the gradient descent, Adam Optimizer was used. The layer-wise description of the model with number of parameters associated with each layer is listed in Table 1.
5 Results We fabricated a total of 830 recordings, 60 of which contain 0.308W (rifle) gunshot, with 110 instances of other 7 types of gunshots (listed in Table 2). Each recording was 5 s long with 44.1 KHz sampling rate. Experiments were done on the dataset after dividing it into 75% for training and 25% for testing. Table 2 lists the types in which the gunshots are classified into along with their detection and classification accuracy, on one-vs-all basis. Model was trained for each of the listed gun types individually. It is seen that model is able to accurately detect the gunshot each time, with classification accuracy as shown. 9mm pistol is classified more accurately by the model on a one-vs-all basis than other gunshots in the dataset. As our output is a vector of length 51, along with classification of the gunshot into different types, it also predicts the timesteps of the gunshot (each timestep corresponds to around 0.1 sec), which can be useful for investigation purposes, saving resources.
Gunshot Detection and Classification …
105
Table 2 Accuracy of gunshot detection and classification for different gun types (one versus all) Gun Type Accuracy Detection Classification 0.22LR (carbine) 0.22LR (pistol) 0.22LR (rifle) 0.223R (rifle) 0.308W (rifle) 0.380 (pistol) 7.62 × 39mm (carbine) 9mm (pistol)
100 100 100 100 100 100 100 100
78 80 80 80 83 81 81 85
Fig. 7 Model cross-entropy loss curve shown on Y-axis with number of epochs on X-axis, for both training and testing (development) dataset
For a look at the performance on complete dataset, Fig. 7 shows the loss curves for “train” and “development” set for the selected model, with cross-entropy loss on y-axis and epoch number on x-axis. Classification of gunshots based on their type was done with an accuracy of 84% on the dataset. Architecture with less number of layers results in less accuracy but increasing convolutional layers may not be computationally effective. Number of layers of CNN and RNN are varied to understand their effect on model accuracy. Table 3 shows the performance of different network architectures used for experimenting. Increasing convolutional layers beyond 2 is not effective. Though model architecture 1 provides better accuracy than the others, it is worth mentioning that architecture 3 is able to reduce the trainable parameters to 30% of architecture 1, without any significant loss in accuracy. Lightweight models (with less number of trainable parameters) help
106
T. Aggarwal et al.
Table 3 Accuracy of different model architectures used for experiment S.no. Conv 1D layers GRU layers Trainable parameters 1 2 3 4 5
2 3 1 2 1
2 2 2 1 1
824,537 1,400,909 248,101 725,465 149,029
Accuracy 84 84 80 79 72
deploy deep learning solutions on the edge, saving power and computational cost incurred per device.
6 Conclusion Gunshot detection and classification model using a single convolutional-GRU deep learning network is made. Data used comprised of different types of gunshot sounds overlapped on a noisy background to mimic real-case scenarios. Using MFCCs features, audio data was converted to 2D images to be fed to the neural network. Different architectural structures were experimented with and gunshots have been classified into a total of eight different types of guns with an accuracy of over 80% and gunshots were detected accurately almost every time. Rather than manually inspecting the usually large amount of pre-recorded audio/video data available during investigating about a site of gunshot, deep learning neural network can help detect and classify the timestamp to pinpoint the gunshot rather speedily. This finds application in security field for both monitoring and investigating purposes.
References 1. Karp, A. (2018). Estimating global civilian-held firearms numbers. JSTOR 2. Zhang, J., Luo, X., Chen, C., Liu, Z., & Cao, S. (2014). A wildlife monitoring system based on wireless image sensor networks. Sensors & Transducers, 180(10), 104. 3. Mulero-Pázmány, M., Stolper, R., Van Essen, L., Negro, J. J., & Sassen, T. (2014). Remotely piloted aircraft systems as a rhinoceros anti-poaching tool in Africa. PloS one, 9(1), e83,873. 4. Tangkawanit, S., Pinthong, C., & Kanprachar, S. (2018). Development of gunfire sound classification system with a smartphone using ann. In 2018 International Conference on Digital Arts, Media and Technology (ICDAMT) (pp. 168–172). IEEE 5. Hrabina, M., & Sigmund, M. (2015). Acoustical detection of gunshots. In 2015 25th International Conference Radioelektronika (RADIOELEKTRONIKA) (pp. 150–153). IEEE 6. Galangque, C. M. J., & Guirnaldo, S. A. (2019). Gunshot classification and localization system using artificial neural network (ann). In 2019 12th International Conference on Information & Communication Technology and System (ICTS) (pp. 1–5). IEEE
Gunshot Detection and Classification …
107
7. Thumwarin, P., Wakayaphattaramanus, N., Matsuura, T., & Yakoompai, K. (2014). Audio forensics from gunshot for firearm identification. In The 4th Joint International Conference on Information and Communication Technology, Electronic and Electrical Engineering (JICTEE) (pp. 1–4). IEEE 8. Raponi, S., Ali, I., & Oligeri, G. (2020). Sound of guns: Digital forensics of gun audio samples meets artificial intelligence. arXiv preprint retrieved from arXiv:2004.07948 9. Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., & Plumbley, M. D. (2015). Detection and classification of acoustic scenes and events. IEEE Transactions on Multimedia, 17(10), 1733–1746. 10. Aucouturier, J. J., Defreville, B., & Pachet, F. (2007). The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music. The Journal of the Acoustical Society of America, 122(2), 881–891. 11. Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., & Sarti, A. (2007). Scream and gunshot detection and localization for audio-surveillance systems. In 2007 IEEE Conference on Advanced Video and Signal Based Surveillance (pp. 21–26). IEEE. 12. Chacon-Rodriguez, A., Julian, P., Castro, L., Alvarado, P., & Hernández, N. (2010). Evaluation of gunshot detection algorithms. IEEE Transactions on Circuits and Systems I: Regular Papers, 58(2), 363–373. 13. Morehead, A., Ogden, L., Magee, G., Hosler, R., White, B., & Mohler, G. (2019). Low cost gunshot detection using deep learning on the raspberry pi. In 2019 IEEE International Conference on Big Data (Big Data) (pp. 3038–3044). IEEE. 14. Kiktova, E., Lojka, M., Pleva, M., Juhar, J., & Cizmar, A. (2015). Gun type recognition from gunshot audio recordings. In 3rd International Workshop on Biometrics and Forensics (IWBF 2015) (pp. 1–6). IEEE. 15. Dogan, S. (2021). A new fractal h-tree pattern based gun model identification method using gunshot audios. Applied Acoustics, 177, 107,916 16. Rao, K. R., Kim, D. N., & Hwang, J. J. (2010). Fast fourier transform: Algorithms and applications 17. Lilien, R. (2018). Development of Computational Methods for the Audio Analysis of Gunshots. Cadre Research Lab 18. Shewalkar, A. N. (2018). Comparison of RNN, LSTM and GRU on speech recognition data
Different Skin Tone Segmentation from an Image Using KNN for Sign Language Recognition Rakesh R. Savant , Jitendra V. Nasriwala , and Preeti P. Bhatt
Abstract Color as a feature has advantages like it is invariant to scaling, rotation, and partial occlusion changes. Skin color segmentation has many applications like sign language recognition, hand and face gesture recognition, biometric applications, face detection, and analysis of facial expressions. Due to the importance of an effective skin segmentation method, the Machine Learning (ML)-based skin segmentation approaches are studied in this paper. The objective of the work is to segment the different human skin tones for sign language recognition. The skin segmentation dataset from the UCI machine learning repository is used to evaluate the effect of various supervised learning algorithms. Despite the comparison criteria, KNN is found to be the desirable classifier. There are two color spaces RGB and HSV, considered in experiments, and the HSV representation gives better performance in the segmentation of various skin tones than the RGB color space. Keywords Skin segmentation · RGB and HSV color space · KNN · Various skin tones · Sign language recognition
1 Introduction In sign language recognition, the fingertips, finger folds, and hand orientations are significant features to recognize signs. For example, to recognize similar-looking signs of the alphabet “V” and number “2” of Indian Sign Language [1]. Color as a feature has advantages like it is invariant to scaling, rotation, and partial occlusion changes. Color images provide more information as compared to gray images. It is faster to process color information than other features [2]. The color image is more convenient than gray images while extracting the feature [2]. Skin color-based segmentation is essential in many applications like sign language recognition, hand and face gesture recognition, biometric applications, face detection, analysis of facial R. R. Savant (B) · J. V. Nasriwala · P. P. Bhatt Faculty of Computer Science, Babu Madhav Institute of Information Technology, Uka Tarsadia University, Bardoli, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_9
109
110
R. R. Savant et al.
expressions, etc. [3–5]. The pixel classification for skin and non-skin regions for hand and face segmentation is considered an essential technique to segment face and hand [6]. It is vital to keep relevant information by eliminating unwanted background details for further processing. Our study covers skin and non-skin pixel segmentation using machine learning approaches. There are two skin color segmentation techniques, i.e., pixel-based and region-based. In a pixel-based technique, each pixel in a given image is evaluated based on some criteria to determine whether it is a skin or non-skin pixel. In regionbased techniques, the spatial relationship of a pixel is considered to identify the skin region from the image [4]. While considering pixel-based segmentation, with the availability of standard skin and non-skin pixel datasets, we can develop the segmentation approach based on machine learning techniques to segment skin pixels from the image [2]. In this study, the skin segmentation dataset is used from UCI Machine Learning Repository for machine learning classifiers [7].
2 Related Work Much work has been reported for skin segmentation using different approaches. Each researcher has provided different algorithms and aspects for skin segmentation. Some focused on color models to improve skin detection accuracy [2, 8, 9]. RGB color space is used as a default color space to store digital images [10]. Different color spaces are used for applications like image processing and computer graphics. The RGB, YUV, and CMY color models are suitable for technical aspects. It does not correspond to how humans recognize or describe the colors. Much work is reported for skin and non-skin pixel classification using Machine Learning (ML)-based approaches [2, 4, 11–13]. In such methods, the model is trained on the skin and non-skin pixel datasets and then classified new pixels into the skin and non-skin classes. Our study uses supervised machine learning techniques on RGB and HSV color spaces for skin and non-skin pixel segmentation in an image.
2.1 Color Space Skin color is considered a significant feature to distinguish between the skin and non-skin pixels in an image. Many color models were invented to quantify color with numerical values. Each of them has different ways to describe the colors, and each has various applications to work upon. The two known color spaces RGB and HSV are considered in our study. RGB. It is the common color space used by computers, graphics, and digital display systems. This color space is an adaptive color model which is having three channels. Each color channel represents primary colors Red (R), Green (G), and Blue (B), respectively. Three channels (R, G, and B) are imposed with some arbitrary
Different Skin Tone Segmentation from an Image …
111
intensity to form a specific color, and this mixture produces desired color. Linear and non-linear transformations convert RGB color space to other color spaces [10]. HSV. HSV color space is more intuitive to how people see the natural colors. HSV color model consists of three aspects, Hue (H), Saturation (S), and Value (V). The Hue channel presents specific color that is desired. The saturation channel shows a gray color level in a particular color. The value channel works with the combination of saturation and describes the intensity of the color between 0 and 100 percentage; 0 is completely black, and 100 means the brightest color [10].
3 Experiments This section covers skin and non-skin classification dataset analysis and the comparative study to analyze the performance and accuracy of four supervised learning algorithms for skin segmentation. Also, evaluate the effects of color spaces RGB and HSV on the skin and non-skin classification using the KNN classifier.
3.1 Dataset Description The Skin Segmentation dataset is available on the UCI machine learning repository used in this study for experiments and analysis [7]. Dataset has a total of 245,057 samples. From total samples, 50,859 (21%) are skin samples and 194,198 (79%) are non-skin samples (Fig. 1). The samples belonging to the dataset are Red (R) (0– 255), Green (G) (0–255), and Blue (B) (0–255) values of a pixel taken randomly from human face images with labels 1 for skin and 2 for non-skin pixel. The face images of different age groups (young, middle, old), race groups (white, black, Asian), and genders are obtained from FERET and PAL databases [7]. Dataset has 51,444 unique samples (28% skin and 72% non-skin), and 11 distinct RGB samples out of 51,444 belong to both skin and non-skin classes.
3.2 Supervised Learning Algorithms Comparative Study In the given study, we experimented with four different supervised learning methods (K-Nearest Neighbor, Decision Tree, Naïve Bayes, and Logistic Regression) for comparison and performance analysis. To analyze the performance of various methods, the confusion matrix is used as a performance evaluator tool. In Table 1, the understanding of the confusion matrix is given to evaluate the performance of supervised learning algorithms. The confusion matrix has four values for binary classification, as described in Table 1. The diagonal values, namely true positive (TP) and true negative (TN), hold
112
R. R. Savant et al.
Fig. 1 Skin and Non-skin class distribution in the dataset
Table 1 Confusion Matrix
Actual Classes Predicted Classes
True (1)
False (0)
True (1)
True Positives (TP)
False Negatives (FN)
False (0)
False Positives (FP)
True Negatives (TN)
correct predictions. False positives (FP) and false negatives (FN) are referred to as wrong predictions made by the classifier. The accuracy of the classifier is calculated based on the values in the confusion matrix with Eq. (1). Accuracy = (T P + T N )/(T P + F P + F N + T N )
(1)
Table 2 covers the comparative analysis of four supervised learning methods with their accuracy and confusion matrix values. For the comparative analysis, the dataset is divided in 80:20 ratio where 80% of samples are considered as a training set, and 20% of samples are considered as testing samples, i.e., out of a total of 245,057 samples in the dataset, 196,045 samples are used for training, and 49,012 samples are used for testing the performance of the classifier. From Table 2, the confusion matrix values are derived as follows: TP—Skin classes are predicted as skin class. TN—Non-skin classes predicted as a non-skin class. FN—Non-skin class predicted as skin class. FP—Skin class predicted as a non-skin class. From Table 2, we can conclude that the accuracy of KNN is highest compared to other supervised learning methods. The accuracy of KNN and Decision Tree gives
Different Skin Tone Segmentation from an Image …
113
Table 2 Classifier Accuracy and Confusion Matrix entries Classifier
Accuracy (%)
TP
TN
FN
FP
KNN
99.95
10,346
38,641
2
23
DT
99.91
10,325
38,643
23
21
NB
92.26
7553
37,667
2795
997
LR
91.84
8484
36,530
1864
2134
Fig. 2 Real image segmentation results; A Original image, B KNN, C DT, D NB, and E LR
nearer measures with a difference of 0.04%, but Decision Tree has false negative (FN) predictions more than KNN. The Skin segmentation dataset used in the study is highly imbalanced, and decision trees are biased with the imbalanced dataset, and the same dataset gives good predictions on KNN. Due to the bias factor in the decision tree, the prediction of an actual image is not good compared to KNN. The visual analysis of various methods is given in Fig. 2. From Fig. 2, the experimental results on the skin dataset revealed that in most cases and despite parameters, the result of KNN is outstanding compared to other supervised learning methods. Also, the different color tones can be classified using machine learning-based skin segmentation. In KNN, K’s value is considered a hyperparameter to tune the model. If the value of K is low, that means the model is overfitting. In such a case, the accuracy of the dataset is very high, but the accuracy is reduced when the same model applies to real data. In other cases, a high K value causes underfitting of the model and is computationally expensive. There is no concrete method to decide the value of parameter K. It is a trial and error process to select the value of K [10]. One common phenomenon to determine K’s value is the square root of the total number of samples in the training set, i.e., sqrt (number of samples in the training set). The distance measure method is another hyperparameter in KNN. Our study uses the eminent distance algorithm Euclidean distance to find the distance between the neighbors in the classification process [14]. The experiments given in Table 3 show the effect of K in KNN with the corresponding confusion matrix values. From the analysis of Table 3, we can conclude that the highest accuracy is when the value of K is minimum. Increasing the value of K can decrease the accuracy and increase the computation time. Also, it affects the false positives (FP) and false negatives (FN) while testing on sample data. While training and testing on the dataset, the false positives (FP) increase when the value of K increases; as previously discussed, the problem of overfitting can be caused when the training accuracy is very high.
114
R. R. Savant et al.
Table 3 Summary of experimental results of KNN algorithm with different K values Value of K
Accuracy (%)
TP
TN
FN
FP
Correct Classified
False Classified
K=3
99.957
10,345
38,646
3
18
48,991
21
K=5
99.948
10,346
38,641
2
23
48,987
25
K = 10
99.942
10,347
38,637
1
27
48,984
28
K = 20
99.912
10,347
38,622
1
42
48,969
43
K = 40
99.895
10,347
38,614
1
50
48,961
51
K = 80
99.857
10,345
38,597
3
67
48,942
70
K = 100
99.808
10,345
38,573
3
91
48,918
94
K = 200
99.726
10,345
38,533
3
131
48,878
134
K = 300
99.702
10,344
38,522
4
142
48,866
146
K = 400
99.673
10,340
38,512
8
152
48,852
160
K = 450
99.602
10,339
38,478
9
186
48,817
195
K = 500
99.585
10,338
38,471
10
193
48,809
203
K = 700
99.412
10,335
38,389
13
275
48,724
288
K = 800
99.236
10,318
38,320
30
344
48,638
374
K = 900
99.175
10,312
38,296
36
368
48,608
404
Increasing the value of K causes underfitting, affecting the prediction of real data. In Fig. 3, the effect of value K is summarized. In Table 3, the summary is given for different K values. It states that the lowest K values give high accuracy on the dataset, but real data prediction is not accurate. The reason behind this effect is the model is overfitted. In case of K value increase, some skin regions in the image are wrongly predicted by the model. As our dataset has a large number of samples and as per the results, the value of K = 400 to 500 range gives good results on real data. The ideal K value calculated based on the formula, i.e., the square root of the total number of training samples gives the answer 443 (sqrt (196,045)), and based on experiments, the results of K = 450 is comparatively good as compared to other K values. And the model also predicts the different skin tones in the image and gives good segmentation accuracy on images with varying skin tones. Other experiments are performed on the HSV color space to classify the skin and non-skin pixels from the image. The RGB values from the dataset are converted into HSV values before training and testing the classifier for the experiments. The algorithm below represents skin segmentation using the KNN classifier on HSV color space. Step 1 Step 2 Step 3 Step 4
Load skin segmentation dataset. Convert RGB values to their equivalent HSV values. Perform training and testing splits on dataset (80:20). Train and test classifier on dataset.
Different Skin Tone Segmentation from an Image …
115
Fig. 3 Summary of results of different K values in segmentation of varying skin tone pixels in an image
Step 5 Step 6 Step 7 Step 8
Input an image. Convert RGB image to HSV. Resize an image (128*128). Scan each pixel of the image and predict its class (skin or non-skin pixel): If prediction == 1 (Skin) Keep the pixel color values as it is (Skin Pixel). If prediction == 2 (Non-skin) Put black color values to that pixel (0 for black color).
From the comparison given in Table 4, the accuracy on the dataset is almost the same for RGB and HSV color space, but for the pixel classification on a real image, the accuracy of HSV color space is reasonable as compared to RGB. We have tested various K values for both RGB and HSV color space, and based on those experiments, we found that HSV with K = 150 gives equivalent results as RGB with K = 450. Table 5 indicates the results of the stated comparisons. Figure 4 indicates that when the prediction is made on the real image, the results of HSV color space are impressive compared to RGB color space with a lesser K value than the K value for RGB color space. We conclude that segmentation accuracy for various skin tones on the image under HSV color space gives good segmentation accuracy with a minimum K value compared to RGB color space.
116
R. R. Savant et al.
Table 4 Comparative summary of experimental results of KNN algorithm with RGB and HSV color space Value of K
K=5
Color Space
Accuracy (%)
TP
TN
FN
FP
Real image pixel classification accuracy (%)
HSV
99.946
10,347
38,639
1
25
70.92
RGB
99.948
10,346
38,641
2
23
72.49
K = 10
HSV
99.936
10,348
38,633
0
31
75.24
RGB
99.942
10,347
38,637
1
27
74.59
K = 20
HSV
99.914
10,348
38,622
0
42
77.46
RGB
99.912
10,347
38,622
1
42
75.75
K = 40 K = 100
HSV
99.846
10,348
38,589
0
75
77.80
RGB
99.895
10,347
38,614
1
50
76.56
HSV
99.738
10,348
38,536
0
128
78.94
RGB
99.808
10,345
38,573
3
91
77.30
Table 5 KNN algorithm comparison with RGB and HSV color space Value of K
Color Space
Accuracy (%)
TP
TN
FN
FP
K = 150
HSV
99.708
10,348
38,521
0
143
K = 450
RGB
99.602
10,339
38,478
9
186
Fig. 4 Comparison of RGB and HSV color space-based skin segmentation using KNN with different K values to segment various skin tones
Different Skin Tone Segmentation from an Image …
117
4 Conclusion This study addressed the skin segmentation problem using supervised learning (SL) methods. Four SL algorithms, namely KNN, DT, NB, and LR, are considered in the study. From the experiments and analysis on those algorithms, the performance of KNN is good, so further experiments are carried out on KNN. Various comparative experiments and analyses are performed using the KNN classifier and studied the effect of hyperparameter K on the dataset and real image pixels. The HSV color space is considered suitable for various skin tones segmentation based on experiments and analysis. The comparative study on RGB and HSV concludes that the skin segmentation under the HSV color space gives good results compared to RGB color space. We suggest to examine the skin segmentation dataset with neural networks and deep learning techniques with other color spaces for future works.
References 1. Savant, R., & Ajay, A. (2018). Indian sign language recognition system for deaf and dumb using image processing and fingerspelling: A technical review. National Journal of System and Information Technology, 11(1). 2. Naji, S., Jalab, H. A., & Kareem, S. A. (2019). A survey on skin detection in colored images. Artificial Intelligence Review, 52(2), 1041–1087. 3. Ahmad, T. A. A. N., & Zakarya, F. A. R. O. U. (2020). Supervised learning methods for skin segmentation classification. 4. Dastane, T., Rao, V., Shenoy, K., & Vyavaharkar, D. (2021). An effective pixel-wise approach for skin colour segmentation using pixel neighbourhood technique. arXiv:2108.10971. 5. Sikandar, T., Ghazali, K. H., Mohd, I. I., & Rabbi, M. F. (2017, July). Skin color pixel classification for face detection with hijab and niqab. In Proceedings of the International Conference on Imaging, Signal Processing and Communication (pp. 1–4). 6. Chen, W., Wang, K., Jiang, H., & Li, M. (2016). Skin color modeling for face detection and segmentation: A review and a new approach. Multimedia Tools and Applications, 75(2), 839–862. 7. Bhatt, R., & Dhall, A.: Skin segmentation dataset. UCI Machine Learning Repository. https:// archive.ics.uci.edu/ml/datasets/Skin+Segmentation. 8. Loke, P., Paranjpe, J., Bhabal, S., & Kanere, K. (2017, April). Indian sign language converter system using an android app. In 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA) (Vol. 2, pp. 436–439). IEEE. 9. Shaik, K. B., Ganesan, P., Kalist, V., Sathish, B. S., & Jenitha, J. M. M. (2015). Comparative study of skin color detection and segmentation in HSV and YCbCr color space. Procedia Computer Science, 57, 41–48. 10. McBride, T. J., Vandayar, N., & Nixon, K. J. (2019, January). A comparison of skin detection algorithms for hand gesture recognition. In 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA) (pp. 211–216). IEEE. 11. Coogan, T., Awad, G., Han, J., & Sutherland, A. (2006, November). Real time hand gesture recognition including hand segmentation and tracking. In International Symposium on Visual Computing (pp. 495–504). Berlin, Heidelberg: Springer. 12. Barui, S., Latha, S., Samiappan, D., & Muthu, P. (2018). SVM pixel classification on colour image segmentation. Journal of Physics: Conference Series, 1000(1), 012110). IOP Publishing.
118
R. R. Savant et al.
13. Monisha, M., Suresh, A., & Rashmi, M. R. (2019). Artificial intelligence based skin classification using GMM. Journal of Medical Systems, 43(1), 1–8. 14. George, Y., Aldeen, M., & Garnavi, R. (2017). A pixel-based skin segmentation in psoriasis images using committee of machine learning classifiers. In 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA) (pp. 1–8). IEEE.
MuteMe—An Automatic Audio Playback Controller During Emergencies Jeremy Dylan D’Souza, Venkitesh S. Anand, Akhil Madhu, and Shini Renjith
Abstract Earphones and headphones are devices that are very widely used these days for a number of purposes. Whether it be to listen to your favorite songs or watch a movie, we all end up using either device at some point or the other during our days. Especially with the COVID-19 pandemic that struck us last year, we have all been confined to our homes for the most part. This leads to an increase in the usage of such devices. We have often found ourselves bored and feel like watching a movie or feeling quite blue and end up listening to our all-time favorite songs. It has become an essential part of our daily lives at this point. With the COVID19 pandemic starting from 2020, there has been a significant rise in the number of people purchasing wearable earphones, headphones, and other audio devices. A study conducted by the International Data Corporation showed a 144.3% YoY growth in available earwear devices in 2020. A study conducted by the team at boAt also shows that their company saw significant growth in terms of their products which were purchased by consumers like us during the pandemic period. Earphones and headphones are available from a plethora of brands and in a wide range of prices depending upon our requirements. Despite all these differences in specific models and brands, one thing that all earphones/headphones have in common is the fact that they are designed to provide a higher level of isolation from our surrounding environment compared to hearing the same media through our basic loudspeakers. Now it is also known that people prefer higher levels of isolation while listening to different media that they are willing to pay a higher price for achieving it, and manufacturers also capitalize on this knowledge. There are special models of earphones with Noise Canceling Technology—which further isolates the user from hearing barely anything going on in their surroundings. There are positives to this aspect—yes, but there are also negatives to this. Suppose a user, listening to audio using Noise Canceling headphones, is being called out to by somebody in their vicinity in the event of an emergency. This particular aspect makes it extremely difficult to alert the user and can even be a threat to their own lives. Our proposed system aims to combat this problem J. D. D’Souza (B) · V. S. Anand · A. Madhu · S. Renjith Department of Computer Science and Engineering, Mar Baselios College of Engineering and Technology, Thiruvananthapuram 695015, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_10
119
120
J. D. D’Souza et al.
by using a relatively simple technology of voice recognition software combined with preexisting hardware in the user’s device. Keywords Voice recognition · Playback · Headphone safety · Machine learning · Mobile application
1 Introduction We are currently living in and heading further toward the age of automation, where pretty much everything we do is either voice-activated or automated in some form. These innovations are discovered to make human lives easier and even though they may be small or minuscule things, automation really does help us out for the better. Our idea is also one such innovation that we call MuteMe, a safety software/feature that can be implemented in our mobile devices to ensure the safety of the user and their surroundings in the event of any emergencies. The basic idea for this was derived from one simple notification that all of us receive on our mobile phones. Suppose we are listening to a song or watching a video on our device, and we get a notification popup. As long as the phone is not in Silent mode, the OS is designed to mute or temporarily soften the audio playback in the background, to let the notification tone ring out loud enough. A simple but effective feature to make sure that the user doesn’t miss any important notifications. We have taken this very idea and implemented it by keeping the “notifications” of our surroundings in mind. An example of this would be—A person with their earphones plugged in and gaming away on their phone. If a family member calls out to this person in the event of an emergency, it is highly likely that they won’t be able to hear it due to the increased isolation technology of modern earphones and headphones. Our software proposes to avoid such a situation from occurring by making use of existing hardware combined with our software to provide a neat and simple solution. The rest of this paper is organized as follows. Section 2 summarizes the related literature and Sect. 3 briefs the proposed architecture. Section 4 explains the implementation details, Sect. 5 performs a comparative analysis, and Sect. 6 concludes the paper along with the details of our next steps.
2 Related Works There is a decent amount of literature available in the field of voice recognition technology used in earphones and headphones to improve the safety of the user wearing them. Ghosh et al. [1] proposed a simple design that aims to detect vehicular threats to the user in their immediate environment. This works based on detecting the sound of a horn, using the preexisting microphone on our mobile device to capture the sound of vehicle horns and alert the user in real time. Multiple horn samples were
MuteMe—An Automatic Audio Playback Controller …
121
collected in the testing phase and finally boiling it down to the most common horn frequency of 2 clip lengths. One clip of 30 min duration was fed to the algorithm, which had 80% accuracy and close to 50 ms response time in alerting the user. The next clip had a duration of 1 h and yielded an accuracy of 75% with a reduction in response time at around 40-45 ms. Xia et al. [2] came up with an intelligent wearable system called PAWS. Pedestrian Audio Wearable System, or PAWS, is a low-cost headset-based wearable device that aims at improving urban safety. It consists of 4 MEMS microphones that are highly sensitive, signal processing as well as machine learning which when combined together helps to detect and locate imminent dangers such as approaching vehicles and give real-time alerts to the users. A better and improved system called PAWS Low-Cost is also implemented, which aims to better the basic PAWS and improve the power consumption of commercial off-the-shelf parts by offloading critical and computationally expensive features. Basu et al. [3] proposed a “smart headphone” device that is designed to detect and relay speech sounds in a user’s environment through their headphones/earphones. This is aimed at mostly a house or social setting where a user might miss their name being called out while being plugged into their mobile device. This idea relies on two pieces of signal processing—The first one is a speech recognition algorithm that is designed to robustly differentiate the sounds of speech from other sounds in an environment. The second one is a method to identify which direction the sound of speech is originating from. This is implemented using a body-based array. Das et al. [4] developed a software-based application that can be integrated into modern smartphones as a stock app right out of the box. It is designed to utilize the existing hardware inside the mobile device so as to cut costs and make the implementation of the idea as neat as possible with no external hardware requirements. It is a non-maskable software, which means that short of shutting off your device, there can be nothing done to mask or prevent this application from working or notifications coming through. It always remains in working mode whether the user is using the device or not. This is implemented to ensure the maximum safety of the user. Sounds are detected only at a fixed distance of 50 m from the user, which gives enough time for the user to be alerted to any imminent danger. When any vehicle is approaching the user within 50 m distance, the antenna in the car transmits a prerecorded message through its narrowband FM modulation to the user’s device which then alerts the user through their earphone/headphone that a vehicle is approaching them. Chen et al. [5] have published a review on how IBM Watson, a cognitive computing technology, has been configured to support life science research. Cognitive technologies offer solutions for integrating and analyzing big datasets. This is what IBM Watson excels in and is one of the most widely used cognitive technology. Anggraini et al. [6] have created a speech recognition application for the speech impaired using the android-based Google Cloud Speech API. In this research, the authors have developed a speech recognition application that can recognize the speech of the speech impaired and can translate it into text form with input in the form of sound detected on a smartphone. Using Google Cloud Speech API allows converting
122
J. D. D’Souza et al.
audio to text, and it is also user-friendly to use such APIs. The API integrates Google Cloud Storage for data storage. Jadhav et al. [7] have published work on sound classification using Python. Their research is based on Automatic Sound Classification and its various real-life applications. They have used Python language and deep learning techniques for this research.
3 Proposed Architecture MuteMe is a proposed system that consists of three subsystems as shown in Fig. 1, it is the collective operation of these that makes MuteMe possible. The keyword database system will be responsible for storing and maintaining the collection of keywords, keywords here being the words that will be used to trigger the software to stop audio playback from the device. This will basically be a database management system and will interface with the device to let the user add, change, or delete the keywords. An additional responsibility will also be to service the keyword list to the Recognition Engine in order to check for matches between the translated audio and the keywords. The Recognition Engine would ideally be a process that runs in the background. Using the Device Interface, it gets audio input from the microphone(s) and also works with the Keyword Database to check for matches. The Device Interface will be the system that takes care of the user interface and also interfaces with the device in order to get data from the input audio stream and issue commands to the output audio stream. The most fundamental job of MuteMe is to pause all audio playback from the mobile device upon the detection of a keyword, i.e., a name. The application will begin operation like any other application; the user has to run/open it and start the process. After it starts it will run in the background while the user goes about his tasks. In the background, the application will continuously be getting an input audio feed from the microphone(s) of the device or earphones. This feed is analyzed and translated by the Recognition Engine to produce corresponding strings. These strings are then checked with a set of user-defined keywords that are stored in the application. Fig. 1 Architecture
MuteMe—An Automatic Audio Playback Controller …
123
Once a particular string matches, which means that the user is being called in the background, but he is not aware of it since he is using the earphones, the application will pause all audio output from the device thus enabling the user to hear sound from his environment and whoever is calling him.
4 Implementation This section gives a detailed explanation of how to implement the proposed project. The aim is to give an idea about the feasibility and complexity of the implementation.
4.1 Platform MuteMe is proposed to be a mobile feature and hence this is a software update injected into the OS from the manufacturer’s side. Initial development is confined to Android and iOS implementation. Older OSs up to Froyo in Android (API level) and iOS 6 in Apple devices are supported considering the use of speech recognition as mentioned above (Fig. 2).
4.2 Software Development As most smartphone OS are written, these features also need to be coded in C/C++, C#, or Kotlin and converted to libraries of modules for unit-wise working. APIs concerning each attribute like voice recognition, frequency monitoring, etc. are also used. We calibrate these APIs using machine learning for understanding the sound intensity range which should be considered for active listening.
4.3 Machine Learning Machine learning is a major part as it helps to continuously improve the accuracy of voice recognition and eliminates the possibility of false alarms. This also helps in isolating sounds made by other sound sources which might be around the same frequency or intensity as alarming sounds considered for this application to work. We could also use this to understand the users’ regular environment sound exposure to calculate the type of expected sound ranges to scan for. We have considered a reference range of 70–129db (world record sound intensity by a human) to isolate alarming cries from normal speech. But this could be calibrated using machine learning.
124 Fig. 2 Flowchart of MuteMe functionality
J. D. D’Souza et al.
MuteMe—An Automatic Audio Playback Controller …
125
5 Comparative Analysis The proposed MuteMe solution outperforms traditional systems in multiple areas. The key differentiators include the following: ● Ability to deal with multiple scenarios: The proposed model considers audio recognition from different scenarios as opposed to a single situation like traffic or a crowded place. This means that a wide range of spatial audio range is considered to implement this proposal. ● No extended devices required: The proposed model can work in the background of users’ mobile phones and hence does not require any wearable device for it to work. This indicates less expense to adopt this technology. ● Easy to set up and use: Only a one-time setup is required for the implementation of this solution to operate unless calibration is required which can be manually done by the user. It will be a very simple procedure and the UI is designed to cause little to no confusion.
6 Conclusion The idea of MuteMe was conceived as a result of the new lifestyle we adopted caused by the pandemic. Staying, working, and spending more time at home only increases the time we will spend with our devices. This also leads to a greater increase in headphone usage time for the ones that do. In view of this situation, our idea could help in emergency situations and reduce the response time of individuals who would otherwise need some other way of noticing the emergency apart from audio/sound. The next step with this idea would be to gather all the needed specific technologies as mentioned above and integrate them into working together in order to obtain a fully functional mobile application. The specified technologies require and implement simple fundamental functions of a mobile device and should not require any special/new technical innovations.
References 1. Ghosh, D., Balaji, K., & Paramasivam, P. (2015). A method to alert user during headset playback by detection of horn in real time. In 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS) (pp. 50–54). https://doi.org/10.1109/RAICS.2015.7488387. 2. Xia, S., et al. (2019). Improving pedestrian safety in cities using intelligent wearable systems. IEEE Internet of Things Journal, 6(5), 7497–7514. https://doi.org/10.1109/JIOT.2019.2903519 3. Basu, S., Clarkson, B., & Pentland, A. (2001). Smart headphones: enhancing auditory awareness through robust speech detection and source localization. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221) (Vol. 5, pp. 3361–3364). https://doi.org/10.1109/ICASSP.2001.940379.
126
J. D. D’Souza et al.
4. Das, S., Banerjee, D., Kundu, S., Das, S., Kumar, S., & Ghosh, G. (2017). A new idea on road safety using smartphone. In 2017 1st International Conference on Electronics, Materials Engineering and Nano-Technology (IEMENTech) (pp. 1–4). https://doi.org/10.1109/IEMENT ECH.2017.8077002. 5. Chen, Y., Argentinis, J. D. E., & Weber, G. (2016). IBM Watson: how cognitive computing can be applied to big data challenges in life sciences research. Clinical Therapeutics, 38(4), 688–701. https://doi.org/10.1016/j.clinthera.2015.12.001. 6. Anggraini, N., et al. (2018). Speech recognition application for the speech impaired using the android-based google cloud speech API. Telkomnika, 16(6), 2733–2739. https://doi.org/ 10.12928/telkomnika.v16i6.9638. 7. Jadhav, S., Karpe, S., & Das, S. (2021). Sound classification using python. In ITM Web of Conferences (Vol. 40). EDP Sciences. https://doi.org/10.1051/itmconf/20214003024.
Chi-Square Top-K Based Incremental Feature Selection Model for BigData Analytics Subhash Kamble, J. S. Arunalatha, K. Venkataravana Nayak, and K. R. Venugopal
Abstract The exponential rise in advanced software computing, internet technologies, and humongous data has given rise to a new paradigm called BigData, which requires an allied computing environment to ensure 4Vs aspects, often characterized as varieties, volume, velocity, and veracity. In sync with these demands, most of the classical commutating models fail, especially due to large unstructured features of gigantically huge volume. To alleviate this problem, feature selection can be a viable solution; provided it guarantees minimum features with optimal accuracy. In this reference, the proposed work contributed a first of its kind solution which could ensure minimum features while ensuring expected higher accuracy to meet 4V demands. To achieve it, in this paper, a robust Chi-Squared Select-K-Best Incremental Feature Selection (CS-SKB-IFS) model is developed that achieved a minimum set of features yielding the expected accuracy. Subsequently, over the selected features, the CS-SKB-IFS model is used for further classification using the Extra Tree classifier. Thus, the strategic amalgamation of the CS-SKB-IFS model achieved the accuracy of (91.02%), F-Measure (91.20%), and AUC (83.06%) than the other state-of-art methods. In addition to the statistical performance, CS-SKB-IFS exhibited significantly smaller computational time (1.01 s) than the state-of-art method (6.74 s). Keywords Bigdata analytics · Incremental feature selection · Chi-Square · Extra tree classifier
S. Kamble (B) · J. S. Arunalatha Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bengaluru, India e-mail: [email protected] K. Venkataravana Nayak Department of Computer Science and Engineering, Jain University, Bengaluru, India e-mail: [email protected] K. R. Venugopal Bangalore University, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_11
127
128
S. Kamble et al.
1 Introduction In the last few years, the exponential rise in advanced computing software, the internet, and integrated services has given rise to a new paradigm called BigData analytics. Functionally, BigData analytics processes humongous data collected from different sources to provide users, the intended information to make optimal decisions. The key enabling technologies such as cloud computing, data mining, and now the internet of things have broadened the horizon of BigData analytics serving large purposes including business communication, healthcare, social media, web services, finance, and industrial communication etc., [1]. Though, BigData has been playing a central role in contemporary decentralized computing and decision systems; however, the highly complex data nature makes the overall process too challenging [1, 2]. In the 4V paradigm, velocity expects the computing model to yield a very swift target response, while variety states the multi-dimensional. Volume refers to the gigantic data size, while veracity states for high accuracy. To cope with the modern analytics problems, a BigData analytics model is expected to possess robustness to process over humongous, heterogeneous, unstructured multi-dimensional data to yield highly accurate results even in small response time [2]. Achieving such specific performance often turns out to be very complex, especially over the large, multidimensional, and unstructured inputs. On the contrary, the conventional analytics models often apply full-batch-mode learning fail in meeting aforesaid demands, and hence make them limited towards contemporary analytics systems. The depth assessment of the related works of the literature reveals that merely improving classification models or even resampling the inputs cannot yield superior performance, rather retaining the suitable set of features with minimum possible size can improve both computational efficacy as well as accuracy. Though random reduction of features or even samples can have an adverse impact on the accuracy as well, and therefore designing a robust performance-sensitive feature selection is of great significance. Recalling the fact that feature selection methods play a decisive role in improving BigData analytics performance; however, retaining the suitable set of features is a challenge [2–4]. Despite the fact that a number of feature selection methods have been developed so far that employ information or entropy values to retain the suitable set of features, they are limited, especially under real-time computation with huge data set and nonlinear feature density [5]. Additionally, such classical approaches employ features of training instances as a priori, which cannot be generalizable in all real-time BigData environments. This is because there can be an environment where the inputs can be fed dynamically, where the mining models would be required to learn over the data dynamically to yield outputs. Additionally, these approaches should have the ability to perform optimally even with smaller data with class-imbalance conditions. Considering these all computational complexities, authors suggested rough set methods for feature selection; however, failed in addressing dynamic data nature, especially over huge input instances. Additionally, they merely emphasized dimension condensation and ignored the cost of feature insensitiveness and uncertainty [5, 6]. A
Chi-Square Top-K-Based Incremental Feature Selection Model …
129
few recent works like [3] improved the conventional method with the fuzzy-based rough set concept to enhance feature selection in an online model; however, failed in addressing redundant instance problems, which can reduce computational performance as well as accuracy. Recent works of literature reveal that the concept of incremental feature selection can improve overall performance. However, the concept of incremental feature selection can play a vital role because of its ability to learn data online and evaluate the efficacy of the feature dynamically to retain the most significant for prediction [5, 6]. Unlike classical offline feature selection methods, the proposed work contributed an online incremental feature selection method that considers both features volumetric optimality as well as corresponding accuracy optimization towards 4V’s-oriented BigData analytics. The proposed model helps to retain lower feature counts size to achieve superior efficiency. It uses Chi-Square-based Select-K-Best model to perform incremental feature selection where, the Chi-Square algorithm, being a heuristic concept employed to a fitness function possessing dual objectives, the minimum number of feature(s), and the maximum accuracy. In this manner, model ensure that the selected features are sufficient enough (i.e., Volume) to meet Veracity (i.e., accuracy) demands. It improves the computational performance of the BigData analytics solutions. To ensure higher accuracy, the proposed model uses an Extra Tree Classifier to obtain the accuracy as more realistic towards BigData analytics.
1.1 Motivation Considering the research gaps and allied scopes, in this work, the key motive is to design a robust and first of its kind solution that could address incremental feature selection so as to maintain higher accuracy even at the reduced computational cost.
1.2 Contributions The main contribution of the proposed work includes 1. The Chi-Square-based Incremental Feature Selection is proposed to identify the best features of the data. 2. The performance of the proposed work has been analyzed by considering five various datasets.
130
S. Kamble et al.
1.3 Organization The other sections of the manuscript encompass related works in Sect. 2, which is followed by the proposed system in Sect. 3. Section 4 presents the results and discussion, which is followed by conclusions in Sect. 5.
2 Related Works As stated in the previous section, rough-set methods have been applied in extensively many works of literature to perform feature selection. For instance, Jing et al. [7] examined the use of knowledge granularity information in which, at first estimated a granular feature matrix was obtained to be used in an incremental manner to select the set of suitable features. In case of dynamic data, the authors proposed the streaming feature selection concept. Javidi et al. [8] at first calculated the level of significance of each feature, which was used to perform stream-wise feature selection using a rough set algorithm. Liu et al. [9] designed an online multi-label streaming feature selection method; however, the use of neighborhood rough sets model helped achieve better incremental feature selection [10]. In reference to the non-linear feature, the authors used rough set-based incremental feature selection. Jing et al. [11] estimated the efficacy of knowledge granularity followed by a group incremental reduction concept to perform incremental feature selection. Chen et al. [12], on the other hand, applied variable precision rough sets for incremental feature selection. However, these methods did not address major BigData analytics problems (i.e., 4V as a cumulative goal). Yang et al. [13] proposed two fuzzy rough set-based feature selection models, where at first, they split the input data into multiple parts, which were later used to estimate the relative discernibility relationship so as to update the feature subsets. However, the mechanism of relative discernibility relationship estimation is an exhaustive approach as it requires n ∗ n relative discernibility relation matrix for each feature. Here, n represents the number of instances in each feature. It makes these approaches limited over large features and data size. Though entropy information can help achieve feature selection [10], it does not consider both feature and instance aspects together, which could help in achieving superior performance. The use of active incremental feature selection [14] seems to be viable where the use of representative instances can be applied to update features dynamically.
3 Proposed System 3.1 Problem Statement To develop an optimized suitable feature selection mechanism for BigData analytics that provides better efficiency of the model and cost-effective.
Chi-Square Top-K-Based Incremental Feature Selection Model …
131
3.2 Objectives 1. An optimal suitable set of features. 2. Maximize the model output accuracy. 3. Reduce the computation time. The proposed model is presented in Fig. 1, it encompasses three key processes and is explained in the following sections. Data Acquisition: In reference to the data diversity and allied feature heterogeneity, so as to assess the efficacy of the proposed model, the different benchmark data sets including Sonar, Ionosphere, KC1, Page Blocks, and Scene, were taken into consideration from the University of California, Irvine (UCI) repository [15]. These datasets encompassed different features as well as varied sizes, and hence an affirmative performance overall these data can help in generalizing the proposed model for BigData analytics tasks. Moreover, these data avoid any sophisticated pre-processing or feature extraction tasks, which helped us to focus on feature selection and corresponding instance selection to achieve better performance with low computational costs. The snippet of these datasets and corresponding features is given in Table 1. Chi-Squared Select-K-Best Incremental Feature Selection: As stated in the previous section, the proposed model performs incremental feature selection; considering this fact, at first, it performs IFS over input data so as to retain the optimally suitable set of features for further classification. To achieve IFS, the proposed model applied Chi-Squared Select-K-Best Incremental Feature Selection (CS-SKB-IFS). The CS-SKB-IFS model employed the Select-K-best algorithm to estimate the level of significance of each feature. To achieve it, it employed dual objective-driven ChiSquare estimation method that helped in approximating the set of utmost significant features (top-k features). Chi-Squared Test: This method often applies the level of significant information of each feature to perform feature selection. Here, the level of significance is estimated by calculating the value of Chi-Square statistics than the target class and of each feature distinctly used to characterize the relationship between the features and the target variable. It functions as a non-parametric assessment approach particularly designed to compare multiple variables for randomly selected data. It is competent in selecting highly relevant features to the classes and reducing the feature scaling effectively.
Fig. 1 Proposed incremental feature selection-based BigData analytics model
132
S. Kamble et al.
Table 1 Dataset description Datasets Instances Sonar Ionosphere KC1 Page Blocks Scene
0208 0351 2110 5473 2407
Features
Classes
060 034 021 010 299
2 2 2 2 2
In CS-SKB-IFS, the Chi-Square algorithm serves as the initial feature estimator, from the feature space it selects the k-highest scoring features. Mathematically, Chisquare is estimated using (1). Chi − Square(tk , ci ) =
M (PS − RQ)2 (P + R)(Q + S)(R + S)
(1)
In (1), the Chi-Square(tk ,ci ) represents Chi-Square values. M represents numbers of documents in the corpus and P represents class data ci containing tk term. The documents having term tk in other classes be Q, whereas the number of documents in class ci which do not involve any term of tk is R. Correspondingly, S represents a number of documents that do not involve any term tk in other classes. Hence, CSSKB-IFS assigned a score for every feature in every class and lastly combined the overall scores with a single final score (2). max(Chi − Square(tk , ci ))
(2)
Unlike conventional Chi-Squared algorithm [16] where features with the highest score(s) are retained for further computing, CS-SKB-IFS defined a dual objective driven score estimation concept for feature selection. CS-SKB-IFS intended to retain the set of features that could yield higher accuracy even with the minimum feature value counts. To achieve it, CS-SKB-IFS defined a fitness model (3), which was estimated for each feature, thus, unlike merely Chi-Square value, CS-SKB-IFS model applied fitness value was taken into consideration. Fitness = α ∗ P + β ∗ Q
(3)
Chi-Square Top-K-Based Incremental Feature Selection Model …
133
In (3), P and Q represent a number of features and the minimum expected accuracy, respectively. Similarly, α and β are the weight coefficients for feature and accuracy, correspondingly. Being an incremental feature method, CS-SKB-IFS retained the values of α and β as a dynamic which were tuned adaptively to meet expected fitness demands. CS-SKB-IFS applied a lower threshold for α and higher for β. CSSKB-IFS model applied as a dual-objective fitness function that intends, to maintain a lower value of P, while ensuring higher accuracy (i.e., Q). To cope with 4V’s veracity demands, CS-SKB-IFS assigned α as 20% weight to the features, while β was assigned as 80% weight to the accuracy. Here, the values of α were tuned dynamically to meet the fitness value demand (3). Thus, it updated the value of α as per (4). P =1−
(Number of Selected Features) (Total Number of Features)
(4)
In this manner, updating the fitness function with tuned feature numbers (4), it gives (5). (Number of Selected Features) (5) + 0.8 ∗ Accuracy] Fitness = [1 − (Total Number of Features) Thus, employing (5), CS-SKB-IFS selects the features from 20% to 80%, where 20% was considered as the lower bound, while the upper bound was fixed at 80%. In other words, the proposed CS-SKB-IFS model at first takes 20% of the top-k best features and then increments the feature volume while assessing corresponding accuracy performance. To ensure high accuracy with minimum features, CS-SKB-IFS increased the number of features by a fraction of 5%. In this mechanism, the new features are added to the existing feature sets, till it reaches the expected accuracy level. Thus, applying this approach, a suitable set of features was obtained, which could deliver the higher accuracy as shown in Algorithm 1.
Algorithm 1 : Chi-Squared Top-K based IFS for BigData Analytics Input: Dataset Ds with n number of Features Output: Optimal Suitable Set of Features 1: Execute Select-K-Best Method on input data 2: Estimate significant of each features 3: Select the K-highest feature score by max(Chi-Square(tk ,ci )) 4: Calculate the Fitness of the model = α ∗ A + β ∗ B 5: Assign low threshold α = 0.2 and high threshold β = 0.8 6: Increment new feature by adding 5% to the existing feature until better accuracy. 7: return Suitable set of features with better accuracy.
Extra Tree Classifier-based Classification: The Extra Trees classifier creates a group of unpruned decision trees according to the usual top-down methodology. It consists of randomized attribute and cut-point selection at the time of partitioning a
134
S. Kamble et al.
node of a tree. However, it can likewise construct comprehensive randomized trees having structures independent of the training samples’ resulting values. Fundamentally, it is separating itself from other tree-based ensemble strategies because of the following factors: it partitions nodes through random selection of cut points fully and applies a whole training sample for empowering tree evolution. Therefore, the predictions of all the trees or classified results are merged in order to provide the ultimate prediction result. Due to the use of the above-specified factors, this method decreases variance and probability of bias it helps to accomplish accurate and competent classification outputs compared to the weak randomized methods. Hence, employing Extra Tree Classifier classified each data element into two categories and labeled them as 0 or 1.
4 Experimental Results and Discussion In this paper, a robust Chi-Square Select-K-Best Incremental Feature Selection (CSSKB-IFS) was developed. Here, CS-SKB-IFS acts as a significant feature selection model, which can be of great significance for BigData analytics purposes. Noticeably, unlike classical Chi-Square methods, CS-KB-IFS model applied a fitness function targeted to achieve a minimum feature set with higher accuracy. At first, the original data were processed for Chi-Square estimation that helped in estimating the rank of each feature to be sorted subsequently. Once sorting these features with the corresponding rank, CS-SKB-IFS initiated an incremental feature selection concept in which the number of features was selected in sync with the expected accuracy level. The features were increased from a lower limit to the upper limit in increasing order, where the lower limit of feature was assigned at 20%, while the upper limit was fixed at 80%. At first, it feeds 20% of the top-k features and assesses the corresponding accuracy, and updates the sample till it reaches the expected accuracy performance. Noticeably, the input data was processed with Extra Tree Classifier algorithm for classification. The simulated results for both the state-of-art method as well as the proposed CS-SKB-IFS-based model are given in Table 2. Observing the results (Table 2), it can easily be found that the number of features selected with the proposed model is significantly small in comparison to the input data and allied features. The results reveal that with the input data, the average accuracy obtained is 87.01%, while the proposed CS-SKB-IFS model accomplishes better accuracy of 91.02%. A similar performance can be found in terms of F-measure and AUC. The input data with 100% feature sets show an F-Measure of 87.21%, while the proposed model retains an F-Measure of 91.20%, which is higher than the state-of-art method. The AUC performance too has been found affirmative where the proposed CS-SKB-IFS model achieves AUC of 83.06%, while the state-of-art method could achieve a maximum of 82.69%. The overall performance reveals that the proposed CS-SKB-IFS model can guarantee higher accuracy even with much reduced feature volume. The higher AUC and F-Measure indicate that the proposed
Chi-Square Top-K-Based Incremental Feature Selection Model …
135
Fig. 2 Comparison on existing methods
model can be suitable even under data imbalance conditions. This, as a result, can help achieve superior performance towards BigData analytics. For BigData analytics purposes, it is always expected that the proposed model would deliver time-efficient computation while retaining 4V aspects. In this relation, this study examined the time efficiency of the proposed CS-SKB-IFS model. The execution time observed for the proposed CS-SKB-IFS model over the different input data is given in Table 3. Observing the results (Table 3), it can be found that the CS-SKB-IFS proposed model is significantly higher time-efficient than the model without proposed feature selection. Statistically, the average computation time by the CS-SKB-IFS model is 1.01 s, while with the state-of-art method data the time consumption was 6.74 s. Numerically, the proposed model performs faster and hence can be hypothesized to yield high-velocity performance to meet 4V demands. To examine relative performance, two recent works [17, 18] were taken into consideration. Noticeably, in [17], authors contributed a heuristic-driven feature selection model, where they used Gray Wolf Optimizer (GWO) and Particle Swarm Optimization (PSO) algorithms. The aforesaid algorithms focus on selecting the feature sets irrespective of their sample size, and therefore even a reduced feature vector-based data with gigantically large data size might force the model to undergo local minima and convergence, which are not addressed in [17, 18]. In this paper, the proposed model performance is related to the existing models [17, 18]. Considering similar data under study, the simulation results obtained are given in Table 4 and depicted in Fig. 2. In reference to Table 4, it can be found that the number of features selected through CS-SKB-IFS is suggestively lower than the state-of-the-art model. The average performance in [17] exhibited that the minimum feature possible for Sonar was 14.65 as vital, while CS-SKB-IFS identifies a total of 11 important features having the decisive impact on the classification performance. For the Sonar dataset, [18] could reduce the features to 26. Interestingly, CS-SKB-IFS selected merely 11 features as vital ones towards analytics. With merely 11 features, CS-SKB-IFS exhibited
136
S. Kamble et al.
Fig. 3 Computational time analysis
85.78% accuracy, higher than DFRS [18] (85.58%) and GWO-PSO (85.5%). The performance with the Ionosphere dataset too revealed that CS-SKB-IFS identifies merely four features, while GWO-PSO selected 5.45 feature sets for classification. Result confirms that CS-SKB-IFS retains near [17] performance even with the significantly reduced feature set. Unlike [17, 18], CS-SKB-IFS performs feature selection and hence makes it more computationally efficient. For the KC1 dataset, GWO-PSO selected a total of 4.65 feature sets, while CS-SKB-IFS selected merely three features to perform classification. In this reference, CS-SKB-IFS model exhibited an accuracy of 91.02%, which is higher than the GWO-PSO for the same data set (i.e., 89.02%). In sync with this statement, the computational time analysis (in seconds) confirms the hypothesis. Table 5 presents the relative computational time analysis where the computational time consumed by CS-SKB-IFS model is compared with [17] and depicted in Fig. 3. As already stated, [17] proposed heuristic-driven feature selection methods, which are often criticized for their huge computation and allied cost. On the contrary, the CS-SKB-IFS model applied a simple analytical concept to perform feature selection. The results (Table 5) confirm that the proposed model consumes significantly smaller time. On the contrary, despite the fact that GWO-PSO estimated similar features, it underwent huge computational cost, in terms of time (seconds). The average time consumed by the existing GWO-PSO model was 15.11 s, CS-SKB-IFS
Chi-Square Top-K-Based Incremental Feature Selection Model …
137
Table 2 Proposed model Feature Selection, Accuracy, F-Measure, and AUC assessment Input data
CS-SKB-IFS
Datasets
Features
Accuracy FMeasure
AUC
Features
Accuracy FMeasure
AUC
Sonar
060
73.17
74.98
80.93
11
84.27
91.93
81.10
Ionosphere
034
86.86
89.84
94.17
04
91.99
86.29
87.14
KC1
021
87.69
81.10
76.67
03
88.94
91.78
78.02
Page Blocks 010
96.00
95.74
66.80
03
96.10
94.42
61.13
Scene
91.35
94.39
94.89
13
92.29
91.59
97.94
87.01
87.21
82.69
91.02
91.20
83.06
299
Average
Table 3 Computational time assessment Input data Datasets Features Time(s) Sonar Ionosphere KC1 Page Blocks Scene
060 034 021 010 299
0.8118 0.7400 1.7619 2.8750 27.5499
CS-SKB-IFS Features
Time(s)
11 04 03 03 13
0.5804 0.5654 0.8796 1.4241 1.6327
Table 4 Results comparison with the existing works Input data
DFRS [18]
GWO+PSO [17]
CS-SKB-IFS
Datasets
Features
Accuracy Features
Accuracy Features
Accuracy Features
Accuracy
Sonar
060
73.17
26
85.58
014.65
85.50
11
85.78
Ionosphere
034
86.86
32
85.19
003.90
90.10
04
91.99
KC1
021
87.69
–
–
004.65
82.10
03
88.94
Page Blocks
010
96.00
–
–
002.30
95.50
03
96.10
Scene
299
91.35
–
–
101.00
91.90
13
92.29
Average
87.01
85.38
89.02
91.02
model consumed merely 1.064 s to perform the overall task. Considering overall performance, it can be stated that the proposed CS-SKB-IFS is better than the other existing approaches [17, 18] to ensure optimal performance in terms of minimum feature sets required while guaranteeing higher accuracy, AUC, F-Measure, and more importantly better time-efficiency.
138
S. Kamble et al.
Table 5 Computational time (seconds) analysis Input data GWO+PSO [17] Datasets Features Features Time Sonar Ionosphere KC1 Page Blocks Scene
060 034 021 010 299 Average
014.65 003.90 004.65 002.30 101.00
06.30 06.10 09.40 13.75 40.00 15.11
CS-SKB-IFS Features 11 04 03 03 13
Time 0.5804 0.5654 0.8796 1.4241 1.6327 1.064
5 Conclusions In this paper, a first of its kind Incremental Feature Selection model was developed for BigData analytics. To achieve it, this work contributed the key model, and it is robust Chi-Squared Select-K-Best Incremental Feature Selection (CS-SKB-IFS). Here, CSSKB-IFS focused on achieving the optimal set of significant features while retaining higher accuracy. In the proposed model, the Chi-squared model helped in estimating the top-k features; however, the use of a proposed objective function helped in retaining the optimal set of minimum features which could yield expected higher accuracy. To assess performance, CS-SKB-IFS Model was processed for two-class classification using Extra Tree Classifier. The statistical performance assessment over the different benchmark data set revealed that the proposed CS-SKB-IFS Model achieves significantly higher accuracy (91.02%), F-Measure (91.20%), and AUC (83.06%) than the state-of-art methods [17, 18] (Accuracy=87.01%, F-Measure=87.21%, AUC=82.69%). The strategic amalgamation of CS-SKB-IFS enabled it to exhibit better performance even with much reduced data and reduced the better computational time. Overall performance insists robustness of the proposed work to be used for BigData analytics, where it can yield better performance even at the reduced cost and computational exhaustion.
References 1. Chardonnens, T. (2013). Big data analytics on high velocity streams. Journal of Software Engineering Group, University of Fribourg, 50, 1–96. 2. Alshawish, R. A., Alfagih, S. A., & Musbah, M. S. (2016). Big data applications in smart cities. In IEEE International Conference on Engineering and Management Information Systems (pp. 1–7). IEEE. 3. Zhang, X., Mei, C. L., Chen, D. G., Yang, Y. Y., & Li, J. H. (2019). Active incremental feature selection using a fuzzy rough set based information entropy. IEEE Transactions on Fuzzy Systems, 28(5), 901–915. 4. Wang, C. Z., Huang, Y., Shao, M. W., Hu, Q. H., & Chen, D. G. (2019). Feature selection based on neighborhood self-information. IEEE Transactions on Cybernetics, 50(9), 4031–4042.
Chi-Square Top-K-Based Incremental Feature Selection Model …
139
5. Saeys, Y., Inza, I., & Larra ñaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, Oxford University Press, 23(19), 2507–2517. 6. Qian, W. B., Shu, W. H., & Zhang, C. S. (2016). Feature selection from the perspective of knowledge granulation in dynamic set-valued information system. Journal of Information Science and Engineering, 32(3), 783–798. 7. Jing, Y. G., Li, T. R., Huang, J. F., & Zhang, Y. Y. (2016). An incremental attribute reduction approach based on knowledge granularity under the attribute generalization. International Journal of Approximate Reasoning, 76, 80–95. 8. Javidi M. M., & Eskandari, S. (2018). Streamwise feature selection: A rough set method. International Journal of Machine Learning and Cybernetics, Elsevier, 9(4), 667–676 9. Liu, J. H., Lin, Y. J., Li, Y. W., Weng, W., & Wu, S, X. (2018). Online multi-label streaming feature selection based on neighborhood rough set. Journal of Pattern Recognition, 84, 273– 287. 10. Zhou, P., Hu, X. G., Li, P. P., & Wu, X. D. (2017). Online feature selection for high dimensional class imbalanced data. Journal of Knowledge Based Systems, 136, 187–199. 11. Jing, Y. G., Li, T. R., Huang, J. F., Chen, H. M., & Horng, S. J. (2017). A group incremental reduction algorithm with varying data values. International Journal of Intelligent Systems, 32(9), 900–925. 12. Chen, D. G., Yang, Y. Y., & Dong, Z. (2016). An incremental algorithm for attribute reduction with variable precision rough sets. Journal of Applied Soft Computing, 45, 129–149. 13. Yang, Y. Y., Chen, D. G., & Wang, H. (2016). Active sample selection based incremental algorithm for attribute reduction with rough sets. IEEE Transactions on Fuzzy Systems, 25(4), 825–838. 14. Wang, F., Liang, J. Y., & Dang, C. Y. (2013). Attribute reduction for dynamic data sets. Journal of Applied Soft Computing, 13(1), 676–689. 15. https://www.openml.org/search 16. Bahassine, S., Madani, A., Al-Serem, M., & Kissi, M. (2020). Feature selection using an improved chi-square for Arabic text classification. Journal of King Saud University Computer and Information Sciences, 32(2), 225–231. 17. El-Hasnony, I. M., Barakat, S. I., Elhoseny, M., & Mostafa, R. R. (2020). Improved feature selection model for big data analytics. IEEE Transactions on Knowledge and Data Engineering, 8, 66989–67004. 18. Kong, L., Qu, W., Yu, J., Zuo, H., Chen, G., Xiong, F., et al. (2019). Distributed feature selection for big data using fuzzy rough sets. IEEE Transactions on Fuzzy Systems, 28(5), 846–857.
M-Vahitaram: AI-Based Android Application for Automated Crowd Control Management in Bus Transport Service Prathamesh Jadhav, Sakshee Sawant, Jayesh Shadi, Trupti Sonawane, Nadir Charniya, and Anjali Yeole Abstract An automated crowd control system is a service that sends real-time crowd density data from inside the bus to a user’s handheld device near the bus stop. It is a cohesive solution when it comes to managing crowds without human intervention. Machine learning is used in the M-Vahitaram app to predict bus crowd density, and a cloud database is used to notify commuters within 200 m of the bus. The choice of whether or not to board the approaching bus can then be made. The suggested approach forecasts crowd density with a 96 percent accuracy. Equipping the commuters or the travelers with the details regarding the present or the current crowd density on a particular bus will benefit them to make educated decisions about which bus to take or whether to seek alternative transportation. As a consequence, there is neither traffic congestion nor unequal crowd distribution among buses, ensuring the most effective use of bus transit. Keywords Bus transport service management · Crowd density prediction · Optimization · Bus service efficiency · CNN · Machine learning · Cloud-based storage · Android application
P. Jadhav · S. Sawant (B) · J. Shadi · T. Sonawane · N. Charniya · A. Yeole VESIT, Mumbai University, Mumbai, India e-mail: [email protected] P. Jadhav e-mail: [email protected] J. Shadi e-mail: [email protected] T. Sonawane e-mail: [email protected] N. Charniya e-mail: [email protected] A. Yeole e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_12
141
142
P. Jadhav et al.
1 Introduction There are over a million passengers that commute by bus every day in the public transportation scenario. They have to deal with a variety of concerns, such as buses running late and buses that are filled. The bus’s crowd density is unfavorable. In several cases, unequal crowd density has resulted in serious health hazards for a large number of individuals. In the present COVID-19 pandemic scenario, the risk of becoming infected is increased in packed buses, hence a system to monitor crowd density inside the vehicle and convey the results to passengers is required. Passengers have the option of boarding or waiting dependent on the density of impending buses. There are now methods for determining crowd density, but no applications or systems exist to display the data or allow for bus and passenger interaction. Google Maps has a feature that asks travelers about their journey’s feedback. Although Users may also ask other users about the bus crowd issue, individuals are not always available to answer inquiries during peak hours. M-Vahitaram is an Android software that provides users with real-time bus crowd density updates at a specific bus stop. The conductor is given an application to update the crowd density of the specific bus, which can be done manually or by using real-time photos captured via CCTV present inside the bus. Following the conductor’s input, the server updates the bus information on the passenger’s device automatically. The conductor detects crowd density inside the bus-based picture and identifies it into five distinct levels, which are as follows. Vacant, Low density, Moderate density, High density, and Extremely high density. The structure of the paper is given in the following order Firstly there’s a literature survey, which tends to give an overview of relevant publications, and it is conferred in Sect. 2. The system which has been put forward or proposed is discussed in Sect. 3. The System’s Methodology which is employed is further discussed in Sect. 4. The result or the outcomes have been further elaborated in Sect. 5.
2 Literature Survey The overbearing crowd in the public transport is not viable for commuters. In [1], the author had proposed a system to determine the crowd management of the bus which consists of a Raspberry pi, a Camera, and an LCD display; it uses a CNN module to count the density of passengers in the bus. The camera takes video of the inside of the bus which is then broken into frames to find out the density of the crowd, and a result of this is put on view on the LCD displays at the nearest bus depot. In [2] with the help/need of an adaptable background proposed model, the researchers of this paper have created a process for counting the number of persons.
M-Vahitaram: AI-Based Android Application …
143
The authors of [3] had presented a method for calculating the number of people in a given area. The device employs an efficient and cost-reductive Raspberry Pi which will be used to track humans and a Haar Cascade which is also sometimes presented as OpenCV to catch a sight of how many people are in the room. Considering there are a very substantial number of people, this technique/approach will become highly inefficient since the period required to compute the number of persons using public transit such as bus lines increases, making it a very time-consuming task. The main insight of this paper is where the authors had proposed a method of crowd tallying as well as crowd evaluation using the method of CV (Computer Vision) [4] Their method employs both direct as well as indirect methods. The direct approach focuses more on the constructive learning methods and algorithms which mainly focus on discrete aspects that aid in crowd estimation and counting. However, we found that the indirect approaches take into account the known algorithms which are developing and learning as well as they relate to certain features that aid in crowd estimate and counting. The authors of this paper [5] have presented an image-based method. For the different head sizes, filters were used for the modeling of the density maps. The inputs were given in the form of photos and the sizes were different in each one of them. Android-based application for bus booking service is developed in [6] using login authentication making the system available for worldwide communication. In [7], the author had proposed an android application that tends to keep tabs on the position of the public bus line so that the user/consumer who is using could get the exact location of the bus on the map; here the conductor will update the location so this requires two android applications, and the database is created in SQL server. Authors of [8] have presented an idea of an Intelligent Examination System that makes the use of a random sample consensus method; the main focus is on determining the flow of the pack of the crowd, and then it classifies the individuals specifying into a different bunch of crowd. In this particular scenario, they tend to use the Lucas-Kandel method particularly to compute the flow of optical which is employing a ‘pyramid’. RANSAC is yet simple but has been an effective method for reducing outlier effects in image flow optical and estimating the flow of the mass commuters with the help of inliers. The system is also set to perform crowd group definition. This system is focused on crowd density and abnormality detection. The authors of [9] have proposed a multi-scale deep learning convolutional neural network model for crowd sum computing using a single image, and the network will be capable enough to produce features that are related enough to be able to compute a large crowd which will be presented in a single-based column structure using multiscale blob. The multiple scale network is not needed in this as the pretraining model is working as an end-to-end training process, but it is possible to use this in practical applications if the number of parameters are reduced while robust crowd counting performances. In [3], the authors propose a system to keep track of the count of the people in a scene to manage crowds. Here, OpenCV-Python is used to provide the count, and Raspberry Pi 3 consists of an ARMv8 CPU to detect the human heads. Human head detection is done by training a haar cascade classifier which works on
144
P. Jadhav et al.
the concept of the optical flow using which the heads are detected. The insights or the output of this system can be more efficient with an increased number of samples.
3 Proposed System Figure 1 depicts the fundamental working of the proposed system. The proposed system is specifically designed to take into account the real-time crowd density within the public transport bus line and then unveil the live information on the commuter’s device. The application is developed for commuters to get live information on the crowd density of buses from around 200 m from the nearest bus stop. The host login is done by the conductor of the particular bus and the location of the conductor which is at that time the location of the bus will be fetched continuously every 4 s Once the conductor is registered on to the server, they have to confirm their contact by logging in to the app. After they are logged in, the conductor can fetch the images from the device and check for the density which will automatically get updated onto the commuter’s device. The density of the crowd/commuters can be organized into the given categories as follows: ● ● ● ● ●
Vacant Low Moderate High Very high
Fig. 1 Conceptual diagram
M-Vahitaram: AI-Based Android Application …
145
The bus color is modified on the application UI based on the crowd density of the particular bus. The following are the colors assigned to each crowd density: ● ● ● ● ●
Green—Vacant Blue—Low Yellow—Moderate Orange—High Red—Very High
1. The proposed application consists of the following parts (Fig. 2): 2. Registration: The conductor must register in the app with their name, bus number, cell phone number, and administrator name, so that the administrator can check the conductor’s credentials and provide them access. 3. Locating: Fetching the coordinates is a vital aspect of the application that is being performed in order to obtain a valid bus position that can subsequently be shown on the commuter screen. 4. Login: Login activity is used to ensure that the conductor and admin are properly authenticated and verified 5. Re-new Bus number: Using pictures captured by CCTV cameras installed inside the buses, the CNN model is used to forecast crowd density. 6. Prediction: The CNN model is integrated into the app to predict the crowd density using the images taken by conductors. For which the app asks for the device’s permission for media. 7. Commuter display: The commuters page contains two key details; the data of incoming buses and the degree of crowd density existing within the bus, with colors representing different crowd densities.
4 Methodology 4.1 Database As a part of location tracking of a bus via conductor’s mobile phone, the Firebase database was used by our model using Google Maps API which is used to store live location coordinates of the buses. The real-time location and frames of crowd density would be stored and retrieved from the firebase database. The database would hold and keep track of each bus’s position coordinates, conductor information, and crowd density. It would also include the commuters’ position to calculate and alert passengers of buses approaching within 200 m.
146
P. Jadhav et al.
Fig. 2 Flowchart
4.2 M-Vahitaram Model Location Tracking and Notification The location tracking feature of the application is powered by a GPS module embedded in the conductor’s phone, which is used to track live bus locations. The GPS module, in conjunction with the broadcast receiver and GPS dependencies included inside the application, aids in the retrieval of the conductor’s location coordinates, which are coordinates presented in the form of latitude and longitude which are updated every 4 s to provide continuous live location. This longitude and latitude information is stored in Firebase via the Internet and later retrieved by the commuter page to display live bus locations (Fig. 3). The ‘distance between()’ function is used to measure the distance between buses that are within 200 m of the commuter’s position. This function computes the distance
M-Vahitaram: AI-Based Android Application …
147
Fig. 3 Location tracking methodology
between two geographic locations. If the calculated distance is less than 200 m, the 200 m check is set to true, and the system’s data is updated to show the bus within the 200-m radius of the commuter’s current position. If the range of operation is expanded, commuters will be able to see a greater area, which may cause confusion because many buses would appear on the screen that is irrelevant to commuters’ destinations. Hence, the circle’s radius has been set at 200 m to avoid any unwanted misunderstanding among passengers. CNN model for Crowd Density Prediction CNN was used to classify the density of crowd density in a bus. The algorithm was trained using a dataset that included videos taken from real-time CCTV cameras put inside buses. The videos were segmented into 6000 frames and categorized into five classes depending on crowd density.75% of the frames gathered were used for training the CNN model. We will gain more precision by expanding the dataset. The information of the model trained is followed and the layers in the CNN model may be seen in Fig. 4. The initial stage in our suggested method is to acquire real-time images of the density of the commuters inside the bus. The catch is then sent to the CNN model for crowd density prediction. A series of learnable filters are used to extract low-level features from the convolution layer which is known to be the first layer of our CNN model. Rectified Linear unit layer (ReLu) is added for rectification and increment of non-linearity in the data frames. To extract dominant features, the Max Pooling layer is added to the convolution layers. Before the fully connected layer receives the input, the output from the convolution layers is flattened out by the flattening layer for conversion into one dimension layer. The main purpose of the fully connected layer is to predict the classification by combining more attributes. The CNN model’s output would be saved in our cloud database and sent to users as needed.
148
P. Jadhav et al.
Fig. 4 Flowchart
4.3 Implementation Every bus conductor will have an application on their device with authority login credentials. Once they are logged on to the application as a conductor, the conductor location is updated in the firebase database along with the bus number boarded and the bus location coordinates. The location will be updated after every predetermined interval. Live data frames would be captured in the bus along with the conductor having the rights to capture crowd density via using a handset camera, both of which would get stored in our database. CNN model would be applied for crowd density prediction. Depending on the crowd density identified, the database would be updated in real-time, notifying the users on the app within 200 m proximity. The density of the crowd will be out of the five levels; vacant, low, moderate, high, and very high. For visual ease, the color of the bus symbol on the application map display will be changed depending on the crowd density identified.
4.4 Components Android Studio. Android studio is an open-source Linux-based operating system and Google’s official IDE for Android development. Android Studio is an Integrated Development Environment (IDE) that includes an editor for writing java code, debugging tools for resolving faults in code and offering ideas to help you work quicker, and an emulator for emulating an Android device on a computer. Android Studio is a Gradle-based method for compiling apps that is far more versatile than traditional compilation because it is done automatically. It also has support for Google Cloud Platform built-in. It makes it easy to integrate Google Cloud Messaging and App Engine. Firebase. Firebase is a Google platform that helps developers to create, manage, and scale their projects. It enables developers to create apps more quickly and securely. On the Firebase side, no programming is necessary, making it simple to take advantage of
M-Vahitaram: AI-Based Android Application …
149
its features. It works with Android, iOS, the web, and Unity. It gives you access to cloud storage. It employs NoSQL as a database for data storage. Real-time database, cloud firestore, user authentication, and remote configuration service are some of the features offered by firebase. A real-time database is a cloud-based NoSQL database that handles your data at millisecond speed; it easily and quickly exchanges the data to and from the database and has massive storage size potential to store user data. The data in the real-time database gets stored in JSON format, which means there is no barrier between data and objects.
5 Results and Analysis The entire application works on an Internet connection or a device with a minimum speed of more than 25Mbps, as well as a smartphone with a GPS module built-in. This technology informs travelers of the current bus crowd density. The conductor’s collected photographs classify the crowd density as levels with an accuracy of 96.53%. People who rely on public transportation for their daily necessities will benefit as a result of such a system. Below are the screenshots of the application (Figs. 5, 6, 7, 8, 9, 10, 11, 12, and 13). Fig. 5 Splash/Home screen of M-Vahitram application
150
P. Jadhav et al.
Fig. 6 Commuter page with bus markers denoting crowd density of upcoming bus
Fig. 7 Administration page
5.1 Analysis Overcrowding was a common occurrence in the current system, resulting in multiple accidents. The M-Vahitram System guarantees that the population is distributed efficiently. Commuters would not be notified of the forthcoming bus’s arrival under the current arrangement. Commuters using M-Vahitram, on the other hand, are fully
M-Vahitaram: AI-Based Android Application … Fig. 8 Conductor verification page
Fig. 9 Re-new bus no
151
152 Fig. 10 Administrator verification
Fig. 11 Conductor’s manual density update option
P. Jadhav et al.
M-Vahitaram: AI-Based Android Application …
153
Fig. 12 Automatic density prediction by pictures captured from CCTV footage
Fig. 13 Bus location, crowd density, and conductor data stored in Firebase
informed of the next bus as well as the crowd density. There was no easy way to retrieve real-time bus positions in the previous system. M-Vahitram allows users to track the whereabouts of buses in real-time. Commuters in the current system would not know how crowded the bus was until they boarded. Users of M-Vahitram may learn about the population density inside the bus without having to physically be there, simply with the tap of a finger.
154
P. Jadhav et al.
6 Conclusion The system allows for better crowd management among buses, preventing difficulties caused by congestion during peak hours. The application relies on the Internet connection for accurate and real-time authentication, location updates, and crowd density updates. To acquire the bus’s current location, the conductor’s location is updated every four seconds. And using real-time cloud-hosted databases like firebase, data can be saved and synced, thus providing users with the newest info. Using an interactive user interface to represent crowd density inside an approaching bus allows passengers to make informed decisions about whether to board the next bus or the next bus in the route. Since the system does not require additional manpower, it can be used to limit the spread of COVID-19, by limiting the number of individuals on a bus.
References 1. Meghana, A. V., et al. (2020). Automated crowd management in bus transport service. In 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC). IEEE. 2. Singh, C., & Sohani, M. (2017). Estimation of crowd density by counting objects. In 2017 International Conference on Trends in Electronics and Informatics (ICEI). IEEE. 3. Abbas, S. S. A., et al. (2017). Crowd detection and management using cascade classifier on ARMv8 and OpenCV-Python. In 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS). IEEE. 4. Saleh, S. A. M., Suandi, S. A., & Ibrahim, H. (2015). Recent survey on crowd density estimation and counting for visual surveillance. Engineering Applications of Artificial Intelligence, 41, 103–114. 5. Zhang, Y., et al. (2016). Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6. Duraisamy, S., & Abuhuraira, J. U. (2018). Android mobile application for online bus booking system. International Journal of Information System and Engineering, 6, 34–56. 7. Sarthak, T., Ajinkya, S., Pratik, T., & Rahul, R. (2021). A mobile application for bus tracking systems. In AICTE Sponsored National Conference on Smart Systems and Technologies. IJIRT. 8. Guo, J., et al. (2009). An intelligent surveillance system based on RANSAC algorithm. In 2009 International Conference on Mechatronics and Automation. IEEE. 9. Zeng, L., et al. (2017). Multi-scale convolutional neural networks for crowd counting. In 2017 IEEE International Conference on Image Processing (ICIP). IEEE.
Automatic Enhancement of Deep Neural Networks for Diagnosis of COVID-19 Cases with X-ray Images Using MLOps Avik Kundu and Saurabh Bilgaiyan
Abstract The novel coronavirus (COVID-2019) pandemic has caused the devastating effect on public health and global economy. It has infected about 231 million individuals globally, and approximately 4.7 million have died as a result. Recent findings suggest that X-ray imaging techniques can provide salient information about the COVID-19 virus. Since then, many deep learning models have been developed and open-sourced. During the development of the deep learning models, several hyperparameters need to be tuned. In this paper, the authors have proposed a method through which the process of hyper-parameter tuning of the deep learning models can be automated using the concept of MLOps. The collection of procedures aimed at maintaining deep learning models in a reliable and efficient manner is termed as MLOps. The proposed approach helped in achieving an improved accuracy of 97.03%, in less time without human interface. This will eliminate the requirement of trained personnel during the model re-training stage as the system has the facility to retrain itself continuously, till the permissible accuracy is achieved. Keywords Coronavirus (COVID-19) · DevOps automation tools · Chest X-ray images · Hyperparameter tuning · Deep learning
1 Introduction The COVID-19 pandemic has put an immense burden on healthcare systems across the globe. Coronavirus responsible for severe acute respiratory syndrome (SARSCoV-2) is responsible for this disease. In many countries, the healthcare systems have already been overwhelmed due to the limited availability of kits for diagnosis, limited hospital beds and ventilators for admission of such patients [1]. Effective screening of infected patients is the most important step. Infected persons should be isolated and diagnosed properly to stop the further spread [2]. For clinical detection, A. Kundu (B) · S. Bilgaiyan School of Computer Engineering, KIIT University, Bhubaneswar, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_13
155
156
A. Kundu and S. Bilgaiyan
reverse transcriptase-polymerase chain reaction (RT-PCR) testing is commonly used [3]. As an alternative, radiography examinations, radiologists use chest radiography or computed tomography imaging to detect some visual signs [2]. Many deep learning models for the detection of COVID-19 with radiology images, giving highly accurate predictions have been proposed. But, during the process of developing such deep learning models, there are many hyper-parameters (e.g., batch size, number of hidden units, optimizer functions, etc.), which have to be tuned to get the accurate metrics. This becomes a very exhausting process. In this study, a different and efficient approach of automating the hyper-parameter tuning process is proposed with the use of DevOps tools for automating the workflow. Several hyper-parameter tuning techniques have been developed in recent years. Grid Search, Random Search are some of the methods. In Grid Search, all single value combinations are explored to yield good accuracy. But in the long run, it is very inefficient. Random Search selects combinations randomly from the grid and has proved to give better performance compared to Grid Search. Bayesian Optimization, a more advanced technique that carries out guided search for achieving the perfect combination, has shown to yield better results. The paper has been arranged in the following sections: Sect. 2 briefs about the various technologies used during the research, Sect. 3 explains the proposed work of the authors, Sect. 4 describes the problem in detail along with Information about the benchmark dataset. Section 5 emphasizes the implementation. Section 6 shows the overall performance evaluation, Sect. 7 shows a comparative analysis and related works. Section 8 elaborates the possible potential threats to the model. The conclusion and future work are described in Sect. 9. The objective of our work is to achieve reasonably acceptable accuracy and other metrics without investing the time of skilled labor for the job, which would, in turn, reduce the cost of implementation.
2 Technologies Used The process of integrating the tools and concepts of DevOps for solving the problems faced while training machine learning Models, by automatic adjustment of the hyper-parameters leading to increased accuracy, is the fundamental concept behind ML-Ops. The most important factor from DevOps, i.e. a focus on Continuous Integration/Continuous Delivery (CI/CD) is applied directly to model generation, while regular deployment, diagnostics and further training can also be done on a frequent process, rather than waiting for one large upload at much slower intervals. In the proposed work, the authors have used some important open-source DevOps tools like Docker and Jenkins for creating the CI Pipeline.
Automatic Enhancement of Deep Neural Networks …
157
Fig. 1 The initial model diagram
Fig. 2 Datasets of X-ray images for (first row) normal cases and (second row) COVID-19 cases [7]
3 Proposed Work In this paper, the authors have developed an automated hyper-parameter tuning pipeline with DevOps automation tools. The authors aim to increase the metrics of the deep learning model with the help of the proposed pipeline. Due to shortage of proper X-ray samples, neural networks become difficult to train. Therefore to compensate this, the authors are using a model trained previously on datasets containing information about various chest-related infections. The pre-trained model of CheXNet [4], trained on ChestX-ray14 is used. Figure 1 portrays the initial model diagram. The input images are resized to 224 × 224 pixels to match with the specification of the DenseNet 121 architecture. The DenseNet is comprised of various Dense, Max-Pooling and Soft-max layers. The model successfully performs a Binary classification between an X-ray of a normal and COVID-19 infected person. In this study, the public dataset of radiology images Fig. 2 was used, which was provided by Dr. Adrian Rosebrock [5] and Dr. Joseph Cohen [6]. Researchers from all around the world frequently update the dataset with the latest images. Pattern of ground-glass opacification with occasional consolidation in the peripheral, patchy and bilateral areas are seen in the X-rays of COVID-19 positive patients. The dataset was balanced, and it included 200 samples each of COVID and Normal Chest Xray scans. The samples were equally divided into training and testing dataset. For the experimental setting, all photos were scaled to 224 224 pixels in size [7].
158
A. Kundu and S. Bilgaiyan
The authors focus on using Docker containers to build and run the Machine Learning Models. Different custom Docker containers will be built using the DockerFile to support different architectures of the ML models. Through Jenkins, multiple jobs would be created as follows: ● Step 1. When certain developers publish a repository to GitHub, the repository is automatically pulled. ● Step 2. Jenkins will automatically launch the necessary image containers to deploy code and begin training based on the code or program file. ● Step 3. Training the model and predicting the accuracy or metrics. ● Step 4. If the Metrics Accuracy is less than 90%, then tweaking the machine learning model architecture. ● Step 5. Retraining the model and notifying that the best model is being created. The proposed approach is depicted as a flowchart in Fig. 3. When the code of a model having low accuracy is downloaded from GitHub, the first step involves classifying the model based on the architecture. This is done by identifying specific keywords in the code-base that helps to identify whether
Fig. 3 Flowchart of the proposed approach
Automatic Enhancement of Deep Neural Networks …
159
the model is a convolution neural network, artificial neural network or a regression model. After the classification is completed, the code is copied to the respective containers, which are pre-loaded with all the packages and dependencies to run the model. After copying the codes, the containers are run that internally start the process of training the model by executing the code being copied inside the container. After the training process gets completed, the accuracy and other metrics get generated based on the performance of the model. The accuracy is then read and matched with the desired accuracy that is fixed by the developer. If the generated accuracy is less than the desired accuracy, the hyper-parameters used to develop the model are altered according to set guidelines provided by the developer. This process is done by automatically editing the code-base. After this process, the model gets retrained to check the metrics and accuracy. If the newly generated accuracy matches with the desired accuracy, an email confirmation is sent to the developer stating that the model has been trained to reach the accuracy as desired. If the accuracy is not yet met, then the hyper-parameter tuning process continues following the next set of guidelines provided.
4 Method 4.1 Problem Formulation In this paper, the authors have created the initial deep learning model, using pretrained model, CheXNet [4], which has Dense Convolutional Network (DenseNet) [8] backbone. Deep learning models have several hyper-parameters that need to be tuned to get the desired accuracy of the model. The authors have automated the tuning process by creating a Continuous Integration (CI) pipeline with the help of Open-source tools like Docker, Jenkins and Git. By implementing this, a higher accuracy for the model was achieved without human interface during the re-training phase of the deep learning model.
4.2 Training the Model The authors trained the model using pre-trained weights from the CheXNet implementation by Weng et al. [9], then the following process is performed: The backbone weights of the DenseNet are frozen such that only the last fully connected layer gets trained. The training is conducted using the optimizer Adam and learning rate 10-4 . The authors have used batches of size 32, and the model was trained for about 20 epochs [1]. Early-Stopping and Model-Checkpoint are implemented in
160
A. Kundu and S. Bilgaiyan
the code. The model having the lowest validation loss gets saved. In the model, the possible hyper-parameter tunings that could be done are: ● ● ● ●
Adjusting the number of FC layers Adjusting the learning rate Choosing an optimizer and a loss function Deciding on the batch size and number of epochs
5 Implementation of the Proposed Pipeline 5.1 Setting up the Docker Containers Docker containers are created for serving the ML model. The tensorflow/tensorflow image from DockerHub is used for the same. After installing the image, it is needed to be modified so that it can run the CNN & ANN models in the container.
5.2 Building the Jenkins Pipeline Step 1: Automatic Code Download. Using the Jenkins Web GUI, a Freestyle Job is created that performs the action of downloading the code into the local system, when a user adds a new model in the connected GitHub account. This job will be automatically triggered via GitHub Web-hooks. Step 2: Classifying Models Based on Their Architecture. Once the code has been downloaded to the local system, this job would classify the model according to the architecture of the model and add it to the respective folder, which is attached as the volume of the respective Docker Container. This job is made as a Downstream Job of Job-1, which means it would automatically run when Job-1 is successfully built. Step 3: Training the Model. Once the codes are saved in the Docker container environment, the model is executed inside the container to train it initially. This job is made as a Downstream Job of Job-2, which means it would automatically run when Job-2 is successfully built. The main role of this job is to execute the file to start training the model, inside the Docker containers. Thus, several models can be trained simultaneously in different environments. ● By the end of this job, codes have downloaded, classified and trained the model. The initial accuracy of the model after training has also been found out. ● Now, if the accuracy obtained is not more than 97%, hyper-parameter tuning has to be performed. This would start the Step 4. ● Otherwise, a mail would be sent to the user stating the desired accuracy has been reached. This would be done by Step 5.
Automatic Enhancement of Deep Neural Networks …
161
Fig. 4 Algorithm: Re-training the model to improve accuracy
Step 4: Retraining the Model to Increase the Accuracy. Suppose after training the model, it is found out that the accuracy is below the desired amount. Thus, adjusting the hyper-parameters for increasing the accuracy of the models would be done. This is where DevOps steps in. With the help of Continuous Integration Pipeline (CI Pipeline), one can automate the process of hyper-parameter tuning. Thus, the work that would require a lot of days if done manually can be finished within a few hours without much human intervention. From the similar models published earlier, it was seen that adding extra hidden layers, resulted in the maximum increase of all the model metrics, followed by changing the Optimizers and Batch Size. For this paper, it has been considered to add Dense layers for increasing the accuracy. Following the same, other hyperparameters would be taken into account. Figure 4 displays the algorithm that was used for re-training the model for improving accuracy. Step 5: Notifying that the Best Model Is Being Created. Once the model achieves the desired accuracy, this job gets fired, which sends an e-mail to the Developer, informing about the metrics achieved by the model. Thus, without any human interface, one is able to tune the hyper-parameters automatically, saving human effort. This helps in faster implementation of the models.
6 Overall Performance Evaluation Experiments were performed for the identification of COVID-19 confirmed cases using X-ray images. In order to train the model, we used two types of X-ray images: non-COVID and COVID-19. It was found that the initial model gave an accuracy of 77.03%. Then the model was uploaded in the CI Pipeline for the automatic tuning of hyper-parameters. The re-training process continued until an accuracy of 97.03% was achieved by the model. The reports of the Automation showed that the number of hidden layers was increased by 6, i.e. 6 dense layers were added.
162
A. Kundu and S. Bilgaiyan
Table 1 After re-training, the suggested deep learning model’s classification performance Patient Status Precision Recall F1-score COVID-19 Normal
1.00 0.96
0.94 1.00
0.97 0.98
Fig. 5 Training loss and accuracy evaluation of the model
For performance and metrics evaluation of the final model, 80% of X-ray images including non-COVID and COVID-19 cases are randomly chosen for training from the dataset. The classification report of final model is presented in Table 1. Figure 5 plots the performance evaluation of the deep learning model with binary cross-entropy loss and accuracy graphically. The resulting confusion matrix of the deep learning model is depicted in Fig. 6. Furthermore, the corresponding graph of the Receiver Operating Characteristics (ROC) curves is depicted in Fig. 7 [7].
7 Comparative Analysis with Related Works Several deep learning models are being widely developed in medical imaging fields in the recent times, especially for the detection of COVID-19 confirmed cases. Table 2 mentions all the recent related works in this field. It is observed that Deep CNN ResNet-5028-based approach produces slightly higher ( 0.97%) detection than the proposed approach. But the traditional process of training models and tuning the hyper-parameters eventually leads to delay in the production of an efficient model. The approach described by the authors will help in reducing the time of training the model by automating the process of hyper-parameter tuning.
Automatic Enhancement of Deep Neural Networks …
163
Fig. 6 Confusion matrix
Fig. 7 ROC curve
8 Possible Potential Threats Deep learning models, developed in the recent years, are being trained with high computation power and large available datasets, which enhances their capability of predicting accurate results. They are being extensively used to assist human beings in crucial operations to enhance the success rate. But still they have not reached
164
A. Kundu and S. Bilgaiyan
Table 2 Comparison of the proposed framework with previous deep learning methods Medical image type Method Accuracy % Chest X-ray Chest X-ray Chest X-ray Chest X-ray Chest CT Chest CT Chest CT Chest CT Chest X-ray Chest X-ray Chest X-ray Chest X-ray
COVID-Net ResNet50+ SVM Deep CNN ResNet-50 VGG-19 COVIDX-Net DRE-Net M-Inception UNet+3D Deep Network ResNet DarkCovidNet Faster R-CNN (Proposed Work)
92.40 95.38 98.00 93.48 90.00 86.00 82.90 90.80 86.70 87.02 97.36 97.03
the state where they can be authorized to take decisions on their own. Humans and machines must work together to achieve better results. Therefore, in spite of training the proposed model with good amount of data and receiving an appropriate accuracy, precision, F-measure and recall, there are chances where the model can give false positives, which will be very negligible. But after understanding the severity of COVID-19 virus, even the negligible ratio can lead to the propagation and transmission of the virus. Therefore, the authors would like to specify that results obtained from the deep learning models, must be verified and cross-checked with doctors. Thus, the authors believe that humans and machines working together can fight any pandemic in the future.
9 Conclusion and Future Work Despite the safeguards adopted by the public and government restrictions, COVID19 illness is still spreading. In this paper, the authors have proposed a new approach of tuning the hyper-parameters. First, they have created an initial deep learning model for classifying X-ray images. Then, they have demonstrated how the metrics of the model could be increased without manually changing the hyper-parameters, by integrating DevOps automation tools with the machine learning model. The results of the final proposed model show a drastic increase in accuracy, without any human interface during the hyper-parameter tuning process. The model reached a sufficient accuracy in less time when compared to the present manual process. Limited size of publicly available dataset of COVID-19 radiological images is one of the current limitations. Furthermore, the metrics of the model can be improved by considering a wide range of hyper-parameters. The algorithm can be optimized
Automatic Enhancement of Deep Neural Networks …
165
further by arranging the hyper-parameters to be tuned based on their contribution towards the increase of the overall metrics of the machine learning model. The speed of the model tuning process can be improved by implementing distributed clustering using Jenkins with the help of IP tunneling. The approach discussed in this paper can be extended to similar types of machine learning models in the future [10].
References 1. Mangal, A., Kalia, S., Rajgopal, H., Rangarajan, K., Namboodiri, V., Banerjee, S., & Arora, C. (2020). Covidaid: Covid-19 detection using chest x-ray. arXiv preprint retrieved from arXiv:2004.09803 2. Wang, L., & Wong, A. (2020). Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images. arXiv preprint retrieved from arXiv:2003.09871 3. Wang, W., Xu, Y., Gao, R., Lu, R., Han, K., Wu, G., & Tan, W. (2020). Detection of SARSCOV-2 in different types of clinical specimens. Jama, 323(18), 1843–1844. 4. Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., et al. (2017). Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint retrieved from arXiv:1711.05225 5. Rosebrock, A. (2020). Detecting covid-19 in x-ray images with keras, tensorflow, and deep learning. https://www.pyimagesearch.com/2020/03/16/detecting-covid-19-in-xrayimages-with-keras-tensorflow-and-deep-learning 6. Cohen, J.P., Morrison, P., & Dao, L. (2020). Covid-19 image data collection. arXiv preprint retrieved from arXiv:2003.11597, https://github.com/ieee8023/covid-chestxray-dataset 7. Hemdan, E. E. D., Shouman, M. A., & Karar, M. E. (2020). Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images. arXiv preprint retrieved from arXiv:2003.11055 8. Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4700–4708). 9. Weng, X., Zhuang, N., Tian, J., & Liu, Y. (2017). Chexnet for classification and localization of thoracic diseases. https://github.com/arnoweng/CheXNet/ 10. Kundu, A., Mishra, C., & Bilgaiyan, S. (2021). Covid-segnet: Diagnosis of covid-19 cases on radiological images using mask r-cnn. In 2021 Seventh International Conference on Bio Signals, Images, and Instrumentation (ICBSII) (pp. 1–5). IEEE
Big Data Disease Prediction System Using Vanilla LSTM: A Deep Learning Breakthrough Natasha Sharma and Priya
Abstract An intelligent disease prediction system using Vanilla LSTM (Long Short Term Memory) is proposed in the paper. The proposed algorithms were tested only on small data due to limited access to data. Later on, a huge dataset of disease prediction was collected from various research institutes and the existing algorithms were not able to bring predictions after training. So, the prediction system with the help of deep learning was designed and vanilla LSTM was used for classification. The algorithm showed significant improvement over the existing algorithm not only in terms of accuracy which is 98.67%, but also in terms of computation complexity. Keywords Deep learning · LSTM · Prediction system · Data mining · Classification algorithm
1 Introduction Mining of Data determines the technique to fetch information from huge data series. The mining brings information through methods like association, classification, etc. A wide significance is held by data mining in social insurance organization, where ideal treatment strategy is described, alignment hazards foreseen, patient consideration and productive costs are evaluated. Quick clinical identification becomes essential for persons who are more prone to disease development. While searching for probability for growth and reversal of diseases are explored side by side. In the initial phase, 40% of the cancer diseases and 80% of heart and diabetes-related disease cases can be preserved defectively [1]. Out of the total expenditure spent on patients with these five diseases related to heart, kidney, diabetes, renal, and lung, N. Sharma (B) · Priya School of Computer Science Engineering and Technology, Himachal Pradesh Technical University, Hamirpur, India e-mail: [email protected] Priya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_14
167
168
N. Sharma and Priya
76% was made on Medicare. It was only for 17% of population required in Medicare in 2012 [2]. For predicting and diagnosing diseases, the classification algorithm Support Vector Machine is implemented on different datasets for the diagnosis of different diseases [3]. Machine learning and data science have grown quite fast these days, covering every technological area. The focus is on breaking this paradigm by diving into the data mining field and creating a system that accurately diagnoses a person according to the symptoms. Many researches have already been proposed in the area, but got restricted due to missing big data and computing power. Majority of the research works focus on driving solutions from machine learning techniques that are tested on the open-source data available on UCI repository. The data was collected from various research institutes in real-time that were capable of validating all records from the doctors. The total records covered and considered in this research are of around 1 million patients. Deep learning technique has been used for evaluating the algorithms.
2 Dataset The Cleveland Heart Disease database with medical features (factors) was divided into two groups: training and testing. The attribute “Diagnosis” was shown to be the most predictive, with a value of “1” for patients with heart disease and “0” for those without. There are four databases in the heart-disease directory that deal with heart disease diagnostics. All of the qualities have a numerical value. The information was gathered at the following four locations: 1. Cleveland Clinic Foundation (cleveland.data) 2. Budapest’s Hungarian Institute of Cardiology (hungarian.data) 3. The V.A. Long Beach Medical Center is in Long Beach, California (long-beachva.data). 4. Zurich University Hospital is located in Zurich, Switzerland (switzerland.data). The instance format is the same for all databases. While there are 76 databases, only 14 of the raw properties are actually used. Then each database has the option of making two copies: one with all attributes and the other without; and one with the 14 qualities that have been used in previous studies. Although the database has 76 properties, all reported experiments are included to make use of a subset of 14 of them. The option of creating two copies of each database: one with all attributes and one with the 14 attributes actually used in previous experiments. The Cleveland database, in particular, is the only one that has been used by machine learning researchers yet. The “goal” field indicates whether or not the patient has heart disease. It has a value of 0 (no presence) to 4 (present). The experiments on the Cleveland database have primarily attempted to distinguish between presence (values of 1,2,3,4) and absence (values of 1,2,3,4). (Value 0).
Big Data Disease Prediction System …
169
3 Literature Survey [4] created a system that aided in coronary heart sickness diagnosis. [5] used neurofuzzy for predicting heart-related illnesses. [6] used neural networks for coronary heart sickness prediction. [7] used a multi-classification technique. [8] used category technique via clustering of Naïve Bayes. [9] exercised ECG and myocardial scintigraphy for heart disease prediction. [10] used Naïve Bayes for creating a prediction device based totally on heart sickness. [11] used multi-layer algorithm for predicting coronary heart sicknesses. [12] used the feature of Framingham for predicting CHD. [13] advanced a data mining machine that performed affiliation analysis and apriori set of rules for assessing coronary heart-related threat factors. [14] provided simulation effects for evaluating patients with coronary heart sicknesses, stop-level renal ailment, and congestive heart failure in Slovenia. [4] proposed a hybrid technique wherein Genetic Algorithms (gasoline) and assisted Vector Machines (SVMs), gadget mastering algorithms were implemented collectively. [4] proposed KFFS (Kernel F-rating function selection) device for a specific type of medical dataset. [15] carried out a method based on data preprocessing using principal component analysis (PCA) and then implemented a differential evolution classifier for diagnosing heart illness. [16] created a classifier ensemble construction with rotation forest (RF) for enhancement in system to study accuracy for set of rules. [2] proposed top-quality selection route Finder (ODPF) to minimize money and time spent on diagnostic trying out of diseases. [17] proposed a method for pcaided coronary heart ailment analysis gadget CDSS based totally on the weighted fuzzy rule. [2] proposed a novel approach to enhance the performance of propagation neural community algorithms. [18] delivered quantity of timber dynamic willpower (an automatic diagnosis of diseases) in a random forest’s set of rules. [19] proposed PSO gadget for CHD prognosis. [20] proposed cumulative effect of genetic vulnerability variants in kind 2 diabetes on CHD chance. [1] applied the J48 method in heart ailment prediction device with Weka 3.6.4. [21] implemented Neural community technique in Weka statistics mining device. [22] used Weka 3.6.6, i.e., for evaluation of heart disease prediction device. [24] used SAS-based software nine.1 three. [23] used the C4.5 set of rules for healthcare decision aid system. [25] utilized BNN coronary heart sickness prediction gadget. [26] offered a better method for predicting hazard levels from the coronary heart sickness database. [27] performed an examination of the coronary heart sickness database on coronary heart attack risk stage prediction. [28] proposed a way to hit upon and pick out coronary artery ailment at the early degree by developing a professional gadget. [29] supplied an evaluation among BPSO and GA techniques in feature selection fashions for figuring out the type and severity of coronary artery disorder. [7] proposed a novel DSS for diagnosing CAD. [30] defined a coronary heart ailment prediction device through the usage of device mastering and facts mining method for category trees hold restricted accuracy. [31] proposed a selection-making model to obtain accurate heart disorder function diagnoses inside the emergency patient room. [32] proposed a singular method by way of combining CBR and facts mining to diagnose chronic diseases. [7] carried
170
N. Sharma and Priya
out ANNRST, k-NN, and CMCF imputation techniques to discover coronary heart ailment out of which ANNRST turned into the best approach. [33–35] proposed a version of hybrid forward selection for diagnosing cardiovascular disease. [3] carried out a record mining method to efficaciously examine facts using combined class nice (CCQ) measures.
4 Proposed Vanilla LSTM The LSTM system implemented in the literature was first represented by Graves and Schmidhuber [36]. The setup is called Vanilla LSTM and is used for referencing the comparison of all variants. Changes made by Gers and Schmidhuber [7] and Gers et al. [37] in the original LSTM [38] were included in the Vanilla LSTM. The revised version of the setup makes use of the full gradient training. Major LSTM changes are described in Section III. Let N be the quantity of LSTM blocks, I as the range of inputs, and Yt the input vector at time t. Accordingly, the weights for the LSTM layer will be as follows: Input weights: Gz , Gi , Gf , Go ∈ H N ×I Recurrent weights: H z , H i , H f , H o ∈ H N ×I Peephole weights: l i , l f , l o ∈ H N Bias weights: mz , mi , mf , mo ∈ H N Forward pass formula of Vanilla LSTM layer is written as H N = G z K t + Hz X t−1 + m z Y t = g(Y ' t) P t = G i K t + Hi X t−1 + L i ⦿ C t−1 + m i p t ∂(P t ) S t = G f K t + H f X t−1 + L f ⦿ C t−1 + m f S t = ∂(S t ) C t = Y t ⦿ P t + C t−1 ⦿ S t Q t = G o K t + Ho X t−1 + L o ⦿ C t−1 + m o Q t = ∂(Q t )
Big Data Disease Prediction System …
171
X t = h(C t ) ⦿ Q t Backpropagation throughout Time The deltas placed internal the block of LSTM is deliberate as: δy t = Δt + H ZT δy t+1 + HiT δi t+1 + H Tf δ f t+1 + HoT δot+1 δo'
= δx t ⦿ h(ct ) ⦿ σ (ot )
t
δct = δx t ⦿ ot ⦿ h ' (ct) + L o ⦿ δo' + L f ⦿ δf ' δf '
t+1
t
+ L i ⦿ δi '
t+1
+ δct+1 ⦿ f t+1
= δct ⦿ ct−1 ⦿ σ ' ( f ' t )
t
δi '
t
= δct ⦿ y t ⦿ σ ' (i ' t )
δy '
t
= δct ⦿ i t ⦿ g ' (y ' t )
In this equation, Δt is the deltas’ vector that crosses the layer above and goes down. E brings the deprivation function, it corresponds to ∂∂ xEt , excluding the recurrent needs. Input becomes essential for the delta only when there’s a layer below and it requires training. This can be computed in the following way δk t = G Ty δy '
t
+ G iT δi '
t
+ G Tf δ f '
t
+ G oT δo'
t
The weight { gradients can } then be calculated as given below in which & is a symbol for any of y ' , i ' , f ' , o' . (&1, &2) depict the external product of two vectors. δG = δ Pi =
ΣT ( ) δt , K t t=0
T −1 Σ
ct ⦿ i '
t+1
t=0
ΣT −1 ( ) δ t+1 , x t δh = t=0
δ Pf =
T −1 Σ t=0
ct ⦿ f '
t+1
172
N. Sharma and Priya
δm =
δL o =
ΣT ( ) δt t=0
T Σ
c t ⦿ o'
t
t=0
The above proposed vanilla LSTM is predominantly designed for Seq2seq modeling. Further, Softmax and classification layer was applied to make it adaptive for the classification purpose. The application of Vanilla LSTM and its performance are described in the following section.
5 Result and Discussion For the experimental results, real-time data collected was concentrated from various research institutes. There was a total of 41,000 subjects and their outcomes being considered. The results of the data set are summarized in Fig. 1 over the training data set. Vanilla LSTM shows 99% performance accuracy upon the training dataset when the number of hidden nodes is 9000, corresponding to 21.2 million parameters. The sigmoid network considers only five million parameters of weight. Increasing the model size of the sigmoid network leads to no noticeable improvement in performance and will only be responsible for increasing the complexity of the model itself. Confusion matrix result analysis is presented upon the trained and validation dataset. Of the Vanilla scheme, the inclination corresponding to positive diagnostic is more, i.e., 98.9% for the correct identification. While in case of negative diagnostic, it shows 99.8% correct identification which is really important in case of a diagnostic system. Our future research will focus on further improvement in the disease prediction system field. Further, other machine learning algorithms on the same dataset were applied and machine learning performance was found to go seamlessly down as the data set was increased. It clearly shows that the extension of machine learning is not possible. As the data gets increased, the complexity of machine learning algorithms also increases tremendously (Tables 1 and 2).
Big Data Disease Prediction System …
173
Fig. 1 Showing the loss and accuracy training convergence of the model
Table 1 The comparative analysis of Vanilla LSTM and other machine learning algorithms is presented in the table over the same data set
Table 2 Classification report of Vanilla LSTM algorithm
Algorithms
Training Acc. (%)
Testing Acc. (%)
SVM
67
54
RF
79
69
DT
71
68
VLSTM
98
99
Accuracy
Precision
Recall
0.98%
0.99%
0.98%
6 Conclusion Neural network architecture was applied on a handheld manual. Big data collected was from kidney disease patients and the type of network was Paisley, specially designed for big data handling. Sequence-to-sequence classification did not suffer from conceptual problems of the sequence-to-sequence limited application of LSTM. There was a conceptual problem of standard rate in neural network training though it was more robust toward the data sensitivity. It was also found that the network can be used in combination with the existing architecture of a neural network and the training can be speed up by improving its architectural layout. Vanilla LSTM shows a significant improvement in the training time. Further, improvement in the
174
N. Sharma and Priya
panel estimate will be made and its application in the medical data set that has been processed by LSTM, summarized by sequence modeling, and classified using the proposed Vanilla extract, will be found.
References 1. Taneja, A. (2013). heart disease prediction system using data mining techniques. Oriental Scientific Publishing Co. 2. Akin Ozcift, & Arif Gulten. (2011). Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Journal of Computer Methods and Programs in Biomedicine, 104, 443–451. 3. Andrew Kusiak, & Christopher, A. Caldarone et al. (2006, January). Hypo plastic left heart syndrome knowledge discovery with a data mining approach. Journal of Computers in Biology and Medicine, 36(1), 21–40. 4. Kunc, S. Drinovec, Rucigaj, & Mrhar, A. (2010). Simulation analysis of coronary heart disease, congestive heart failure and end-stage renal disease economic burden. Mathematics and computers in simulation. 5. Swati Shilaskar et al. (2013). Feature selection for medical diagnosis: Evaluation for cardiovascular diseases. Journal of Expert System with Application, 40, 4146–4153. 6. Bahadur Patel, Ashish Kumar Sen, D, P, & Shamsher Shukla. (2013, September). A data mining technique for prediction of coronary heart disease using Neuro-fuzzy integrated approach two level. International Journal of Engineering and Computer Science. ISSN: 2319–7242, pp. 2663–2671, Vol. 2, Iss. 9. 7. Ismail Babaoglu, & Omer Kaan Baykan et al. (2009). Assessment of exercise stress testing with artificial neural network in determining coronary artery disease and predicting lesion localization. Journal of Expert System with Applications, 36, 2562–2566. 8. Usha Rani, K. (2011). Analysis of heart diseases dataset using neural network approach. International Journal of Data Mining & Knowledge Management Process 9. Anbarasi, M., Anupriya, E., & Iyenga, N. C. H. S. N. (2010). Enhanced prediction of heart disease with feature subset selection using genetic algorithm. International Journal of Engineering Science and Technology, 2(10), 5370–5376. 10. Matjaz’ Kukar. (1999). Analysing and improving the diagnosis of ischemic heart disease with machine learning, Elsevier. 11. Aditya Sundar, N., Pushpa Latha, P., & Rama Chandra, M. (2012, May-June). Performance analysis of classification data mining techniques over heart disease data base. International Journal of Engineering Science & Advanced Technology, 2(3), 470–478. 12. John Peter, T., & Somasundaram, K. (2012). Study and development of novel feature selection framework for heart disease prediction. International Journal of Scientific and Research Publications. 13. Wang, & Hoy, W. E. (2005). Is the Framingham coronary heart disease absolute risk function applicable to Aboriginal people? Med J Australia, 182(2), 66–69. 14. Karaolis, J.A. Moutiris, L. Pattichs “Assessment of the Risk Factors of Coronary Heart Events Based on Data Mining with Decision Trees”, IEEE Transactions on IT in Biomedicine, vol. 14, No. 3, 2010. 15. Tan, K. C., & Teoh, E. J. et al. (2009). A hybrid evolutionary algorithm for attribute selection in data mining. Journal of Expert System with Applications, 36, 8616–8630. 16. Kemal Polat, & Salih Gunes. (2009). A new feature selection method on classification of medical datasets: Kernel F-score feature selection. Journal of Expert Systems with Applications, 36, 10367–10373.
Big Data Disease Prediction System …
175
17. Pasi Luukka, & Jouni Lampinen. (2010). A classification method based on principal component analysis and differential evolution algorithm applied for prediction diagnosis from clinical EMR heart data sets. Journal of Computer Intelligence in Optimization Adaption, Learning and Optimization, 7, 263–283. 18. Chih-Lin Chi, & W. Nick Street et al. (2010). A decision support system for cost-effective diagnosis. Journal of Artificial Intelligence in Medicine, 50, 149–161. 19. Anooj, P. K. (2012). Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules and decision tree rules. Journal of Computer Sciences, 24, 27–40. 20. Nazri Mohd Nawi, & Rozaida Ghazali et al. (2010). The development of improved backpropagation neural networks algorithm for predicting patients with heart disease. In proceedings of the first international conference ICICA, Vol. 6377, pp. 317–324. 21. Evanthia E. Tripoliti, & Dimitrios I. Fotiadis et al. (2012, July). Automated diagnosis of diseases based on classification: dynamic determination of the number of trees in random forests algorithm. Journal of IEEE Transactions on Information Technology in Biomedicine, 16(4), [45]. 22. Muthukaruppan, S., & Er, M. J. (2012). A hybrid particle swarm optimization based fuzzy expert system for the diagnosis of coronary artery disease. Journal of Expert Systems with Applications, 39, 11657–11665 [55]. 23. Anooj, P. K. (2012). Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules. Journal of Computer and Information Sciences, 24, 27–40. 24. Pfister, R., & Barnes, D. et al. (2011). Individual and cumulative effect of type 2 diabetes genetic susceptibility variants on risk of coronary heart disease. Journal of Diabetologia, 54, 2283–2287. 25. Rashedur M. Rahman, & Farhana Afroz. (2013). Comparison of various classification techniques using different data mining tools for diabetes diagnosis. Journal of Software Engineering and Applications. 26. Nidhi Bhatla, & Kiran Jyoti. (2012, October). An analysis of heart disease prediction using different data mining techniques. International Journal of Engineering Research & Technology (IJERT), 1(8). ISSN: 2278–0181. 27. Monali Dey, & Siddharth Swarup Rautaray. (2014). Study and analysis of data mining algorithms for healthcare decision support system. International Journal of Computer Science and Information Technologies. 28. Das, R., Turkoglu, I., et al. (2009). Effective diagnosis of heart disease through neural networks ensembles. Journal of Expert System with Applications, 36, 7675–7680. 29. Tantimongcolwat, T. (2008). Thanakorn Naenna. Elsevier. 30. Hnin Wint Khaing. (2011). Data mining based fragmentation and prediction of medical data. IEEE. 31. Chauraisa, & Pal, S. (2013). Data mining approach to detect heart diseases. International Journal of Advanced Computer Science and Information Technology (IJACSIT), 2(4), 56–66. 17. 32. Debabrata Pal, & Mandana, K. M. et al. (2012). Fuzzy expert system approach for coronary artery disease screening using clinical parameters. Journal of Knowledge-Based System, 36, 162–174. 33. Markos G. Tsipouras, & Themis P. Exarchos et al. (2008, July). Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling. Journal of IEEE Transactions on Information Technology in Biomedicine, 12(4). 34. Peter C. Austin, Jack V. Tu, et al. Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure. 35. Son, C.-S., Kim, Y.-N., et al. (2012). Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches. Journal of Biomedical Informatics, 45, 999–1008. 36. Mu-Jung Huang, & Mu-Yen Chen et al. (2007). Integrating data mining with case-based reasoning for chronic diseases prognosis and diagnosis. Journal of Expert Systems with Applications, 32, 856–867.
176
N. Sharma and Priya
37. Setiawan, N. A. et al. (2008). A comparative study of imputation methods to predict missing attribute values in coronary heart disease data set. Journal in Department of Electrical and Electronic Engineering, 21, 266–269. 38. Vahid Khatibi, & Gholam Ali Montazer. (2010). A fuzzy-evidential hybrid inference engine for coronary heart disease risk assessment. Journal of Expert Systems with Applications, 37, 8536–8542. 39. Jae-Hong Eom, & Sung-Chun Kim, et al. (2008). Apta CDSS-E: A classifier ensemble-based clinical decision support system for cardiovascular disease level prediction. Journal of Expert Systems with Applications, 34 2465, 2479. 40. Shou-En Lu, & Gloria L. Beckles et al. (2012). Evaluation of risk equations for prediction of short-term coronary heart disease events in patients with long-standing type 2 diabetes: The translating research into action for diabetes. International Journal of BMC Endocrine Disorders, 12. 41. Jaya Rama Krishnaiah, V. V., Chandra Sekhar, D. V., & Ramchand H Rao, K. (2012, May). Predicting the heart attack symptoms using Biomedical data mining techniques. The International Journal of Computer Science & Applications, 1(3), 10–18. 42. Ishtake S. H., & Prof. Sanap S. A. (2013). Intelligent heart disease prediction system using data mining techniques. International J. of Healthcare & Biomedical Research.
Non-destructive Quality Evaluation of Litchi Fruit Using e-Nose System Suparna Parua Biswas, Soumojit Roy, and Nabarun Bhattacharyya
Abstract In this study, we have developed a conveyor e-Nose system for the nondestructive quality evaluation of litchi fruit. A pair of electronic noses with an optimized sensor array of six sensors was adopted to test two litchi fruits at a time when passing through two channels of the running conveyor belt and halt below the e-Nose for a few seconds. The study started with sensor array optimization using a sensitivity test. Three methods were employed for sensor optimization. And the results of all the three methods are quite similar to each other. After acquiring data about the aroma volatiles of the sample and from the electronic nose, Principal Component Analysis (PCA) was employed to see the pattern differences between good litchi and rejected litchi. The experimental results showed that the patterns of the two categories are different from each other. As the clustering of two patterns is prominent, data analysis for classification is employed. 123 data were taken for two categories of litchi. After acquiring aroma volatiles from a sensory array of good and rotten litchi, the data was analyzed and compared using SVM, Logistic Regression, KNN, decision tree and random forest classifier models. SVM and Logistic Regression showed the lowest accuracy rate. As the decision tree model shows the lowest error rate, it is applied and integrated into the system to allow the classification of good and rejected litchi. Keywords e-Nose · Conveyor · Sensor · Metal Oxide Semiconductor (MOS)
1 Introduction Litchi (Litchi chinensis) is one of the most important subtropical commercial cash crops in India. India holds the second rank in terms of litchi production in the world after China [1]. Litchi is well known for its delightful taste, beauty, anti-carcinogenic S. P. Biswas (B) · N. Bhattacharyya Centre for Development of Advanced Computing, Kolkata, India e-mail: [email protected] S. Roy Maulana Abul Kalam Azad University of Technology, Kolkata, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_15
177
178
S. P. Biswas et al.
properties and nutritional rich composition such as proteins, vitamins, amino acids and carbohydrates in the form of sugars [2]. At ambient temperature (25 ± 2 °C), the shelf-life of harvested litchi is less than 48 h and starts decreasing its quality due to its internal metabolism reaction. This decreases the total soluble solid content in the fresh fruit and microbial growth starts to happen which influences the interior structure of the litchi [3]. Common available detection methods for litchi quality include ultraviolet spectrophotometer, high-performance liquid chromatography, differential scanning calorimetry, mass spectrometry and other chemical detection methods. [4]. However, most of the techniques are costly, specific and sensitive and usually time-consuming for regular uses in the food industry and require complex sample pretreatment and a high degree of technical expertise in these fields. Recent analytical technologies based on sensor signals are widespread in various applications including food quality detection [5]. In any sensor-based measurement system, the selection of sensors plays an important role. In recent days, multiple commercial sensors have been developed among which metal oxide semiconductor (MOS) sensors are used extensively in machine olfaction gas detection due to low response to moisture and high chemical stability [6]. There is a certain correlation between sensor resistance and reducing gas concentration in air which is referred to as sensitivity characteristics. Consequently, there is a substantial need for a non-destructive analytical instrument that could mimic the human nose odor and its use in various industrial applications. Techniques based on e-nose, which approximately mimic the human nose, are widely used to deal with the quality evaluation of food products. Previous literature identified some major aroma-generating chemicals responsible for litchi aroma were 2-phenylethanol, geraniol, Linalool, Benzyl alcohol, etc. [7]. Also, there is still more research that needs to be done especially relating to sensor technology, processing the data, interpreting the results and validation studies. Therefore, the objective of this study was set to perform the optimized selection criteria for sensor arrays and to develop a continuous type e-Nose system [8] for quality evaluation by extracting the suitable features and applying those in the classification process.
2 Materials and Methods 2.1 Sample Collection & Preparation Freshly harvested matured litchi samples without any pretreatment were collected from the local market of Kolkata in the month of May–June, 2019. The total number of samples collected in our study was more than 300 hundred. The major aroma-generating chemicals responsible for litchi aroma (2-phenylethanol, geraniol, Linalool and Benzyl alcohol) were purchased from Sigma-Aldrich (USA). Two methods were adopted for sensor selection. For the chemicals, 7 PPM vapor is created in a desiccator one after another for each of the chemicals, then 30 ml
Non-destructive Quality Evaluation of Litchi Fruit …
179
Fig. 1 Desiccator for vapor creation
Fig. 2 Vapor testing in HEN system
e-NOSE system
of 7 PPM vapor is taken out through a syringe (Fig. 1) and was exposed to each of the sensors (Fig. 2) to our pre-developed e-Nose system named as HEN (Handheld Electronic Nose in Fig. 2). Litchi samples were considered for the 2nd method. After harvesting litchi comes into the market in a bunch with leaves and stems. At this stage, litchi must be separated from its branch for grading. There is a technique for this separation. There is a notch just above the root of the litchi. Hold the stem above the notch and twist (revolve clockwise) the litchi by another hand, litchi will be separated from its branch at the notch point and will look like Fig. 3ii. The litchi input sample will look like Fig. 3iii. Same weighted litchis (10–11 gm) were chosen for the experimentation.
2.2 Sensor Array Optimization: (Feature Optimization) Feature preprocessing: The initial sensor selection was done from a set of 12 commercial Figaro make MOS sensors. Several statistical methods have been applied to preprocess and extract the useful features from the sensor signals, such as the ones listed in the following Table 1. The sensitivity was checked for each of the sensors 10 times (Table 2) using our Handheld e-Nose system (HEN) and the response was
180
S. P. Biswas et al.
Notch
(i)
(ii)
(iii)
Fig. 3 Litchi sample preparation: (i) sample collection; (ii) With notch; (iii) extracted from notch
recorded [9]. Experiment was done with single litchi of 10–12 gms (approx.) each (Table 2). Again the sensitivities of each of these sensors were checked individually by exposing 30 ml of vapor of each of benzyl alcohol, linalool, geraniol and 2phenylethanol to each of the sensors separately with 7 PPM concentration. Data were taken 3 times for the same chemical with the same sensor and the responses were recorded (Fig. 6 in the results part). In Table 3, data was taken for Benzyl Alcohol. In the same way, data was taken for the rest of the three chemicals. The total no. of data taken = 3 *12 (no. of sensors) * 4(no. of chemicals) = 144 (Fig. 3). Feature optimization: For sensor array optimization, we employed two methods (a) for the litchi sample: Ordering of three preprocessed data sets (standard deviation, normalization and slope of the original signal responses) for 12 sensors from largest to smallest and then selection of the first six largest preprocessed valued sensors. (b) For each of the four chemicals: Ordering of three preprocessed data sets (standard deviation, normalization and slope of the original signal responses) for 12 sensors from largest to smallest and then selection of the first six largest preprocessed valued sensors. Experiment was done with three major aroma-bearing chemicals with 30 ml of vapor having 7 PPM concentration and (c) Ordering of the classification rate from largest to smallest for good and rejected litchi for 12 sensors and then selection of the first six sensors. Three preprocessed data sets (standard deviation, normalization and slope of the original signal responses) for 12 sensors from largest to smallest and then selection of the first six largest classification rates for six sensors. And in all the cases, Table 1 Features considered for preprocessing of sensor data Sl. no.
Preprocessing method
Formula
1
Slope
SL = {Vs1(max)–Vs1(base)} / {pos(Vs1(max))–pos(Vs1(base))}
2
Normalization
3
Standard deviation
Norms1 = {Vs1(max)–Vs1(min)}/ Vs1(min) √ Σ SDs1 = { (Vs1(i)-Vs1(mean))2/N}
4
Difference
Fs1 = Vs1(max) – Vs1(min)
Non-destructive Quality Evaluation of Litchi Fruit …
181
Table 2 Sensitivity reading taken of litchi samples using different sensors Preliminary sensor list
No. of sensitivity taken
Statistical feature 1
Statistical feature 2
Statistical feature 3
TGS 813
10 times
Step 1: Standard deviation (STD) of each reading Step 2: Average of 10 STD
Step 1: Normalization (Norm) of each reading Step 2: Average of 10 norms
Step 1: Slope of each reading Step 2: Average of 10 slopes
TGS 816
10 times
-Do -
Do -
Do -
TGS 821
10 times
-Do -
Do -
Do -
TGS 823
10 times
-Do -
Do -
Do -
TGS 825
10 times
-Do -
Do -
Do -
TGS 826
10 times
-Do -
Do -
Do -
TGS 830
10 times
-Do -
Do -
Do -
TGS 832
10 times
-Do -
Do -
Do -
TGS 2602
10 times
-Do -
Do -
Do -
TGS 2610
10 times
-Do -
Do -
Do -
TGS 2611
10 times
-Do -
Do -
Do -
TGS 2620
10 times
-Do -
Do -
Do -
Table 3 Sensitivity reading taken of Benzyl Alcohol using different sensors Preliminary sensor list
No. of sensitivity taken
Statistical feature 1
Statistical feature 2
Statistical feature 3
TGS 813
3 times
Step 1: Standard deviation (STD) of each reading Step 2: Average of 3 STD
Step 1: Normalization (Norm) of each reading Step 2: Average of 3 norms
Step 1: Slope of each reading Step 2: Average of 3 slopes
TGS 816
3 times
-Do -
Do -
Do -
TGS 821
3 times
-Do -
Do -
Do -
TGS 823
3 times
-Do -
Do -
Do -
TGS 825
3 times
-Do -
Do -
Do -
TGS 826
3 times
-Do -
Do -
Do -
TGS 830
3 times
-Do -
Do -
Do -
TGS 832
3 times
-Do -
Do -
Do -
TGS 2602
3 times
-Do -
Do -
Do -
TGS 2610
3times
-Do -
Do -
Do -
TGS 2611
3 times
-Do -
Do -
Do -
TGS 2620
3 times
-Do -
Do -
Do -
182
S. P. Biswas et al.
(i)
(ii)
Fig. 4 (i) Final Sensor array; (ii) e-Nose system connected with Laptop with LabVIEW s/w
we got almost the same result and six highest response sensors were selected based on the commonality for the final sensor array for litchi quality detection (Fig. 4i).
2.3 Data Acquisition and Analysis After optimizing the most useful features, the sensor array is developed with six final sensors and data is taken rigorously for the development of the classification model. 43 nos. of good litchi and 80 nos. of rejected litchis were selected for training and testing analysis out of which 93 were taken for training and 30 nos. taken for testing. Signal responses were recorded by using NI DAQ card (NI-6008) and the features are the 1D matrix (1400 * 1); each needs transformation to a single-valued feature. So we considered four transformations, i.e. standard deviation, normalization, slope and difference. So the feature of one sample is represented by a single value (Table 3). Two class data sets were fed to the five classification algorithms [10], i.e. SVM, Logistics Regression, KNN, Decision Tree and Random Forest Classifier which were applied to the data set and results are observed. Based on the accuracy score for correctly classification decision tree was selected and integrated with the e-Nose software module. Proposed e-Nose system: After optimization of the sensor array, finalizing the classification algorithm, we developed the e-Nose system in a conversed platform. The following figure shows the proposed e-Nose system, comprising a sensor array, an interface printed circuit board (PCB), embedded with a pattern recognition algorithm, as well as a verification program. Sensor responses pass through a data acquisition card (DAQ) to a laptop with a self-developed LabVIEW program for the purpose of verifying the function of the portable e-Nose system (Fig. 4ii). A flowchart of the proposed e-Nose system [11] was given in Fig. 5.
Non-destructive Quality Evaluation of Litchi Fruit …
183
Start
Fig. 5 Flowchart of proposed e-Nose system
Fig. 6 Sensitivity of sensors based on chemical analysis
3 Experimental Results and Discussions 3.1 Sensor Array Optimization Sensor optimization using litchi sample: Results of all the features are more or less the same as each other and maximum similarity exists between STD and slope feature. Clustering is done by single features one by one and the results are validated using the classification score which is listed in Table 4. Using chemicals: highest response sensors are similar to the sensors listed in the above table. The highest response sensors for the major aroma-bearing chemicals are observed using a 3D bar plot (Fig. 6). So we selected TGS 825, TGS 823, TGS 826, TGS 2620, TGS 832 and TGS 816 for sensor array (Table 5).
184
S. P. Biswas et al.
Table 4 Highest response sensors based on different features of sensor data Sl. no.
List of sensors
Standard deviation
Normalization
Slope
Classification rate usingthree features (%)
1
TGS 825
0.416724 – R1
1.154073 – R5
0.00074 – R1
85
2
TGS 823
0.269917 – R2
3.161045 – R3
0.000484 – R2
85
3
TGS 826
0.172895 – R3
0.782023 – R8
0.00036 – R3
75
4
TGS 2620
0.138987 – R4
1.063521 – R6
0.000253 – R6
80
5
TGS 832
0.135854 – R5
3.632608 – R2
0.000287 – R5
85
6
TGS 816
0.114579 – R6
4.935 – R1
0.00033 – R4
80
7
TGS 2610
0.110892 – R7
1.26567 – R4
0.000245 – R7
651
8
TGS 813
0.078897 – R8
0.901736 – R7
0.000202 – R9
65
9
TGS 2611
0.078699 – R9
0.506375 – R10
0.000205 – R8
80
10
TGS 830
0.060425 – R10
0.116283 – R12
0.000123 – R11
80
11
TGS 2602
0.047691 – R11
0.743785 – R9
0.000197 – R10
70
12
TGS 821
0.041034 – R12
0.412453 – R11
9.63E-05 R12
85
3.2 Data Analysis 80 nos. of rejected and 43 nos. of export quality litchis were selected manually for preparing the classification model. Sensor responses were plotted for two classes of data and the pattern difference (Fig. 7) is observed. To confirm the pattern difference PCA was applied [12] and plotted the result (Fig. 8) and the meaningful information that was observed tends to apply to the classification task. Five classification algorithms are applied to the data for implementation into the system. Out of 123 data, 93 samples were taken for preparing the model and 30 data is taken for test data. Results are compared. The experimental result in (Table 6) shows that the accuracy of decision tree and random forest classifier algorithms is very promising. So we employed decision tree in our software. Developed system and performance: 300 litchis (206 for accepted, 94 for rejected) are collected to test the system performance. The system performance is 92%. Litchis
Non-destructive Quality Evaluation of Litchi Fruit …
185
Table 5 Highest response sensors based on STD features of sensor response Sl. no.
Sensor
Benzyl alcohol
Linalool
Geraniol
1
TGS 825
0.353127 (R-3)
0.49588 (R-5)
0.2254 (R-4)
2
TGS 826
0.38157 (R-2)
2.1896 (R-1)
0.94024 (R-1)
3
TGS 823
0.30912 (R-4)
0.35259 (R-6)
0.3381(R-2)
4
TGS 832
0.2254(R-6)
0.703033(R-3)
0.12397(R-7)
5
TGS 880
0.094453 (R-11)
0.178173(R-10)
0.03703(R-14)
6
TGS 816
0.088013 (R-13)
0.137387(R-13)
0.05313(R-13)
7
TGS 830
0.22057 (R-7)
0.55223 (R-4)
0.13363 (R-6)
8
TGS 821
0.06118 (R-14)
0.06118 (R-15)
0.07245 (R-10)
9
TGS 813
0.05152 (R-15)
0.13524 (R-14)
0.03381 (R-15)
10
TGS 2610
0.24472 (R-5)
0.32844 (R-8)
0.07406 (R-9)
11
TGS 2620
0.816807(R-1)
1.88209(R-2)
0.33649 (R-3)
12
TGS 2600
0.09982 (R-10)
0.1449(R-12)
0.07889(R-8)
13
TGS 2611
0.210373 (R-8)
0.34454 (R-7)
0.15295(R-5)
14
TGS 2602
0.18032(R-9)
0.17549 (R-11)
0.06279 (R-11)
15
TGS 2612
0.091233 (R-12)
0.236133 (R-9)
0.05796 (R-12)
3
4
2
TGS 823
1
TGS 2620
TGS 823
0
TGS 2620
1 137 273 409 545 681
1 137 273 409 545 681
0
2
(i)
(ii) 4
2 TGS 823
0
TGS 2620
1 117 233 349 465 581 697
1
(iii)
2
TGS 823 TGS 2620
0 1 117 233 349 465 581 697
3
(iv)
Fig. 7 (i) and (iii) signal response for 2twosensors for accepted litchi. (ii) and (iv) signal response for two sensors for rejected litchi
186
S. P. Biswas et al. Litchi FRESH-ROT Analysis with STD feature
Litchi ROT-FRESH Analysis with STD feature
0.5
PCA2 6.4355
0.5
Litchi-Fresh Litchi ot
0.4 0.3
0.3
0.2
0.2
0.1
0.1
0
0
-0.1
-0.1 PCA21.5581
-0.2
-0.2
-0.3
-0.3
-0.4
-0.4
-0.5
Litchi Rot Litchi Fresh
0.4
R
-0.5
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
1
PCA1 97.5122
PCA1 91.1151
(i)
(ii)
Fig. 8 (i) PCA plot for accepted litchi. (ii) PCA plot for rejected litchi
Table 6 Accuracy of five classification models applied on four features of the sensor response
Accuracy
SVM
Logistic regression
KNN (10)
Decision tree
Random forest classifier
STD 87.09
STD 93.54
STD 93.54
STD 96.77
STD 96.77
Norm 87.09
Norm 90.32
Norm 90.32
Norm 93.54
Norm 96.77
Slope 83.87
Slope 90.32
Slope 90.32
Slope 96.77
Slope 96.77
Diff 90.32
Diff 90.32
Diff 90.32
Diff 96.77
Diff 93.54
are put in a lot into the hopper. The hopper is designed so intelligently that litchis are trapped into the grooves, moved down and placed properly on the running conveyor belt under the e-Nose module (Fig. 10). Figure 11 shows our developed full system.
Confusion Matrix Accepted Accepted 202 Rejected Total
19
Rejected
Total
4
206
75
94 300
Fig. 9 Confusion matrix for 300 litchis input to the developed Litchi Grading System
Non-destructive Quality Evaluation of Litchi Fruit …
187
Fig. 10 Conveyor e-Nose system for Litchi Analysis
Fig. 11 Conveyor litchi grading system
4 Conclusion A continuous system has been developed for litchi quality detection. Some criteria of the FVGMR for e-Nose have been accommodated. The electronic NOSE can effectively detect the quality of litchi in some controlled environments. Actually, we developed a fusion of e-Nose and E-VISION module. In this paper, only e-Nose part was discussed. There is a future scope for accommodating other criteria for the FVGMR standard in the e-Nose module.
188
S. P. Biswas et al.
Acknowledgements The financial support provided by the Department of Science & Technology, New Delhi, India, is duly gratefully acknowledged.
References 1. Choudhary, J. S., Prabhakar, C. S., Das, B., & Kumar, S. (2013). Litchi stink bug (Tessaratoma javanica) outbreak in Jharkhand, India, on litchi. Phytoparasitica, 41(1), 73–77. 2. Wang, Y., Wang, H. C., Hu, Z. Q., & Chen, H. B. (2010). Litchi good for heath from skin to heart: An overview of litchi functional activities and compounds. In Proceedings of 3rd IS on Longan, Lychee & Other Fruit, Acta Hort. 863, ISHS 2010 3. Xu, S., Lü, E., Lu, H., Zhou, Z., Wang, Y., Yang, J., & Wang, Y. (2016). Quality detection of litchi stored in different environments using an electronic nose. Sensors, 16(6), 852. 4. El-Mesery, H. S., Mao, H. & El-Fatah Abomohra, A. (2019). Applications of non-destructive technologies for agricultural and food products quality inspection. Sensors, 19, 846. https:// doi.org/10.3390/s19040846. 5. Loutfi, A., Coradeschi, S., Mani, G. K., Shankar, P., & Balaguru Rayappan, J. B. (2015). Electronic noses for food quality: A review. Journal of Food Engineering, 144, 103–111. 6. Keller,P. E., Kangas, L. J., Liden, L. H., Hashem, S., & Kouzes, R. T. (1992). Electronic noses and their applications. In IEEE Northcon/Technical Applications Conference (TAC’95) in Portland. 7. Wu, Y., Zhu, B., Tu, C., Duan, C., & Pan, Q. (2011). Generation of volatile compounds in litchi wine during winemaking and short-term bottle storage. Journal of Agriculture and Food Chemistry, 59, 4923–4931. 8. Wilson, A. D., & Baietto, M. (2009). Applications and advances in electronic-nose technologies. Sensors (Basel), 9(7), 5099–5148. 9. Sharma, M., Ghosh, D., & Bhattacharya, N (March, 2013). Electronic nose—A new way for predicting the optimum point of fermentation of Black Tea. International Journal of Engineering Science Invention. ISSN (Online): 2319 – 6734, ISSN (Print): 2319 – 6726. 10. Tang, K.-T., Chiu, S.-W., Pan, C.-H., Hsieh, H.-Y., Liang, Y.-S., & Liu, S.-C. (2010). Development of a portable electronic nose system for the detection and classification of fruity odors. Sensors, 10, 9179–9193. 11. Karakaya, D., Ulucan, O., & Turkan, M. (2020). Electronic nose and its applications: A survey. International Journal of Automation and Computing. 17(2), April 2020. 12. Kusumiyati, Hadiwijaya, Y., Putri, I. E. (2019). Non-destructive classification of fruits based on Uv-Vis-Nir spectroscopy and principal component analysis. Jurnal Biodjati, 4(1):89–95, May 2019.
A Survey of Learning Methods in Deep Neural Networks (DDN) Hibah Ihsan Muhammad, Ankita Tiwari, and Gaurav Trivedi
Abstract Machine learning algorithm hyperparameters for a specific dataset are mathematically expensive and it is challenging to find the best values. Machine Learning predictive modelling algorithms are governed by hyperparameters. Artificial intelligence has become a new-age solution to most of the global challenges, hitherto unravelled and resolved. Within artificial intelligence, Machine Learning and its subset Deep Learning (DL) have brought a paradigm shift with their computation power. Currently, DL is a widely used computational approach in machine learning. The uniqueness of deep learning is its capability to learn a large amount of data. This paper discusses mathematical algorithms that are associated with each of these performance-enhancing measures and demonstrates tuning results and efficiency gains for each process and analysis, and it attempts to review different types of deep learning training and learning methods, viz., supervised, unsupervised, semisupervised, and reinforcement learning together with challenges each face. Deep Learning algorithms are evaluated by applying different machine learning techniques. Keywords AI-Artificial Intelligence · BDL-Bayesian Deep Learning · DL-Deep Learning · DNN-Deep Neural Network · ML-Machine Learning · RL-Reinforcement Learning
1 Introduction Artificial Intelligence, accompanied by strong ethics, is a new-age solution to most of the pressing problems of humankind globally. Unlike programming, Machine Learning is a component of artificial intelligence and is a science of accomplishing tasks without being explicitly programmed. H. I. Muhammad (B) Kendriya Vidyalaya, Khanapara, Assam 781022, India e-mail: [email protected] A. Tiwari · G. Trivedi Indian Institute of Technology Guwahati, Assam 781039, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_16
189
190
H. I. Muhammad et al.
In recent years, the field of artificial intelligence has focused on the concept of artificial neural networks (ANNs), also known as multilayer perceptron, mostly because they present a complex algorithm that can approach almost any challenging data problem. Although there are several machine learning algorithms available to solve data problems, ANNs have become increasingly popular among data scientists, due to their capability to find patterns in large and complex datasets that cannot be interpreted by humans. From the perspective of artificial intelligence, a neuron implements conditional rules (i.e., if the condition is true for inputs, then generate the given output). And the complex logic predictors combine these neurons to form a network that supervises our thinking and decision-making. Recently, there has emerged a point of consensus between artificial intelligence and neural network camps with respect to the underlying structure of intelligence. It comes in the form of a mixture model. Mixture models use a divideand-conquer strategy in which complex tasks are assigned to a number of connected, semi-autonomous specialists (i.e., agents or experts), themselves relatively mindless. A deep learning based model that exploits the benefits of a weighted average model from CNN algorithms, are more frequently employed. More recently, it has enormously contributed to the diagnosis, diagnostics, drug, and vaccine development for the COVID-19 vaccine [1]. Among various ML algorithms, DL has gained immense importance with its widespread and diverse applications across the sectors. The DL is modelled on neural networks like that of the structure and function of the human brain. In DL, learning involves assessing the model parameters so that the algorithm model can perform an identified definite task. In ANN or Artificial Neural Networks, weight matrices are given to the parameters [2]. In DL, a neuron is the fundamental computational unit, which takes numerous signals as input to generate outputs. This involves the integration of a number of feature signals linearly with the weights and transfers the combined signals over nonlinear tasks. The term “deep” in deep learning denotes the concept of several layers (hidden layers) through which the data is transformed. The DL systems have credit assignment path (CAP) depth, illustrating the cause-and-effect relationship between the inputs and outputs. This is very valuable where a task involves a vast amount of data [3]. Several developments have contributed to the speedy advancement of deep learning. The most critical ones include the availability of huge-labelled datasets to train large DL models with many parameters [4], emergence and availability of fast and affordable hardware with enormous computational capability [5, 6], and the invention of several sophisticated software libraries [7]. ANN duplicates the key aspects of the human brain, specifically its ability to learn from experience. A human brain has many characteristics that would be required in artificial models. Linear perceptron neural network: d Σ ( ) g −1 y = w0 + wi xi Λ
i=1
A Survey of Learning Methods in Deep Neural Networks (DDN)
191
Fig. 1 Electrical linear perceptron—Generalized linear model
Per Fig. 1, perceptron can easily be prolonged by swapping its irregular step function with a constant output alteration known as an activation function, g(.). d Σ ( ) y = g(w0 + wi xi ) Λ
i=1
The above equation can homogeneously be articulated as a link function, g −1 (.). A link function is the scientific inverse (similar log and exponent or square in addition square root) of the activation function. d Σ ( ) g −1 y = w0 + wi xi Λ
i=1
In combination with a suitable error function, an activation function shot a neuron into a comprehensive linear model, proficient of fitting number of target types, together with normal, binary, multinomial, and ordinal targets.
2 Challenges Before Deep Learning DL is extremely data thirsty and thus requires exploration and processing of vast datasets for training to achieve a well-behaved performance model, especially in areas where datasets are currently limited. Then comes the challenge of the availability of an adequate volume of training data to obtain good performance. Imbalanced data is one of the major challenges in biological data, as the sick patients are less than compared to normal or healthy patients. Further, in disease diagnosis problems, the interpretability of data is very important both for disease
192
H. I. Muhammad et al.
diagnosis and for improving the precision in prediction outcomes of a trained DL model. In medicine, uncertainty scaling is significant. Thus, to avoid unreliable and misleading predictions, the score of confidence for every enquiry is significant. Another challenge is catastrophic forgetting in cases of plain DL models wherein new information is incorporated. Training by employing new and old data will also be time-consuming and computationally exhaustive and may lead to an uneven learned representation. The study fields of health care and environmental science are data-intensive, and heterogeneous, requiring additional computational issues for compressing the DL models. One of the significant challenges in training is the overfitting of data, especially in cases where the number of parameters is large and involves complex relations, which will affect the model’s ability to accomplish reliable results on the tested data. Another challenge is the vanishing gradient problem that arises in each training iteration when weight updating is not done, in which, based on current weight, every weight of the neural network is updated. It is relative to the partial derivative of the error function. In contrast, another challenge is the exploding gradient problem accumulated during backpropagation, making the system unsteady and the model losing its ability to learn effectively. Another new challenge noted recently at Google is under specification- is poor behaviour of DL models when tested in applications such as natural language processing and medical imaging. The deep global engagement in DL has evolved a number of techniques and models to overcome the same, and many are in the process of overcoming [8].
3 Deep Learning Architectures The Deep Learning architectures include 1. 2. 3. 4. 5. 6. 7. 8.
Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), Gated Recurrent Units (GRU), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Auto-Encoder (AE).
These deep learning architectures are very useful for classification, encoding, prediction, decoding, data generation, and many more application domains. GANs. They comprise two different neural networks, viz., Generator and Discriminator contesting each other in terms of data distribution, with an objective to have “indirect” training. GANs with their different types, viz., Vanilla GANs, Deep Convolutional GANs, Conditional GANs, and Super Resolution GANs are being used
A Survey of Learning Methods in Deep Neural Networks (DDN)
193
extensively not only in detecting photorealistic images but also in producing photorealistic images useful in art, interior design, industrial design, shoes, textiles, video games, and science. In contrast, it is also a subject of ethics as it is being increasingly used in malicious applications [9, 10]. In GANs, learning involves all kinds of learnings, viz., fully supervised, semi-supervised, and unsupervised. Deep Reinforcement Learning (DRL). It combines deep learning and reinforcement learning, where learning is performed by the trial-and-error approach to train the unknown and real environment [11]. The DRL is found in diverse applications, including computer vision, natural language processing, education, and health care. Bayesian Deep Learning (BDL). It is a cross-section between DL and Bayesian conditional probability theory to quantify the uncertainty of parameters of interest. In general, BDL statistical methods start with a prior distribution for all unknown parameters, update this prior distribution in the light of the data (for example, using likelihood) to construct the posterior distribution, and then use the posterior distribution for inferential decisions. Bayes’ theorem is presented below P(A|B) =
P(B|A)P(A) P(B)
Per the Bayes approach, we started with prior and available information (collected from “Centers for Disease Control and Prevention”—CDC data source: https://covid. cdc.gov/covid-data-tracker/#cases_casesper100klast7days, experts, and so on) about the coronavirus disease incidence rate and after gathering the information and data (binomial counts), we updated the incidence rate using the posterior distribution for an estimate of the incidence rate of coronavirus disease using a statistical model. P(A) is the prior probability of event A. It’s called prior because it does not take into account any information about event B. P(B|A) is the conditional probability of event B given event A. P(B) is the prior or marginal probability of event B. P(A|B) is the conditional probability of event A given event B. And it’s called the posterior probability because it is derived from the specified value of event B. The following steps are involved in our data analysis pointing to Bayesian inference. The probability distribution of the parameter, known as the prior distribution, is formulated. Given the observed data, we choose a statistical model (referred to as likelihood) that describes the distribution of the data given the parameters. We updated our beliefs about the parameters by combining information from prior distribution and data through the calculation of posterior distribution. This is carried out by using Bayes’ theorem (equation shown below), hence the term Bayesian analysis. p(θ |x) =
f (x|θ )π(θ ) m(x)
194
H. I. Muhammad et al.
The above equation is often verbalized as “posterior density = (likelihood * prior)/marginal likelihood the marginal density of x” is an integral defined as ( f (x|θ )π (θ )dθ The posterior density or distribution describes the distribution of the parameter of interest with respect to the data and prior. The posterior distribution is necessary for probabilistic prediction and for sequential updating. As per our analysis, Bayesian methods offer alternatives to classical statistical inference. Instead of treating parameters as fixed constants, these methods treat parameters as random variables. These parameters cannot be determined exactly and are uncertain. The parameters are expressed through probability statements and distributions. Bayesian inference about the parameters is based on the probability distribution for parameters. Further, it has become quintessential to the field of data science for its ability to estimate the probability of occurrence of an event and thus in addressing uncertainty. In addition, it has become valuable in processing multi-task problems and is identified for better results in diverse applications [12]. Transfer Learning. It is a Deep Learning method, where the knowledge gained in solving one task (pre-trained models) is re-purposed to solve a different but related problem. It is an optimization model that saves training time with a jump start (instead of collecting data from scratch), besides improving performance. The pretrained models can be used either directly or through leveraging feature extraction or by fine-tuning several layers. It is widely being used in image data and natural language processing. The challenges involve the non-availability of a pre-trained model, availability of domain expertise, etc.
4 Types of Neural Networks 4.1 Deep Neural Network (DNN) A deep neural network is a hierarchical (vertically stacked layers) neural network comprising an input layer, an output layer, and a minimum of one hidden layer in between [13]. The hidden layers are the ones that determine the performance and complexity of neural networks. They perform a multitude of functions, viz., data transformation and automatic feature creation. The DNN is an Artificial Neural Network (ANN) and it simulates the human brain through a set of algorithms. The word “Deep” denotes the depth of layers of the neural network. DL approaches employ regulated initialization in which the modification of the hidden weights are the function of incoming signals. We used the following mathematical formula/method for assigning the variance of hidden weights and weight
A Survey of Learning Methods in Deep Neural Networks (DDN)
195
initialization. The initialization method is random uniform with variance / ( / ) 6 6 , Wi, j ∼ U − m+n m+n where n and m are the number of output and input connections (hidden units in the current layer), respectively. The weight initializations have less influence over model performance if batch normalization is used for the reason that batch normalization standardizes data passed among the hidden layers. Here are the weight initializations: The constant variance “standard deviation = 1” if we consider the following equation Y = W 1 + · · · + W25 . /
6 The normalized variance “standard deviation = 25+25 ≈ .34” if we consider the “Y = W 1 + · · · + W 25 .” In the neural network, input data is first accepted by the first layer of neurons (input layer), which is forward propagated into hidden layers and so on, till it provides the final output at the output layer, without looping back. Thus, DNNs behave like feedforward networks (as shown in Fig. 1). Inputs are received by dendrites and are initially weighted through adjustable synapses earlier being added. When the result exceeds, the threshold voltage neurons communicate among each other through axons in the form of spikes. At a basic level, any neural network encompasses parts like inputs, weights, a bias or threshold, summation function, activation function, and an output. Each
Neuron Axon (output)
Dendrite (input)
Fig. 2 Electrical activity that happens at a neuronal synapse
Synapse (weight)
196
H. I. Muhammad et al.
synapse “weights” the comparative strong point of its incoming input. The synoptically weighted contributions are added. If the sum overdoes an adjustable threshold (or bias), the neuron directs a signal to its axon of the other neurons in the network to which it connects. The neural network combines different inputs using weights. The weights are multiplied with inputs and generate outputs of either zero or one. If the network is not precise in recognizing an outline, the procedure will regulate the weights [14]. Like the brain, it segregates every incoming information into the “relevant” and “not-so-relevant” categories, in ANN-Deep Learning, segregation plays an important role. It ensures that the network learns from relevant information, rather than being burdened with analysing the “not-so-relevant” part. Here, the activation function determines whether a neuron should be activated or not. In other words, in solving the problem or prediction, it decides whether the neuron’s input to the network is important or not. The non-linear activation function allows for classifying non-linear decision boundaries or patterns in the data. However, a key finding of current neurophysiology is that node connections are flexible; they alter with knowledge. The additional dynamic the weight, the robust the link develops. Equally, synapses through slight or without movement deteriorate and, ultimately, die off (degenerate). This is believed to be the foundation of learning. Even though there are divisions of NN study that challenge the impersonator, the fundamental organic procedures in feature, furthermost NN doesn’t try to be physically accurate.
4.2 Recurrent Neural Network (RNN) In contrast to ANN, which are feedforward neural networks, the recurrent neural network (RNN) saves the output of processing nodes and feeds this back into the model as an input. In other words, in RNN, the information cycles through a loop [15]. Backpropagation plays a critical role. So RNNs have internal memory and thus make it very precise in predicting what’s coming next. Therefore, it is the preferred algorithm for sequential data like trend/time series finance, weather, etc. The long short-term memory (LSTM) network, an extension of RNN, empowers the RNNs to remember inputs over a long period. The way to implement the moving average component is to use a recurrent network, in which a network’s output is fed back as an input on the next iteration. This gives the neural network a “memory.” There are many variants of the recurrent neural network paradigm. The convolutional neural networks (CNNs) are space invariant ANN. During the feature extraction, it involves a series of Convolution + ReLU and Pooling, and classification. This is widely used in computer vision, image processing, and automatic speech detection and recognition [16, 17] applications.
A Survey of Learning Methods in Deep Neural Networks (DDN)
197
4.3 Challenges and Merits of DNN Challenges. The two major challenges with DNN are overfitting and computation time, as it involves several training parameters, viz., size and the learning rate. To overcome these issues, a number of methods and techniques are developed and are in development. Merits. Compared to shallow learning approaches, DNNs with their inherent multiple layers are more proficient in representing variable non-linear functions [18]. In addition, the features of extraction and classification layers make DNNs more efficient.
4.4 Traditional ML Versus DL In DL, the learning of features is done automatically. It is represented hierarchically at various levels, whereas in the case of traditional ML, features are extracted using extraction algorithms and learning algorithms [11].
5 Types of Learning Methods in Deep Learning In the literature review, we briefed on different methods of learning, including supervised, semi-supervised, and unsupervised learning. In addition, there are other categories called reinforcement learning (RL) and representation learning [11, 19]. While distinctions are helpful, the boundaries are increasingly becoming fluid rather than rigid.
5.1 Supervised Learning Supervised Learning denotes prior knowledge. It is a learning technique wherein the model is trained using labelled data. The labelled training data comprises a pair consisting of an input object and a desired corresponding output. Learning means the model will build some logic of its own. The model is tested once it is ready upon completion of training. The supervised learning approaches include convolutional neural networks (CNN), deep neural networks (DNN), long short-term memory (LSTM), recurrent neural networks (RNN), and gated recurrent units (GRU) [11]. Algorithms commonly used in supervised learning are linear and logistic regression, K-nearest neighbours, support vector machines, decision tree, and random
198
H. I. Muhammad et al.
forest. Other complex regression algorithms include support vector regression and regression trees. The supervised learning or linear algorithm consists of only a single independent feature (x) which is related to its dependent feature (y) and it is linear, because of its simplicity. Generally, its performance is improved even for the simple dataset. The merit of this technique is prior knowledge of specifics of classes before a model is trained, having a high degree of control over the training process. In this model, we can train the classifier to have a perfect decision boundary to distinguish different classes accurately. In contrast, the demerit is that it can’t classify data by analysing its features on its own. In other words, it can’t process the inputs correctly if any of the input is not from any of the classes in the training data. Overall, this technique is more straightforward and simpler [8]. Deep stacking networks (DSN), commonly referred to as the deep convex network (DCN) suggest the depth of algorithm used to learn the network. The core idea behind this architecture is stacking. It is armed with parallel and scalable learning. The DSN consists of an amalgamation of units and is present in the architecture as a part of the network. It works superior to non-complex DBNs, which makes it an accepted and suitable network model [3]. In recent years, there have been calls to banish the term supervised learning with claims that supervised learning is not necessary, while many questioned these claims by reviewing insights that Learning and generalization are not possible without supervision or inductive biases. Learning in nature also requires data and supervision in various forms [20].
5.2 Semi-Supervised Learning In a semi-supervised learning model, the algorithm is trained with a vast unlabeled dataset augmented with a small labelled data, based on which the model is expected to learn and make predictions on new examples. It is an instance of minimum supervision. This is of importance to optimize the time and cost involved in labelling the datasets for training. Often, the labelled datasets are not readily available or are in paucity, and labelling may be difficult to obtain because of either cost or time-consuming process, or other factors. Annotation is time-consuming and boring. Thus, semi-supervised learning minimizes the amount of labelled data needed and thus overcomes one of the problems of supervised learning—having enough labelled data. This model will have at least one of several assumptions, viz., continuity assumption, cluster assumption, or manifold assumption. It is extensively being used in Internet content classification, speech analysis, protein sequence classification, etc. The generative adversarial networks (GAN) and deep reinforcement learning (DRL) are employed as a semi-supervised learning methodology. Additionally, RNN, including GRU and LSTM, are also employed [8, 11].
A Survey of Learning Methods in Deep Neural Networks (DDN)
199
5.3 Unsupervised Learning Unsupervised learning models are trained using unlabeled datasets without any supervision, rather the model itself detects hidden patterns and insights from the data, similar to the human brain while learning new things. In other words, the task of the unsupervised learning algorithm is to identify the image patterns on their own. This is particularly important, as we may not have trained data to solve every case in real life. The model is utilized for tasks like association, clustering, and dimensionality reduction. Its potential to decipher similarities and dissimilarities in datasets makes it the immaculate solution for cross-selling approaches, data analysis, customer segmentation, and image recognition. The real-world applications include categorizing news stories from diverse news outlets, object recognition using computer vision, medical imaging, quality control through anomaly detection, building customer profiles and choices, discovering data trends based on past purchase behaviour data, etc. The algorithms commonly used in unsupervised learning are K-means clustering, association rules, hierarchical and probabilistic clustering, a priori algorithms, singular value decomposition, principal component analysis, restricted Boltzmann machines (RBM), dimensionality reduction, autoencoders, recurrent neural network (RNN), generative adversarial network (GAN), LSTM, and RL [11, 21]. The challenges to unsupervised learning occur when it allows models to execute without any human intervention, viz., longer training, inaccurate results, lack of transparency on data sorting, and computational complexity.
5.4 Reinforcement Learning (RL) Deep reinforcement learning is built upon gaining experience, rewarding desired behaviours, and/or punishing undesired ones, and is used chiefly in unknown environments. In other words, an RL agent perceives and interprets its environment and learns through trial and error. RL or Deep RL were developed in 2013 with Google Deep Mind. RL differs from supervised learning because it involves no need for labelled input/output pairs and no need to correct sub-optimal actions. Thus, performing this learning is considered more difficult. Emphasis is on deciding a balance between exploration and exploitation of current knowledge. Algorithms commonly used are Q-learning, SARSA, Monte Carlo, Deep Q Network, etc. [22]. A major barrier to its deployment is its reliance on an exploration of the environment. Thus, the time required for proper learning is limiting its usefulness and usage in intensive computing resources. In contrast, supervised learning can deliver results faster and efficiently if adequate and proper data is available. RL is defined by characterizing a learning method not by learning methods like other general learning models [11, 23].
200
H. I. Muhammad et al.
Additionally, supervised learning deals with classification and regression tasks, while unsupervised learning deals with patterns clustering and associative rule. In contrast, reinforcement learning deals with exploration and exploitation.
6 Comparison of Different Deep Learning Algorithms In Table 1, we compared the features, merits and demerits, and challenges present while executing deep earning algorithms. A deep neural network algorithm is a hierarchical neural network. It has an input; an output layer, and more than one hidden layer is present. These algorithms are capable of handling highly variable non-linear functions. However, there are two major challenges that are also present such as more computation time and overfitted model. In the case of Recurrent Neural Network, it is a powerful algorithm for sequential data but is constrained by gradient vanishing and exploding. Convolutional Neural Networks provide very high accuracy in image processing and computer vision with spatial relationship, but it involves both much time and cost and requires a lot of training data.
7 Discussion and Conclusion The application of AI using DL has already made a significant impact across the sectors, including in addressing the current COVID-19 pandemic, enabling the expeditious development of the vaccine in less than one year, for the first time in the history of humankind, demonstrating the power of AI. In the years ahead, humanity is looking forward to the prospects of AI to resolve the challenges that are seen insurmountable to date. To illustrate, in health care, DL is being increasingly tested for the early diagnosis of disorders and diseases, including Alzheimer’s and Parkinson’s diseases, developmental disorders, etc. Similarly, in their pioneering publication, Rolnick et al. [24] have explored and explained how machine learning could be of immense help in understanding and solving climate change problems, including prediction, energy efficiency, etc. Deep learning is growing exponentially demonstrating its success and versatility of applications in diverse areas. In addition, the rapidly improved accuracy rates clearly exhibit the relevance and prospects for deep learning advancement. In the evolution of DL, the hierarchy of layers, learning models, and algorithms are critical key factors to evolving an efficacious implementation with deep Learning. In this paper, an attempt is made to review various state-of-the-art DL models; structural and unistructural models, and RL and their applications in a variety of domains addressed to climate change and COVID-19.
DNN
Merits/Prospects/Cons Compared to shallow learning approaches, DNNs with their inherent multiple layers are enough accomplished on behalf of extremely fluctuating non-linear functions In addition, the features of extraction and classification layers make DNNs more efficient Applied in supervised learning approach The DNN actions are used to solve more traditional classification problems, such as fraud detection. And it’s widely used with great accuracy
Details
Features: • It is a hierarchical neural network comprising an input, output, and hidden layers. The hidden layers are the ones that determine the performance and complexity of neural networks. They perform a multitude of functions, viz., data transformation, automatic feature creation, etc. • The DNN is an Artificial Neural Network (ANN) and it simulates the human brain through a set of algorithms • In the neural network, the input data is first accepted by the first layer of neurons (input layer), which is forward propagated into hidden layers and so on, till it provides the final output at the output layer, without looping back. Thus, DNNs behave like feedforward networks • Like the brain, it segregates every incoming information into the “relevant” and “not-so-relevant” categories; in ANN-Deep Learning, segregation plays an important role. It ensures that the network learns from relevant information, rather than being burdened with analysing the “not-so-relevant” part. Here, the activation function determines whether a neuron should be activated or not. In other words, in solving the problem or prediction, it decides whether the neuron’s input to the network is important or not
Table 1 Comparison of various deep learning algorithms Demerits/Challenges/Cons
(continued)
The two major challenges with DNN are overfitting and computation time, as it involves several training parameters, viz., size and learning rate. To overcome these issues, a number of methods and techniques are developed and are in development The learning process of the mode is also too much slow Spatial relationships are not possible
A Survey of Learning Methods in Deep Neural Networks (DDN) 201
Features: The Convolutional Neural Networks (CNNs) is space invariant ANN. In this feature, extraction involves series of Convolution + ReLU and Pooling, and classification Applications: This is widely used in image processing, computer vision, and automatic speech recognition
CNN
Offer very high precision in image recognition problems and can detect important features without any human supervision Spatial relationships are possible Very high performance and learning of model is fast Computational efficiency employing convolution and pooling layers
Merits/Prospects/Cons Powerful for Sequence Data: Remembers information through time series RNN can employ their internal storage for computing the uninformed sequence of inputs which is not the case with feedforward neural networks Applied in supervised learning, Semi-supervised learning, and Unsupervised learning Model size does not increase if the input size is larger
Details
Features: • In contrast to ANN, which are feedforward NNs, RNN saves the output of processing nodes and feeds this back into the model as an input. In other words, in RNN, the information cycles through a loop • Backpropagation plays a critical role. So RNNs have internal memory and thus make it very precise in assuming the coming data • There are many variants of the recurrent neural network paradigm. The Convolutional Neural Networks (CNNs) is space invariant ANN Applications: • It is the preferred algorithm for sequential data like trend or time series, finance, weather, etc. • The Long Short-Term Memory (LSTM) network, an adjunct of RNN, empowers the RNNs to remember inputs over a long period • The way to implement the moving average component is to use a recurrent network, in which a network’s output is fed back as an input on the next iteration. This gives the neural network a “memory”
RNN
Table 1 (continued) Demerits/Challenges/Cons
It needs lot of labelled data for classification Convolutional neural networks can take a number of days to train a model in certain situations High computational cost Do not encode orientation and position of the object Require a lot of training data No recurrent connections
Due to its recurrent nature, the computation is slow Constrained by gradient vanishing and exploding Training is onerous Spatial relationships are not possible
202 H. I. Muhammad et al.
A Survey of Learning Methods in Deep Neural Networks (DDN)
203
References 1. Arora, G., Joshi, J., Mandal, R. S., Shrivastava, N., Virmani, R. (2021). Artificial intelligence in surveillance, diagnosis, drug discovery and vaccine development against COVID19. Pathogens, 10(8), 1048. https://doi.org/10.3390/pathogens10081048, https://www.mdpi. com/2076-0817/10/8/1048. 2. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. 3. Dargan, S., Kumar, M., & Ayyagari, M. R. (2019). A survey of Deep Learning and its applications: A New Paradigm to Machine Learning: Research Gate. 4. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A largescale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (pp. 248–255). IEEE. 5. Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., et al. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 609–622), Cambridge, UK, December 13–17, 2014. 6. Coates, A., Huval, B., Wang, T., Wu, D. J., Catanzaro, B., & Ng, A. Y. (2013). Deep learning with COTS HPC systems. In: ICML. Google Scholar. 7. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, B., Tucker, P.A., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., & Zhang, X. (2016). TensorFlow: A system for large-scale machine learning. In: OSDI. Google Scholar. 8. Alzubaidi, A., Zhang, J., Humaidi, A. J., Al Dujaili, A., Duan, Y., Al Shamma, O., Santamaría, J., Fadhel, M. A., Al Amidie, M., Farhan, L., et al. (2021). Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data, 8, 53. https://doi.org/10.1186/s40537-021-00444-8. 9. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems (pp. 2672–2680). Cambridge, MA, USA: The MIT Press. 10. Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. In Advances in Neural Information Processing Systems (pp. 613–621). Cambridge, MA, USA: The MIT Press. 11. Alom, Md. Z., Taha, T. M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M. S., Hasan, M., Van Essen, B. C., Awwal, A. A. S., & Asari, V. K. (2019). A state-of-the-art survey on deep learning theory and architectures. Electronics MDPI, 8, 292. https://doi.org/10.3390/electroni cs8030292. 12. Kendall, A., & Gal, Y. (2017). What uncertainties do we need in Bayesian deep learning for computer vision? Advances in Neural Information Processing Systems (NIPS). MIT Press. 13. Bengio, Y., LeCun, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444. 14. Hof, R. D. (2018). Is artificial intelligence finally coming into its own? MIT Technology Review. Archived from the original on 31 March 2019. Retrieved July 10, 2018. 15. Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y. (2016). Exploring the limits of language modeling. arXiv:1602.02410 [cs.CL]. 16. Lecun, Y., Bottou, L., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE., 86(11), 2278–2324. https://doi.org/10.1109/5.726791 17. Sainath, T. N., Mohamed, A.-R., Kingsbury, B., Ramabhadran, B. (2013). Deep convolutional neural networks for LVCSR. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8614–8618). https://doi.org/10.1109/icassp.2013.6639347, ISBN 978-1-4799-0356-6. S2CID 13816461. 18. Mohamed, A.-R., Dahl, G. E., & Hinton, G. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech and Language Processing, 20, 14–22. 19. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 1798–1828.
204
H. I. Muhammad et al.
20. Hernandez-Garcia, G. (2021). Mila. Rethinking supervised learning: insights from biological learning and from calling it by its name, June 22, 2021 [2012.02526]. 21. Saeed, M. M., Al Aghbari, Z., & Alsharidah, M. (2020). Big data clustering techniques based on spark: A literature review. PeerJ Computer Science, 6, 321. 22. Li, Y. (2017). Deep reinforcement learning: An overview. arXiv, arXiv:1701.07274. 23. Zhu, F. Liao, P. Zhu, X. Yao, Y., & Huang, J. (2017). Cohesion-based online actor-critic reinforcement learning for mhealth intervention. arXiv:1703.10039. 24. Rolnick, D., Donti, P. L., Kaack, L. H., Kochanski, K., Lacoste, A., Sankaran, K., Ross, A. S., Milojevic-Dupont, N., Jaques, N., Waldman-Brown, A., Luccioni, A., Maharaj, T., Sherwin, E. D., Mukkavilli, S. K., Kording, K. P., Gomes, C., Ng, A. Y., Hassabis, D., Platt, J. C., Creutzig, F., Chayes, J., Bengio, Y. (2019). Tackling climate change with machine learning, Nov 5, 2019. arXiv:1906.05433v2 [cs.CY].
The Implementation of Object Detection Using Deep Learning for Mobility Impaired People Pashmeen Singh and Senthil Arumugam Muthukumarswamy
Abstract The Object detection technique is used to locate and identify objects in images and videos. To train a model, deep neural nets are used in conjunction with complex algorithms to detect objects in real-time using machine learning applications and deep learning. People with visual impairments have difficulties when it comes to autonomous mobility; despite walking on a well-known path, they can encounter multiple hazards along the way. In this paper, the MobileNet Single Shot Multibox Detector (SSD) algorithm is implemented using transfer learning; so that it can be integrated with a smart walking stick for mobility-impaired people using Raspberry Pi v3 B+. Keywords MobileNet SSD · Object detection · Deep learning · Raspberry Pi v3 B+
1 Introduction Object detection comprises of two steps: object localization, to locate an object in an image, and object classification, which classifies the located object in the appropriate category. For object detection, deep learning methods are utilized. With the evolution of machine learning, deep learning plays a pivotal role where neural networks are used to program machines to make accurate decisions without assistance from humans [1]. For deep learning algorithms to work properly, they need to be trained and when it works as intended, it is considered by many as a scientific wonder, the real backbone of artificial intelligence. This project aims to develop a successful way in which object avoidance can be made possible using deep learning for mobility-impaired P. Singh (B) · S. A. Muthukumarswamy Heriot-Watt University, Dubai, UAE e-mail: [email protected]; [email protected] S. A. Muthukumarswamy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_17
205
206
P. Singh and S. A. Muthukumarswamy
people with the help of a cost-effective robotic walking stick using Raspberry pi v3 B+. With the steady rise in the world’s population, there is an increased demand for mobility-impaired devices. The smart robotic stick presented in this paper can provide individuals with visual disabilities with independent and coherent navigation through the environment with the help of artificial intelligence [2]. To implement deep learning and its use of it for autonomous object avoidance, a detailed study on existing object detection algorithms and identifying which one works best for the robotic aid is carried out, an investigation of deep learning and its use of it for objects avoidance, a deep neural network is to be chosen and used with the robotic aid for avoiding collisions and falls in real-time using COCO (Common Objects in Context) which is a pre-trained dataset [3]. In Fig. 1, a dog lying on the floor is observed; when provided to any deep learning classifier like AlexNet, GoogleNet, MobileNet, and VGGNet, the image will be classified as a dog because the dog is the most salient feature of the image. The human eyes will also classify the image as a dog. In Fig. 2, a person and a dog can be observed, both are salient features and that’s where object detection comes into play. Both classification and localization occur for the image. To carry out successful image classification and recognition, MobileNet SSD (Single Shot Detector) v3 is used which is an image classifier that outputs fast results. SSD divides images into smaller boxes and a combination of those boxes based on its salient feature inputs it into the classifier which classifies the image set. For object detection, the most popular image dataset is COCO which has 80 classes [3]. Fig. 1 Image of a dog
The Implementation of Object Detection Using Deep …
207
Fig. 2 Image of a person holding a dog
The programming language Python is used for the depiction and demonstration. OpenCV library was used with a pre-trained deep learning architecture using TensorFlow. OpenCV was used to load already pre-trained TensorFlow frozen models, and upon completion of the object detection simulation, using a Raspberry Pi module and a Raspberry Pi Camera, real-time detection for mobility-impaired individuals was carried out. Following the introduction, the remaining paper is subdivided into five sections. In Sect. 2, a summary of previous work is presented. As for Sect. 3, the necessary approaches are discussed. In Sect. 4, the findings are presented and analyzed. In Sect. 5, the paper’s conclusion and the major points that have been achieved are addressed. In Sect. 6 of the paper, the project’s future is predicted.
2 Literature Review Sample situations are depicted in the literature review chapter where object detection has been applied using different algorithms and the comparison is made between those algorithms and their applications. The Convolution Neural Network (CNN) has a plethora of applications including object detection, image classification as well as semantic image segmentation. CNNs can deduce enough features from images to allow for image classification. CNNs have a better performance than traditional algorithms as they are capable of learning advanced feature representations of images as well as maintaining image relations while reducing dimensionality. CNNs have emerged as a popular tool for machine learning, especially for image recognition [4]. Deep neural networks built with MobileNets are based on systematic architectures and use depth-wise separable convolutions for computations. MobileNets are often used when there is a constraint when it comes to computational power [5]. An image recognition model, Inception,
208
P. Singh and S. A. Muthukumarswamy
is highly feasible to utilize. With the help of computational methods, the Inception architecture can be optimized for performance when it comes to accuracy and speed [6]. Residual Network (Res Net), known as a popular deep learning architecture due to its stellar performance for object classification tasks, is capable of constructing complex architectures with thousands of layers that result in enhanced recognition accuracy [7]. These various algorithms are used for detection and are more than capable of detecting objects within a particular class in videos and images. Fast R-CNN uses RPNs (Region Proposal Network) that are trained to develop region proposals, and an RPN can effectively predict object bounds and objectness score [8]. The process of object detection can be optimized; you only look once at an image (YOLO) to assess the type and location of objects. YOLO is a simple and fast algorithm that trains on full images; it is a single convolution network that generates bounding boxes and class probabilities for those boxes [9]. SSD detector (Single Shot Multibox Detector) as the name suggests detects various objects in a single shot. During the training process for an SSD, an input image and truth boxes are required for each object. Results obtained on the PASCAL VOC, COCO, and ILSVRC datasets prove that SSD generates fast observations and provides a systematic framework for deep learning; with its simple-to-train algorithm, implementation into programs becomes convenient. Hence, SSD can achieve great accuracy and speed. When the VOC2007 test was carried out on an Nvidia Titan X for an image of 300 × 300 input, the SSD was able to achieve a 74.3% mean accuracy precision (mAP) at 59 FPS. When an image of 512 × 512 input was used, the mAP resulted in 76.9%, thus proving as input increased mAP percentage also increased, SSD tends to perform well on larger objects. When this exact test was carried out on the Faster R-CNN model, a result of 7 FPS and a 73.2% mAP was observed. As for YOLO, a result of 45 FPS and a mAP of 63.4% was observed. Thus, again proving SSD is superior, surpassing these models offers substantially better precision than other single-stage algorithms [10]. Implementation of machine learning models can be made convenient using ML frameworks such as Google TensorFlow, first initiated by the Google Brain team. It uses an open-source library for large numerical computations. To develop applications with TensorFlow, the programming language Python is used for front-end API. Deep neural networks can be running and trained on TensorFlow from classification to image recognition to word embeddings to sequence-to-sequence models to even natural language processing. Object detection has been implemented in walking aids. In [11] for the proximity detection module for measuring closeness to the objects, ultrasonic sensors are used. The ultrasonic module sends the analog voltage data to the microcontroller which converts it into the distance in centimeters. In many studies, this detection has provided a significant amount of distance alert. When information is gathered via an ultrasonic sensor, obstacle detection is shown on an LCD. An APR33A3 module delivers an audio message to a blind user who is unable to view the message on the LCD, so the user receives an alarm in terms of voice feedback. This intelligent walking stick also vibrates to warn impaired people away from aerial obstacle collision mishaps [12].
The Implementation of Object Detection Using Deep …
209
In mobile robotics and computer vision applications, object detection, recognition, localization, and tracking are substantial tasks. Algorithms and measuring sensors such as cameras, LIDAR, and RADAR are used to accomplish these tasks [13]. According to statistics, an average of 15 percent of mobility-impaired people encounter difficulties daily and bump into unnecessary obstacles, whereas 40 percent of mobility-impaired people fall every year due to these obstacles on their way [12]. A video sensing technique for obstacle tracking has been seeing a significant increase in commercial applications. The use of obstacle tracking has been implemented in surveillance systems, mobile robots, medical systems as well as driver assistance systems. Traffic scenes can also be tracked using obstacle detection to receive tracking information on whether vehicles are staying in their respective lanes and preventing accidents [14]. Current state-of-the-art methods using obstacle detection for the perusal of mobility-impaired people in the market are limited. These individuals often still rely on methods such as generic walkers, canes, crutches, and wheelchairs that have become outdated and have minimal effect on their lives. A visually impaired user relying on a generic cane to walk in unknown environments can be dangerous in this day and age. In [15], a blind stick is presented that uses ultrasonic distance sensors and infrared sensors to detect objects which provides feedback to the user with the help of buzzers. Akula et al. [16] illustrate a haptic shoe model. This sensor system is placed in the visually impaired person’s shoe to detect any obstruction in their way and alert them with the help of a vibrating buzzer informing the individual that they are closer to the object than the safe distance. Miah and Hussain [17] depict a model using ultrasonic distance sensors, an SD card module, an atmega328p microcontroller, and a headphone that calculates distance till object collision and communicates that value to the user with the help of the headphone through recorded voices in the SD card module. Most existing smart canes for mobility-impaired individuals just use ultrasonic distance sensors that provide feedback in the form of vibrations only, and some feature navigation and destination tracking. These technologies do not give the user a vivid description of their surroundings. At the end of the day, visually impaired individuals want to adapt and know what exactly they could be encountering. Object detection using deep learning models integrated into a Raspberry Pi module can provide a suitable solution to the problem.
3 Methodology 3.1 Implementing Object Detection Using the MobileNet SSD V3 on Open CV A simulation of MobileNet SSD was carried out using Anaconda Navigator on Jupyter Notebook running the Python 3.7 version. Multiple libraries were imported
210
P. Singh and S. A. Muthukumarswamy
Fig. 3 Diagram for MobileNet SSD classifier
including OpenCV and matplotlib. The object detection model used was the MobileNet SSD according to the diagram in Fig. 3. The dataset was read from TensorFlow. The images in the dataset were filtered, predicted, and detected accordingly. A configuration file tested in OpenCV was used for the MobileNet SSD v3 model. The image dataset COCO which has 80 classes was pre-trained using transfer learning with a range of objects like cars, trucks, benches, people, animals, traffic lights, and fire hydrants. The image input displayed in blue, green, and red was changed to red, green, and blue color schemes, respectively (RGB). The configurations were set based on input image size. Then using cv2, the module import name for OpenCV, a rectangle frame-bounding box highlights the objects in the image and video input. As a point of reference, this bounding box around the object represents the coordinates of that box that comprises images containing objects and the background. MobileNet SSD divides the image with the help of a grid, with each cell responsible for detecting objects in a specific region.
The Implementation of Object Detection Using Deep …
211
3.2 Object Detection Using the Raspberry Pi V3 b + and the Pi Camera To integrate this system on a walking aid, ultrasonic distance sensors are connected to the Raspberry Pi v3 B+ microcontroller along with the Pi Camera, and the respective object detection model is trained using the MobileNet SSD algorithm. The walking aid consists of a Raspberry Pi v3 B+ microcontroller, ultrasonic sensors, Pi Camera, and a vibrating motor module. With the help of the sensors and the Raspberry Pi module on the smart robotic stick, the user can analyze the environment along their path repeatedly. The Pi camera detects objects from the live video data feed to give a sense of surroundings to the individual. Images are subsequently processed by Raspberry Pi’s Artificial Intelligence module which generates tags describing the images’ contents, and audio output is then returned to the user through earphones based on these descriptions [18]. The flow diagram for object detection can be seen in Fig. 4.
4 Results The simulation results obtained via implementing the MobileNet SSD algorithm are presented in this section. The results were fast and accurate as can be observed in the following figures below. Figure 5 depicts the image in the form of BGR which is converted in Fig. 6. The rectangular boxes mark the territory where the respective objects are detected based on the image dataset. It can be seen from Fig. 6. that objects are outlined in boxes. The objects detected by the algorithm include a bicycle and a person. It can be seen from Fig. 7 that objects are outlined in boxes. The objects detected by the algorithm include a bus and a car. It can be seen from Fig. 8 that the object is outlined in a box. The object detected by the algorithm includes a traffic light and a car. It can be seen from Fig. 10 that objects are outlined in boxes in the video. The objects detected by the algorithm include a truck and a person on the sidewalk. From the results obtained on Jupyter Notebook, it is seen that object detection has been successfully executed using the MobileNet SSD algorithm. This detects objects with a great accuracy rate thereby reducing latency and can realize real-time detection with data feedback. After the program was loaded onto a Raspberry Pi v3 B+ module for the usage of a walking cane, the Raspberry Pi camera captured moving objects in real-time providing feedback to the user through its audio output and vibratory circuits connected to the Raspberry Pi module. The raw data was read from the live video feed of the Pi Camera, and the object detection model analyzes and locates the object in the frame which was extracted with the help of the class labels in the COCO dataset. Once all this information was processed, it was converted to text, and with the help of the text-to-speech module, the
212
P. Singh and S. A. Muthukumarswamy
Fig. 4 Flow diagram for object detection
names of the objects were recognized and given as voice feedback to the earphone. The individual ultimately receives feedback in the form of speech. This process was then repeated for every object detected by the ultrasonic distance sensor within a threshold value set to two meters which gave the model enough time to relay the information back to the user and avoid object encounters. As observed in Figs. 6, 7, 8, 9, and 10, objects detected are highlighted by a bounding box and a text is displayed, and the matplotlib prompt used in the program marks the coordinates of the rectangular boxes around the objects detected. The image displayed in Fig. 5 is in BGR format as that’s the default color order for images in Open CV which was then converted back to RGB with the appropriate configurations as seen in Fig. 6. The smart robotic walking aid is a gift for mobility-impaired people when it comes to ease of movement. The usage of Raspberry Pi v3 B+ along with the Pi Camera and
The Implementation of Object Detection Using Deep …
213
Fig. 5 Input command and Image captured in default BGR format with respective x and y coordinates
Fig. 6 Image in RGB format with respective x and y coordinates
pre-trained dataset helps create a vivid description of the individual’s surroundings and provides them with a great sense of independence.
5 Conclusion As seen in this paper, an investigation of deep learning and its use in objects avoidance is observed, and a deep neural network is chosen and used with the robotic walking
214
P. Singh and S. A. Muthukumarswamy
Fig. 7 Image captured with x and y coordinates, respectively
Fig. 8 Image captured of a traffic light and a car with x and y coordinates, respectively
Fig. 9 Input command to read video
aid for avoiding collisions and falls in real-time. It is proven that object detection was carried out successfully and precisely to be implemented in a smart aid for mobilityimpaired people. The results depict a solution for mobility-impaired individuals to lead a more holistic lifestyle. With the help of object detection using MobileNet SSD,
The Implementation of Object Detection Using Deep …
215
Fig. 10 Video still of a street
object detection can be carried out in a timely fashion warning these users of what lies ahead and preventing bumps and falls.
6 Future Scope To receive a much more precise reading and avoid overload of data generation at one time, ultrasonic distance sensors can be used to predict the distance till object collision and provide a countdown. The ultrasonic module can send analog voltage data to the microcontroller that later converts it into the distance in centimeters and warns the user of the distance remaining till collision. This method can be applied to keeping up with the COVID-19 pandemic guidelines as well as making sure social distancing is being maintained when it comes to mobility-impaired individuals. An advanced version of Raspberry Pi can be implemented for faster results. Raspberry Pi 4 will show an increase in FPS (frames per second).
References 1. Gupta, A., Anpalagan, A., Guan, L., & Khwaja, A. S. (2021). Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues. Array, 10, 100057. 2. Sudharani, P., & Koteswrara Rao, I. (2020). Deep learning-based advanced 3D-intelligent walking stick to assist the blind people. Journal of Engineering Sciences, 11(7), 1171–1172.
216
P. Singh and S. A. Muthukumarswamy
3. Srivastava, S., Divekar, A. V., & Anilkumar, C. (2021). Comparative analysis of deep learning image detection algorithms. Journal of Big Data, 8(1), 66. 4. Wang, Z., Peng, J., Song, W., Gao, X., & Zhang, Y. (2021). A convolutional neural networkbased classification and decision-making model for visible defect identification of high-speed train images. Journal of Sensors. 5. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., & Weyand, T. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. https:// doi.org/10.48550/arXiv.1704.04861. 6. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In 2016 International Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2818–2826), Las Vegas, NV, USA: IEEE. 7. Alom, Md. Z., Hasan, M., Yakopcic, C., Taha, T. M., Asari, V. K. (2020). Improved inception-residual convolutional neural network for object recognition. Neural Computing, and Applications, 32(12). 8. Ren, S., He, K., Girshick, R., & Sun, R. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Packt Publishing Ltd. https://doi.org/10.48550/ arXiv.1506.01497. 9. Redmon, J., Divvala,S., Girshick,R., & Farhadi, A. (2016). You only look once: unified, realtime object detection. In 2016 International Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 779–788). Las Vegas, NV, USA: IEEE. 10. Liu, W., et al. (2016). SSD: Single shot multibox detector. In: B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), Computer Vision ECCV 2016. Lecture Notes in Computer Science (Vol. 9905). Cham: Springer. https://doi.org/10.1007/978-3-319-46448-0_2. 11. Gangopadhyay, S., et al. (2013). Intelligent gesture-controlled wireless wheelchair for the physically handicapped. International Journal of Electrical, Electronics and Data Communication, 1(7). ISSN: 2320-2084. 12. Chang, W., Chen, L., et al. (2020). Design and implementation of an intelligent assistive system for visually impaired people for aerial obstacle avoidance and fall detection. Sensors Journal, 20(17), 10199–10210. IEEE. 13. Lecrosnier, L., et al. (2020). Deep learning-based object detection, localization, and tracking for smart wheelchair healthcare mobility. International Journal of Environmental Research and Public Health, 18(1), 91. 14. Naeem, H., Ahmad, J., & Tayyab, M. (2013). Real-time object detection and tracking. In: INMIC (pp. 148–153). IEEE. 15. Loganathan, N., et al. (2020). Smart stick for blind people. In: 6th International Conference on Advanced Computing and Communication Systems (ICACCS) (pp. 65–67). IEEE. 16. Akula, R., et al. (2019). Efficient obstacle detection and guidance system for the blind (haptic shoe). In Learning and Analytics in Intelligent Systems (pp. 266–271). 17. Miah, Md. R., & Hussain, Md. S. A. (2018). Unique smart eyeglass for visually impaired people. In: International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE) (pp. 1–4). Gazipur, Bangladesh: IEEE. 18. Degaonkar, S., et al. (2019). A smart walking stick powered by artificial intelligence for the visually impaired. International Journal of Computer Applications, 178, 7–10.
A Study on Deep Learning Frameworks for Opinion Summarization Sandhya Ramakrishnan
and L. D. Dhinesh Babu
Abstract Opinion summarization has gained interest among researchers due to the usefulness and challenges in extracting insights from the unprecedented amount of opinionated data available on the internet. Deep learning has been employed in various natural language processing tasks and produced state-of-the-art results. In this paper, we present a comprehensive review of various deep learning approaches in the context of opinion summarization. This paper presents a review of the notable state-of-the-art models with recommendations for future research. Keywords Opinion summarization · Deep learning · Neural network · Natural language processing
1 Introduction The volume of opinionated data on the internet is enormous and keeps increasing unprecedently. The massive amount of user-generated data makes it tedious to consume the relevant information. Automatic summarization is essential to reduce the time and effort needed to discover the relevant information from the overwhelming amount of data [1–3]. The task of opinion summarization is to generate summaries automatically from a large set of opinions related to a specific entity [4]. The work proposed by Hu et al. [5] is considered to be one of the first works towards automatically summarizing opinions. Summarizing opinions can be considered as a type of multi-document summarization [6]. However, opinion summarization has distinct features that distinguish it S. Ramakrishnan (B) · L. D. D. Babu Vellore Institute of Technology, Vellore, India e-mail: [email protected] L. D. D. Babu e-mail: [email protected] S. Ramakrishnan Amal Jyothi College of Engineering, Kottayam, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_18
217
218
S. Ramakrishnan and L. D. D. Babu
from the multi-document summarization. Due to opinions being subjective, opinion summaries do not use the same concept of salient information as traditional text summaries [7]. Opinion polarity should be considered while generating the summary [8]. Opinion summarizers need to be scalable to handle the massive amount of reviews and controllable to address user preferences [9] . The success of neural-based approaches in traditional text summarization [10– 14] has triggered increased interest among researchers to experiment deep learning approaches to generate opinion summaries in textual form [6, 15–18]. The important aspect of deep learning is automatic feature extraction [19]. Opinion summarization techniques can be classified into two broad categories: extractive and abstractive summarization techniques. In extractive summarization, the most representative segments of the opinion set are selected to create the opinion summary. In abstractive summarization, summaries are generated using new text segments. Most of the earlier works in the area of opinion summarization are extractive based [2]. Selecting the most representative sentences is a difficult task in extractive summarization [18]. Extractive approach may result in incoherent summaries with redundant opinions [20]. Abstractive approaches are similar to the way human summarizes [21]. The availability of large-scale data set with advancement on neural architectures has attracted increased attention towards abstractive summarization that can generate a more concise and coherent summary [16–18, 22, 23]. Both approaches can be handled in supervised and unsupervised scenarios. The training process for supervised learning requires a large number of review-summary pairs. Most of the recent works performed opinion summarization in unsupervised learning [16–18, 24]. In this paper, we reviewed the various deep learning techniques applied to opinion summarization so that the researchers will get a quick idea about the current trends and the open issues to be resolved. The reviewed papers include high-quality works from 2016 to 2021. The rest of the article is structured as follows. Section 2 describes the review of frameworks applied to Opinion Summarization. Section 3 presents the evaluation metrics and Sect. 4 describes the commonly used data sets. Section 5 is a discussion on various approaches and future research directions. Section 6 concludes the article.
2 Deep Learning-Based Opinion Summarization Methods 2.1 Recurrent Neural Networks-Based Models Recurrent neural networks (RNN) [25] is a widely used framework in the sequential data processing. RNN effectively processes sequential data through its recurrent hidden state. At each time step, the hidden state is activated based on the current input and the output from the previous state. RNN is difficult to train [26] due to the vanishing gradient and the exploding gradient problems [27]. Because
A Study on Deep Learning Frameworks for Opinion Summarization
219
of these problems, many variants have been proposed. Some of the most popular variants are Long Short-Term Memory (LSTM) [28], Bi-directional Long ShortTerm Memory(BiLSTM) [29], and Gated Recurrent Unit (GRU) [30]. This section discusses some of the opinion summarization methods based on Recurrent Neural Networks. Wang et al. [31] proposed an attention-based LSTM network for generating abstractive opinion summarization. An attention-based encoder encodes the input text units sampled using an importance score. The decoder then decodes the latent representation of the input to generate a summary based on an importance score. Yang et al. [32] proposed abstractive review summarization designed to work in a cross-domain setting. Initially, the domain classification is done by an LSTM. An LDA model learns the domain-specific aspect and sentiment information. Finally, the attention mechanism identifies the important aspects and the sentiment to generate the summary. MARS [33] proposed by Yang et al. is an encoder–decoder-based aspect/sentiment-aware Abstractive Review Summarization Model. The encoder having multiple attentions learns the aspect, sentiment, and context word. The decoder produces the summary using attention fusion. Ding et al. [34] proposed an encoder-decoder framework with multi-attention mechanism to generate a summary of user reviews. The model has a preprocessing stage that rearranges review using attention and selects the top -n important sentences. GRU-based encoder-decoder generates the expert review. Amplayo et al. [6] proposed a condense-abstract framework that utilizes all the reviews for generating the summary. The input reviews are first encoded in a condensed form by the condense framework. The encoded representations are fused by the abstract model to generate the opinion summary. Condense framework is implemented by a BiLSTM-based autoencoder and abstract framework is implemented with LSTM decoder with attention and copy mechanism. Chu et al. [16] proposed MeanSum, the neural network-based end-to-end abstractive opinion summarization framework, that generates summaries in an unsupervised manner. The model is constructed using a pair of LSTM-based encoder-decoders which constitute the autoencoder module and the summarization module, where the mean of the review encodings is decoded as the summary. CopyCat [17] proposed by Brazinskas et al. is an unsupervised abstractive opinion summarization system, based on a hierarchical organization of variational autoencoder model, an extension of the VAE [35] model proposed by Bowman et al. The central part of the model is a conditional language model that predicts a review conditioned on the other reviews of a product. The model is trained using auto-encoders which allows backpropagating gradient through differential samples. The mean value of the latent variables is used to limit the novelty in the summary. The summary is generated depending on the mean of the review’s latent code. Amplayo et al. [24] proposed a solution to the non-availability of training data in supervised learning. They proposed a framework known as DenoiseSum for opinion summarization by noising and denoising. A synthetic dataset is created by applying
220
S. Ramakrishnan and L. D. D. Babu
noising process. Reviews are sampled from the user review database to act as pseudosummary. Token-level noising, chunk-level noising, and document-level noising are applied to create pseudo-reviews from pseudo-summary. Each pseudo-review is encoded by a multi-source- encoder. Opinions are summarized by denoising the dataset. Then the noise is removed from these encodings by an explicit denoising module. The denoised encodings are then aggregated into fused encodings. The fused encoding is utilized as input to the decoder with attention and copy methods to generate the original review. During the testing phase, the model accepts real reviews and outputs a summary with important features. Amplayo et al. [36] proposed a content plan induction model for unsupervised opinion summarization. Aspect and sentiment probability distribution-based content plan is used to create synthetic dataset during training and also to generate summaries. Bražinskas et al. [37] proposed a few-shot framework for abstractive opinion summarization that utilizes a small set of annotated data to effectively switch an unsupervised model to a summarizer. In the unsupervised pre-train phase, a conditional language model is trained based on encoder-generator architecture, using the leave-one-out objective. The model is also trained on review properties. In the fineturning phase, a plug-in module is fine-tuned to predict the property value on a handful of summaries. Thus, the generator is switched to summarization mode.
2.2 Convolutional Neural Networks-Based Models Convolutional Neural Network (CNN) is a popular deep learning architecture inspired by the vision processing in living beings [38]. This section discusses some of the opinion summarization methods based on Convolutional Neural Networks. Wu et al. [39] proposed two convolution-based models, cascaded CNN and multitask CNN, for aspect-based opinion summarization. Cascaded CNN consists of multiple CNN for aspect mapping and a single CNN for sentiment classification arranged in a cascaded manner. Multitask CNN consists of multi-channel CNN and single-channel CNN sharing the same word embedding. Multi-channel CNN handles the aspect mapping, and single-channel CNN performs the sentiment classification. Finally, an aggregator generates the summary of different aspects with the positive and negative sentiment counts. Li et al. [40] proposed an opinion summarization model for Chinese microblogs using CNN. CNN is used to learn the features which form the input to the text rank algorithm that constructs the feature vector graph. Representative features, selected by the maximum marginal relevance, are used to generate the summary.
A Study on Deep Learning Frameworks for Opinion Summarization
221
2.3 Transformer-Based Models In the past three years, transformer-based models have gained popularity because of their capacity to capture semantic and syntactic features and they are contextdependent also. This section discusses some of the opinion summarization methods based on Transformers. Suhara et al. [18] proposed a weakly supervised, customizable, abstractive opinion summarization framework, using the transformer-based encoder-decoder model. The framework first extracts the opinion phrases from review using a pre-trained Aspectbased Sentiment Analysis model [41]. The transformer model is then trained to reconstruct the original review from extracted phrases. During the summarization phase, opinion phrases are extracted from multiple reviews and are clustered to select the most popular ones. The selected opinion phrases are fed into the trained transformer model which generates the opinion summary. The authors highlighted the main advantage as the model does not require any gold-standard summary for training. Also, the user can customize the opinion summary by filtering input opinions using aspects and/or sentiment polarity. Angelidis et al. [9] proposed an unsupervised opinion summarizer based on a quantized transformer model. Each input sentence is encoded by a transformer encoder into a multi-head representation. Each head vector is then mapped to a mixture of discrete latent codes by a vector quantizer. The transformer sentence decoder reconstructs the sentence from the quantized head vectors of the sentence. Amplayo et al., [23] proposed Aspect controllable Summarizer (AceSum) to generate abstractive aspect specific summaries with aspect controllers that generate three types of summaries: generic, single aspect specific, and multi-aspect specific summaries. Aspect keywords, reviews sentences, and aspect codes together constitute the aspect controllers, which are predicted based on multiple instances learning model proposed by [42]. A pre-trained T5 transfer transformer model [43] is fine turned to generate the summary. Aspect controllers are the key factors that facilitate synthetic data set creation for self-supervision and controllable summary generation. Wang et al. [44] proposed a self-supervised framework for opinion summarization using aspect and sentiment embeddings. Transformer-based encoder-decoder architecture is used to generate summaries.
2.4 Hybrid Models Angelidis et al. [15] proposed a hybrid deep learning model for opinion summarization. An attention-based aspect encoder predicts the aspects at the segment level. Segment’s sentiments, encoded by the CNN, are given as input to the attention-based GRU. A summary is created based on the opinions that scored the highest in each aspect.
222
S. Ramakrishnan and L. D. D. Babu
Hong et al. [45] proposed a deep neural network and topic mining-based opinion summarization. POS phrases are extracted from product reviews using POS grammar rules. LDA model is used to filter semantics phrases from POS phrases. LSTM-based summarization module generates personalized summaries. Abd et al. [46] proposed an extractive opinion summarization of multi-documents. The model has sentiment analysis embedding space, text summarization embedding spaces, and an opinion summarization module. Sentiment analysis embedding space utilizes LSTM network and text summarization embedding space utilizes Restricted Boltzmann Machine. The opinion summarization module consists of sentence classification and sentence selection.
2.5 Summary of Various Approaches Table 1 depicts the summary of various deep learning models in opinion summarization.
3 Evaluation Metrics Evaluation metrics are required to evaluate the effectiveness of the summary objectively. This session discusses the evaluation metrics used in the opinion summarization task.
3.1 Rouge ROUGE [47] is a collection of evaluation metrics widely used to estimate the quality of opinion summarization. ROUGE stands for Recall-Oriented Understudy of Gusting Evaluation. ROUGE generates the performance scores by comparing automatic summaries against human written reference summaries. There are different variants [47] for the ROUGE score. ROUGE-N and ROUGE-L are the most commonly used performance metrics.
3.2 Rouge-N ROUGE with n-gram co-occurrence statistics (ROUGE-N) metrics proposed by [21] is based on the overlap of n-grams. ROUGE-N, an n-gram recall measure, is calculated as follows:
A Study on Deep Learning Frameworks for Opinion Summarization
223
Table 1 DL approaches on opinion summarization Research work
Model
Dataset
Category
R1
NNAS [31]
RNN
Rotten Tomatoes
Abstractive, Supervised, Generic Summary
R -SU 4: 24:88
CASAS [32]
LSTM
Amazon-el
Abstractive, Weakly supervised, Aspect Specific summary
83.25
64.45
85.98
MeanSum [16] RNN
Yelp
Abstractive, Unsupervised Generic Summary
28.86
3.66
15.91
AOS [39]
M-CNN
ASR
Extractive, Unsupervised, Aspect Specific Summary
F1: 76.4
[40]
CNN
COAE2014
Extractive, Unsupervised, Generic Summary
AUC: 0936
[33]
LSTM
Amazon
Abstractive, Unsupervised, Aspect Specific Summary
84.13
68.28
86.15
[15]
Autoencoder
OPSUM
Weakly Supervised, Extractive, Generic Summary
44.1
22.8
43.3
CondaSum [6]
LSTM-based Autoencoder
Rotten Tomatoes
Unsupervised, Abstractive, Aspect Specific Summary
22.49
CopyCat [17]
GRU-based Autoencoder
Amazon Yelp
Unsupervised, Abstractive, Generic Summary
DenoiseSum [24]
BiLSTM
Rotten Tomatoes Yelp
OpinionDigest [18]
Transformers
Yelp
Unsupervised, Abstractive, Generic Summary Abstractive, Weakly Supervised, Generic Summary
R2
RL
7.65
18.47
31.97
5.81
20.16
29.47
5.26
18.09
21.26
4.61
16,27
30.14
4.99
17.65
29.30
5.77
18.56
(continued)
224
S. Ramakrishnan and L. D. D. Babu
Table 1 (continued) Research work
Model
Dataset
Category
R1
R2
RL
QT [9]
Transformers
SPACE
Extractive, Unsupervised, General + Aspect Specific Summaries
38.6
10.22
21.90
YELP Amazon MILNET + MATE + MT [15]
CNN + GRU
OpoSum
Weakly supervised, Extractive, Aspect-based opinion summarization
AceSum [23]
Transformer-based transfer Learning
SPACE Oposum
Abstractive, Self-Supervised, Both generic and aspect specific summary
FewSum [37]
Transformer
AMAZON YELP
Plan Sum [36]
BiLSTM
YELP Amazon
TranSum [44]
Transformer
YELP Amazon
Σ R OU G E − N =
Σ S∈{R}
Σ
S∈{R}
Unsupervised, Abstractive, Generic Summary
28.40
3.97
15.27
34.04
7.03
18.08
44.1
43.3
40.37
11.5
23.23
32.98
10.7
20.27
33.56
7.17
21.49
37.29
9.92
22.76
Unsupervised, Generic Abstractive Summary
34,79
7.01
19.74
32.87
6.12
19.05
Unsupervised, Abstractive, Generic Summary
36.62
8.41
20.31
34.24
7.24
20.49
gram n ∈S Count match (gram n )
Σ
21.8
gram n ∈S Count (gram n )
(1)
where R is set of reference summaries and S is an individual reference summary. Count match(gramn ) measures the number of matching n-grams between the reference summary and the model generated summary and Count(gramn ) refers to the total count of n-grams in reference summary. ROUGE-N is generally a recall measures; however, it is possible to set the denominator to the total count of n-grams in automatic summary to measure precision. ROUGE-1 and ROUGE-2 are the commonly used ROUGE-N measures. ROUGE-1 and ROUGE-2 measure the overlap of unigrams and bigrams, respectively.
A Study on Deep Learning Frameworks for Opinion Summarization
225
3.3 Rouge-L ROUGE with Longest Common Subsequence (ROUGE-L) measures the longest matching word sequence using the longest common subsequence algorithm.
3.4 Rouge-S In ROUGE-S skip-gram concurrence metrics measures the co-occurrence of consecutive words that are separated by one or more other words in the automatic summary and reference summary. Though ROUGE is popular, it has some disadvantages. It does not consider the match between words of similar meaning as it does not measure semantics. Hence, most of the precious works are supported by human evaluation also.
4 Datasets 4.1 Oposum OPOSUM [18] contains reviews from Amazon for six product domains. This collection includes not only various reviews and summaries, but also information on the domain and the polarity of products.
4.2 Space Space [9] is a single domain dataset that consist of hotel reviews from TripAdvisor. Space is a large review corpus that includes gold-standard abstractive summaries to assess general and aspect-specific opinion summarization.
4.3 Rotten Tomatoes Rotten Tomatoes [31] is a well-known US film review aggregation website that includes both professional and user evaluation. Rotten tomatoes dataset is made up of movie reviews gathered from the Rotten Tomatoes website.
226
S. Ramakrishnan and L. D. D. Babu
4.4 Opinosis The Opinosis dataset [48] contains sentences taken from user reviews on a specific topic obtained from different sources, including Tripadvisor, Edmunds.com, and Amazon.com. There are 51 topics in all, each with around 100 sentences on an average. Gold standard summaries are also included in the dataset.
4.5 AmaSum AmaSum [49] developed by Brazinskas et al. is the biggest abstractive opinion summarization dataset based on more than 33,000 Amazon product reviews. On average, 320 reviews are paired with a summary. Summaries consist of verdicts, advantages, and disadvantages.
4.6 Yelp Yelp dataset proposed by Chu et al. [16] is based on the Yelp Dataset Challenge. It consists of customer reviews with five-star ratings. The authors used AMT to give 100 manual written summaries for model evaluation.
5 Discussion and Future Works Many of the early works focused on extractive summarization methods. Due to the advancement of neural architectures, abstractive methods have started to attract more attention. A major challenge here is the limited availability of annotated data for training. To generate the review-summary pair is highly expensive. Hence, the task was commonly approached in an unsupervised or weakly supervised manner. Due to the lack of annotated training examples, one of the main approaches used was auto-encoder architecture, often enhanced with attention and copy mechanism, for unsupervised opinion summarization. As a general method, salient information was encoded into an aggregated representation and then this representation was used to generate the summary. Another solution proposed by some of the recent works is to create synthetic datasets to train the model in a weakly supervised manner. Such datasets are created by randomly sampling the review as pseudo-summary or adding noise to the reviews. The summarizer is then trained in a supervised way to predict the pseudo-summaries from input reviews. This self-supervised approach has been proved to be effective in generating summaries in some of the recent works.
A Study on Deep Learning Frameworks for Opinion Summarization
227
Because of the limitation of the current hardware, most of the authors proposed an extract abstract framework, where the subset is first selected and then the summary is generated [31]. Amplayo et. al., [6] argued that such an approach can result in less informative summaries as only a subset of reviews is considered. They also argued that the EA approaches limit the customizability of the summary. As a solution, they proposed a condense-abstract framework, where the condensed reviews serve as input to the abstract model. During the pre-generation stage, the reviews are condensed using a simple autoencoder, instead of pre-selecting the salient reviews. The condensed reviews are then fed into the decoder to generate the summary. Information loss is minimized as no reviews are filtered out. If abstract summarizers are not exposed to human written summaries, they may include superficial and unimportant information. Because, only a few human written summaries are available, several few-shot models have been developed. By using specialized mechanisms such as parameter subset fine-tuning and summary candidate ranking, these models alleviate annotated data scarcity. In addition, most opinion summarization programs include advanced features also [23, 37]. Most of the works focus on popular opinions or opinions that are redundant across multiple reviews as salient opinions to be included in the summary (Amplayo et al.) argues that the salient information depends on the user interest by controlling opinion summaries to generate aspects based on user request, we can help users make decisions more effectively. As e-commerce grows in popularity, the number of reviews also grows rapidly. Scalability is the ability to handle a large number of input reviews. Therefore, being able to summarize a large number of reviews has become an important but difficult feature for opinion summarization solution. Multimodality refers to the ability of the model to handle data in multiple modalities like text and non-text data. While most opinion summarizers take reviews as a single input source recent solutions have started to include non-text sources of input as well. Multimodal solutions are expected to be the focus of future work. Abstractive summaries are mostly fluent but still affected by issues like text degeneration [50], hallucinations [22], and inappropriate use of first-person narrative. Generally, opinion summarization methods haven’t looked at the quantitative side, the proportion of positive and negative opinions. Despite some existing works, the topic of comparative opinion summarization has not been studied extensively. Hence, opinion summarization is still an active research area with a lot of challenges.
6 Conclusion In recent years, the importance of the opinion summarization process has increased due to the vast amount of opinionated data available on the internet. We provided a comprehensive review of different approaches based on deep learning. The major problem with opinionated data is volume and volatility. Most of the previous works on opinion summarization were extractive based. The availability of large-scale data set
228
S. Ramakrishnan and L. D. D. Babu
and the advancement in neural architectures have triggered the interest in abstractivebased systems. Due to the limited availability of annotated text corpus, researchers have proposed solutions like creating the synthetic dataset and few-shot learnings. Advanced features like controllability, scalability, and multimodality are incorporated in many recent opinion summarization solutions. However, these solutions are still in their developing stage. In short, opinion summarization using deep learning is under-explored with many challenges. We hope that this study will provide the future researchers with new insights in the field of opinion summarization using deep learning.
References 1. Balahur, A., Kabadjov, M., Steinberger, J., & Steinberger, R. (2012). Challenges and solutions in the opinion summarization of user-generated content. Journal of Intelligent Information Systems, 39(2), 375–398. https://doi.org/10.1007/s10844-011-0194-z. 2. Kim, H. D., Ganesan, K., Sondhi, P., & Zhai, C. (2011). Comprehensive review of opinion summarization. UIUC. 3. Ding, Y., & Jiang, J. (2015). Towards opinion summarization from online forums. In Proceedings of Recent Advances in Natural Language Processing: 10th RANLP 2015 (pp. 138–146). 4. Conrad, J. G., Leidner, J. L., Schilder, F., & Kondadadi, R. (2009). Query-based opinion summarization for legal blog entries. In ICAIL ‘09: Proceedings of the 12th International Conference on Artificial Intelligence and Law (pp. 167–176). 5. Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In KDD ‘04: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 168–177). 6. Amplayo, R. K., & Lapata, M. (2021). Informative and controllable opinion summarization. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational (pp. 2662–2672). Association for Computational Linguistics. 7. Peyrard, M. (2019). A simple theoretical model of importance for summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 8. Ganesan, K., Zhai, C., & Han, J. :Opinosis (2010). A graph based approach to abstractive summarization of highly redundant opinions. In Proceedings of the 23rd International Conference on Computational Linguistics (pp. 340–348). 9. Angelidis, S., Amplayo, R. K., Suhara, Y., & Wang, X. (2021). Extractive opinion summarization in quantized transformer spaces. Transactions of the Association for Computational Linguistics, 277–293. https://doi.org/10.1162/tacl_a_00366. 10. Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 11. See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator network. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (pp. 1073–1083). 12. Narayan, S., Cohen, S. B., & Lapata, M. (2018). Ranking sentences for extractive summarization with reinforcement learning. In Proceedings of the 2018 Conference of Computational Linguistics: Human Language Technologies (pp. 1747–1759). 13. Liu, P. J., Saleh, M., Pot, E., & Goodrich, B. (2018). Generating Wikipedia by summarizing long sequences. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018.
A Study on Deep Learning Frameworks for Opinion Summarization
229
14. Perez-Beltrachini, L., Liu, Y., & Lapata, M. (2019). Generating summaries with topic templates and structured convolutional decoders. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 5107–5116). 15. Angelidis, S., & Lapata, M. (2018). Summarizing opinions: Aspect extraction meets sentiment prediction and they are both weakly supervised. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 3675–3686). Association for Computational Linguistics, Brussels, Belgium. https://doi.org/10.18653/v1/D18-1403. 16. Chu, E., & Liu, P. J. (2019). MeanSum: A neural model for unsupervised multi-document abstractive summarization. In Proceedings of the 36th International Conference on Machine Learning (ICML) (pp. 1223–1232). 17. Bražinskas, A., Lapata„ M., & Titov, I. (2020). Unsupervised opinion summarization as copycat-review generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5151–5169). https://doi.org/10.18653/v1/2020.acl-main.461. 18. Suhara, Y., Wang, X., Angelidis, S., & Tan, W.C. (2020). OPINIONDIGEST: A simple framework for opinion summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5789–5798). https://doi.org/10.18653/v1/2020.acl-mai n.513. 19. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 436–444. 20. Jiang, W., Chen, J., Ding, X., Wu, J., He, J., & Wang, G. (2021). Review summary generation in online systems: Frameworks for supervised and unsupervised scenarios. ACM Transactions on the Web, 15, 1–33. 21. Wang, L., Raghavan, H., Castelli, V., Florian, R., & Cardie, C. (2016). A sentence compression based framework to query-focused multi-document summarization (pp. 1384–1394). 22. Zhao, Z., Cohen, S. B., & Webber, B. L. (2020). Reducing quantity hallucinations in abstractive summarization. In Findings of the Association for Computational Linguistics: EMNLP 2020. 23. Amplayo, R. K., Angelidis, S., & Lapata, M. (2021). Aspect-controllable opinion summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 6578–6593). 24. Amplayo, R. K., & Lapata, M. (2020). Unsupervised opinion summarization with noising and denoising. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 1934–1945). Association for Computational. 25. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–211. 26. Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning. 27. Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5, 157–166. 28. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780. 29. Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991. 30. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning. 31. Wang, L., & Ling, W. (2016). Neural network-based abstract generation for opinions and arguments. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 32. Yang, M., Qu, Q., Zhu, J., & Shen, Y. (2018). Cross-domain aspect/sentiment-aware abstractive review summarization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp. 1531–1534). Torino. 33. Yang, M., Qu, Q., Shen, Y., & Liu, Q. (2018). Aspect and sentiment aware abstractive review summarization. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 1110–1120). Association for Computational Linguistics. 34. Ding, X., Jiang, W., & He‡, J. (2018). Generating Expert’s review from the Crowds’: Integrating a multi-attention mechanism with encoder-decoder framework. In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing &
230
35.
36.
37.
38.
39.
40.
41. 42.
43.
44.
45.
46.
47. 48.
49. 50.
S. Ramakrishnan and L. D. D. Babu Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (pp. 954–961). https://doi.org/10.1109/SmartWorld.2018.00170. Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., & Bengio, S. (2016). Generating sentences from a continuous space. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning (pp. 10–21). https://doi.org/10.18653/v1/K161002. Amplayo, R. K., Angelidis, S., & Lapata, M. (2020). Unsupervised opinion summarization with content planning. In Proceedings of the 35th Conference on Artificial Intelligence (pp. 12489– 12497). Bravzinskas, A., Lapata, M., & Titov, I. (2020). Few-shot learning for opinion summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/2020.emnlp-main.337. Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shua, B., Liu, T., Wang, X., Wang, L., Wang, G., Cai, J., & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 354–377. Wu, H., Gu, Y., Sun, S., & Gu, X. (2016). Aspect-based opinion summarization with convolutional neural networks. In 2016 International Joint Conference on Neural Networks (IJCNN) (pp. 3157–3163). https://doi.org/10.1109/IJCNN.2016.7727602. Li, Q., Jin, Z., Wang, C., & Zeng, D. D. (2016). Mining opinion summarizations using convolutional neural networks in Chinese microblogging systems. Knowledge-Based Systems, 107, 289–300. https://doi.org/10.1016/j.knosys.2016.06.017. Miao, Z., Li, Y., Wang, X., & Tan, W. C. (2020). Snippext: Semi-supervised opinion mining with augmented data. In WWW ‘20 (pp. 617–628). Keeler, J., & Rumelhart, D. E. (1991). A self-organizing integrated segmentation and recognition neural net. In Proceedings of the 4th International Conference on Neural Information Processing Systems (pp. 496–503). Raffel, C., Shazeer, N., Roberts , A., Lee, K., Naran, S., Matena, M., Zhou,Y., Li, W. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research. Wang„ K., & Wan, X. (2021). TransSum: Translating aspect and sentiment embeddings for self-supervised opinion summarization. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 729–742). Hong, M., & Wang, H. (2021). Research on customer opinion summarization using topic mining and deep neural network. Mathematics and Computers in Simulation, 185, 88–114. https://doi.org/10.1016/j.matcom.2020.12.009. Abdi, A., Hasan, S., Shamsuddin, S. M., Idris, N., & Pirand, J. (2021). A hybrid deep learning architecture for opinion-oriented multi-document summarization based on multi-feature fusion. Knowledge-Based Systems, 213. Lin, C.Y. (2004). Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out. Ganesan, K., Zhai, C., & Han, J. (2010). Opinosis: A graph-based approach to abstractive summarization of highly redundant opinions. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010) (pp. 340–348. Bražinskas, A., Lapata, M., & Titov, I. (2021). Learning opinion summarizers by selecting informative reviews. arXiv:2109.04325. Holtzman, A., Buy, J., Du, L., Forbes, M., & Choi, Y. (2020). The curious case of neural text degeneration. In Proceedings of 8th International Conference on Learning Representations.
Improvisation of Information System Security Posture Through Continuous Vulnerability Assessment Navdeep S. Chahal, Preeti Abrol, and P. K. Khosla
Abstract As information technologies continue to expand, especially for the asset, network, mobile, and web applications, the Internet has become an integral part of modern corporate information systems. With the development of web technologies, the popularity of web/mobile-based applications has grown tremendously. There is extensive use of websites for information dissemination many times critical too by the government organizations. The hackers target these government web applications. In this paper, the Continuous Vulnerability Assessment (CVA) security dashboard is proposed for vulnerability management, monitoring, identification, visualization, reporting, mitigation, and remediation based on the mathematical model of the Risk Score Index (RSI). This dashboard tackles the challenging issue of the development of an orchestrated interface that leads to future data analytics as it addresses the requirement of all the automated security processes by evaluating real-time 535 state government web applications from the last 6 years (i.e. 2015–2021). The findings of the study have been implemented on state government networks to examine the confidentiality concerns of the cybersecurity vulnerability status. The experimental results indicate that there is an improvement in the security features of the State Organization’s network as well as applications in comparison to the Bubblenet, Blockchain Signaling System (BloSS), and Conventional Security Systems (CSS). Keywords Application security · Cybersecurity · Vulnerabilities · Security dashboard · Mitigation · Remediation
N. S. Chahal (B) · P. Abrol · P. K. Khosla C-DAC, Mohali, India e-mail: [email protected] P. Abrol e-mail: [email protected] P. K. Khosla e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_19
231
232
N. S. Chahal et al.
1 Introduction Today, mobile and web-based applications are used in security-critical operational environments such as ministries and state government departments. Besides, these web and mobile applications are also used for commercial purposes such as shopping using a credit card, online transactions, and net banking. All these applications are prone to vulnerabilities that can be exploited by malicious software and attackers [1, 2]. In the past few years, an increase in attention to application security issues has swept the software industry. In a recent survey, 28% of cybercriminal activities are executed through external attackers [3]. The threats posed to government bodies, institutions, organizations, and businesses are a captious concern. Presently, analysis conducted by the researchers unambiguously witnesses the prevalence and severity of the threat [4]. As per the 2021 global report [5, 6], the average global threats cost increased by 37% in two consecutive years to $14.67 million, along with the occurrence of incidents raised by 49% during this period. This challenge can be tackled with the vulnerability assessment as well as the analysis of such incidents. As per the Threat Landscape Report 2020 [2], the prevalence of cyber-attacks has spiked up to 59.5%, where 30% of organizations have witnessed various attacks. Hence, it highlights the immense requirement for major protective steps to be taken to fight against the threat. Typically, organizations desire to explore the security issues for strengthening their network as well as assets against vulnerabilities and attacks. However, the deployment of protection measures against potential threats fails due to attacks. Attacker abuse authorizes access to gain control of critical systems and eventually defaces the entire data system [1]. It attacks not only private organizations, but also government bodies for malicious intent or monetary gain, other motives such as industrial espionage, sabotage, and business advantage [5]. Such types of threats are enhancing in frequency and scope. Therefore, a critical requirement for the implementation of secure code techniques and practices is emphasized as shown in Fig. 1. In a conventional security system (CSS), the Security Auditor analyzes the structural risks, engages in ethical hacking to test for weaknesses, or crawls the entire network and web/mobile application for vulnerabilities and provides secure code solutions also. Further, they defend their organizations from cyber-attacks by Vulnerability assessment and penetration testing of assets, networks, and web/mobile applications from potential breaches. Then the remediation and mitigation are performed by taking countermeasures, and vulnerabilities are fixed and corrected, followed by the generation of security audit reports. This process is followed in several iterations until the network/asset and application are completely secure. Open Web Application Security Project (OWASP) is a worldwide free, open community that focuses on enlightening application security. It is a Non-Profit Charitable Organization with the mission to make security vulnerabilities visible to persons and organizations [7–9]. Many security solutions following OWASP standards exist such as Apache Metron, MozDef, OSS EC, Dashboard, Certstation, Citrix application security dashboard, Security Brigade Dashboard, Gauntlet, Infosec dashboard,
Improvisation of Information System Security Posture …
233
Fig. 1 Flowchart of the vulnerability assessment process
etc. Various commercial and open-source vulnerability scanners also compete for excellence. For example Acunetix, burp-suite, and Splunk are commercial vulnerability scanners, and W3af, wapiti, arachni, OWASP Zap, etc. are a few open-source tools. Despite the proliferation of application services, it faces major challenges that may result in frustrating users and the organization. The existing security dashboards lag in the domain of an orchestrated project management system. It can visualize the vulnerabilities only but cannot perform vulnerability analysis. A security dashboard is required that has high resilience against attacks and anomalous activities. Its criticality poses various challenges to developing robust systems for ensuring protection along with reduced overall as well as development cost and time at the same time. The dynamic complexity of the security dashboard needs to be manifested with the
234
N. S. Chahal et al.
emergence of unexpected and insecure asset behaviors in response to the dynamic environment as well as the unpredictable fluctuations both in assets and network needed for the application’s execution. Therefore, identifying, understanding, and analyzing complex interactions, and interdependence represent a challenge to the evaluation of the real vulnerability of each system in consequence of a malicious event. Various challenges of the security dashboard can be highlighted as follows: • Risk score index calculation is required as the existing security dashboards focus on visualizations. • It requires a project management solution wherein the applications could be allocated to the security auditors. • It desired full visibility to rapidly search and analyze security-related events. • Need to manage and review vulnerability Protection Status, as well as view and act on security alerts. • It requires several widgets, such as Vulnerability Management Summary, Vulnerability Protection Status, and Global Weekly Vulnerability Detections. • It needed to create a world-class SOC with superior response and maturity levels. • Lacked SIEM solution and in-house vulnerability scanning tool. • Require data analytic tool for vulnerability analysis. For this reason, the proposed security dashboard is expected to improve application management and decision-making by amplifying cognition and capitalizing on human perceptual capabilities. Hence, a security dashboard is required that bridges the gap between the three stakeholders, i.e. the organization, the security auditor, and the developer. The rest of this paper is organized as follows: Sect. 2 discusses the literature survey; Sect. 3 describes the CVA security dashboard. Section 4 enlightens the mathematical model of the Risk Score Index (RSI). Section 5 covers the implementation and evaluation. Finally, Sect. 6 presents conclusions.
2 Literature Review In the past decade, E-commerce has grown exponentially due to the significant increment in online transactions. It is imperative to mention that US online retail sales grew 12.6% in 2015 to reach $176.2 billion in 2019. The phenomenal growth of a 10% compound annual growth rate is spotted from 2015 to 2019 [4]. With the expansion of E-commerce, the threat and vulnerability of IT assets have increased [5]. While the State government web/mobile applications often become a target for hackers, these security gaps constitute real threats. Therefore, to build a secure and usable organization infrastructure, there is a need for an exhaustive study to know the security gaps, guidelines, experience, and suggestions for mitigation and remediation of the best development practices. Research analysis highlights that 83.2% of e-Government web/mobile applications from 200 different countries were vulnerable to Structured Query Language (SQL) injection and Cross-Site Scripting (XSS).
Improvisation of Information System Security Posture …
235
The studies conducted to assess the factors affecting the adoption of state government services found that security is the major challenge [6]. Several vulnerabilities exist in Government applications [1]. Hackers get attracted to these websites due to their high impact of these sites. Verizon’s 2013 data breach report showed that 52% of the information security breaches are due to web/mobile application hacking especially cross-site scripting (XSS) [2]. The statistical security analysis of these applications is another significant open challenge. Various visualization tools and techniques are designed and implemented to fit the web/mobile applications [4]. Different researchers have plotted cybersecurity data on the bar and scatter plots [7]. There is potential to combine and link multiple graphs together into a dashboard that is then evaluated against users. Cybersecurity data is tailored by various researchers through visual representations [8, 9]. Map-like visualizations of the entire Internet seek to preserve the spatial location of similar types of computers across multiple datasets [3, 10]. An aggregated sliding slice of time is discussed to support the workflow of network analysts dealing with large quantities of data [11]. Numerous cybersecurity researchers have adapted existing visualization analysis in this domain, but very little of this work has been tested for network and security analysts [12, 13]. The working of IT personnel and analysts is observed to showcase the utility of web-based visualization dashboards for network and application security, but it lags in the evaluation of users’ data [14].
3 The Continuous Vulnerability Assessment Security Dashboard An open-source orchestrated Continuous Vulnerability Assessment (CVA) Security Dashboard is proposed that manages the assets/network and web/mobile application for the threats and vulnerabilities monitoring, risk score index evaluation based on the advanced analytics methodology. The proposed dashboard presents the frontend consolidation for displaying the ongoing security processes, enabling a security analyst the reaction to the threat level. The major objective of the presented security dashboard is to perform continuous vulnerability assessment and to monitor the overall risk to grade the capability of the application for protection against threats. This security dashboard is compatible with other open-source as well as commercial vulnerability scanning tools for listing vulnerabilities. It is based on the mathematical model for evaluating the application with RSI highlighted in detail. Further, the proposed dashboard is implemented on the real-time data of 535 web applications of the state government of India for 5 years’ time period. It provides the vulnerability count and average vulnerabilities encountered per audit. The experimental results prove that the CVA security dashboard reduces the Mean-Time to Mitigate Vulnerabilities and Mean-Time to Patch and remediation. Therefore, it is called a continuous vulnerability assessment (CVA) security dashboard.
236
N. S. Chahal et al.
3.1 Asset and Network VA/PT The proposed security dashboard also works on the challenges and issues within the infrastructure. The organizational infrastructure can be technically categorized into assets and networks between those assets [15]. Both assets and networks pose security risks. For this reason, securing assets can be an intimidating task to gain insight into specific defects and guidelines for fast remediation with coverage of major technologies and platforms like Apache, Oracle, IIS, Java, PHP, etc. It also identifies structural and architectural defects based on assets [16]. The following are the steps involved in asset and network VAPT: • Collect all the information of assets and their OS versions and regulate vulnerabilities. • Patch the vulnerability and update the POC of the asset. • Data Centre Reviews. • Vulnerabilities will be open and the ticket will close after submitting a POC for the respective vulnerability/CVE or details for patch management. So, every device or asset within the network should be scanned. Failing to scan every asset and access point leaves the network making the asset receptive to vulnerabilities [17].
3.2 Mobile and Web Application VA/PT The proposed security dashboard offers an integrated basic VA Scanner along with added more than 400+ Vulnerability Datasets comprised on Web and Mobile. The security dashboard promotes VAPT. VAPT helps in achieving standards including the GDPR, ISO 27001, and PCI DSS [18]. Some of the significant challenges that are unique to the security monitoring implementation architecture include • Information gathering (footprinting) for server/platform. • Manually and automated scanning for Vulnerabilities present in respective applications and maintaining POC. • Generate and share reports with clients including all POCs. • Proactively monitor, track, and react to security violations. • Provide reports and APIs for external consumption and integration. • Ensure that the complete lifecycle of an application (such as events, incidents, and changes) is consolidated and visible to the responsible operational teams. • Maintain historical data for problem resolution along with the information necessary to perform compliance audit reviews and certifications shown in Fig. 2.
Improvisation of Information System Security Posture …
237
Fig. 2 Stage-wise vulnerabilities visualization
3.3 ISMS Audit Module The organization requires ISMS certificate management platforms with additional capabilities [19]. An information security management system constitutes a set of policies along with procedures for the management of sensitive data belonging to the organization systematically [20]. The main goal is the minimization of risk that ensures continuity of business by reducing the security impact. It standardizes with ISO 27001 specifications. These policies do not compel specific direction but suggest audits, documentation, continuous improvement, and preventive measures [21]. Further mitigation of vulnerabilities involves patch installation, changes in network security policy, software reconfiguration, etc. [22]. It added more than 100+ Secure Code examples for various platforms and languages to get clear direction on flaw remediation [23]. Further, action plans based on a prioritized list of high-impact issues are created.
3.4 Reporting Tool This is another significant feature of the proposed CVA security dashboard. Once an assessment is complete, the proposed security dashboard delivers a formal report and debriefs outlining key findings and a prioritized list of remedial actions to help address any identified risks and exposures of the security audits [24]. It has been
238
N. S. Chahal et al.
an effective way to communicate both abstract and concrete ideas through graphs, charts, etc. [25]. It constitutes the following graphical views for visualization as Stage-wise Vulnerabilities Visualization, Current vulnerability stats, Iteration stats, Overall Vulnerabilities stats, Application Stats, Last 6 Months Stats, Overall RSI, etc.
3.5 Risk Score Index It is a fully calibrated and orchestrated security solution that scores application security, assets, and network security by centralizing visibility and enabling drill down to resolution [26]. This study proposes a security dashboard-based framework that orchestrates and advocates continuous assessment leading to mitigation and remediation of vulnerabilities [27]. The proposed security dashboard diminishes the loopholes of the existing security dashboards. It includes the allocation of the asset/network or web/mobile application to the available security analyst, scanning of the entire web/mobile application through its vulnerability scanner and its mitigation and remediation along with its evaluation through the automatic risk score index (RSI) computation iteratively as shown in Fig. 3.
Fig. 3 CVA security dashboard
Improvisation of Information System Security Posture …
239
4 Statistical Model of RSI in CVA Security Dashboard The role-based CVA security dashboard has a crucial, secure, and reliable login system. Google ReCAPTCHA is used to prevent brute force. It offers various roles like Super Administrator, Administrator, Manager, Team Leader, Developer, Auditor, etc. [28]. Hence, it speeds up the working of security auditors and analysts. Intelligent aggregation of Vulnerabilities can be highlighted by the Risk Score Index (RSI). Security analysts can identify and prioritize risky assets, observe RSI changes, and determine an appropriate course of action. The dashboard performs the calculation of RSI based on monthly policy revisions and the Security Index Score concerning OWASP risk calculating factors. The maximum RSI is 10 in Table 1. Risk score index = Probability of attack ∗ Impact ( δ=
A(s) + A(a) 2a
(1)
) ∗ Impact
δ is a risk score index, A(α) is the Vulnerability factor, A(s) is the attacker’s strength factor which includes security auditor skill set, Purpose of attack, Leverage to attack, Team type, and team size. Similarly, A(t) (Technical impact factor) is comprised of Confidentiality loss, Integrity loss, Availability loss, and Accountability loss. A(b) (Business impact factor) consists of Reputational impact, Non- compliance, Privacy impact, and Financial impact. All these components are rated 0–10 and then the average of all computes their respective factor. It can be rewritten as shown below Table 1 RSI calculation parameters A(s) attack strength
A(a) vulnerability factor
A(t) technical impact A(b) business impact
Security auditor skillset A(α) compliance or vulnerability scanner coefficient of open vulnerabilities accuracy
Confidentiality loss
Reputational impact
Purpose of attack
A(β) compliance coefficient of high vulnerabilities
Integrity loss
Non-compliance
Leverage to attack
A(η) compliance Availability loss coefficient of the same number of open vulnerabilities for two consecutive stages
Team type and team size A(γ ) compliance coefficient of stages
Accountability loss
Probability of attack
Impact
Privacy impact
Financial impact
240
N. S. Chahal et al.
( δ=
A(s) + A(a) 2a
) ( ) A(t) + A(b) ∗ 2a
(2)
Vulnerability factor A(α) provides data on performance indicators of RSI and metrics based on the following parameters: • • • • •
Number of Stages. Number of high-risk vulnerabilities concerning each stage. Number of open vulnerabilities. The same number of open vulnerabilities in two consecutive stages. Addition in the total number of vulnerabilities in two consecutive stages. A(a) = A(α) + A(β) + A(γ ) + A(η) + A(ε)
(3)
where A(α) is the compliance coefficient of open vulnerabilities, A(β) is the compliance coefficient of high vulnerabilities, A(γ ) is the compliance coefficient of stages, A(η) is the compliance coefficient of the same number of open vulnerabilities for two consecutive stages, and A(ε) is the compliance coefficient of increased open vulnerabilities for two consecutive stages. Here, compliance coefficient of open vulnerabilities, A(α) =
Σi≤ j+k i=1
i +1
(4)
where the number of open vulnerabilities, i = 0 and j = [10, 20], and k ≥ 20. Compliance coefficient of high vulnerabilities can be computed as A(β) =
Σx≤y+z x=1
x +1
(5)
where the number of high vulnerabilities, x = 0 and y= [10, 20], and z ≥ 20. The compliance coefficient of stages is calculated as
A(γ ) =
⎧ p≤q Σ ⎪ ⎪ ⎪ p = 0 , other wise p ≤ 1 ≤ r ⎪ ⎪ ⎨ p=0
r>p ⎪ Σ ⎪ ⎪ ⎪ p = 1 − e−σ A(a) ⎪ ⎩
(6)
p≥=r
where σ is the vulnerability fixing rate, such that σ = [0, 1]. Compliance coefficient of the same number of open vulnerabilities for two consecutive stages, A(η) = log10 [A(α) − A(α − 1)) + A(γ ) − A(γ − 1)]
(7)
Improvisation of Information System Security Posture …
241
where A(γ )and A(γ − 1) are the compliance coefficients of the previous and current stages, respectively. A(α)and A(α − 1) is the compliance coefficient of open vulnerabilities. The compliance coefficient of increased open vulnerabilities for two consecutive stages can be calculated as [ A(ε) = log10
A(α) − A(α − 1)) + A(γ ) − A(γ − 1) A(α)A(γ )
] (8)
Substituting Eqs. 4, 5, 6, 7, and 8 in Eqs. 2 and 3, the A(a) is computed and hence the RSI is calculated. Following is the algorithm of the CVA security dashboard:
Algorithm 4.1: CVA Security Dashboard Input: 1: 2:
3: 4: 5: 6:
Enter the web/ Mobile application, asset, or network details. Sort the applications entered as per priority For (RSI!=0) Allocate the application for the manual resource vulnerability scanning tool. Call Vulnerability scanner Sort all the vulnerabilities as per their severity. Compute the Total vulnerabilities count. Compute the Probability of attack A s + A a Probability of attack= 2a α
7:
8:
β
γ
η
ε
or
automated
Refer to Eq.(2) Refer to Eq.(3)
Compute Impact b A t Impact = 2a Where A(t) and A(b) are Technical and Business impact factors respectively. Calculate RSI, δ. δ=
A s + A a 2a
*
A t + A b 2a
Refer to Eq.(2)
9: Generate a report and visualizations. Automate the entire process till RSI=0 10: 11: End For Output: Secured application
The proposed CVA security dashboard is fed with the web application URLs and credentials are provided for VAPT wherein these applications are sorted on a priority basis. If RSI is not equal to zero, then the application can be allocated for manual or automated VAPT [29]. In the case of automated VAPT, the Vulnerability scanner starts its search for vulnerabilities, which are sorted as per severity. Further, the RSI for the same is computed using the mathematical model. In the end, the report generation of the same is followed by visualizations. This process is iterated till RSI = 0 and the application is not prone to vulnerabilities.
242
N. S. Chahal et al.
5 Implementation and Evaluation Within this space, a security dashboard is designed to overcome the discussed gap constraints. The proposed security dashboard is developed in PHP that integrates with MySQL. SHA-256 hashing technique is used, and passwords are stored along with a variable salting technique. Evaluation and analysis highlight the CVA Security Dashboard usability, and thus is successfully deployed and used by CDAC security auditors. A sample security report of 535 State Government web applications is reviewed for 6 years, i.e. 2015–2021, to determine the nature of the proposed security dashboard. These applications are hosted on various servers and different platforms including 185 Java web applications, 150 in.NET, and 200 web applications in PHP as shown in Table 2. A detailed description of the sample web applications is considered for the vulnerability count. Along with its relative percentage is computed to calculate the cumulative vulnerability percentage. Further, the detailed analysis of the Compliance coefficient of open vulnerability in PHP, .Net, and Java web applications is shown in Fig. 4. The cumulative compliance coefficient of open vulnerability is shown in Fig. 5. A3 vulnerability is the major contributor in this sample state government website with 19% vulnerability. The other vulnerabilities are less than 18%. The compliance coefficient of high vulnerability is shown in Fig. 6 and is categorized based on vulnerability severity. The compliance coefficient of stages is shown in Fig. 7 where the total number of vulnerabilities at the various stages are in the case of network, asset, and web/mobile applications. It is observed that there is a gradual fall in the graph of the network, asset, and web/mobile application vulnerability count, which leads to a decrease in the compliance coefficient of the stage each year from 2015 to 2021. Similarly, the RSI graph is shown in Fig. 8. In the year 2015–2016, RSI of the web/mobile, asset, and network vulnerabilities are recorded as the highest, wherein web/mobile application RSI declines to 44% in 2017–2018. The peak difference percentages of the three cases (web/mobile, asset, and network) are about 66%, 75%, and 55%, respectively. Hence, RSI has decreased with each year passed due to continuous vulnerability assessment, leading to the lowest risk score in the year 2021. Therefore, all the network, assets, and web/mobile applications audited become secure as the developers and network analysts fixed all the vulnerabilities with the help of secure code offered in the CVA security dashboard, hence making the complete State government organization secure.
170
165
HIGH
HIGH
Authorization Bypass/ Forceful browsing
The heartbleed bug A3
174
74
MEDIUM
MEDIUM
Host header attack
Vulnerable forgot password implementation
49 175
MEDIUM
MEDIUM
Poodle attack
48
Application error message
A4
171
HIGH
Insufficient Anti-automation
A2
HIGH
Error based SQL injection
155
HIGH
Blind SQL injection
135
185
87.05
87.05
88.23
57.64
76.47
82.35
83.52
56.47
64.70
41.17
100
135
141
147
97
138
134
136
27
130
17
145
70
82
94
54
76
78
72
54
60
14
90
Sample total Relative (150) percentage (%)
Sample total Relative (185) percentage (%)
HIGH
HIGH
A1
Cross-site request forgery
Vulnerability in .NET
Vulnerability in JAVA
Time-based SQL injection
Severity
OWASP Top 10
Vulnerability
Table 2 Vulnerabilities found in 535 State Government Web Applications
157
190
193
184
175
172
187
136
154
133
193
57.57
90.90
93.93
84.84
75.75
72.72
87.87
36.36
54.54
33.33%
93.93
Sample total Relative (200) percentage (%)
Vulnerability in PHP
76.19
86.30
91.07
61.90
76.19
76.19
80.95
51.78
61.30
37.50
95.83
(continued)
Cumulative vulnerability percentage (%)
Improvisation of Information System Security Posture … 243
178
LOW
LOW
INFO
HIGH
Session cookie without HTTP only flag set
Browser cache weakness
Password input field with autocomplete enabled
Stored cross-site scripting
A7
178
HIGH
A6
Improper redirection
54
184
165
5
HIGH
165 181
63.52
98.82
76.47
91.76
91.76
5.88
95.29
76.47
0
14
148
126
147
135
11
146
131
148
8
96
52
94
70
2
92
62
96
Sample total Relative (150) percentage (%)
Sample total Relative (185) percentage (%) 0
Vulnerability in .NET
Vulnerability in JAVA
Remote file inclusion
HIGH
MEDIUM
Severity
HIGH
A5
OWASP Top 10
Insufficient transport layer protection
File upload
ViewState is not encrypted
Vulnerability
Table 2 (continued)
145
193
148
190
106
0
193
175
0
45.45
93.93
48.48
90.90
6.06
0
93.93
75.75
0
Sample total Relative (200) percentage (%)
Vulnerability in PHP
43.45
97.02
63.69
92.26
68.45
3.57
94.04
72.02
96
(continued)
Cumulative vulnerability percentage (%)
244 N. S. Chahal et al.
HIGH
183 84 84
INFO
LOW
MEDIUM
Clickjacking
Vulnerable remember password
54
0
48
180
179
181
98.82
98.82
97.64
63.52
0
56.47
94.11
92.94
95.29
60
150
150
147
112
16
118
145
148
149
15
100
100
94
89
12
36
90
96
98
10
Sample total Relative (150) percentage (%)
Sample total Relative (185) percentage (%) 51
Vulnerability in .NET
Vulnerability in JAVA
Email address disclosure
HIGH
Insecure direct object reference
A10
HIGH
HIGH
302 to 200 OK vulnerability
ASP.net oracle padding vulnerability
HIGH
Possible bruteforce attack
A9
HIGH
Session fixation
A8
Session replay attack
Severity
HIGH
OWASP Top 10
Reflective cross-site scripting
Vulnerability
Table 2 (continued)
200
196
193
127
188
190
193
193
193
151
100
96.96
93.93
27.27
88
90
93.93
93.93
93.93
51.51
Sample total Relative (200) percentage (%)
Vulnerability in PHP
99.40
98.80
95.83
44.64
32
48.88
92.85
94.04
95.83
43.15
Cumulative vulnerability percentage (%)
Improvisation of Information System Security Posture … 245
246
N. S. Chahal et al. 100
80 60
JAVA
PHP
40
.NET
20 0 A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
Fig.4 Compliance coefficient of open vulnerabilities in Java, PHP, and .NET web applications Fig. 5 Cumulative compliance coefficient of Open vulnerability
Cumulative compliance coefficient of Open vulnerability
1.54%
1.54%
A1
0%
A2 A3 17.6%
17%
3.07%
A4 A5 A6
16.92%
A7
19.23%
A8 A9
16.92%
5.38%
A10
Compliance coefficient of High vulnerability
Fig. 6 Compliance coefficient of High vulnerability
HIGH
7%
31% MEDIUM
30% LOW
INFORMATION
34%
ONAL
Fig. 7 Compliance coefficient of the stage
No. of vulnerabilities
Improvisation of Information System Security Posture …
2500
247
Compliance coefficient of stages
2000 1500 1000 500 0
vulnerbailities
10
network
Risk Score Index (RSI)
8 Risk Score
Fig. 8 Risk Score Index (RSI) of CVA Security Dashboard
No. of Stages Asset
6 4 2 0
2015-162016-172017-182018-192019-202020-21 Years Network Asset Web/Mobile
5.1 Comparative Analysis The conventional security system dashboard (CSS Dashboard) includes the traditional method of exploiting vulnerabilities and then reporting them by the security analyst [30]. There are many visualization tools [31]. In another perspective for the evaluation of the security dashboard, it is compared with Bubblenet [24], Blockchain Signaling System (BloSS) [32, 33], and CSS Dashboard based on the following parameters. (a) Efficiency Efficiency can be calculated as the average time to patch the vulnerability or the number of days between when the vulnerability was opened and when that vulnerability was closed. Shorter times to fix, particularly for significant vulnerabilities, make an asset more secure. The efficiency of the dashboard is directly dependent on the Vulnerability remediation rate as shown in Fig. 9.
248
N. S. Chahal et al.
Efficiency
120
Number of days
100
80 60 40 20 0
100
200
300 400 No. of Applications
CVA security dashboard
CSS Dashboard
Bubblenet
535 BloSS
Fig. 9 Efficiency of CVA security dashboard
The peak percentage difference in the efficiency of the CVA security dashboard is 60% more than the others. Therefore, it is observed that the proposed security dashboard outstands others. Hence, the proposed security dashboard cannot only amplify the security perspective but also orchestrates and advocates continuous assessment leading to mitigation and in some cases remediation of vulnerabilities. So, to automate security-related activities and fill the gap, a proposed security dashboard delivers significant business value by dramatically reducing the vulnerability identification time, and investigating and remediating security-related incidents. Therefore, the total annual cost of security, compliance, and ongoing operations is reduced.
6 Conclusions and Future Scope Creating an effective proposed security dashboard is a dynamic process that evolves as organizational security goals shift, as new sources as user requirements change. The proposed security dashboard enhances the security of the IT applications and infrastructure across divisions with complete visibility of vulnerabilities along with knowledge-based solutions. Automated vulnerability assessment based on opensource information is a significant work that generates risk scores and establishes the Cyber-security Maturity Index framework for the organization. It also offers ISMS certificate management. In the future, SIEM will be focused on the next area of interest.
Improvisation of Information System Security Posture …
249
References 1. Fischer, F., & Keim, D. A. (2014). NStreamAware: Real-time visual analytics for data streams to enhance situational awareness. In VizSec ’14. 2. Weir, C., Ware, M., Migues, S., & Williams, l. (2021, August 23–28) Infiltrating security into development: Exploring the world’s largest software security study. In ESEC/FSE ’21. 3. S.E.E. Profile and S.E.E. Profile. (2012, January). A new mathematical model for analytical risk assessment and prediction in IT systems. 4. Hao, L., Healey, C. G., & Hutchinson, S. E. (2013). Flexible web visualization for alert-based network security analytics (pp-1–8). ACM. 5. Barth, A., Rubinstein, B. I. P., Sundararajan, M., Mitchell, J. C., Song, D., & Bartlett, P. L. (2010). A learning-based approach to reactive security. In International Conference on Financial Cryptography and Data Security (pp. 192–206). Springer. 6. The information confidentiality and cyber security in medical (pp. 855–864). 7. U. Interfaces. (2019). Applying design system in cybersecurity dashboard development. 8. Yu, T., Lippmann, R., Riordan, J., & Boyer, S. (2010, September) Ember: A global perspective on extreme malicious behavior. In Proceedings of the Symposium on Visualization for Cyber Security (pp. 1–12. 3). New York, NY: ACM Press. 9. Awoleye, O. M., Ojuloge, B., & Siyanbola, W. O. (2012). Technological assessment of egovernment web presence in Nigeria (pp. 236–242). 10. Paul, C. L., Rohrer, R., Sponaugle, P., Huston, J., & Nebesh, B. (2013, October). CyberSAVI: A cyber situation awareness visual interface for mission-level. In VizSec 2013. 11. Bastos, I., Melo, V. H. C., Schwartz, W. R. (2020, March). Bubblenet: A disperse recurrent structure to recognize activities. In 2020 IEEE International Conference on Image Processing (ICIP). 12. Agutter, J., Foresti, S., Livnat, Y., & Moon, S. (2006). Visual correlation of network alerts. IEEE Computer Graphics and Applications, 26(2), 48–59, March 2014. 13. Ryan, P. Y. A. (2000, September). Mathematical models of computer security, 2014. 14. Mckenna, S., Staheli, D., & Meyer, M. (2015). Unlocking user-centered design methods for building cyber security visualizations. In 2015 IEEE Symposium on Visualization for Cybersecurity (VIZSEC) design. IEEE. 15. Akgul, Y. (2016). Web site accessibility, quality, and vulnerability assessment: A survey of government web sites in the Turkish Republic (Vol. 4, pp. 1–13). 16. Mckenna, S., Staheli, D., Fulcher, C., & Meyer, M. (2016). BubbleNet: A Cyber Security Dashboard for Visualizing Patterns. Eurographics Conference on Visualization (EuroVis), 35(3), 2016. 17. Faso, B. (2016). Vulnerabilities of government websites in a developing country—the case of Burkina Faso, December 2017. 18. Barsomo, M. (2017). A survey of automated tools for probing vulnerable web applications. 19. Idris, I., Majigi, M. U., & Olalere, M. (2017, December). Vulnerability assessment of some key Nigeria government websites vulnerability assessment of some key Nigeria government websites. 20. Elisa, N. (2017). Usability, accessibility and web security assessment of E-government in Tanzania. International Journal of Computer Applications, 164. 21. Friedman, J. (2019). Vulnerability scoring systems, remediation strategies, and taxonomies by EAS499 senior capstone thesis. 22. Ali, A. A., & Murah, M. Z. (2019, June). Security assessment of Libyan government websites. In 2018 Cyber Resilience Conference (pp. 1–4). 23. Singh, V. K., Callupe, S. P., & Govindarasu, M. (2019, October). Test bed-based evaluation of SIEM tool for cyber kill chain model in power grid SCADA System. In 2019 North American Power Symposium (NAPS). 24. Khalimonenko, A., Kupreev, O., Badovskaya, E. (2018, April). DDoS attacks in Q1 2018. Retrieved March 6, 2019, from https://securelist.com/ddos-report-in-q1-2018/85373/.
250
N. S. Chahal et al.
25. Mannhart, S., Rodrigues, B., Scheid, E., Kanhere, S. S., & Stiller, B. (2018, August). Toward mitigation-as-a-service in cooperative network defenses. In 3rd IEEE Cyber Science and Technology Congress (CyberSciTech 2018) (pp. 362–367), Athens, Greece. 26. Al-Dhahri, S., & Al-Sarti, M. (2017). Information security management system. International Journal of Computer Applications, 158(7), 29–33. 27. Ridley, T. (2021, June). Security management and security leadership dichotomies: Which is needed more? In Security, Risk & Management Sciences (pp. 1–15). 28. Fonseca-Herrera, O. A., Rojas, A. E., & Florez, H. (2021). A model of an information security management system based on NTC-ISO/IEC 27001 standard. IAENG International Journal of Computer Science, IJCS_48_2_01, 48(2). 29. Soomro, Z. A., Shah, M. H., & Ahmed, J. (2016). Information security management needs more holistic approach: A literature review”. International Journal of Information Management, 36(2), 215–225. 30. Park, S., & Lee, K. (2014). Advanced approach to information security management system model for industrial control system. The Scientific World Journal, (1–2), 348305. 31. Antunes, M., Maximiano, M., Gomes, R., & Pinto, D. (2021). Information security and cybersecurity management: A case study with SMEs in Portugal. Journal of Cybersecurity and Privacy, 1, 219–238. 32. Killer, C., Rodrigues, B., Stiller, B. (2019). Threat management dashboard for a blockchain collaborative defense. In Proceedings of the IEEE GLOBECOM Workshop 27th on Blockchain in Telecommunications: Emerging Technologies for the Next Decade and Beyond, February 2020. 33. Killer, C., Rodrigues, B., & Stiller, B. (2019). Security Management and Visualization in a Blockchain-Based Collaborative Defense. IEEE.
Design and Development of Micro-grid Networks for Demand Management System Using Fuzzy Logic L. Senthil, Ashok Kumar Sharma, and Piyush Sharma
Abstract Micro-grid is designed to operate with an Energy Management System (EMS), which dispatches the units, in order to optimize generation costs. One of the inputs to this system corresponds to the prediction of demand and also incorporates a demand management system. The objective of this paper is to design the demand prediction block, taking into account the non-linear behavior presented by the demand. The model is designed to deliver the predictions that EMS needs, that is, for a 2-day horizon. When deriving the model, a stability analysis based on the fuzzy theorems is included in the identification stages. The final model consists of four rules and 96 regressors, that is, the future demand depends on the demand of the previous day. As a result, a model is obtained that manages to deliver predictions for horizons of 2 days, with errors of around 14%. The prediction was also analyzed using the EMS optimizer; the fuzzy model prediction had an error 11% lower than the prediction originally used, which translated into a 15% decrease in costs for the 2-day optimization. The second objective corresponds to developing a methodology to model the variation in consumption in the face of demand management signals, using this fuzzy model. Keywords Micro-grid · Energy management system · Demand management system · Fuzzy model
1 Introduction Due to the geographical conditions of India, there are still communities isolated from the electrical systems that distribute energy within the country. Smart microgrids based on renewable energies represent a solution to this problem, taking into L. Senthil (B) · A. K. Sharma · P. Sharma Department of Electrical Engineering, Rajasthan Technical University, Kota, India e-mail: [email protected] A. K. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_20
251
252
L. Senthil et al.
account the abundance of resources that exist to generate non-conventional renewable energies. With this in mind, a distributed generation micro-grid incorporates renewable resources (solar and wind energy). Before the installation of the microgrid, this locality had only 10 h of daily energy, a situation that changed, currently having electricity 24 h a day. The installed micro-grid currently operates with an energy management system (EMS: Energy Management System), which dispatches the units, minimizing generation costs. The EMS inputs are consumption prediction, prediction of climatic variables and the state of charge of the batteries. EMS has a built-in demand management system, based on “demand response”. This system consists of sending light signals to consumers through traffic lights installed in their homes, in order to reduce, increase or maintain their consumption in order to have an optimal dispatch. With regard to this system, in this work two topics are addressed; the calculation of the range in which the demand can move and the effect that the demand management signals have on the forecast. For the first topic, fuzzy intervals will be used to determine, based on historical data, the dynamic range. This range provides the limits to the optimizer for the load displacement factor, on which the signals that are sent to the consumers depend. The second issue is based on the fact that consumers are expected to modify their consumption when seeing the demand management signals, but it is not known by how much, which also affects the prediction, that is why a methodology will be presented to model the consumption variation using fuzzy models. Demand management strategies serve to maintain the balance between generation and demand, which is of vital importance for micro-grids, where energy resources fluctuate.
1.1 Scopes In this paper, a model for the prediction of demand in micro-networks will be developed, using as input the demand data of a past day (96 regressors), to forecast the demand at a horizon of 2 days (192 steps). The prediction will be carried out using fuzzy modeling, the identification process of which will include a stability analysis stage, which will be programmed in Matlab. The prediction model will also be programmed in Matlab; it will be compared with a prediction model based on neural networks, also designed for the micro-network. In relation to demand management, two main topics will be covered in this paper; the first corresponds to calculating the dynamic range for the load displacement factor, used by the micro-grid optimizer to determine the demand management signals. To determine this range, fuzzy intervals will be used, which consider developing two methods to predict the intervals at “j” steps, since the optimizer works with a horizon of 192 steps. One of these methods, chosen for its performance, will be used to calculate the 192-step demand shift range, with historical data, and this dynamic range will be tested in the energy management system optimizer simulator. The second issue in relation to demand management
Design and Development of Micro-grid Networks …
253
corresponds to the changes produced by the signals in the demand pattern, for which a methodology will be proposed to model the effect that demand management signals have on the prediction of demand using fuzzy models.
1.2 Structure of the Paper The paper is structured as follows. The first section presents the introduction to the work, setting out the scope and objectives of the paper and the way forward for its successful completion. In the second section, the problem to be addressed is presented; for this, it begins with the presentation of the concept of micro-networks, then the micro-network installed in with all its components is detailed and the aspects that the work of this paper is specified aim to improve. The third section corresponds to the state of the art, where the two main topics of this paper, forecasting and demand management, are dealt with. Related concepts and classifications are detailed, and applied work in the field of micro-networks is emphasized, both for demand management and its prediction. Next, the fourth section deals with the demand forecasting model developed. The fuzzy models of the stability analyze that have been carried out in other works on this type of models and the technique that will be included in the identification stages used in this work.
2 Background and Current Situation In this section, the concept of micro-networks is explained, and then the case study is specified. Geographical and social characteristics of the said community will be described, as well as the technical characteristics of the installed micro-network.
2.1 Micro-networks Distributed generation has gained importance due to the low emissions and low costs it entails. This technology includes gas turbines, fuel cells, micro-turbines and photovoltaic panels. To avoid the problems that these generators can cause, in [1] it is proposed to use these generators and their associated loads as subsystems that can be isolated from the distribution network. These subsystems are called micronetworks. Micro-grids correspond to a set of loads and small generators, operating as a single controllable system, that provide energy and/or heating to their associated local area. These may or may not be connected to the main network, being able, in the first case, to isolate themselves against the existence of distribution problems. The energy sources used in this type of network correspond to small generators ( T) then the CH node is selected into the cluster end Acknowledgement is received from the destination for j = 1 to m do if (j receives Acknowledgement through optimal CH) then NG = NG + RC; else if (intermediate node receives Acknowledgement through optimal path) Node reward else Node Penalty end Penalty consistent, P = 1/2 Reward consistent, R = 0.01; Initial goodness value T = 0.30; η = 0.15.
4 Performance Analysis The proposed protocol might demonstrate using the network simulator-NS2 with the simulation parameters tabulated in the following Table 1. The performance and the effectiveness of the proposed protocol RBMCA might be analyzed using the simulator. The Average Dissipated Energy (ADE) is calculated by the equation AD E =
Eni − En f N (D I )
(11)
352
A. Revathi and S. G. Santhi
Table 1 Simulation parameters
Sl. No
Parameter
Value
1
Simulator
NS-2
2
Topology
Random Node placement,one Sink
3
Number of nodes
100
4
Packet size
3000 bits
5
Control packet size
300 bits
6
Initial energy
0.5 J
7
Rounds
1200
8
Trust threshold
0.5
9
Protocols considered
RBMCA (proposed), EECRP (Existing)
where Eni is the opening energy and Enf stands for the final energy of the corresponding node, N(Di) is the total number of data items which are transmitted by the corresponding user to the neighbor nodes in network as shown in Fig. 6. The proposed technique provides a total number of nodes calculated the data transmission time duration and delay and is calculated using Eq. 12. N ode Delay = Pr ocess Delay + T ransmission Delay + Queue Delay + Pr opagation Delay
(12)
The number of clusters reduces packet loss and end-to-end delay as shown in Fig. 7. Packet Delivery Ratio is calculated with number of nodes. The calculation is based on the packets transmitted from source to destination through the number of nodes using Eq. 13. Fig. 6 Node versus energy
Routing-Based Restricted Boltzmann Machine Learning …
353
Fig. 7 Node versus delay
N ode P D R =
T otal no. o f packets r eceived T otal no. o f packets sent
(13)
A graphical representation of a mean packet delivery ratio is shown in Fig. 8. Throughput is calculated using Eq. 14. The number of nodes data transmission is based on the time interval. T hr oughput =
N umber o f N odes T ime spent f or transmission
(14)
The RBMCA technique nodes and throughput are shown in Fig. 9. Data transmission speed and energy consumption for the packet transmission between sources to destination are calculated using Eq. 15. Energy Consuption =
T otal Energy N umber o f Packet T rans f err ed
(15)
The RBMCA mechanism reduces energy usage while increasing speeds as shown in Fig. 10. Fig. 8 Node versus packet delivery ratio
354
A. Revathi and S. G. Santhi
Fig. 9 Node versus throughput
Fig. 10 Speed versus energy
Speed and delay are calculated using Eq. 16 based on the packet transmission speed. Speed =
Distance T ime
(16)
In Fig. 11, the new method outperforms the present method in terms of speed. The proposed RBMCA lowers packet loss and end-to-end delay. Packet delivery ratio and speed are calculated using Eq. 17. The ratio of the transmitted packet and speed is calculated from source to destination. Σ Packet Deli ver y Ratio =
(T otal Packet Recei ved time) T otal Packet Sent time
(17)
The proposed method increases the packet delivery ratio speed as shown in Fig. 12. The RBMCA mechanism in Fig. 13 allows a high throughput of around 1pkt/s while also extending the network’s lifetime calculated using Eq. 18.
Routing-Based Restricted Boltzmann Machine Learning …
355
Fig. 11 Speed versus delay
Fig. 12 Speed versus Packet Delivery Ratio
Fig. 13 Speed versus throughput
T hr oughput =
sum(N o. o f Packets tranmitted ∗ average packet si ze T ransmission T ime (18)
356
A. Revathi and S. G. Santhi
5 Conclusion and Future Work The paper discusses advantage and disadvantage of the previous work, the proposed method for data aggregation routing using RBMCA for WSNs. The performance of the proposed methodology is compared with IPSO, EECRP existing system using NS2 simulator for various scenarios, simulation experiments indicateing the proposed scheme gives better results other than the existing clustering algorithm based on the energy consumption and consumption delay. In future, the study lies to select CH from the selected CH and takes steps to consume more energy with less delay.
References 1. Aarti Jain, Manju Khari, Elena Verdu, Shigeru Omatsu, & Rube´n Gonza´ lez Crespo. (2020). A route selection approach for variable data transmission in wireless sensor networks. Cluster Computing, 1–23. 2. Ramegowda. K., & Dr. Sumathi, R. (2015). An Introduction to Basic Concepts of Clustering Methods In Wireless Sensor Networks, 2(9). 3. Puneet Azadand, & Vidushi Sharma. (2013). Cluster head selection in wireless sensor networks under fuzzy environment. Hindawi Publishing Corporation ISRN Sensor Networks, Volume (2013). 4. Sukhchandan Randhawa, & Sushma Jain. (2016, August). Performance analysis of LEACH with machine learning algorithms in wireless sensor networks. International Journal of Computer Applications, 147(2). 5. Huajin Tang, Vui Ann Shim, Kay Chen Tan, & Jun Yong Chia. (2010). Restricted Boltzmann machine based algorithm for multi-objective optimization, IEEE. 6. Palaniappan Sathyaprakash, & P. Prakasam. (2020). Boltzmann randomized clustering algorithm for providing quality of evolution in wireless multimedia sensor networks. Wireless Personal Communications, 4. 7. Palaniyappan, S., & Periasamy, P. (2020). Boltzmann randomized clustering algorithm for providing quality of evolution in wireless multimedia sensor networks. Springer. 8. Wibhada Naruephiphat, & Chalermpol Charnsripinyo. (2009). Clustering techniques in wireless sensor networks. International Conference on New Trends in Information and Service Science, IEEE. 9. Pradhan, S., & Sharma, K. (2016). Cluster head rotation in wireless sensor network: A simplified approach. International Journal of Sensor and Its Applications for Control Systems, 4(1), 1–10. 10. Nirbhay K. Chaubey, & Dharti H. Patel. (2016, May). Energy efficient clustering algorithm for decreasing energy consumption and delay in Wireless Sensor Networks (WSN). International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization), 4(5). 11. Manjeet Singh, Surender Soni, Gaurav, & Vicky Kumar. (2016). Clustering using Fuzzy Logic in Wireless sensor Networks. IEEE, 978-9-3805-4421-2/16/$31.00_c 12. Tri Gia Nguyen, Chakchai So-In, Nhu Gia Nguyen, & Songyut Phoemphon. (2017). A novel energy-efficient clustering protocol with area coverage awareness for wireless sensor networks. Peer-to-Peer Networking and Applications, 10(3). 13. Haider Ali, Umair Ullah Tariq, Mubashir Hussain, & Liu Lu. (2020). Member, IEEE, John Panneerselvam, and Xiaojun Zhai, ARSH-FATI A Novel Metaheuristic for Cluster Head Selection in Wireless Sensor Networks. IEEE Systems Journal.
Routing-Based Restricted Boltzmann Machine Learning …
357
14. Botao Zhu, Ebrahim Bedeer, Ha H. Nguyen, Robert Barton, & Jerome Henry. (2020). Improved Soft-k-means clustering algorithm for balancing energy consumption in wireless sensor networks. IEEE Internet of Things Journal. 15. Solmaz Salehian, & Shamala. K. Subramaniam. (2015). Unequal clustering by improved particle swarm optimization in wireless sensor network. The 2015 International Conference on Soft Computing and Engineering (SCSE 2015), pp. 403–409. 16. Jian Shen, Anxi Wang, Chen Wang, & Patrick C. K. Hungand Chin-Feng Lai. (2017). An efficient centroid-based routing protocol for energy management in WSN-assisted IoT. Special Section on Intelligent Systems for the Internet of Things, IEEE Access. 17. Ali, Z. A., Masroor, S., & Aamir, M. (2019). UAV based data gathering in wireless sensor networks. Wireless Personal Communications, 106(4), 1801–1811. 18. Zhan, C., Zeng, Y., & Zhang, R. (2018). Energy-efficient data collection in UAV enabled wireless sensor network. IEEE Wireless Communications Letters, 7(3), 328–331. 19. Sun, Y., Xu, D., Ng, D. W. K., Dai, L., & Schober, R. (2019). Optimal 3D-trajectory design and resource allocation for solar- powered UAV communication systems. IEEE Transactions on Communications, 67(6), 4281–4298.
A Systematic Review on Underwater Image Enhancement and Object Detection Methods Chandni, Akanksha Vats, and Tushar Patnaik
Abstract In the last decade, the number of underwater image processing research has increased significantly. This is primarily due to society’s dependency on the precious resources found underwater and to protect the underwater environment. Unlike regular imaging in a normal environment, underwater images suffer from low visibility, blurriness, color casts, etc. due to light scattering, turbidity, darkness, and wavelength of light. For effective underwater exploration, excellent approaches are necessary. This review study discusses the survey of “underwater image enhancement and object detection” methods. These methods are outlined briefly with the available dataset and evaluation metrics used for underwater image enhancement. A wide range of domain applications is also highlighted. Keywords CLAHE · USM · Faster R-CNN · YOLOv3 · WaterNet · UIE-Net
1 Introduction Water covers 70% of the total Earth’s crust, making it a rich subject that has been one of the research areas for two decades long. The underwater image captured is many times color distorted. The hue color also matches the green and blue parts of the water. In order to solve this problem, we must first understand the fundamental physics of light transmission in water. Due to the medium’s physical attributes, which cause to generate deterioration effects not seen in conventional images shot in air. Because source light is attenuated exponentially as it travels in the water, underwater images are characterized by poor perceptibility, resulting in scenes that are poorly contrasted Chandni (B) · A. Vats · T. Patnaik Centre for Development of Advanced Computing, Noida, India e-mail: [email protected] A. Vats e-mail: [email protected] T. Patnaik e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_29
359
360
Chandni et al.
and foggy. The visibility distance is limited to roughly 20 m in transparent water and 5 m or less in muddy water due to light attenuation. Absorption (the removal of light energy) and dispersion are the two factors that create light attenuation (which changes the path of light). Because of these complex physical properties of the underwater environment, underwater image processing is more difficult. As a result, the clarity of these images has become a center for research. The following are some of the briefly discussed sources of underwater image distortion [1]: Light Scattering: Light falling on objects is reflected as well as deflected multiple times by dusty and foggy particulates present in the water before actually hitting the camera, resulting in light scattering. The image’s visibility and contrast suffer as a result. ● Forward Scattering. When light deviates randomly on its path from an object to the camera, underwater image details get blurred. ● Backward Scattering. Light before reaching to the object is reflected by water toward the camera which diminishes the visual contrast of the objects in the image. Color Change: Light propagating in the water with different wavelength bands experiences varying degrees of absorption, causing color change and giving ambient underwater areas a bluish tone. Non-Uniformity due to Artificial Light: Artificial light illuminates the environment non-uniformly, which results in a shiny spot inside the image bounded by a poorly lighted area (Fig. 1). All the above-mentioned factors will also become challenging factors in the case of detecting underwater objects. The objects present in images are distorted by light reflectivity nature at the water’s surface, and when they are very depth in water, the boundaries of the objects are hard to distinguish due to the hue color of the objects and poor lighting. The image restoration and the image enhancement are two different techniques that are used to improve the underwater image quality which will help in detecting underwater objects as well.
1.1 Restoration of Under-Water Images The aim of image restoration is to form an original image from a distorted image based on some idealized models. Physical models are required for the underwater image restoration procedures. The process of developing the degrading model, computing the model’s parameters, and solving the inverse issue is all part of the physical modelbased methodologies. Foreknowledge and varied assertions about environmental conditions are used in model-based techniques.
A Systematic Review on Underwater Image Enhancement …
361
Fig. 1 Underwater optical imaging schematic diagram [2]
1.2 Enhancement of Under-Water Images The aim of image enhancement is the processing of an image in such a way that the outcome is more appropriate than the actual image for a particular usage. Physical models are not required for the implementation of underwater image enhancing algorithms which means prior knowledge of the environment is not required to extract informative details from images. The following two categories can be used to categorize image enhancement methods: Spatial-Domain Method: In spatial-domain methods, we work on image pixels directly. To accomplish the desired improvement, alteration with pixel values will be done. Frequency-Domain Methods: In frequency-domain methods, at first the image is translated into the frequency-domain, whose result is ultimately used to execute all types of enhancement operations, and then the Inverse-Fourier transform is applied to obtain the enhanced output image. Mathematical models based on previous information are complicated, and the methods for estimating model parameters are computationally challenging. Therefore, we’ll go for underwater image enhancement methods instead of image restoration methods (Fig. 2).
362
Chandni et al.
Fig. 2 Flowchart of object detection algorithms overview
2 Literature Review 2.1 Literature Survey on Underwater Image Enhancement It has been observed lately that generalized data of images for underwater image enhancement and object detection was not available along with reference images so that different enhancement and object detection algorithms can be compared. In the year 2020, Li, et al. [3], presented a research work that includes collection of underwater images, generation of their respective reference images, and proposing a basic CNN model known as WaterNet which uses a fusion technique to enhance the underwater images. This research work will help in developing a better CNN model for image enhancement but while generating the reference images they overlook the
A Systematic Review on Underwater Image Enhancement …
363
Fig. 3 WaterNet architecture proposed by Li et al. [3]
effect of backscattering which will affect the result while enhancement and detection. WaterNet can be improved by using the backbones like U-Net architecture and residual network architecture (Fig. 3). In 2017, Pérez Soler et al. [4] developed a hybrid method in which they learned an image enhancement method from image restoration methods using an architecture of deep learning. By using a dataset of pairings of raw and restored images, the authors trained a convolutional network, which allowed the model to give restored images from degraded input images. After that, the results were compared with other image enhancement approaches, with the image restoration provided as the referenced images. The drawback of this method is that it requires a dataset of restored images to begin with. In 2017, Wang et al. [5] proposed an architecture for the enhancement of underwater images, using a network that was based on CNN known as UIE-Net(work). The proposed network trained two tasks: correction of color and removal of haziness, from which it can generate an image in which color is corrected and a transmission map, respectively. The weakness of this approach is that the author only considers two types of distortion and attempts to remove them (Fig. 4). Zheng et al. [6] in the year 2016 introduces a novel approach that was based on a single deteriorated underwater image and requires no specific hardware or prior information about the underwater environment. Contrast Enhancement and AHE (Adaptive Histogram Equalization) methods are combined in this approach and fused with USM (Un-sharped Masking) result. The author also demonstrated how the suggested algorithm and alternative techniques were performed on real underwater images. Extensive evaluation studies using these images show that the suggested method beats the present state-of-the-art till 2016. Although the results were better than previous state-of-the-art algorithms, it was not evaluated using any evaluation metrics except the histogram plotting (Fig. 5).
364
Fig. 4 UIE-Net architecture proposed by Wang et al. [5] Fig. 5 Fusion of CLAHE & USM to produce Enhanced Image proposed by Zheng et al. [6]
Chandni et al.
A Systematic Review on Underwater Image Enhancement …
365
2.2 Literature Survey on Underwater Image Detection Athira et al. [7] in the year 2021 utilized Regression-Based algorithm which is YOLOv3. The agenda of this paper is to provide a model that uses the YOLOv3 architecture and the darknet framework to automatically detect underwater objects, using the Fish 4 Knowledge dataset, this research investigates the feasibility of customtrained YOLOv3-based underwater object detection algorithms. This object detection technique is found to be the fastest among all but still comparative study with calculated accuracy is not yet presented (Fig. 6). Mahavarkar et al. [8] in the year 2020 utilized Faster R-CNN algorithms. The author used Faster Regions with Convolution Neural Network (Faster R-CNN) as a technique to perform detection in underwater objects using Machine Learning (ML) with TensorFlow API and Image Processing methods. A suitable environment is constructed so that different images of the object can be trained using an ML technique. Although it gives real-time object detection in an artificial setup, it is not tested in a real environment for example in river, ocean, etc. In another paper [9] which uses a single degraded underwater image proposes an approach in which underwater raw images are enhanced by the contrast stretching technique then segmented by adaptive thresholding, and finally object in the image is detected using the Sobel operator (operator is used for edge detection), which results in an image with darkened and clearer boundaries than the edges obtained by Canny, Prewitt and Hule et al. Although the PSNR values and SSIM values are better than comparative methods but it can still give better results if fusion techniques were applied for enhancement (Fig. 7). Fig. 6 Method for Object Detection with detected object score proposed by Athira et al. [7]
366
Chandni et al.
Fig. 7 Underwater Object Detection Method proposed by Saini and Biswas [9]
3 Underwater Image Quality Evaluation Image Quality Assessment (IQA). IQA methods can be distinguished into subjective (qualitative) and objective (quantitative) approaches [10].
3.1 Subjective/Qualitative IQA Because human observers are the final users in most multimedia systems, subjective testing is the most reliable way of measuring image quality. Subjective testing involves a group of people being asked to rate the quality of each image. Subjective evaluations, on the other hand, are costly and time-taking, which makes them unsuitable for real-world applications. As a result, mathematical models must be developed that can predict the quality evaluation as per normal human observance.
3.2 Objective/Quantitative IQA The aim of objective IQA is to develop quantitative models that can accurately and automatically predict quality of image. The Quantitative methods can be categorized into three groups [11]: FR-IQA: Full-Reference Image Quality Assessment (FR-IQA) means that the reference image is available and is undistorted and of perfect quality.
A Systematic Review on Underwater Image Enhancement …
367
RR-IQA: Reduced-Reference Image Quality Assessment (RR-IQA). When the reference image is not entirely available, RR-IQA is used. In this method, some features from the referenced images are taken and used as side information to evaluate the quality of test images. NR-IQA: No-Reference Image Quality Assessment (NR-IQA) in which the reference image is not available. Because the reference image is not always available in real-world applications, NR-IQA approaches are particularly useful in practice. In addition to basic performance evaluation measures, specialized metrics have been presented in the literature to efficiently analyze underwater image quality. The following are a few examples: MSE: MSE is used as an image quality metric [11]. It’s a full reference metric. The square of the difference between every pixel in Y and the corresponding pixel in Y i , added together, and divided by the number of pixels, is the Mean Squared Error (MSE) between two pictures, therefore least value is better. n 2 1Σ (Yi − Y i ) MSE = n i=1 Λ
(1)
SSIM: SSIM is for comparing the similarity between two images [9]. The luminance, contrast, and structure metrics are extracted from a picture by the Structural Similarity Index (SSIM) measure. These three terms will make up the Structural Similarity Index Method: SSIM(x, y) = [l(x, y)]α [c(x, y)]β [s(x, y)]γ
(2)
where α > 0, β > 0, γ > 0 denote luminance, contrast and structure, respectively. PSNR: The peak signal-to-noise ratio (PSNR) is used for determining the quality of distorted image [9] compression codec reconstruction where the signal is original information of image, and the noise is the result of compression or distortion. PSNR is calculated as follows: ) ( MAXi 2 (3) PSNR = 10 MSE Here, MAX is the maximal in the image data and its value is 255 for gray-scale or 8-bit image. UCIQE: In 2015, Yang and Sowmya [12] presented a metric for evaluating underwater color image quality (i.e., UCIQE). Its firsts quantify the non-uniform color casts, blurring of image, and low contrast of image before linearly combining these 3 elements. UCIQE in formula can be defined as
368
Chandni et al.
UCIQE = c1 × σ c + c2 × conl + c3 × μs
(4)
where σ c => of chroma’s standard deviation, conl => luminance contrast and μs = > saturation’s average, and weight coefficients are c1, c2 and c3. UIQM: In 2015, Panetta, Gao, and Agaian [13] proposed the UIQM, which is a nonreference underwater IQA metric that includes three attributes: color, sharpness, and contrast. Each attribute is driven by the human visual system. UIQM is defined as UIQM = c1 × UICM + c2 × UISM + c3 × UIConM
(5)
4 Underwater Dataset Islam et al. [14] in 2021 presented the first large collection of Semantic Segmentation of Underwater Imagery (SUIM). Around 1500 images along with annotated pixels are included in the dataset for eight diverse item categories. The images were meticulously gathered and documented by human participants during oceanic research and human-robot research. Islam et al. [15] in 24 Feb 2020 presented USR-248, which is a comprehensive dataset consisting of 3-sets of underwater images with spatial resolutions of “high” (640 × 480) and “low” (80 × 60, 160 × 120, and 320 × 240). USR-248 includes paired cases for supervised SISR model training at 2×, 4×, or 8× scales. Islam et al. in 4 Feb 2020 presented UFO-120, which is the dataset to support high-scale SESR learning, with around 1500-sample training dataset and 120-sample benchmark test set. Islam et al. in 2020 [16] presented Enhancing Underwater Visual Perception (EUVP), a high-scale dataset of paired and unpaired underwater images (of "bad" and "excellent" quality) recorded using seven distinct cameras in varied visibility circumstances during deep-sea researches and human-robot cooperation research. They also conducted a number of subjective and objective assessments, indicating that the model they proposed can learn to improve underwater image quality through the trained dataset. Li et al. [3] in 2019 constructed an Underwater Image Enhancement Benchmark (UIEB). There are 950 real-world underwater images in all, with 890 of them accompanied by reference images. They classified the remaining 60 underwater images as challenging data since they were unable to collect satisfactory reference images. Porto Marques et al. [17] in 2019 introduced the OceanDark dataset. It’s made up of 183 1280 × 720-pixel underwater images recorded by video cameras at great depths utilizing artificial lighting. The dataset is designed to assist researchers in better understanding scenes with poor lighting by giving a variety of real-world samples that may be used to develop enhancement approaches.
A Systematic Review on Underwater Image Enhancement …
369
Jian et al. [18] in 2019 constructed database called as Marine Underwater Environment Database (MUED). This collection includes 8600 underwater images of 430 separate groups of significant items with complicated backgrounds, several salient objects, and nuanced changes in stance, geographical position, illumination, water turbidity, and other factors. Li et al. in 2019 set up the U45 underwater test dataset, in which color casts, weak contrast, and fog effects from underwater deterioration are all present. Berman et al. [19] in 2018 collected a dataset called Stereo Quantitative Underwater Image Dataset (SQUID). The collection includes RAW images, TIF records, camera projection data, and distance estimates. The database contains 57 stereo pairings. Liu et al. [20] in 2019 developed a huge Real-world Underwater Image Enhancement (RUIE) dataset separated into 3 sub-categories using an undersea image capture technology. The three subgroups are focused on 3 challenging areas to improve: image perceptibility, color casts, and higher-level classification. Duarte et al. [21] in 2016 proposes a dataset TURBID. They were able to manage the amount of image deterioration caused by underwater features in a simulation including 3D objects that represented seabed characteristics. TURBID is made up of five different subsets of degraded images, each with its own reference images.
5 Applications ● Underwater Resource Exploration for the development of medical drugs, food, and energy sources [22]. ● Marine Ecological Research. ● Marine environment protection. ● Naval Military applications. ● Study of flora and fauna. ● To detect the pollutants like polybags, garbage, and broken waste metallic items. ● Prospection of an Ancient Shipwreck. ● Assessment of Underwater Ecological Destruction. ● Periodic Transitions Detection.
6 Discussion The review illustrates the requirement for even more work in the domain of “underwater image enhancement and object detection.” The issues that arise as a result of the underwater environment like scattering, absorption, attenuation, non-uniformity of artificial light, and hazy particles present underwater, all of these properties are characteristics of the water medium and will result in distorted images. Various methods for enhancement of the underwater images and detection of underwater objects are
370
Chandni et al.
Table 1 A comparative study of various papers on underwater image enhancement and object detection Year
Author
Paper name
Algorithm(s)/Method used
Result obtained
2020
Li et al
“An Underwater WaterNet Image Enhancement Benchmark Dataset and Beyond”
MSE: 0.7976*103 PSNR (dB): 19.1130 SSIM: 0.7971
2017
Pérez Soler et al
“A Deep Learning Approach for Underwater Image Enhancement”
Deep Convolutional Neural Network
Error Rate: 14.1%
2017
Wang et al.
“A deep CNN method for underwater image enhancement”
UIE-Net
Entropy: 7.5735 PCQI: 1.1229
2016
Zheng et al.
“Underwater Image Fusion of CLAHE Enhancement and USM Algorithm Based on CLAHE and USM”
2021
Athira et al.
“Underwater Object Detection model based on YOLOv3 architecture using Deep Neural Networks”
YOLOv3
Accuracy: 96.17% mAP: 96.61% Average IoU: 69.28% F1 Score: 0.92
2020
Mahavarkar et al.
“Underwater Object Detection using Tensorflow”
Faster R-CNN
Accuracy: 96.31% Test Time: 0.2 s
2019
Saini et al.
“Object Detection in Sobel Operator Underwater Image by Detecting Edges using Adaptive Thresholding”
PSNR (dB): 63.028 (small fish) SSIM: 0.9984 (small fish)
explored and identified in this study on the basis of image enhancement and object detection-based approaches. Table 1 shows the summarized reviewed techniques.
7 Conclusion This study is a compilation of research in the fields of underwater image enhancement and object detection. The underwater imaging model is explored, as well as the influence of fading colors on underwater image degradation. This paper discusses related research and recent applications of underwater image enhancement and object detection approaches in order to raise awareness of the need for
A Systematic Review on Underwater Image Enhancement …
371
underwater image enhancement and object detection, as well as to encourage future researchers with their findings. In the past few years, there have been a lot of advances in the development of algorithms for single underwater image enhancing approaches and for detection of underwater objects approaches, but only a few combined models can be used to enhance images and detect underwater objects have been developed. Although significant progress has been made in the domain of underwater image enhancement and object detection, there is still a lot of room for fusion-based image processing techniques to aid underwater exploration. Furthermore, a hybrid model for image enhancement and object detection using the most up-to-date object detection algorithms is required.
References 1. Schettini, R., & Corchs, S. (2010). Underwater image processing: State of the art of restoration and image enhancement methods. EURASIP Journal on Advances in Signal Processing, 2010, 1–14. 2. Quan, X. (2019). An underwater image enhancement method for different illumination conditions based on color-tone correction and fusion-based descattering, 19. 3. Li, C., et al. (2020). An underwater image enhancement benchmark dataset and beyond. IEEE Transactions on Image Processing, 29, 4376–4389. 4. Pérez Soler, J., et al. (2017). A deep learning approach for underwater image enhancement. In International Work-Conference on the Interplay Between Natural and Artificial Computation (pp. 183–192). 5. Wang, Y., et al. (2017). A deep CNN method for underwater image enhancement. In IEEE International Conference on Image Processing (ICIP) (pp. 1382–1386). 6. Zheng, L., et al. (2016). Underwater image enhancement algorithm based on CLAHE and USM. In IEEE International Conference on Information and Automation (ICIA) (pp. 585–590). 7. Athira, P., et al. (2021).Underwater object detection model based on YOLOv3 architecture using deep neural networks. In 7th International Conference on Advanced Computing and Communication Systems (ICACCS) (Vol. 1, pp. 40–45). 8. Mahavarkar, A., et al. (2020). Underwater object detection using tensorflow. In ITM Web of Conferences (Vol. 32, p. 5). 9. Saini, A., & Biswas, M. (2019). Object detection in underwater image by detecting edges using adaptive thresholding. In 3rd International Conference on Trends in Electronics and Informatics (ICOEI) (pp. 628–632). 10. Mohammadi, P., et al. (2014). Subjective and objective quality assessment of image: a survey. Majlesi Journal of Electrical Engineering, 9. 11. Raveendran, S., et al. (2021). Underwater image enhancement: a comprehensive review, recent trends, challenges and applications. Artificial Intelligence Review, 54. 12. Yang, M., & Sowmya, A. (2015). An underwater color image quality evaluation metric. IEEE Transactions on Image Processing, 24, 6062–6071. 13. Panetta, K., et al. (2016). Human-visual-system-inspired underwater image quality measures. IEEE Journal of Oceanic Engineering, 41, 541–551. 14. Islam, M. J., et al. (2020). Semantic segmentation of underwater imagery: Dataset and benchmark. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1769–1776). 15. Islam, M. J., et al. (2020). Underwater image super-resolution using deep residual multipliers. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 900–906).
372
Chandni et al.
16. Islam, M. J., et al. (2020). Fast underwater image enhancement for improved visual perception. IEEE Robotics and Automation Letters, 5, 3227–3234. 17. Porto Marques, T., et al. (2019). A contrast-guided approach for the enhancement of lowlighting underwater images. Journal of Imaging, 5, article no. 79. 18. Jian, M., et al. (2019). The extended marine underwater environment database and baseline evaluations. Applied Soft Computing, 80. 19. Berman, D., et al. (2020). Underwater single image color restoration using haze-lines and a new quantitative dataset. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP, 1–1. 20. Liu, R., et al. (2019). Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light. IEEE Transactions on Circuits and Systems for Video Technology, 30, 4861–4875. 21. Duarte, A., et al. (2016). A dataset to evaluate underwater image restoration methods. In OCEANS 2016 (pp. 1–6), Shanghai. 22. Khurana, K. (2020). A review of image enhancement techniques for underwater images. Bioscience Biotechnology Research Communications, 13, 40–44.
IoT-based Precision Agriculture: A Review V. A. Diya, Pradeep Nandan, and Ritesh R. Dhote
Abstract Agriculture is one of the important economic sectors in India, constitutes 20% of the gross domestic product (GDP) of the country. The use of the Internet of Things (IoT) and unmanned aerial vehicles (UAV) in agriculture helps farmers improve their productivity through better prediction, real-time monitoring, and efficient management of crops. This review aims to highlight the role of UAVs in crop health monitoring, various critical parameters, wireless sensor technologies (WSNs), and platforms used in IoT-based precision agriculture (PA), which significantly improves productivity when compared to manual farming. Keywords IoT · Precision agriculture · WSN · Zigbee · WiFi · UAV · Machine learning
1 Introduction Though agriculture is the backbone of the Indian economy, Indian farmers confront numerous challenges, including climate change, ambiguous pesticide and fertiliser use, soil degradation, and the rapid spread of diseases among crops [1]. Precision agriculture reduces expenses by means such as predicting where fertiliser needs to be applied and improving water efficiency [2]. Precision agriculture could use the Internet of Things (IoT) to assist farmers to enhance production by better estimating and analysing parameters that are crucial for crop growth in real time [3]. UAVbased health monitoring could aid in the identification of diseases and calculation of vegetation indices [4, 5].
V. A. Diya (B) · P. Nandan · R. R. Dhote Centre for Development of Advanced Computing (C-DAC), Bihar, Patna, India e-mail: [email protected] P. Nandan e-mail: [email protected] R. R. Dhote e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_30
373
374
V. A. Diya et al.
Rice and wheat are two crops that are widely grown in India and require hectares of land for cultivation. It necessitates a large number of people, a lot of hard labour, and constant attention. Using cutting-edge technology like IoT and UAV, however, farmers may reduce their workload by automating crop monitoring, irrigation, pesticide and fertiliser application, and disease diagnosis. IoT assists by gathering and analysing data to determine where additional attention is needed in the field. It has the ability to deploy large WSNs across the field and collect real-time data. Actuators can be used to automate irrigation and other mechanical activities [6]. With the help of different camera modules like visible multispectral, hyperspectral, or thermal, UAVs assist in collecting high-quality images of the crop [7]. We could use software like Agisoft metashape software to create a three-dimensional (3D) model of the field or machine learning to analyse the data collected by the UAV [4, 8]. This study looks at all of the current precision agriculture technologies that use IoT and UAV technology. It goes through WSNs, their advantages and disadvantages, IoT solutions, and image-based field monitoring. It depicts what precision agriculture still lacks, as well as what India’s future with cutting-edge technologies might entail. The remainder of the paper is laid out as follows. Section 2 goes over several data collection techniques, like WSN and UAV image acquisition. Section 3 discusses how the data is analysed. Section 4 focuses on comparative analysis of various reviewed works. Section 5 concludes the paper.
2 Data Acquisition Techniques The foremost step in the IoT is acquiring an adequate amount of data for analysis. There are primarily two approaches to collecting the required data. First, using WSN [9, 10]. The real-time sensor data provides a clear image of the field to be observed. Some of the characteristics that can be tested and monitored are field temperature and humidity, soil moisture, temperature and humidity of soil, and pH of soil [11– 13]. The acquisition of images is the second method. The use of an unmanned aerial vehicle (UAV) [4, 14–17], satellite images [18], or images taken with a smartphone camera are all common ways to collect image data. Some research works utilised data sets that was readily available [19, 20].
2.1 WSNs WSN collects sensor data in real time and sends it to a system that can process it [21]. In some applications, such as smart irrigation, it also aids in the activation of actuators to alleviate water scarcity. With the use of WSNs efficiently, it is possible to farm from anywhere, at any time. A WSN node is depicted in Fig. 1 as a simple block diagram. Sensors, processing units, power supplies, and communication technology that allows data transfer are
IoT-based Precision Agriculture: A Review
375
Fig. 1 Block diagram of a WSN node
the major components of the WSN node, as indicated in the diagram. The number of such nodes deployed should increase in proportion to the expansion in the area that has to be monitored. Sensors are endpoints of the IoT. Temperature, pressure, and humidity are examples of physical data that they collect. The data is then converted into electrical signals, which the system can understand. This data is transferred using a communication technology like Zigbee or WiFi to a point where it can be conveniently processed [22]. Data is sometimes moved to the cloud so that it can be accessed from anywhere. The endpoint module can be powered by a variety of sources, including solar power [4] or a battery.
2.1.1
Sensors
For each crop, there will be an ideal parameter value that produces the highest yield. As a result, precision agriculture relies heavily on reliable monitoring of these parameter values. Sensors are devices that detect physical characteristics in the environment and output signal values based on them. The problem with sensors is that they are not always properly calibrated, necessitating laboratory calibration [23]. Table 1 enlists the parameters that define crop growth as well as the sensors that are used to measure it. As given in Table 1 Shafi et al. [4] measured soil moisture and temperature, as well as air temperature and relative humidity, for real-time monitoring of crop health indicators. Tyagi et al. [11] devised a system that alerts farmers when growth-supporting parameters like temperature and humidity sensors, soil moisture sensors, and rainfall are not at their optimal levels. Jaiswal et al. [36] created a system for real-time monitoring. For this, they took measurements of soil moisture, temperature, and humidity. Estrada-López et al. [12] measured soil conductivity, temperature, and humidity to enable accurate soil monitoring. In order to establish an intelligent agricultural platform, Ma et al. [3] monitored a total of 8 parameters: CO2 concentration, temperature, air humidity, light intensity, soil temperature, soil moisture, wind direction, and wind speed.
376
V. A. Diya et al.
Azimi et al. [24] created a self-regulating environmental system that maintains ideal mushroom growth conditions. Sensors were used to measure CO2, temperature, and humidity. For monitoring the growth of greenhouse vegetables and grapes, Codeluppi et al. [25] monitored temperature and humidity in both air and soil. Karimah et al. [6] acquired ambient temperature and humidity, soil moisture, and soil pH so that a given smart pot may irrigate itself as needed. Trilles et al. [26] describes a scenario in which a vineyard is monitored by measuring data such as air temperature and humidity, wind speed and direction, rain, soil moisture, and atmospheric pressure. This aids in the detection of diseases that harm vine plants. For the goal of monitoring a greenhouse, Syafarinda et al. [27] examined light intensity, air temperature, and humidity. This aids in dealing with abrupt climate changes that have an impact on crop growth. For the construction of an optimal green wall, Rivas-Sanchez et al. [28] continuously monitored soil moisture, air temperature and relative humidity, light intensity, rain, and water flow. Environmental characteristics including as air temperature and humidity, illuminance, soil humidity, pH, and CO2 concentration were measured by Guandong et al. [22] for better farming management. For better tomato development in a greenhouse environment, Erazo-Rodas et al. [29] measured air temperature, relative humidity, soil moisture, solar radiation, luminosity, and CO2. Thus, numbered parameters as shown in Table 1 define the crop health more accurately than manual monitoring.
Table 1 Parameter monitored and sensors used Work in Parameter monitored [4, 6, 24–26] [26] [27] [26, 28]
Air temperature, air humidity Atmospheric pressure Luminosity Rain
[22, 24] [26, 29] [29] [3, 4, 6, 11, 26, 30, 31]
CO2 concentration Wind speed and direction Solar radiation Soil moisture
[3, 4, 12, 22, 25, 30, 32] [6, 13, 22, 33–35] [12]
Soil temperature\humidity pH of soil Soil conductivity
Sensors used DHT11, DHT22, AM2032 MPL3115A2 BH1750, TSL2561 YF-S402, YL-83, SE-WS700D MG-811, MQ135 WS-3000, SEN08942 SQ-110 VMA303,VH400, 6HL-69, ECH2O-10HS, MP406 DS18B20,SHT-10,MCP9808 E-201, PCE-228S SEN0114
IoT-based Precision Agriculture: A Review
377
Table 2 Different communication technologies used Power Range Zigbee [6, 12, 22, 29, 30, 37] WiFi [22, 24, 25, 29, 36] GSM [4, 26] LoRa [3, 4]
2.1.2
Data rate
Low
10 to 100 m
Low(250 Kbps)
High
100 m
High
High Low
Global Less than 14 Km
High Low (10 Kbps)
Communication Technologies
WSN relies heavily on communication technology. When choosing a communication method, power, range, and data rate are all factors to consider. Because sensor data is small in size, the data rate in WSN could be low. The communication range varies depending on the application. Due to the large area of agriculture, rice or wheat based projects usually demand a higher range. Because it helps to lengthen battery life, power consumption should be kept as low as possible. Table 2 shows the types of technologies utilised for data transfer in IoT-based agricultural initiatives. Zigbee [6, 12, 22, 29] is a low power and low data rate device. It uses the IEEE 802.15.4 standard. Because the data acquired by sensors would be extremely small in size, the Zigbee data rate of order 250 Kbps is not a limitation for our use cases. When Zigbee is not in use, it can go into sleep mode to save energy. On battery power, the device can last for years. Because Zigbee uses a mesh topology, there is no single point of failure. The device’s range is between 10 and 100 m. Because Zigbee performs better in confined areas [12], it is strongly recommended. WiFi [22, 24, 25, 29, 36] is the most widely used technology among WSN technologies. The IEEE 802.1 standard is the foundation for WiFi. The issue with WiFi is that it uses a lot of power and has a short range, as illustrated in Table 2. As a result, it is preferred in applications that require a high data rate. If network difficulties are not present, GSM [4, 26] gives global coverage with a pretty good data rate. In circumstances where the farmer needs to be notified by text message, GSM can be used. GSM has the problem of consuming a lot of power. As a range is order of kilometers, LoRa [3, 4] delivers better geographical coverage. It aids in the secure construction of a network with a low data rate, such as Zigbee. However, when compared to Zigbee, LoRa consumes a lot of power. As a result, it is rarely used. Thus, Zigbee-based WSNs are better suited for large sensor network deployments since they function better in limited IoT environments like low power and low data rate.
378
V. A. Diya et al.
Table 3 Types of processors and specification CPU RAM Raspberry pi [4, 25, 28] Arduino [3, 6, 28] ESP8266 [24] ESP32 [36]
2.1.3
Cost
Clock frequency
Connectivity Bluetooth and WiFi
64 bit
1 GB
High
1.4 GHz
8 bit
2 KB
Low
16 MHz
32 bit 32 bit
64 KB 520 KB
Low Low
80MHz 160 MHz
WiFi Bluetooth and WiFi
Processors
The processor could automate irrigation or send a warning to the farmer when a deviation in a parameter is identified. As a result, they serve as the brains of IoT technologies and help to keep the overall system compact. Table 3 shows the processors that are frequently used in the agricultural industry to implement WSN. Due to its low cost and effective functioning, the Arduino is the most popular microcontroller [3, 6, 28]. It has an 8 bit processor and 2KB of RAM. The Arduino is sufficient for minor applications such as coordinating Zigbee devices and smart irrigation. It features an Atmega328P microcontroller that can be programmed in C/C++. The lack of wireless communication, such as WiFi, is a major restriction on the Arduino. The Raspberry Pi [25, 28] is a high-processing-power device that can even replace a desktop computer’s CPU. It includes WiFi and Bluetooth built-in, as well as a 64 bit processor and 1 GB of RAM. It is preferred only in instances when a large amount of data must be processed due to its high cost. The ESP8266 [24] is a low-cost 32 bit microcontroller with WiFi capabilities. With a faster processing speed and additional built-in Bluetooth connectivity, the ESP32 [36] outperforms the ESP8266. In terms of performance and connection, both the ESP8266 and the ESP32 are better than the Arduino. Table ?? shows the various precision agriculture technologies employed by the researchers. As we can see, the two most common applications of IoT in agriculture are automated irrigation and real-time crop monitoring. Sabo et al. [30] monitored even insect movements and sounds for controlling insecticides. To create a smart irrigation system like the smart pot of Karimah et al. [6], sensors such as soil moisture, temperature, and humidity sensors must be used. The data collected from these sensor endpoints are then transferred through communication technology. Zigbee is the best option in restricted situations like this as it is power efficient and highly reliable. This information is sent to the cloud, where it can be analysed further. The system operates an irrigation actuator based on the results of the analysis.
IoT-based Precision Agriculture: A Review
379
Table 4 Various methods used by the researchers for precision agriculture Work of
Year
Use
Parameters
Communication methods
Processed by
[4]
2020
Crop health mapping
Sm1 & soil T2 , air T & H3
LoRA, GSM
Raspberry pi
[11]
2020
System that alert farmer when parameters crossed a threshold
T & H , Sm & Rain
GSM
Atmega 328P controller IC
[25]
2020
Growth monitor for green house vegetables and grapes
T & H of air & soil
WiFi, LoRa
Raspberry pi, CC3200 and LoPy4
[36]
2020
Maintaining optimum Sm, soil T & H moisture level by soil monitoring and smart irrigation
WiFi
ESP32
[6]
2019
Smart pot that irrigates itself
T & H of air, Sm & pH of soil
Zigbee
Arduino Nano, ESP8266
[28]
2019
Irrigation programmer for ideal green wall
Sm, air T & H, light intensity, rain, water flow
USB connection
Arduino Uno Raspberry Pi
[24]
2018
Self-regulating environmental system which maintains optimum condition for mushroom growth
CO2, T & H
WiFi
ESP8266
[26]
2018
Vineyard monitoring
Air T & H, wind GSM speed & direction, rain, Sm, atmospheric pressure
Particle Electron
[27]
2018
Monitoring of a greenhouse
light intensity, air T &H
MiFi
Wemos D1 Mini
[29]
2018
Parameter monitoring Air T & H, Sm, solar for tomatoes in radiation, luminosity, greenhouse CO2
Zigbee, WiFi
Waspmote processing card
[12]
2018
Soil parameter estimation
Soil conductivity, T &H
Zigbee
MSP430-FR5969
[3]
2018
An intelligent agricultural platform through real time sensor data collection and monitoring
CO2, air T & H, soil LoRa T, Sm, light intensity, wind direction & wind speed
Arduino
[30]
2018
Plant health and insect monitoring
Sm, soil T & H vibration sensor colour sensor
STM32
Sm - Soil moisture T - Temperature H - Relative Humidity
Zigbee
380
V. A. Diya et al.
For a higher yield, each crop will have an optimal set of parameter values that must be followed. As a result, during the analysis stage, it must be determined whether the parameter value falls inside this spectrum. WSN’s successful deployment allows us to manage the field without having to be physically present. Thus, in IoT-based precision agriculture, WSNs are becoming increasingly common.
2.2 Image Acquisition UAVs, satellites, and mobile phones are all common means for collecting images for analysis of crops. UAVs are becoming increasingly popular because they provide higher image quality than satellites. Image analysis helps to increase yield by using image processing and machine learning [20]. A UAV can be used to capture highresolution images of a large region. Different cameras installed in them will collect enough data to forecast and monitor crop growth. Using high-quality RGB photos of leaves and their processing, one may categorise the plant as healthy or unhealthy. As Quiroz et al. [38] showed, image preprocessing such as converting RGB photos to other colour models such as YCbCr might produce superior results. Multispectral images, on the other hand, can be utilised to calculate the Normalized Difference Vegetation Index (NDVI) [39, 40] of plants. The NDVI scale measures the amount of chlorophyll in plants. If the chlorophyll level is quite high, the plant is generally healthy. The equation to calculate NDVI is given by N Ir − R N DV I = (1) N Ir + R where NIr is near infrared and R is the red value of the bands. The SFM technique creates a 3D model of the given image set [7]. There are softwares that can create 3D models as well [41]. These gathered 3D models actually lighten the farmer’s load because analysis of them yields more information than human monitoring. When the agricultural field is large, satellite photos are useful. However, the accuracy of the results produced is low, and it has even been suggested that UAVs could perform better [8]. In summary, the IoT, through the deployment of WSN, aids in the collection of crucial agricultural growth information. Crop health studies also rely heavily on images gathered by UAVs. The use of thermal cameras aids in the calculation of NDVI, whereas a visible camera system aids in 3D modelling and disease detection.
3 Data Analysis Simple processing is required for data acquired via sensors, such as determining if a parameter’s value falls within the range of an optimum value or not. An 8 bit micro-
IoT-based Precision Agriculture: A Review
381
controller is sufficient for this task. However, for accurate findings following image acquisition, computer-assisted processing is required. Image preprocessing, segmentation, feature extraction, and classification are all processes in the traditional method of image processing [42]. The image preprocessing step improves image quality so that the final result is correct. Changing the colour model, boosting brightness, and lowering contrast are some examples. Segmentation aids in the extraction of relevant information from certain regions of a picture. Feature extraction is the process of extracting data from photographs, which is usually in the form of vectors. Finally, methods such as clustering or neural networks can be used for classification. Machine learning approaches have recently supplanted traditional methods because they provide accurate image analysis results in fewer stages [20, 43]. The most often used machine learning classification algorithms are support vector machines (SVM) and neural networks(NN). Table 5 shows the many forms of data analysis that UAV-based precision agriculture systems employ. Shafi et al. [4] employed the classifiers SVM, Naive Bayes(NB), and NN and discovered that NN outperforms the others with a 98.4% accuracy. During training, NN translates input images into a categorised output, such as whether a plant is healthy or not. We can feed 2D image matrices to NN in our applications to classify images based on the types of diseases present or to detect the presence of weeds. NB is a probabilistic classification algorithm based on Bayes’ theorem. Another machine learning algorithm that functions as a margin classifier and gives reliable results is SVM [16, 19]. However, utilising SVM alone has the disadvantage of necessitating additional procedures prior to classification, such as image preprocessing, segmentation, and feature extraction. Kitpo et al. [19] used histogram equalisation and thresholding for preprocessing and segmentation, respectively. They extracted colour and shape features and used SVM to apply them. Shafi et al. [4] creates NDVI maps using multispectral data acquired from the UAV. The health status of the crops can be evaluated by determining whether the NDVI value is greater than a threshold value. Saha et al. [14] used an SVM classifier to analyse hyperspectral data obtained from an Adafruit AMG8833 camera. With the use of UAV and IoT methods, they built a disease mapping application. In comparison to visible cameras, hyperspectral cameras provide more information since they cover a wider range of frequencies. But this creates a dilemma of increased storage requirements and a requirement for more computing power. Multispectral data, on the other hand, provides information that is more specific to agricultural applications as it covers a smaller set of frequency bands and hence is a superior choice. Saha et al. [14] also suggested a selective seed and pesticide sprayer, which makes UAVs more appealing. Guo et al. [8] studied and compared satellite data with UAV data and discovered that UAV data is sufficient for rice crop health monitoring and gives accurate results. For this study, a DJI TM Phantom 4Pro V2.0 quadcopter with a Tetracam TM ADC multispectral camera was used. Yashwanth et al. [20] demonstrated that deep learning algorithms aid in image-based crop health monitoring classification. Quiroz et al. [38] obtained an RGB image with Phantom 3 and converted it to YCrCb space for a
2020 2020
2019
2019
2019
2018 2018
2018
[20] [8]
[38]
[7]
[40]
[19] [39]
[14]
b
Camera used
Inbuilt
NA
Adafruit AMG8833, RGB-D Sensor
NIAES-MM3 Optris PI-LW SONY FDR-X3000 XR q350 pro Parrot Sequoia, Autoshot SX260 HS Used readily available dataset Custom made Mobius (modified)
ACSL-PF1
Phantom 3
Sentera multispectral imager Used readily available dataset DJI TM Phantom 4 Pro V2.0 quadcopter Tetracam TM ADC
DJI Phantom 4
UAV
Additional functions other than crop health monitoring. Neural Network. c Data not available. d Normalized Difference Vegetation Index. e Structure-from-Motion technique. f Support vector machine.
a
2020
[4]
Work of Year
Table 5 UAV involved research works in precision agriculture
multispectral, thermal, visible video visible, multispectral visible visible, multispectral hyperspectral visible
Visible
Visible multispectral
multispectral
Data type
Disease mapping NA Seed and pesticide sprayer
SVMf
NA
NA
SVM NDVI
NDVI
Gaussian Adaptive Threshold & Hough transform SFMe
Selective spraying Support satellite image analysis Crop line identification of mango trees
NAc
NNb & NDVIc NN 3D Modelling
Functionsa
Data analysis
382 V. A. Diya et al.
IoT-based Precision Agriculture: A Review
383
better outcome. To observe the mango tree crop line, they used a Gaussian adaptive threshold and the Hough transform. To gather diagnostic information about an agricultural field, Inoue et al. [7] utilised three types of data: multispectral, thermal, and visible video. Data is collected using the ACSL-PF1 and Sony FDR-X3000 and Optris PI-LW cameras, and the SFM approach is used to create a 3D model of the field for health monitoring. De oca et al. [39] created a UAV using the ardupilot open source flight controller software. Two mobius cameras were employed. One was used to capture RGB data, while the other was altered to collect multispectral data. The NDVI is used to assess a crop’s health. Ya et al. [40] monitored rice crop in the same way, with differences only in the UAV and cameras utilised for data collecting. In essence, data collected by UAVs and their 3D modelling and machine learning classification algorithms assist farmers in better monitoring of crops and yields.
4 Technical Discussion Shafi et al. [4] integrated the latest technology such as machine learning, IoT, and UAV for crop health monitoring. When compared to only measuring the NVDI, this provides better insight. Tyagi et al. [11] used a pre-trained Resnet34 to classify a crop image dataset and achieved a high accuracy of 99%. They created a system that alarms farmers when vital field parameters cross a certain threshold. This guarantees that the field receives the proper attention when it is required. Lorafarm, an indoor and outdoor WSN built by Codeluppi et al. [25], displayed data collected on a dashboard. They did not, however, incorporate any data analysis. Many works employed smart irrigation by limiting monitoring to only soil [6, 12, 28, 36]. Only if all of these components, such as irrigation, monitoring, and farmer alerting, are combined and performing properly, can a beneficial model stated to be deployed. Because UAV-based analysis could outperform satellite image analysis, we could fully abandon satellite photos in favour of UAVs, as Guo et al. [8] suggested. UAVs can also be used for disease mapping and 3D modelling. Yashwanth et al. [20] attained a 96.3% efficiency with deep learning technology, however, they did it by using freely available datasets rather than real-time acquired data, like Kitpo et al. [19]. UAVs could implement crop line detection [38] and sprayers [14]. When UAVs use multispectral cameras to calculate NVDI values, they produce effective results [7, 39, 40]. Aside from that, with the visible spectra images collected by UAVs, reliable disease diagnosis is still out of reach. As a result, while IoT technology is sophisticated enough for real-time monitoring with WSN, UAVs still lack the technology that makes them cost-effective and preferred.
384
V. A. Diya et al.
5 Conclusion Various WSN technologies and UAV-based processing for growth monitoring and smart irrigation are discussed in this review. Although WiFi is the most widely used WSN, Zigbee is better for our applications due to the constrained agricultural environment. For crop monitoring, more than 60% of studies use multispectral image processing and NDVI calculation with the help of UAV. Deep learning algorithms are favoured for crop health mapping over traditional image processing approaches because they are more accurate and efficient. Custom-built UAVs will be more costeffective and promising in the Indian agriculture industry. Obtaining high-quality photos for disease and weed identification remains a challenge. Only when readily available datasets are used can high accuracy, such as above 95%, be achieved. We could make UAVs a vital component of agriculture in the future, reducing farmer burdens through real-time identification of diseases and weeds, as well as pesticide and fertiliser spraying. If anything goes wrong in the field, such as an unexpected change in the environment, the IoT might establish an alarming system for farmers. Thus, using the UAV and WSN combined will be more fruitful. The Internet of Things (IoT), unmanned aerial vehicles (UAVs), and deep learning algorithms are likely to play a big role in the agriculture industry’s new era. Acknowledgements We extend our sincere thanks and gratitude to the Centre for Development of Advanced Computing (C-DAC), Patna, for providing the administrative and technical support to the IoT lab for conducting this research work. We would like to thank Mr. Kunal Abhishek (Joint Director, C-DAC, Patna) for his inputs.
References 1. Singh, R., Singh, H., & Raghubanshi, A. S. (2019). Challenges and opportunities for agricultural sustainability in changing climate scenarios: A perspective on Indian agriculture. Tropical Ecology, 60(2), 167–185. 2. Mintert, J. R., Widmar, D., Langemeier, M., Boehlje, M., & Erickson, B. (2016). The challenges of precision agriculture: Is big data the answer? Technical report. 3. Ma, Y. W., & Chen, J. L. (2018). Toward intelligent agriculture service platform with lora-based wireless sensor network. In 2018 IEEE International Conference on Applied System Invention (ICASI) (pp. 204–207). IEEE. 4. Shafi, U., Mumtaz, R., Iqbal, N., Zaidi, S. M. H., Zaidi, S. A. R., Hussain, I., & Mahmood, Z. (2020). A multi-modal approach for crop health mapping using low altitude remote sensing, internet of things (iot) and machine learning. IEEE Access, 8, 112708–112724. 5. Bah, M. D., Dericquebourg, E., Hafiane, A., & Canals, R. (2018). Deep learning based classification system for identifying weeds using high-resolution UAV imagery. In Science and Information Conference (pp. 176–187). Springer. 6. Karimah, S. A., Rakhmatsyah, A., & Suwastika, N. A. (2019). Smart pot implementation using fuzzy logic. Journal of Physics: Conference Series, 1192, 012058. IOP Publishing. 7. Inoue, Y., & Yokoyama, M. (2019). Drone-based optical, thermal, and 3d sensing for diagnostic information in smart farming–systems and algorithms. In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, (pp. 7266–7269). IEEE.
IoT-based Precision Agriculture: A Review
385
8. Guo, Y., Jia, X., Paull, D., Zhang, J., Farooq, A., Chen, X., & Islam, M. N. (2019). A dronebased sensing system to support satellite image analysis for rice farm mapping. In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium (pp. 9376–9379). IEEE. 9. Lee, I., & Lee, K. (2015). The internet of things (IoT): Applications, investments, and challenges for enterprises. Business Horizons, 58(4), 431–440. 10. Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems, 29(7), 1645–1660. 11. Tyagi, K., Karmarkar, A., Kaur, S., Kulkarni, S., & Das, R. (2020). Crop health monitoring system. In 2020 International Conference for Emerging Technology (INCET) (pp. 1–5). IEEE. 12. Estrada-López, J. J., Castillo-Atoche, A. A., Vázquez-Castillo, J., & Sánchez-Sinencio, E. (2018). Smart soil parameters estimation system using an autonomous wireless sensor network with dynamic power management strategy. IEEE Sensors Journal, 18(21), 8913–8923. 13. Barik, S., & Naz, S. (2021). Smart agriculture using wireless sensor monitoring network powered by solar energy. In 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) (pp. 983–988). IEEE. 14. Saha, A. K., Saha, J., Ray, R., Sircar, S., Dutta, S., Chattopadhyay, S. P., & Saha, H. N. (2018). Iot-based drone for improvement of crop quality in agricultural field. In 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 612–615). IEEE. 15. Shafi, U., Mumtaz, R., Hassan, S. A., Zaidi, S. A. R., Akhtar, A., & Malik, M. M. (2020). Crop health monitoring using iot-enabled precision agriculture. In IoT Architectures, Models, and Platforms for Smart City Applications (pp. 134–154). IGI Global. 16. Uddin, M. A., Mansour, A., Le Jeune, D., Ayaz, M., & Aggoune, E. M. (2018). UAV-assisted dynamic clustering of wireless sensor networks for crop health monitoring. Sensors, 18(2), 555. 17. Bhuvaneshwari, C., Saranyadevi, G., Vani, R., & Manjunathan, A. (2021). Development of high yield farming using iot based UAV. In IOP Conference Series: Materials Science and Engineering, vol. 1055, p. 012007. IOP Publishing. 18. Kovalskyy, V., & Yang, X. (2020). Assessment of multiplatform satellite image frequency for crop health monitoring. In EGU General Assembly Conference Abstracts, p. 12328. 19. Kitpo, N., & Inoue, M. (2018). Early rice disease detection and position mapping system using drone and iot architecture. In 2018 12th South East Asian Technical University Consortium (SEATUC) (vol. 1, pp. 1–5). IEEE. 20. Yashwanth, M., Chandra, M. L., Pallavi, K., Showkat, D., & Satish Kumar, P. (2020). Agriculture automation using deep learning methods implemented using keras. In 2020 IEEE International Conference for Innovation in Technology (INOCON), pp. 1–6. IEEE. 21. Raghavendra, C. S., Sivalingam, K. M., & Znati, T. (2006). Wireless sensor networks. Springer. 22. Gao, G., Jia, Y., & Xiao, K. (2018). An IoT-based multi-sensor ecological shared farmland management system. International Journal of Online Engineering, 14(3). 23. Bychkovskiy, V., Megerian, S., Estrin, D., & Potkonjak, M. (2003). A collaborative approach to in-place sensor calibration. In Information processing in sensor networks (pp. 301–316). Springer. 24. Azimi Mahmud, M. S., Buyamin, S., Mokji, M. M., & Zainal Abidin, M. S. (2018). Internet of things based smart environmental monitoring for mushroom cultivation. Indonesian Journal of Electrical Engineering and Computer Science, 10(3), 847–852. 25. Codeluppi, G., Cilfone, A., Davoli, L., & Ferrari, G. (2020). Lorafarm: A lorawan-based smart farming modular iot architecture. Sensors, 20(7), 2028. 26. Trilles, S., González-Pérez, A., & Huerta, J. (2018). A comprehensive iot node proposal using open hardware: A smart farming use case to monitor vineyards. Electronics, 7(12), 419. 27. Syafarinda, Y., Akhadin, F., Fitri, Z. E., Widiawan, B., Rosdiana, E., et al. (2018). The precision agriculture based on wireless sensor network with mqtt protocol. In IOP Conference Series: Earth and Environmental Science, (vol. 207, p. 012059). IOP Publishing.
386
V. A. Diya et al.
28. Rivas-Sánchez, Y. A., Moreno-Pérez, M. F., & Roldán-Cañas, J. (2019). Environment control with low-cost microcontrollers and microprocessors: Application for green walls. Sustainability, 11(3), 782. 29. Erazo-Rodas, M., Sandoval-Moreno, M., Muñoz-Romero, S., Huerta, M., Rivas-Lalaleo, D., Naranjo, C., & Rojo-Álvarez, J. (2018). Multiparametric monitoring in equatorian tomato greenhouses (i): Wireless sensor network benchmarking. Sensors, 18(8), 2555. 30. Sabo, A., & Qaisar, S. M. (2018). The event-driven power efficient wireless sensor nodes for monitoring of insects and health of plants. In 2018 IEEE 3rd International Conference on Signal and Image Processing (ICSIP) (pp. 478–483). IEEE. 31. El-Magrous, A. A., Sternhagen, J. D., Hatfield, G., & Qiao, Q. (2019). Internet of things based weather-soil sensor station for precision agriculture. In 2019 IEEE International Conference on Electro Information Technology (EIT) (pp. 092–097). IEEE. 32. Hou, R., Li, T., Qiang, F., Liu, D., Li, M., Zhou, Z., Yan, J., & Zhang, S. (2020). Research on the distribution of soil water, heat, salt and their response mechanisms under freezing conditions. Soil and Tillage Research,196, 104486. 33. Wei, H., Liu, Y., Xiang, H., Zhang, J., Li, S., & Yang, J. (2020). Soil PH responses to simulated acid rain leaching in three agricultural soils. Sustainability, 12(1), 280. 34. Bhattacharyya, S., Sarkar, P., Sarkar, S., Sinha, A., & Chanda, S. (2020). Prototype model for controlling of soil moisture and PH in smart farming system. In Computational Advancement in Communication Circuits and Systems (pp. 405–411.) Springer. 35. Bhatnagar, V., & Chandra, R. (2020). Iot-based soil health monitoring and recommendation system. In Internet of Things and Analytics for Agriculture, Volume 2, pp. 1–21. Springer. 36. Jaiswal, A., Jindal, R., & Verma, A. K. (2020). Crop health monitoring system using IoT. International Research Journal Engineering Technology, 2485–2489. 37. Huang, Y., & Wang, S. (2017). Soil moisture monitoring system based on ziggbee wireless sensor network. In 2017 International Conference on Computer Systems, Electronics and Control (ICCSEC) (pp. 739–742). IEEE. 38. Quiroz, R. A. A., Guidotti, F. P., & Bedoya, A. E. (2019). A method for automatic identification of crop lines in drone images from a mango tree plantation using segmentation over ycrcb color space and hough transform. In 2019 XXII Symposium on Image, Signal Processing and Artificial Vision (STSIVA) (pp. 1–5). IEEE. 39. De Oca, A. M., Arreola, L., Flores, A., Sanchez, J., & Flores, G. (2018). Low-cost multispectral imaging system for crop monitoring. In 2018 International Conference on Unmanned Aircraft Systems (ICUAS) (pp. 443–451). IEEE. 40. Ya, N. N. C., Lee, L. S., Ismail, M. R., Razali, S. M., Roslin, N. A., & Omar, M. H. (2019). Development of rice growth map using the advanced remote sensing techniques. In 2019 International Conference on Computer and Drone Applications (IConDA) (pp. 23–28). IEEE. 41. Shafi, U., Mumtaz, R., García-Nieto, J., Hassan, S. A., Zaidi, S. A. R., & Iqbal, N. (2019). Precision agriculture techniques and practices: From considerations to applications. Sensors, 19(17), 3796. 42. Gonzalez, R. C. (2009). Digital image processing. Pearson Education India. 43. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260.
Enhancing the Security of JSON Web Token Using Signal Protocol and Ratchet System Pragya Singh, Gaurav Choudhary, Shishir Kumar Shandilya, and Vikas Sihag
Abstract User Authentication is crucial for every application. Initially, session variables were used. If a session ID was compromised, lost, or stolen, the user’s identity was compromised. This resulted from session IDs having a long lifespan and the tendency of most users not to log out of their accounts. JSON web tokens (JWTs) Tokens were introduced to overcome this problem and are currently considered industry standards. JWTs are secured using base 64 encryption and session variables. They have a shorter lifespan as compared to session variables. Even then, it is highly susceptible to Man-in-the-Middle attacks, especially over unsecured networks. Internal attack vectors also pose a significant threat in this particular scenario. To counter this problem, we explore the possibility of using the Signal algorithm with a double ratchet system to encrypt our JWTs to add a new layer of security. The result is an algorithm capable of securing primary JSON payloads in an end-to-end manner. This research aims to increase the security of JWTs using a ratchet system. Keywords Authentication · JSON · Session · Signal algorithm
1 Introduction The internet has become a glue that holds our lives together. However, like most things, there is also a dark side to it. There have been countless instances of the internet being used for nefarious purposes. Data theft is one of them. Most modern P. Singh · S. K. Shandilya School of Computing Science and Engineering, VIT Bhopal University, Bhopal, India e-mail: [email protected] G. Choudhary DTU Compute, Technical University of Denmark, 2800, Kongens Lyngby, Denmark e-mail: [email protected] V. Sihag (B) Sardar Patel University of Police, Security and Criminal Justice, Jodhpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_31
387
388
P. Singh et al.
websites collect user data to enhance user experience. This may seem innocent, but there are countless ways to abuse this data. Data is the identity of the user. In the wrong hands, data can be a deadly weapon. Misuse of data can lead to social engineering attacks, password cracking, etc. [1]. To safeguard the confidentiality of data, the development of encryption took place. Symmetric and asymmetric encryption are some of the methods of encryption. But if they are compromised, then it results in loss of user identity. The data obtained by unethical means is used for identity theft, ransom, or stealing the organization’s hardware or network infrastructure. The intruders are trying to steal the data as it has become the crux of the marketing industry. Big enterprises collect user data to understand customer needs better and further develop their product. However, more often than not, they fail to protect their data from either external or, in some cases, internal attack vectors [2]. Even big corporations like Facebook, Microsoft, Instagram, and Youtube have suffered massive data breaches. Man in the middle attack is where the attacker taps into a conversation between two users. To combat MITM (Man In The Middle), there is a dire need to enhance the current technology. JSON stands for JavaScript object notation. It is a highly used data format by which the communication between a web-based server and an endpoint system transpires. REST APIs work exclusively on JSON queries. It can incorporate both SQL and NoSQL databases and contains its objects. Each JSON object, in turn, consists of key-value pair. It is essentially a mapping between keys and values. In the last decade, JSON has become very popular. Owing to its popularity, it has become an excellent target for various attacks, including MITM. Since JSON packages are widely used across applications, generating malicious JSON packets has become trivial. Attackers may use this to automate attacks using malicious JSON packets. As such attacks are arduous to detect, it became crucial to secure JSON-based communication. To counter MITM attacks, the concept of JWTs came. JWT stands for JSON web token. The structure of JWT comprises three distinct parts: a header, payload, and signature. Tokens help authenticate the user over World Wide Web, and they also expire after their lifetime duration ends. JWTs are the current industry security standard. But even JWTs are not entirely secure. They can fall victim to various attacks, especially over insecure networks(HTTP). So in this project, we tried to make it more confident with the help of a ratchet system. We took this inspiration from using the ratchet system from the signal protocol. The signal protocol has become one of the impregnable protocols of the current decade.
1.1 Problem Statement and Our Contributions Numerous applications depend on JSON tokens for communication, authorization, and performing basic operations. The applications for JSON range from IoT-based personal smart devices to industrial-scale projects like manufacturing plants. However, JSON tokens often fall prey to various attack vectors and become a potential
Enhancing the Security of JSON Web Token …
389
security vulnerability. Thus, there is a need for JSON tokens to be secured feasibly. Our goal is to implement a way to secure JSON tokens cleanly and efficiently, with minimal overhead. The implementation should be easy to integrate into any system that uses JSON-based payloads for communication. ● We integrated JWTs with the signal protocol algorithm to enhance the security of JSON Web Tokens. ● The result can be packed in a single package such that the developer can use it directly in the form of a module. This makes it easy to integrate into any system. ● The internal API system is easy to use and straightforward. The module acts as a standalone system that can handle the entire communication. ● It can also be used as a middleware in web development frameworks by making an abstraction around it. ● The algorithm works well when the client and server communicate using a persistent connection like the socket protocol. However, it will also work using typical HTTP/HTTPS methods.
2 Related Works Javascript Web Token (JWT), defined by RFC 7519, is an open standard for securely transferring claims between parties [3]. The claims are either encoded as the payload of a Javascript Web Signature (JWS) or used as the plaintext in a Javascript Web Encryption (JWE) structure. Javascript Web Encryption is defined by RFC 7516 [4]. It represents encrypted content using JSON structures encoded in base64url encoding. The algorithms used with JWEs are mentioned in RFC 7518. It encrypts and protects the integrity of an arbitrary set of octets. Thus, claims can be digitally signed or used with a Message Authentication Code (MAC) to protect their integrity. JWTs are generally used to assert the identity of a user when they make requests. RFC 7518 (Javascript Web Algorithms or JWA) recommends the use of RSASSA-PKCS1-v1-5 using SHA-256 and highly recommends ECDSA using P-256 and SHA-256. However, if one of the parties has a compromised device, token theft becomes a possibility. Once tokens are compromised, the entire exchange becomes compromised. The attacker doesn’t need to be actively checking for tokens either. Since tokens are usually reused multiple times, there is no way to distinguish between them and the real user once the attacker has the token. Therefore, the attacker can pose as the original user and access any sensitive information they want. Thus, we need to look for an alternative way to secure JWTs. Symmetric key cryptography or symmetric cryptography is a technique where the same key is used for both encryption and decryption. It is often used with asymmetric cryptography techniques to add a layer of security to the system. AES, DES, and 3DES fall in this category [5–7]. However, symmetric cryptography would be a poor choice for JWTs in an unsecured environment(over HTTP). While symmetric
390
P. Singh et al.
Table 1 The state-of-the-art algorithms and protocols for security enhancement. P1: Asymmetric, P2: Resistant Against Man In The Middle Attacks, P3: Resistant Against Brute Force Attacks, P4: Ensures Integrity, P5: Has Signature Verification, P6: Resistant Against Token Theft Attacks Authors
Algorithm/ Protocol
P1
P2
P3
P4
P5
Bradley et al. [3]
Plain JWT
Kammer et al. [4]
DES
Rijmen et al. [8]
AES
Karn et al. [5]
3DES
Rivest et al. [6]
RSA
✓
✓
✓
ElGamal [7]
ElGamal
✓
✓
Barker et al. [9]
ECDH
✓
✓
✓
✓
✓
✓
✓
Johnson et al. [10]
ECDSA
✓
✓
✓
✓
✓
Josefsson et al. [11]
EdDSA
✓
✓
✓
✓
✓
Callas et al. [12]
✓
OpenPGP
✓
✓
✓
✓
✓
Mehuron et al. [13]
DSS
✓
✓
✓
✓
✓
Cramer et al. [14]
Cramer Shoup
✓
✓
✓
Krawczyk et al. [15]
HMAC
✓
✓
✓
✓
✓
Janwa et al. [16]
McEliece
✓
✓
✓
Sarr et al. [17]
MQV
✓
✓
✓
✓
✓
Hao and Feng [18]
YAK
✓
✓
✓
✓
✓
Hofstein et al. [19]
NTRUEncrypt
✓
✓
✓
Paillier and Pascal [20]
Paillier
✓
✓
✓
Gordon et al. [21]
Signal Protocol
✓
✓
✓
✓
✓
P6
✓
✓
✓
cryptography isn’t as vulnerable as plaintext, key exchange over such an environment renders the encryption useless. Such an environment enables the possibility of manin-the-middle attacks. Moreover, token theft still poses the risk of compromising the entire exchange between two or more parties. Therefore, we cannot use these techniques in favor of ensuring integrity and privacy. The state-of-the-art algorithms and protocols for security enhancement are shown in Table 1. As opposed to symmetrical cryptography, asymmetrical key cryptography or asymmetrical cryptography uses different keys for encryption and decryption, making it an ideal solution for secure communication in a public network. MQV, YAK, and RSA are some examples of asymmetric key cryptography [9–11]. Asymmetric Cryptography would be ideal for exchanging sensitive information in an unsecured environment. Man-in-the-middle attacks are no longer a threat. However, there is the risk of forging messages. Techniques that only rely on a private and public key pair are prone to this. Including digital signatures and verifying them is one way to solve this problem. Creating a shared secret using a technique similar to Diffie-Hellman is yet another way to tackle this problem. However, just like symmetric cryptography, token theft is still an issue. Elliptical curve cryptography (ECC) is more secure compared to non-ellipticalcurve-based cryptography methods. It is based on using the algebraic structure of elliptical curves over finite fields. ECDH and ECDSA are two implementations of
Enhancing the Security of JSON Web Token …
391
ECC [12, 13]. Smaller keys in EC-based cryptography techniques seem to be comparable to larger keys in non-EC-based cryptography techniques in terms of security [14]. EC is a strong candidate for encryption, but yet again, is also susceptible to token theft. It would seem that no matter how secure a cryptography system is, if it is based on a unique token, there is no way to mitigate the threat of token theft. Modern cryptography systems are a set of protocols or algorithms used together to attain a high level of security that might not have been possible to achieve using its components individually. It enforces a set of rules and defines pipelines that lead to encryption and decryption of payload securely. Dynamic token creation is a solution to the token theft vulnerability. OpenPGP and Signal Protocol with Double Ratchet are two modern cryptography protocols; while dealing with dynamic token creation, they also encrypt messages using several algorithms [15, 16]. Because of this, these systems are very secure. The purpose of OpenPGP was to serve as a standard for email encryption, while Signal Protocol’s introduction was to ensure privacy while messaging. However, Signal Protocol has a much more linear way of generating tokens while still being secure. Moreover, Signal generates a chain of tokens, while OpenPGP generates new temporary tokens for each message. Therefore, Signal Protocol is a better candidate for securing JWTs.
3 Secure JSON Tokens It is often necessary to have a secure way to exchange claims. Cryptography has been a crucial element in warfare for a long time. High-ranking government officials often have to communicate state secrets over long distances. With the rise in popularity of e-banking, it becomes more and more targeted by malicious agents looking for ways to exploit it. Many server-based applications rely on secure JSON claims to function. IoT-based applications also rely on JSON claims to either get or post information to or from a server. With time, more exploits and vulnerabilities get discovered, poking a hole in system security. Since JSON-based tokens have numerous applications, it is imperative to reinforce them to ensure privacy and security. JSON tokens are considered by many to be a secure and robust way to exchange claims between two parties. However, once an attacker gets ahold of an encryption token, the entire exchange is exposed. Moreover, the attacker can issue undesired claims in an undetectable way. Further, it becomes arduous to differentiate between claims genuine and not, and rolling back to the time before the attacker made any changes also becomes difficult. To combat this, we implemented the Signal Protocol using the Double Ratchet Algorithm to secure claim exchanges. A symmetric key ratchet, as shown in Fig. 1, is a system based on an essential derivation function in which all messages are secured with a unique message key. Keys form a chain linked by an encryption function. Each key is used as an encryption key to encrypt a constant, and the encrypted value is treated as the new key. This cycle goes on to derive new keys to encrypt messages.
392
P. Singh et al.
Fig. 1 Symmetric key ratchet system
Diffie-Hellman ratchet, as shown in Fig. 2, is a system in which two parties communicating generate new shared secrets for every message. Because of this, each message is encrypted with a unique key. A “Ping-Pong” like behavior is established where both parties take turns generating new key pairs.
3.1 Double Ratchet Algorithm To secure tokens, we propose the use of the Signal protocol and the Double Ratchet Algorithm. The Double Ratchet Algorithm (formerly known as Axolotl Ratchet), as shown in Fig. 3, is a key management algorithm used in a cryptographic protocol to provide end-to-end encryption. An initial key exchange occurs, followed by ongoing renewal and maintenance of short-lived session keys. It combines two “cryptographic ratchets.” One is based on the Diffie-Hellman key exchange (DH) and the other on a key derivation function (KDF). Since it uses two “cryptographic rachets,” it is, therefore, called the Double Ratchet Algorithm. One of its greatest features is its ability to heal. It is considered to be self-healing because it deters an attacker from decrypting future messages under certain conditions even after having compromised one of the user’s session keys [22]. Because new session keys are exchanged after a certain number of rounds of communication, the attacker has no choice but to intercept all forms of communication between the targetted parties since they lose access as soon as a non-intercepted key exchange occurs. This trait was termed future secrecy or post-compromise security.
Enhancing the Security of JSON Web Token …
Fig. 2 Diffie-Hellman ratchet system
Fig. 3 Double ratchet system
393
394
P. Singh et al.
3.2 Signal Protocol The Signal Protocol (previously known as the TextSecure Protocol) is a non-federated cryptographic protocol. Its uses include providing end-to-end encryption for voice calls, video calls, and chat systems. The protocol combines the Double Ratchet algorithm, pre-keys, and a triple Elliptic-curve Diffie-Hellman (3-DH) handshake [23], and uses Curve25519, AES-256, and HMAC-SHA-256 as primitives [24].
4 Methodologies The securing process JWTs is categorized into 5 parts. ● Initial key exchange: The initial key exchange involves initialization. Both server and client generate their first unique public–private key pairs, and the client then requests the public key from the server. This way, the client generates the shared secret key to encrypt messages. ● Ratchet system operation: The ratchet system has two parts, the Diffie-Hellman key ratchet and the symmetric key ratchet. The Diffie-Hellman key ratchet triggers when a streak of messages is broken by either party, i.e., by their first message in a message chain. Unlike the Diffie-Hellman ratchet, the symmetric key ratchet does not require a key exchange, and it moves forward whenever the user receives or sends a message. We follow the double ratchet algorithm developed by Signal to ensure security. ● Key generation: The key depends primarily on the ratchet system. The symmetric ratchet generates the encryption key directly. While any symmetric encryption algorithm can be used, AES has a solid foundation and plenty of research backing its use. For these reasons, we will use AES for symmetric encryption. While there are many candidates for asymmetric key encryption algorithms, elliptical curve algorithms stand out from the rest. The curve x25519 has become an industry standard, and thus we will be using it to secure our key exchanges and generation. Lastly, for our key derivation function, HKDF will be our preferred choice. ● Sending and receiving a token: The generated key is used with a symmetric encryption algorithm (AES in our case) to encrypt the token to be sent. We obtain the key after moving the symmetric key ratchet forward by one step. Similar to sending the message, the generated key is used to decrypt the received message using the same symmetric encryption algorithm. Since the symmetric ratchet is deterministic, key generation leads to the same result on both ends.
Enhancing the Security of JSON Web Token …
395
Fig. 4 Initialization code
5 Implementation The proof of concept has been implemented using a class-based pattern using objectoriented programming in python3.
5.1 Dependencies We use the python library cryptography to implement our proposed algorithm.
5.2 Initialization Initialization is when the object is first created. When the object is first created, it generates a key pair using curve25519. Other properties cannot be initialized before getting the other party’s public key, so they are set to None for the time being. “keyNeedsRegeneration” is a boolean used to control when the key pair needs to be regenerated. It is initially set to False as we just generated a pair of keys.The Initialization code sample is shown if Fig. 4.
5.3 Introduction Code As mentioned before, certain properties cannot be initialized without obtaining the other party’s public key. For this, we request the other party to send us their public key. This process has been dubbed “introduction.” The introduction code sample is shown if Fig. 5. The function serializePublicKey serves as an abstraction to convert the public key object into a more manageable form. The function generateIntroduction generates an unencrypted JSON payload that contains the public key of the caller. When
396
P. Singh et al.
Fig. 5 Introduction code
the user sends a request to the server, the server will respond with an introduction created using this method. The server doesn’t need an introduction from the sender because its public key will be included in all future communications. Once the sender receives the introduction from the server, it will be passed to the acceptIntroduction function. This function extracts the public key from the introduction and passes it off to initialExchange, which is responsible for initialization. However, the key is still serialized and unusable in its current form. To deal with it, deserializeKey is called to serve as an abstraction to convert it into a usable object.
5.4 Ratchets getSharedKey is a helper function that generates the shared secret key using the private key and the server’s stored public key that we got from its introduction. HKDF is an essential derivation function that is used to keep the chain of the symmetrical ratchet going. rootKeyRatchet, receivingKeyRatchet, and sendingKeyRatchet are used to advance the ratchet for the rootKey, receivingKey, and the sendingKey,
Enhancing the Security of JSON Web Token …
397
Fig. 6 Ratchets code
respectively. They all return the latest value while also storing it, and older values are discarded. The Ratchets sample code is shown in Fig. 6.
5.5 Sending and Receiving Messages The party that receives an introduction upon request from the other party has to be the one to send the first message. Both DH and symmetrical ratchets come into play here. The party that receives an introduction upon request from the other party has to be the one to send the first message. Both DH and symmetrical ratchets come into play here. generateNewKeyPair is a helper function that is used to generate new key pairs. The encrypt function is responsible for encrypting a message and properly encoding the same in a JSON format. It is also responsible for advancing the rootKeyRatchet and sendingKeyRatchet at appropriate times. Similarly, the decrypt function is responsible for receiving such a JSON payload and decrypting it. Like its counterpart, it is also responsible for handling the rootKeyratchet and the receivingKeyRatchet at appropriate times. The encrypt and decrypt functions are vital to keeping the system deterministic. Compared to time-based triggers, deterministic triggers make the system more robust. Sending and receiving Messages sample code is shown in Fig. 7. The entire algorithm is contained within a single class. The class handles all aspects of JSON-based communication. It is easy to integrate into any python based application. The code can also be freely ported into any other language based on the need. Note that this deals purely with the encryption and decryption of JSON claims. The actual transfer may take place over any secured or unsecured environment. The class should only be interacted using its defined methods. At no point in time should
398
P. Singh et al.
Fig. 7 Sending and receiving messages
the values of its fields be changed for any reason. The ratchet system is automatically handled too. Developers should only interact with the functions to encrypt, decrypt, generate the introduction payload, and receive the introduction payload. The intended use is to encrypt data and tokens using this module before sending them. However, it can also be used as a simple encryption module to encrypt any JSON data.
Enhancing the Security of JSON Web Token …
399
6 Conclusion and Future Works We proposed a new security mechanism, including a Signal algorithm with a double ratchet system to encrypt JWTs to add a new layer of security. The proposed solution used the most efficient Signal algorithm and considered JSON tokens for implementation. Not only do we generate new keys to encrypt every message, but the system as a whole is also deterministic. This means that it is not dependent on time in any way and works solely on events, no matter when they occur. Using the cryptography package in python 3, we use code for encryption techniques used in the Signal algorithm and the Double Ratchet System. The problems we were seeking to solve were preventing man-in-the-middle attacks after a leaked token and securing JSON tokens in insecure networks. The signal algorithm successfully achieves both results. Suppose token leaks, the immediate leak of data is unavoidable. However, the double ratchet heals the security token. Using this technique to encrypt JSON objects, we have successfully implemented a way to secure JSON tokens. The algorithm can be used to secure communication over any network, be it secure or insecure. In the Future, We will further extend the Signal algorithm to handle tokens arriving out of order. There is also the potential to use different sets of encryption algorithms. Moreover, the algorithm can be used to secure the JSON token and any JSON data.
References 1. Sihag, V., Vardhan, M., & Singh, P. (2021). Blade: Robust malware detection against obfuscation in android. Forensic Science International: Digital Investigation, 38, 301176. 2. Sinha, R., Sihag, V., Choudhary, G., Vardhan, M., & Singh, P. (2021). Forensic analysis of fitness applications on android. In Mobile Internet Security: 5th International Symposium, MobiSec 2021, Jeju Island, South Korea, October 7–9, 2021, Revised Selected Papers, p. 222. Springer Nature. 3. John, B. et al. (2015). JSON Web Token (JWT). RFC. 4. Data encryption standard 1999, Oct 1999. 5. Karn, P. R., Simpson, W. A., & Metzger, P. E. (1995). The ESP Triple DES Transform. RFC 1851. 6. Rivest, R. L., Shamir, A., Adleman, L. M. (1977). Cryptographic communications system and method. US Patent, 1977. 7. Elgamal, T. (1985). A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Transactions on Information Theory, 31(4), 469–472. 8. Daemen, J., & Rijmen, V. (2003). Note on naming 01 2003. 9. Barker, E. B., Johnson, D., & Smid, M. E. (2007). Sp 800-56a. recommendation for pair-wise key establishment schemes using discrete logarithm cryptography (revised). Technical report, Gaithersburg, MD, USA, 2007. 10. The elliptic curve digital signature algorithm (ECDSA). (2001). 11. Edwards-curve digital signature algorithm (EdDSA). (2017). 12. Openpgp message format. (2007). 13. Mehuron. (2013). Digital signature standard (dss). 14. Cramer, R., & Shoup, V. (2006). Practical public key cryptosystem provably secure against adaptive chosen ciphertext attack, vol. 1462, pp. 13–25.
400
P. Singh et al.
15. Canetti, R., Krawczyk, H., & Bellare, M. (1997). Hmac: Keyed-hashing for message authentication 16. Janwa, H., & Moreno, O. (1995). Mceliece public key cryptosystems using algebraic-geometric codes. In Proceedings of 1995 IEEE International Symposium on Information Theory, p. 484. 17. Sarr, A., Elbaz-Vincent, P., & Bajard, J. C. (2009). A secure and efficient authenticated diffiehellman protocol 18. Hao, F. (2010). On robust key agreement based on public key authentication, vol. 7, pp. 383– 390. 19. Silverman, J. H., Mass, N., Hofstein, J., Pipher, J., Both of Pawtucket. Public key cryptosystem method and apparatus. 20. Paillier, P. (1999). Public-key cryptosystems based on composite degree residuosity classes, vol. 5, pp. 223–238. 21. Cohn-Gordon, K., Cremers, C., Dowling, B., Garratt, L., & Stebila, D. (2017). A formal security analysis of the signal messaging protocol. In 2017 IEEE European Symposium on Security and Privacy (EuroS P) (pp. 451–466). 22. Marlinspike, M. (2013). Advanced cryptographic ratcheting. 23. Unger, N., Dechand, S., Bonneau, J., Fahl, S., Perl, H., Goldberg, I., & Smith, M. (2015). Sok: Secure messaging. In 2015 IEEE Symposium on Security and Privacy (pp. 232–249). 24. Frosch, T., Mainka, C., Bader, C., Bergsma, F., Schwenk, J., & Holz, T. (2016). How secure is textsecure? In 2016 IEEE European Symposium on Security and Privacy (EuroS P) (pp. 457–472).
Price Prediction of Ethereum Using Time Series and Deep Learning Techniques Preeti Sharma and R. M. Pramila
Abstract Ethereum, a blockchain platform inspired by Bitcoin, was introduced in 2015. It is a worldwide computing platform fueled by Ether (ETH), its native currency. As the demand for processing power on the Ethereum blockchain rises, so will the price of ETH. Several studies are working to project its price based on previous price inflations of the cryptocurrency. This topic has become a prominent research topic all around the world. In this work, the price of ETH is predicted using a hybrid model consisting of Long short-term memory (LSTM) and Vector Auto Regression (VAR). The hybrid model gave the least values for the evaluation metrics compared to the standalone models. Keywords Bitcoin · Cryptocurrency · Ethereum · Hybrid model · LSTM · ARIMA · VAR
1 Introduction This paper focuses on the domain of cryptocurrencies, specifically Ethereum. In the traditional financial transaction method, a mediator such as a bank handles all aspects of the transaction. It does a decent job in transacting funds but comes with the issue of security, trust, reliability, inability to transfer a considerable sum of money, etc. Bitcoin, the first decentralized cryptocurrency or digital currency, was proposed in 2009 by Nakamoto [1]. This is how the term cryptocurrencies came into existence. It is the first decentralized cryptocurrency that uses blockchain technology. Following the idea of Bitcoin, Vitalik Buterin took the blockchain concept to another level. In 2014, he proposed that blockchain technology can be used for making transactions as well as for creating decentralized applications (Dapps) using smart contracts [2]. P. Sharma (B) · R. M. Pramila Christ (Deemed to Be University), Lavasa, Pune, India e-mail: [email protected] R. M. Pramila e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_32
401
402
P. Sharma and R. M. Pramila
The native cryptocurrency of Ethereum is Ether (ETH or Ξ) [3] which fuels different Dapps running in the Ethereum network. The market capitalization of ether stands right next to that of Bitcoin. The number of users using the Ethereum network has grown exponentially since its invention. It took almost three years for Ether to attain a value of $388.05. Due to Covid-19, that had hit the world in the year 2019, the price of Ether gradually decreased till May 2020 [4]. The value gained continues to rise as the Covid-19 situation improves. The prices of cryptocurrencies are always a topic of curiosity for those who tend to invest in the network. The favorable aspects of Ethereum include security, stability, scalability, supply, decentralization, and immutability. However, volatility in the price of Ethereum is a major reason why researchers did not place much emphasis here. Due to the popularity of Bitcoin, the majority of the study in this field focuses on its price prediction. Thus, excluding Ethereum and other cryptocurrencies. However, as far as Ethereum is concerned, there are reasons such as being ranked after Bitcoin in the market, and also the volatility, which caused a lack of study for this cryptocurrency. Ethereum, being the general-purpose blockchain, is a digital universe, whereas all other cryptocurrencies are solely digital money. Transactions made on the Ethereum network are faster than Bitcoin. Because of the benefits of the Ethereum network, demand is rising and will continue to rise, much like Bitcoin. A quick spike or reduction in their prices might result in a large profit or loss. If an investor understands when to invest in the Ethereum network, then the price prediction of Ethereum can help the investor gain the maximum benefit. The goal of this study is to forecast the price of Ethereum with a high degree of accuracy and develop a suitable system for price prediction. According to some studies, Ethereum will soon surpass Bitcoin by smashing its all-time high. At the end of last year, the prices of Bitcoin and Ethereum were perceived to have doubled. We cannot ignore the fact that, while investing in cryptocurrencies has long-term benefits, it also carries significant risk due to excessive volatility, which is why any investment in these networks should be made only after thorough research. By implementing neural networks, machine learning, and time series models, some researchers have tried predicting the price of cryptocurrencies, especially Bitcoin. Neural networks can capture the complex nonlinear nature of the data, thus covering the volatility. This work proposes a hybrid model that takes the residuals from a VAR and provides it as an input to the LSTM model. The final prediction of price is obtained by summing the price and the predicted residuals. Furthermore, the predictions of the proposed approach are compared to the predictions made by standalone LSTM and VAR models. The price of cryptocurrencies is so unpredictable that we can’t predict what they’ll be worth tomorrow. Ethereum, the second most valued cryptocurrency, suffers from the same issue. Because no analytical thinking is relevant in such instances, it is difficult to anticipate its price. The price is affected not just by historical price data, but also by a number of other factors. Finding such elements is a challenging task. Even if certain enticing features are added, volatility will remain, making forecasting difficult.
Price Prediction of Ethereum Using Time Series …
403
However, Deep learning techniques are seen to learn complex patterns from the data, which results in superior predictions. Since this is a time series problem in which there is a nonlinear relationship between the price of Ethereum, sequential models such as RNN, specifically LSTM, can be used, which consider the sequential dependencies within the data points. LSTM, when combined with a linear time series such as VAR, will provide price forecasts with the least amount of error. The rest of the paper is arranged as follows. Section 2 depicts a synopsis of relevant research work. The proposed model is explained in Sect. 3. Section 4 introduces the data and methods. The performance analysis and results are discussed in Sect. 5. The study is concluded in Sect. 6 with a discussion on future work. Section 6 concludes the research with a proposal for future work.
2 Related Works People are aware of the lengthy history of neural networks. First came the AI winter, followed by the ever-expanding spring, in which neural networks are the best performing technique. Implementing neural networks for time series forecasting in finance and economics is not a new concept. Kaastra and Boyd [4] proposed an eight-step procedure for performing the same by designing a neural network model. Various parameters that must be considered during data preprocessing in designing and training the model are explained. The study conducted in [5] used MLP and LSTM for the price prediction of Ethereum. They used minute, hourly and daily data for evaluating the model, the close price being the target variable. The features are low price, high price, and open price. LSTM outperformed MLP when there was no huge difference between the prices of two consecutive data points. Poongodi et al. [6] considered more parameters such as quote volume and the weighted average. They compared support vector machine (SVM) and linear regression models for a computationally feasible approach. They used the cross for a computationally feasible approach validation which gave an accuracy of 85.46% for linear regression and 96.06% for SVM. Jay et al. [7] present a technique for random walk theory that incorporates layer-wise randomization into observed neural network feature activations to mimic market volatility. They considered Bitcoin, Ethereum, and Litecoin and compared the predictions from MLP and LSTM models with more than 2 years of mean-normalized data. Stochastic models outperformed deterministic models (with rectified linear unit (ReLU) activation function) without validation. The study conducted in [8] predicted Ethereum’s price where they trained convolution neural network (CNN), LSTM, stacked LSTM, bidirectional LSTM (BiLSTM), and GRU models. LSTM and GRU models were able to correctly predict the price of ether. They demonstrated a working prototype of a web-based system for predicting the closing price of ether in USD in real time (US Dollars). Angela and Sun [9] conducted a study for analyzing the factors affecting Ethereum’s price through macroeconomics aspects. The factors such as gold price didn’t show any effect, whereas EUR (Euro)/USD showed a significant positive effect in the short
404
P. Sharma and R. M. Pramila
term. Bitcoin, Litecoin, and Monero showed an effect on the price of Ethereum after using the autoregressive distributed lag (ARDL) test model. Bitcoin has the most robust developer ecosystem, with more applications and implementations than any other cryptocurrency. This is the reason why researchers are focused on forecasting Bitcoin’s price rather than looking out for other coins. Several studies including [10, 11], used regression models such as Huber regression, Theil-Sen regression, and RNN models such as LSTM and GRU for predicting the price. Linear regression and GRU were less time-consuming than other models (99% accuracy). Shankhdhar et al. [10] also implemented an IoT system for creating an alerting system about the value of Bitcoin using the Bolt library. Also, they mentioned that by adding more features and by applying new models, the accuracy could be increased. Nonlinear models such as decision trees (95.8%) are also used in [12] for forecasting 5 days, which is compared to the result of linear regression (97.5%). Several models such as binomial generalized linear model (GLM), SVM, and random forests are used in [13] for predicting the price using 10 min and 10 s interval time points, getting 50–55% accuracy. In [14], ensemble techniques such as random forest and gradient boosting are used along with linear regression, but linear regression outperformed the other two with 99% accuracy. They aimed to use LSTM for developing a multivariate time series forecasting model. LSTM along with leaveone-out cross-validation in [15] proved to be more accurate than using standalone LSTM by considering a directed graph as a model involving the buyer and seller as nodes and their transaction as edge. They compared LSTM with SVR which was able to capture longer-range dependencies. The study [16] showed the price trend of 42 primary cryptocurrencies with key economic indicators. They used a market-oriented dataset for training LightGBM, SVM, and random forest where LightGBM outperformed the other two, showing better results for medium-term such as 2-week prediction only. The popularity of Bitcoin is also an important feature considered in the study [17]. They trained LSTM and GRU using data from a short period where GRU gave better predictions. Whereas [18] compared LSTM with the linear regression model. Miura et al. [19] used realized volatility and then trained MLP, GRU, LSTM, SVM, and Ridge regression models. They compared these models with heterogeneous autoregressive realized volatility (HARRV). Ridge regression showed the best performance with an MSE of 4.6667e06. Abraham et al. [20] focused on tweet volumes and the sentiment analysis for the price prediction of both Ethereum and Bitcoin. They found that sentiment analysis is ineffective if the prices are falling but both tweet volume and Google Trends data accurately reflect the overall interest in the prices. Various neural networks can also be seen such as in [21], a Bayesian neural network (BNN) is used. The results from BNN are compared with linear regression and SVM. They used several blockchain information like block size, transactions, hash rate, difficulty, etc. MAPE and RMSE of BNN are lower than the other models. In [22], the authors used tenfold and got an MAE error of 0.0043 and without tenfold, MAE went up to 0.1518. Also, the authors suggested removing the noise before modeling the data for price prediction.
Price Prediction of Ethereum Using Time Series …
405
Some authors have tried using a convolution neural network (CNN) for price prediction of cryptocurrencies like the one in [23]. The authors also considered the model’s training time with the validation loss. Different activation functions are used while training the models which played an important role in achieving a better accuracy where GRU outperformed the other two. Radityo et al. [24] compared the accuracies and complexities of different variants of ANN. Some authors have used cloud technologies such as Azure in [25] and trained boosted decision tree regression, neural network regression, Bayesian linear regression, and linear regression with data from CoinDesk and Coinmetrics.io. The authors found that all regression-based models performed well. Livieris et al. [26] focused on predicting Bitcoin price using ensemble learning techniques like ensemble-average, bagging, and stacking. The authors proposed CNN, LSTM, and BiLSTM-based ensemble model. For future research, authors have pointed out the problem of outlier caused by global instability which unsupervised algorithms can catch to overcome the problem of cryptocurrency instability. Prices of altcoins such as Digital Cash (DASH) and Ripple are also forecasted in [27] along with Bitcoin using LSTM and generalized regression neural networks. LSTM is seen to dominate GRNN in predicting the prices. Encoders are also used for forecasting Bitcoin’s price as seen in [28]. Time series models such as ARIMA are used for modeling in [29] where its order is determined by the correlogram method. ARIMA (4,1,4) got the best accuracy and the authors concluded that ARIMA’s performed better for short-term predictions. In [30], the authors compared LSTM and ARIMA models, where they considered several factors. They used real-time oneminute interval data collected from Coinmarketcap and blockchain info APIs. The authors have considered social platforms such as Twitter and Reddit and have applied sentiment analysis and supervised ML techniques. Here, LSTM with multi-features performed better than the ARIMA model. Taking a combination of two or more models and performing price prediction tasks achieved better results than standalone models. In [31], the authors used a hybrid model consisting of LSTM and GRU for predicting altcoin prices such as Litecoin and Monero with high accuracies. The output of both GRU and LSTM networks is combined and passed through a dense final prediction layer. Evaluation metrics for regression are used for evaluating the results. It is concluded that this method effectively predicts short-term predictions more accurately. In [32], ARIMA is used for converting the linearity in the data, in contrast, several techniques such as feed-forward neural network (FFNN), convolution neural network (CNN), LSTM, and support vector regression are used for covering the non-linearity in the data. This provides a better prediction than a single LSTM or FFNN model. MAPE of ARIMA and CNN model is the minimum and is considered to be suited for price prediction.
406
P. Sharma and R. M. Pramila
3 Proposed Model This section describes the suggested model for anticipating Ethereum pricing, as seen in Fig. 1. It employs a hybrid model based on VAR and LSTM. The suggested approach incorporates a number of features that help in the prediction of Ethereum’s price. The recurrent neural network (RNN) runs well on sequential data. It is better than machine learning models since it finds complex patterns in the data. In addition, RNN has an advantage over machine learning algorithms in that they aid in creating models with very little overfit to the training data. In RNNs, the output from the previous stage is given as input to the current phase. However, RNNs have a flaw; they are subjected to vanishing gradient issues, which are solved by using long short-term memory (LSTM) and gated recurrent units (GRU). The forget gate, input gate, and output gate are the three gates that update and regulate the cell states in an LSTM network. The forget gate is responsible for forgetting which information to forget. The input gate task is to insert new relevant information into the cell state. The output gate ensures that the information encoded in the cell state is transmitted as input in the following time step. This enables the LSTM model to communicate critical information through the long chain of sequences in order to make predictions. On the other hand, the time series models help in making predictions for the future if the data is dependent on time. Models like ARIMA, ARMA, AR, MA, etc., work on a single variable and can predict the future values using only that variable. The Vector Auto Regression (VAR) model can predict one variable while keeping the effect of other variables in mind. So, VAR has the advantage of using several variables to predict the price of Ethereum. The suggested technique combines VAR and LSTM models. The goal here is to combine two models such that they can cover both linear and nonlinear components of time series. The combined proposed model may be expressed as yt = lt + n t
Fig. 1 Proposed model
(1)
Price Prediction of Ethereum Using Time Series …
407
where yt is the time series value at time t, lt , and nt are the linear and nonlinear components. The VAR model captures the linearity of the data, and the residuals from this model include the nonlinear components. The residuals et from the VAR model at time t are defined as rt = yt − lt
(2)
An LSTM model is trained with the residuals r t to model the nonlinear component, i.e., the residuals r t rt = z (rt−1 , r t−2 , ...., rt−n ) + et
(3)
where z is the nonlinear function that is decided by the LSTM model and et is the error.
4 Methodology 4.1 Data Collection This study makes use of four data, i.e., historical pricing data, Basic block information data, Google Trends data, and Reddit sentiment analysis data that are fetched using several websites and some Python APIs. All four datasets are merged within a time frame of 1st January 2017 and 31st August 2021. After merging the final dataset contains 27 attributes and 1704 rows. Historical price data. Yahoo Finance’s website maintains a record of cryptocurrencies’ prices and contains Ethereum’s daily historical price dataset. The dataset is available for analysis and investigation. It consists of 2261 rows and 7 columns. The dataset has six features, which are Date, Open, High, Low, Adj Close, Volume, and Close. The closing price of Ethereum Close is the target variable that is predicted using the proposed methodology. Basic block information. Basic block information of the Ethereum network is fetched from the Bitinfocharts website. The data contains the information related to the block of the Ethereum network which can be used for the price prediction of Ethereum. The dataset has 2571 rows and 18 columns. The columns are date, average_fee_to_block, num_unique_add, median_trans_val, average_trans_value, market_cap, average_block_time, median_fee, average_trans_fees, mining_profit, num_transactions, average_price, hash_rate, sent_usd, num_unique_address_from, size, average_difficulty, and num_tweets. Google trends data. The data is available on the Google Trend website which is fetched using the Pytrends API for further analysis. The data is retrieved using
408
P. Sharma and R. M. Pramila
the terms ‘Ethereum’ and ‘Ether.’ There are 1747 rows and 3 columns; Date, ethereum_count, and ether_count. Reddit data. For obtaining the Reddit data, using the Pushshift API in Python, the subreddits are collected from the Ethereum Reddit forum. After obtaining the data, Natural Language Processing (NLP) techniques are applied for getting the subjectivity and polarity scores from the data. Using the API, 150,007 rows of data are fetched and after performing the sentiment analysis, 1892 rows were obtained and the columns are date, subjectivity, and polarity.
4.2 Data Processing The merged data is checked for missing values and the values are imputed using the linear interpolation function available in the Pandas library. Augmented Dickey– Fuller (ADF) test is used for checking the stationarity of the datasets. After performing the ADF test, it is found that some features are not stationary. So, 1st differencing method is used for converting the non-stationary features to stationary. After which the stationary data is normalized so that the values range from zero to one.
4.3 Feature Selection Granger causality test is done for selecting the features which would be helpful in making the predictions for the price of Ethereum. This is a statistical test used for finding which series helps in predicting other series. The features with p-values less than 0.05 are considered while building the model.
4.4 Train and Test Set The preprocessed data is then split into the training and testing sets. Since it is a time series dataset, hence the top eighty percent of the data is assigned for training the model and the last twenty percent is assigned for testing the model.
5 Experimental Results This section presents the experimental results obtained by the proposed model. The models LSTM, VAR, and the proposed LSTM-VAR hybrid model are trained using the training set. The evaluation metrics such as Mean squared error (MSE) [33], root
Price Prediction of Ethereum Using Time Series … Table 1 Evaluation metrics
409
Models
MSE
RMSE
MAE
VAR (3)
1,305,186.77
1142.4
1117.93
LSTM LSTM-VAR
418,079.38
646.59
474
61,424.04
247.83
185.64
mean squared error (RMSE) [34], and mean absolute error (MAE) [34] are calculated for these models using the unseen testing set and are used to evaluate price prediction issues.
5.1 Results Table 1 shows the evaluation metrics MSE, RMSE, and MAE obtained by training VAR, LSTM, and the proposed LSTM-VAR hybrid model. VAR. After training the VAR (3) standalone model, the MSE, RMSE, and MAE values are 1,305,186.77, 1142.4, and 1117.93, respectively. The VAR (3) model attempts to learn the linearity in the dataset. Still, in the case of price prediction of Ethereum, which is so volatile that predicting its price in the next moment is challenging, the VAR model leaves out the data’s non-linearity. As a result, the values of evaluation metrics are very high. LSTM. LSTM, a sequential deep learning model, when trained alone is able to cover the volatility in the price of Ethereum to some extent. MSE obtained by the LSTM model is 418079.38, RMSE and MAE are 646.59 and 474, respectively, which are less than the metrics obtained by training the standalone VAR model. The obtained metrics are due to the inability of the LSTM to cover the linearity of the data. The LSTM is able to learn the nonlinear component, i.e., the volatility of the price of Ethereum. Hybrid LSTM-VAR model. MSE, RMSE, and MAE obtained by the proposed model are 61,424.04, 247.83, and 185.64, which are minimum compared to the standalone LSTM and VAR models. The VAR model captures the linear component of the data but not the nonlinear component. The LSTM model captures the nonlinear component but fails to capture the linear component. By combining both VAR and LSTM models, there is an incorporation of both the linear and nonlinear components.
5.2 Loss Curve Figure 2 depicts the training loss of the training data and the validation loss of the validation data. In the initial epochs, the training loss is near 0.14 which is minimized with the increase in the number of epochs. Validation loss is also seen to decrease with the increase in the number of epochs.
410
P. Sharma and R. M. Pramila
Fig. 2 Training and validation loss
5.3 Discussions In the proposed study, an LSTM-VAR-based hybrid model for Ethereum price prediction is developed. The results of the proposed model are compared with some existing models such as Linear Regression, Random Forest, Decision Trees, and Support Vector Regression. The MAE obtained from Linear Regression is 581.79, Random Forest is 1193, Decision Trees is 1379.31, and Support Vector Regression is 1683.43. It is evident that the Linear Regression model performed better than the other models by achieving the minimum error metrics. The MAE of the proposed model is less than that of the Linear Regression and all other models. As a result, the suggested model outperformed numerous existing techniques and is able to predict the price of Ethereum more accurately. This is so because of the presence of both LSTM and VAR models which helps in capturing the nonlinearity and linearity components of the data, respectively.
6 Conclusion Investing in cryptocurrencies is viewed as advantageous in the finance business, and many individuals are looking forward to investing in various cryptocurrency networks. However, as much as investing in cryptocurrency seems appealing and tempting, its volatility is something that must be overlooked. As a result, if some model can forecast the approximate future, it would be highly essential and helpful for the entire community. In this research, a long short-term memory (LSTM) and vector autoregression (VAR)-based hybrid model is developed to forecast the price of Ethereum. Several factors are considered together with the closing price of Ethereum
Price Prediction of Ethereum Using Time Series …
411
with the goal of achieving better predictions. The volatility, i.e., the nonlinearity in the price of Ethereum itself is a big challenge for which sequential deep learning techniques such as LSTM are a perfect fit. In addition to the current day price, LSTM considers the previous price and then makes predictions about the future. The general linear trend in its price is explained by the VAR model which considers the past price data. Also, the relation between the price and other features are taken into consideration in this research. The hybrid model’s measurements and predictions outperform the individual LSTM and VAR (3) models. Predictions are better with less inaccuracy. Cryptocurrency prices are accompanied by an arbitrary process, with some underlying trends that an intelligence framework must look for when making accurate and dependable projections. As a result, more advanced computational methods, alternative methodologies, and new validation measures should all be investigated. In the future, some other kind of data can be collected with which the model can be trained. During the training of the LSTM model, activation functions, the number of epochs, and regularization techniques can be changed for attaining finer results.
References 1. Nakamoto, S. (2008). Bitcoin: A peer-to-peer electronic cash system. 2. Ethereum - CoinDesk site. Retrieved December 01, 2021, from https://www.coindesk.com/ price/ethereum/. 3. Ethereum—Wikipedia site. Retrieved November 21, 2021, from https://en.wikipedia.org/wiki/ Ethereum. 4. Kaastra, I., & Boyd, M. (1996). Designing a neural network for forecasting financial and economic time series. Neurocomputing, 215–236. 5. Kumar, D., & Rath, S. K. (2020). Predicting the trends of price for ethereum using deep learning techniques. In: Artificial Intelligence and Evolutionary Computations in Engineering Systems 2020 (pp. 103–114). Singapore: Springer. 6. Poongodi, M., Sharma, A., Vijayakumar, V., Bhardwaj, V., Sharma, A. P., Iqbal, R., & Kumar, R. (2020). Prediction of the price of Ethereum blockchain cryptocurrency in an industrial finance system. Computers & Electrical Engineering, 81, 106527. 7. Jay, P., Kalariya, V., Parmar, P., Tanwar, S., Kumar, N., & Alazab, M. (2020). Stochastic neural networks for cryptocurrency price prediction. IEEE Access, 8, 82804–82818. 8. Zoumpekas, T., Houstis, E., & Vavalis, M. (2020). Eth analysis and predictions utilizing deep learning. Expert Systems with Applications, 162, 113866. 9. Angela, O., & Sun, Y. (2020). Factors affecting cryptocurrency prices: Evidence from ethereum. In 2020 International Conference on Information Management and Technology (ICIMTech) (pp. 318–323). IEEE. 10. Shankhdhar, A., Singh, A. K., Naugraiya, S., & Saini, P. K. (2021). Bitcoin price alert and prediction system using various models. In IOP Conference Series: Materials Science and Engineering (Vol. 1131, No. 1, p. 012009). IOP Publishing. 11. Phaladisailoed, T., & Numnonda, T. (2018). Machine learning models comparison for bitcoin price prediction. In 2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE) (pp. 506–511). IEEE. 12. Rathan, K., Sai, S. V., & Manikanta, T. S. (2019). Crypto-currency price prediction using decision tree and regression techniques. In 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI) (pp. 190–194). IEEE.
412
P. Sharma and R. M. Pramila
13. Madan, I., Saluja, S., & Zhao, A. (2015). Automated bitcoin trading via machine learning algorithms. http://cs229.stanford.edu/proj2014/Isaac%20Madan,%20Shaurya%20Saluja,% 20Aojia%20Zhao,Automated%20Bitcoin%20Trading%20via%20Machine%20Learning% 20Algorithms.pdf. 14. Saad, M., Choi, J., Nyang, D., Kim, J., & Mohaisen, A. (2019). Toward characterizing blockchain-based cryptocurrencies for highly accurate predictions. IEEE Systems Journal, 14(1), 321–332. 15. Chen, Y., & Ng, H. K. T. (2019). Deep learning Ethereum token price prediction with network motif analysis. In 2019 International Conference on Data Mining Workshops (ICDMW) (pp. 232–237). IEEE. 16. Sun, X., Liu, M., & Sima, Z. (2020). A novel cryptocurrency price trend forecasting model based on LightGBM. Finance Research Letters, 32, 101084. 17. Awoke, T., Rout, M., Mohanty, L., & Satapathy, S. C. (2021). Bitcoin price prediction and analysis using deep learning models. In Communication Software and Networks (pp. 631–640). Singapore: Springer. 18. Kavitha, H., Sinha, U. K., & Jain, S. S. (2020). Performance evaluation of machine learning algorithms for bitcoin price prediction. In 2020 Fourth International Conference on Inventive Systems and Control (ICISC) (pp. 110–114). IEEE. 19. Miura, R., Pichl, L., & Kaizoji, T. (2019). Artificial neural networks for realized volatility prediction in cryptocurrency time series. In International Symposium on Neural Networks (pp. 165–172). Cham: Springer. 20. Abraham, J., Higdon, D., Nelson, J., & Ibarra, J. (2018). Cryptocurrency price prediction using tweet volumes and sentiment analysis. SMU Data Science Review, 1(3), 1. 21. Jang, H., & Lee, J. (2017). An empirical study on modeling and prediction of bitcoin prices with Bayesian neural networks based on blockchain information. IEEE Access, 6, 5427–5437. 22. Tandon, S., Tripathi, S., Saraswat, P., & Dabas, C. (2019). Bitcoin price forecasting using lstm and 10-fold cross validation. In 2019 International Conference on Signal Processing and Communication (ICSC) (pp. 323–328). IEEE. 23. Khan, A. S., & Augustine, P. (2019). Predictive analytics in cryptocurrency using neural networks: A comparative study. International Journal of Recent Technology and Engineering, 7(6), 425–429. 24. Radityo, A., Munajat, Q., & Budi, I. (2017). Prediction of bitcoin exchange rate to American dollar using artificial neural network methods. In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS) (pp. 433–438). IEEE. 25. Fahmi, A., Samsudin, N., Mustapha, A., Razali, N., Khalid, A., & Kamal, S. (2018). Regression based analysis for bitcoin price prediction. International Journal of Engineering & Technology, 7. 26. Livieris, I. E., Pintelas, E., Stavroyiannis, S., & Pintelas, P. (2020). Ensemble deep learning models for forecasting cryptocurrency time-series. Algorithms, 13(5), 121. 27. Lahmiri, S., & Bekiros, S. (2019). Cryptocurrency forecasting with deep learning chaotic neural networks. Chaos, Solitons & Fractals, 118, 35–40. 28. Liu, M., Li, G., Li, J., Zhu, X., & Yao, Y. (2021). Forecasting the price of Bitcoin using deep learning. Finance Research Letters, 40, 101755. 29. Wirawan, I. M., Widiyaningtyas, T., & Hasan, M. M. (2019). Short term prediction on bitcoin price using ARIMA method. In 2019 International Seminar on Application for Technology of Information and Communication (iSemantic) (pp. 260–265). IEEE. 30. Raju, S. M., & Tarif, A. M. (2020). Real-time prediction of BITCOIN price using machine learning techniques and public sentiment analysis. arXiv:2006.14473. 31. Patel, M. M., Tanwar, S., Gupta, R., & Kumar, N. (2020). A deep learning-based cryptocurrency price prediction scheme for financial institutions. Journal of Information Security and Applications, 55, 102583. 32. Nguyen, D. T., & Le, H. V. (2019). Predicting the price of bitcoin using hybrid ARIMA and machine learning. In International Conference on Future Data and Security Engineering (pp. 696–704). Cham: Springer.
Price Prediction of Ethereum Using Time Series …
413
33. Schluchter, M. D. (2005). Mean square error. Encyclopedia of Biostatistics, 5. 34. Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)–Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247–1250.
Light Weight Approach for Agnostic Optimal Route Selection Nagendra Singh , Chintala Srujan, Dhruva J. Baruah, Divya Sharma, and Rajesh Kushwaha
Abstract Transportation plays a vital role in people’s life. A country like India has a lot of diversity; different transport operators use their infrastructure and utilities to provide services. Urban transports are gaining more fame every day as most people use them; they offer good fare rates with reduced time than private transport. But the major problem comes when an individual wants to use it, but doesn’t know which route to take to reach the destination. As most of these operators do not have route optimization and route recommendation facility, giving a bad user travel experience. This paper aims to solve the routing problem in urban transport using a modified twoway Breadth-first search algorithm. It is a stable uninformed graph search algorithm and guarantees optimal solution; our solution solves the problem of routes and aims to reduce the response time to front-end applications when our algorithm is used. We presented a lightweight approach for route recommendations that individual transport operators can use their transportation network to overcome this. Dijkstra’s algorithm is traditionally used for shortest-path calculation that uses the concept of Breadth-first search that guarantees optimal solutions. BFS searches from source to destination, it is slow in graphs with a high branching factor. We introduce a parallel hybrid two-way BFS variant that simultaneously starts BFS from both ends, i.e., source and destination. This variant uses BFS top-down approach, and we add it to a bottom-up parallel pipeline, and as a result, we get a novel approach for route recommendation. We implemented this approach on the BEST Mumbai transportation dataset, analyzed the performance with the existing Dijkstra algorithm, and achieved improvements. Keywords Optimal path · Graph theory · Intelligent transport system · Route recommendation
N. Singh (B) · C. Srujan · D. J. Baruah · D. Sharma · R. Kushwaha Embedded Systems R&D, Centre for Development of Advanced Computing Noida, Noida, India e-mail: [email protected] C. Srujan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5_33
415
416
N. Singh et al.
1 Introduction Public transport plays an important role in highly urbanized and densely populated cities (like Mumbai, Delhi, Bangalore, Pune, Kolkata, etc.). A considerable part of the population depends on public transport, due to which the government is continuously updating the public transport and trying to get connectivity between all possible locations. These cities have a dense public transport network, and there are different transport service providers to handle other areas in a city. In the case of Mumbai, there are 14 public transport like Brihanmumbai Electric Supply and Transport (BEST), Navi Mumbai Municipal Transport (NMMT), Thane Municipal Transport (TMT), Mira-Bhayandar Municipal Transport (MBMT), etc. operators operating under MMRDA. Remembering all the operators and the routes served by these operators is very tough for a traveller or a visitor. Collecting all the information about routes and stations and making a journey plan is difficult. So this Route Agnostic ticketing system provides the utility of giving suggestions and then making route plans [1] according to user choice by providing source and destination pair only. It gives a user-friendly environment and provides information about the possible optimal paths between source and destination. We introduced a two-way Dijkstra approach, an extended version of the existing Dijkstra algorithm [2]. It uses a BFS algorithm that starts from the source and searches for a destination by traversing all the nodes (station) and edges that guarantee the best solution. Still, this traversal is slightly slow in a graph with a high branching factor. We start BFS [3] from both ends parallelly propagate and combine them by a bottom-up approach to speed up this. This parallel approach speeds up the existing algorithm and guarantees optimal solution as we are still using the BFS algorithm in the core.
1.1 Motivation Planning the best route for a journey is always a messy task. To make a journey plan, a person has to remember the information about stations and routes. We have information on all the routes and stations; with this information, we can develop an approach to find the optimal path between two locations [4]. Searching approaches can be used for optimal path-finding, and by merging optimal paths between source and destination, we can achieve a globally optimal path for the journey. The breadthfirst search [5] approach is used to search optimal paths in graphs. BFS traverses all the nodes and edges in the graph and guarantees an optimal link between two nodes. With modifications, it can be used to find optimal paths.
Light Weight Approach for Agnostic Optimal Route Selection
417
2 Related Work A search algorithm is used whenever we need to achieve the goal using a sequence of actions. To define a search problem, we need a starting state (or source in our case), the ending state (or destination), the series of actions we do (The path), a cost function to compare how well our course works compared to other courses. The cost function may vary depending on the type of search problem; for example, in a chess game, the starting and ending states are a particular type of board position. The cost function is the minimum number of moves it takes to reach from one board position to another. In a search problem where we need to find the shortest path between two points, the cost function is the least distance to reach from one point to another point There are many different search algorithms [6, 7], all of which are mainly classified into two broad divisions
2.1 Uninformed Search Uninformed searches, generally known as blind searches, use the components and information we provide to find the solution. They don’t have any extra data, so we cannot say how close we are to the goal using an uninformed search. Uninformed searches are slow compared to informed searches as they don’t have a feedback system to tell them of the correct path. As a result, they check all the existing paths until they reach the final solution. Using an uninformed search always guarantees us the complete solution, which is not the case in informed search. The uninformed search also has high computation costs since it is not the optimal solution; there are no suggestions given to reach the solution. The uninformed search is lengthy to implement. Some of the most used uninformed searches are depth-first search [8], breadth-first search [5], uniform cost search, bidirectional search [9]. Dijkstra is a well-known algorithm used to find the shortest path between the source to all vertices in a given graph. The ’B.E.S.T’ bus station routes can be considered a graph with vertices as stations. Dijkstra finds the minimum path by updating its shortest-path tree; it maintains two sets: one set has the vertices already there in the shortest path and the other set contains the vertices that are not there. We keep on adding vertices from the second set to the first set such that the vertex is at a minimum distance from the source, and upon adding the vertex, there shouldn’t be any closed loop in the shortest-path tree; by following these two conditions we keep on adding vertices until all vertices are covered. The resultant shortest-path tree is the minimum path from source to destination. Dijkstra is an efficient algorithm. It is used in many path-finding algorithms [10] like google maps; it is used in geographical maps and telephone networks. Dijkstra leads to an acyclic graph and sometimes cannot give the shortest path.
418
N. Singh et al.
2.2 Informed Search As the name suggests, Informed searches use information other than the one we provide to compute and find the optimal solution in a search problem. The data is obtained by a function called the heuristic function [11]. The heuristic function takes the present state and calculates how close it is to the final state or the goal, and uses the cost to select the route as a result at every step, the heuristic function enables our search to go in an optimal path so that we can reach the end goal faster As easy as it may sound, getting a good heuristic function is difficult since the heuristic function need to sacrifice between accuracy and computational complexity, It may seem like the optimal heuristic function is the one that finds the actual cost between the present state and the goal state, but that is also again an instance of the original problem we need to find, so an ideal heuristic function should give the cost similar to the actual cost so that for a small sacrifice of the accuracy, we can decrease the computational complexity. The Informed search obtains the solution more quickly compared to an uninformed search, But the answer may or may not be complete in some scases. The cost of computation of an informed search is low, and it also consumes less time. Some of the Informed searches are A* search algorithm [12, 13], greedy algorithm, graph search algorithm [14].
3 System Model The goal is to find an optimal route from source to destination. Here for the optimal path, we consider the route with the shortest distance as the shortest distance means less time and less fare. We have all the routes and stations data for BEST (Brihanmumbai Electric Supply and Transport). The data is arranged in excel files in many forms; We use routes with stages files for our implementation. The route with stages file contains route name, which is generally a four-digit number followed by U or D. U stands for up route and D stands for down route most of the routes in our data has both up and down routes, Although the stations encountering during up and down routes may differ sometimes. For each row, there is a route name, stop serial number, which is the order in which the stations occur in the route, stop_name, which is the station name itself; stop code which is unique for each station, stop latitude and longitude (which are the coordinates for the station), distance from the origin which is the distance of each stage from the original stage which is the origin, and lastly fare stage number which is the number of the stage. There are 2416 stations and 1077 routes, including the up and down routes. Out of the 2416 stations,1010 stations are stages. A stage is a station where multiple routes coincide to that point. The fare amount for the BEST is calculated based on the distance between stages instead of stations. The first and last station of a route are generally stages. The route with stages file is read using
Light Weight Approach for Agnostic Optimal Route Selection
419
a pandas library, each row and column is modified according to our usage by using route dictionary and station dictionary functions to reduce the time taken to access, Since each time we need to access a particular route, we need to go through the whole route with stages file.
4 Proposed Scheme We already know BFS and Dijkstra’s shortest-path algorithms [4]. We first find the path between stations without considering the weight of edges. To do this, we start BFS from both the source and destination node simultaneously. BFS gives the intermediate stations if any. If both nodes belong to the same route, it returns the route with minimum edge weight(distance) and other route information. Suppose the shortest path has interconnection with different routes than BFS returns intermediate stations. Intermediate stations can be reached from source and destination with surety. We compute the shortest path to these intermediate stations from source and destination. If we do not get intermediate stations in one go, we recursively search for it, and we fix the maximum depth of the recursive tree to four as we see the pattern, i.e., most of the users do not want to exchange more than three or four times in a journey, Since it would be difficult to do so. We recursively find the intermediate stations from source and destination, we compute the shortest paths to the intermediate station.
G = min H c
where G is global objective function and H = Td + Tt + Sc + Fc Here, Td is total distance travelled during journey. Tt is total estimated travel time to complete journey. Sc represents total cost of switching and Fc represents fair cost of journey.
Td =
dst
D(sti ,sti+1 )
sr c
Where Dsti ,sti+1 is distance between i th and (i + 1)th station.
Sc =
switch
Si × Ci
420
N. Singh et al.
Where Si is switching station and Ci is switching cost of i th station. These all parameters are used to get the cost of recommended route. Objective is to minimize the cost by managing tradeoffs between these parameters.
4.1 Reforming Data 4.1.1
Station Data
To decrease the time complexity for accessing data, a dict of stations is created before hand. This dict contains station name as key and it has a list as values. This list contains three items, (1st) and (2nd) item is the latitude and longitude of the station and the third item is a set which contains all the possible routes in which the station is present,this is the key to it. data processing for stations is completed through All Stations() defined in Algorithm 1 stations_dict[ station_name ] = { latitude, longitude, r outes_set}
4.1.2
Route Data
Similar to stations dictionary, We access route data a lot of times, so in order to decrease the computation time a route dict is created. now the key in this dict is route name and the values are a nested list. For each route the values are a nested list. All Routes() function is defined in Algorithm 1 for same. Item of the route list are [ station_Sr.N o, station_name, dist_ f r om_origin, f ar e_stage_no ]. By creating this instead of accessing all the 13000 entries in the excel file we can directly search for the required route in a dict which contains almost 2000 routes,so the computation time is decreased drastically. Source and destination are visualized on map shown in Fig. 1.
4.2 One Leg Recommendation A one leg journey is a scenario, where a person can travel from source to destination using a single route(i.e. The person need not change intermediate into a different route to go to the destination). This is the most feasible route for the traveller since there is no need to interchange, The traveller needs to take a single route to go from source to destination. For one leg journey We create two sets,which contains all routes passing through source in one set and all routes passing through destination in other set. We then find the intersection routes of these two sets. These routes are the possible routes we can take to reach the destination, from these routes we need to select the optimal route(i.e. the route which has the least distance), So for every route we calculate the
Light Weight Approach for Agnostic Optimal Route Selection
421
Algorithm 1: Optimal Route Finding: One leg Journey Input: Sour ce_station, Destination_station Output: minimum distance r oute Stations_dict ← dictionar y with station name as key Route_dict ← dictionar y with r oute number as key de f AllRoutes(station_name): gets the set of routes passing through the station from the stations_dict return r oute_set end de f AllStations(r oute_number ): finds all the stations present in the route from routes_dict and add it to a set return station_set end de f OneLeg(sour ce, destination): I R = All Stations(sour ce) ∩ All Stations(destination) S ← { } set o f possible r outes if (IR) then for route in IR do for station in route do if source SR_NO < destination SR_NO add route to S end end end shor test_r oute = min(S) dis
return shor test_r oute end
distance from source stage to destination stage. (This is done like this because in bus and metro the fairs are calculated based on the distances between two stages instead of stations as defined in Algorithm 1. SO for source and destination we calculate the distance between the stage which comes before source and the stage which comes after destination) and return all the routes with there distances in a tuple. OneLeg journey is visualized on map in Fig. 1.
4.3 Two-Leg Recommendation If the journey cannot be completed in a single leg (i.e. there is no direct route to take from source to reach the destination, We come to the second possibility which is a two-leg journey. Here the traveller need to interchange at one point and take a different route to reach the destination (i.e. the traveller needs to travel is two different routes hence the name two-leg journey). As defined in algorithm 2, the two-leg journey can be imagined as a two one leg journeys separately, but we need to find the station where the person need to interchange under the condition that the whole journey should be
422
N. Singh et al.
Fig. 1 OneLeg recommendation visualization
Algorithm 2: Optimal Route Finding: Two leg Journey de f TwoLeg(sour ce, destination): I S = Stations pr esent inter mediate r outes o f sour ce and destination for stations in IS do initialize f1 = NULL f2 = NULL set f1 and f2 as OneLeg(source, station) sort(f1,f2) shor test_r oute = min ( f 1, f 2) sum(dis)
return shor test_r oute end
completed in as minimum distance as possible (since distance is directly proportional to time taken for the journey). So We use the previous one leg code as a function. Similar to one leg we take again two sets, But instead of routes we use all the stations
Light Weight Approach for Agnostic Optimal Route Selection
423
Fig. 2 Two-leg journey visualization
present in each and every route into the given set (i.e. this set contain all the stations in all the possible routes which passes through the one leg). We create two sets one for source station and other for destination station. We take the intersection of these sets, Any of the station from the intersection set can be used as a interchange point since we can travel from source to the ’station in this set’ and from here to destination using a single route plotted on map in Fig. 2. In order to calculate the route for the whole journey we just need to find one legs between the source to intersection set and intersection set to destination. We take all the possible routes with there distances in the whole journey and make a list of them from which we get our desired route
424
N. Singh et al.
4.4 Three Leg Recommendation When there is no possibility that the journey can be completed in two legs we come to three leg. That means the journey is completed using three different routes, which means the person need to get down and interchange at two different points. Similar to the two-leg implementation, we find two sets of stations for source and destination, but here since there is no two leg possible, that means there wont be any intersection between the sets, therefore, we need to find a third path between any two stations in those two sets. As defined in algorithm 3, selecting the optimal stations from those two routes and the overall path is determined by calculating all the possible paths and then selecting the path with least possible sum of distances.
Algorithm 3: Optimal Route Finding: Three leg Journey de f ThreeLeg(sour ce, destination): IS // initialized in TwoLeg process TSR = intermediate routes present in IS of source TDR = intermediate routes present in IS of destination S = AllStations(TSR) D = AllStations(TDR) shor test_r oute = min (OneLeg(S, D)) sum(dis)
return shor test_r oute end
Therefore in the whole journey the person needs to take a path from (source to interchange point1), then another path from (interchange point 1 to interchange point 2) and finally from (interchange point2 to destination) using third path shown in Fig. 3.
4.5 Four-Leg Recommendation The four-leg journey is similar in form two, two-leg journey, and we go one level deep in a four-leg journey. As defined in algorithm 4 in a four-leg journey, the person needs to take four different routes and interchange at three different points to reach the destination. For source, we create a set of stations we can reach from the source, and for each station in the set, we take all the possible stations we can reach, in a bigger set; this is done for the destination also, and we find the intersection set. That set has stations that can be reached from the source using two legs, and from that station, we need to reach the destination using another two-leg path. We find all the possible paths and take the minimum path as the final path. The four-leg scenario is generally rare as most of the destinations can be reached by at max three routes from any station. But there are still some stations where we cant reach even by four
Light Weight Approach for Agnostic Optimal Route Selection
425
Fig. 3 Three leg journey visualization
Algorithm 4: Optimal Route Finding: Four leg Journey de f FourLeg(sour ce, destination): D = All Stations(All Routes(T S R)) ∩ All Stations(All Routes(T D R)) R1 = TwoLeg(source, D) R2 = TwoLeg(D, destination) shor test_r oute = min (R1, R2) sum(dis)
return shor test_r oute end
legs. We don’t consider those because the traveller in general is not comfortable taking more than four different routes to reach his destination.
5 Performance and Comparison Analysis To standardize the performance of proposed scheme, we have implemented Dijkstra algorithm on the BEST dataset with same computation resources. The results show
426
N. Singh et al.
Table 1 Time performance analysis (average computation time in sec) Task Dijkastra’s Proposed scheme One leg Two leg Three leg Four leg
0.00104 0.03542 0.08051 0.10743
0.00112 0.03552 0.07142 0.09316
that both algorithms have similar performance on OneLeg journey suggestions. In the case of two-leg journey suggestions, the proposed scheme slightly improves computation time. In the case of the three-leg and four-leg journey, the proposed scheme outperforms Dijkstra on a specific BEST dataset. Based on the results, we can say that the proposed scheme outperforms Dijkstra’s algorithm with some applicationspecific conditions and specific traffic datasets. Performance comparison is shown in the Table 1.
6 Summary and Future Work Based on the results, we can say that the proposed scheme performs better on specific urban transport networks. The proposed scheme does not require a lot of computation resources. We consider a general understanding from experiences that a person can not travel on a route with more than three switching stations in the same city. So we bound computation till four-leg journey recommendation. Any transport operator can use the proposed methodology to provide route recommendations in the existing system by adding it as a separate module independent of all other functionalities. With the help of this scheme, transport operators can add the functionality of ticketing for routes selected by users. This can be further developed as a fully automatic electronic ticketing system. That will improve the user experience and give a transparent ticketing system. It also reduces dependencies on ticketing stations for buses and other transport operators.
References 1. Ahmed, L., Heyken-Soares, P., Mumford, C., & Mao, Y. (2019). Optimising bus routes with fixed terminal nodes: Comparing hyper-heuristics with NSGAII on realistic transportation networks. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 1102–1110). 2. Liu, L. S., Lin, J. F., Yao, J. X., He, D. W., Zheng, J. S., Huang, J., Shi, P. (2021). Path planning for smart car based on Dijkstra algorithm and dynamic window approach. In Wireless Communications and Mobile Computing. ISSN 1530-8669. https://doi.org/10.1155/2021/8881684 3. Swaroop, K. P., Garapati, D. P., Nalli, P. K., & Duvvuri, S. S. (2021). Service restoration in distribution system using breadth-first search technique. In 2021 7th International Conference
Light Weight Approach for Agnostic Optimal Route Selection
427
on Electrical Energy Systems (ICEES) (pp. 403–407). https://doi.org/10.1109/ICEES51510. 2021.9383670 4. Di, J., & Gao, R. (2022). Research on railway transportation route based on Dijkstra algorithm. In 2021 International Conference on Big Data Analytics for Cyber-Physical System in Smart City (pp. 255–260). Springer. 5. Rahim, R., Abdullah, D., Nurarif, S., Ramadhan, M., Anwar, B., Dahria, M., Nasution, S. D., Diansyah, T. M., & Khairani, M. (2018). Breadth first search approach for shortest path solution in Cartesian area. Journal of Physics: Conference Series, 1019. 6. Panda, M., & Mishra, A. (2018). A survey of shortest-path algorithms. International Journal of Applied Engineering Research, 13, 6817–6820. 7. Pathak, M. J., Patel, R. L., & Rami, S. P. (2018). Comparative analysis of search algorithms. International Journal of Computer Applications, 179, 40–43 (A Novel Clustering Method Using Enhanced Grey) 8. Chen, Y. H., & Wu, C. M. (2020). An improved algorithm for searching Maze based on depthfirst search. In 2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCETaiwan) (pp. 1–2). IEEE 9. Li, L., Zhang, M., Hua, W., & Zhou, X. (2020). Fast query decomposition for batch shortest path processing in road networks. In 2020 IEEE 36th International Conference on Data Engineering (ICDE) (pp. 1189–1200). IEEE. 10. Susanto, W., Dennis, S., Handoko, M. B. A., & Suryaningrum, K. M. (2021). Compare the path finding algorithms that are applied for route searching in maps. In 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI) 11. Thant, T., Myint, K. N., & Than, T. (2020). Role of heuristic on informed and uninformed search. Journal of Computer Applications and Research, 1. 12. Ju, C., Luo, Q., & Yan, X. (2020). Path planning using an improved a-star algorithm. In 2020 11th International Conference on Prognostics and System Health Management (PHM-2020 Jinan) (pp. 23–26). IEEE. 13. Candra, A., Budiman, M. A., & Hartanto, K. (2020). Dijkstra’s and A-Star in finding the shortest path: A Tutorial. In 2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA) (pp. 28–32). IEEE. 14. Wayahdi, M. R., Ginting, S. H. N., & Syahputra, D. (2021). Greedy, A-Star, and Dijkstra’s algorithms in finding shortest path. International Journal of Advances in Data and Information Systems, 2, 45–52.
Index
A Ajay Kumar, 305 Ajay S. Chouhan, 305 Akanksha Vats, 359 Akhil Madhu, 119 Alokesh Ghosh, 69 Amarjeet Singh Cheema, 85 Amit Kumar Ateria, 85 Anjali Yeole, 141 Ankita Tiwari, 189 Anusha Kabber, 295 Archana Shirke, 1 Arunalatha, J. S., 127 Ashok Kumar Sharma, 251 Ashutosh Kumar, 85 Ashwath Krishnan, 39 Avik Kundu, 155
Divya Sharma, 415 Divya Srivastava, 271 Diya, V. A., 373
F Fatima Shefaq, 53
G Gaurav Choudhary, 387 Gaurav Trivedi, 189 Gigi Joseph, 305
H Hena Ray, 69 Hibah Ihsan Muhammad, 189
B Benitta Mariam Babu, 1
C Chandni, 359 Chethan Kumar, B., 331 Chintala Srujan, 415
D Deepika Pantola, 271 Dhinesh Babu, L. D., 217 Dhruthi, V. M., 295 Dhruva J. Baruah, 415 Dinesha, 331
I Imtiaz Ahmed, Md., 53
J Jayesh Shadi, 141 Jeremy Dylan D’Souza, 119 Jitendra V. Nasriwala, 109
K Kavya P. K., 25 Khosla, P. K., 231
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Noor et al. (eds.), Proceedings of Emerging Trends and Technologies on Intelligent Systems, Advances in Intelligent Systems and Computing 1414, https://doi.org/10.1007/978-981-19-4182-5
429
430 M Madhuri Gupta, 271 Manjiri Kherdekar, 1
N Nabarun Bhattacharyya, 69, 177 Nadir Charniya, 141 Nagaraja, 331 Nagendra Singh, 415 Natarajan, S., 25, 39, 295 Natasha Sharma, 167 Navdeep S. Chahal, 231 Naveen Aggarwal, 95 Nidhi Gupta, 17 Nonita Sharma, 95
P Pashmeen Singh, 205 Piyush Sharma, 251 Pradeep Nandan, 373 Pragya Singh, 387 Pramila, R. M., 401 Prathamesh Jadhav, 141 Preeti Abrol, 231 Preeti P. Bhatt, 109 Preeti Sharma, 401 Priya, 167 Priyesh Ranjan, 85
R Raghav Pandit, 295 Rahil N. Modi, 25 Rajesh Kushwaha, 415 Rajib Bandyopadhyay, 69 Rajneesh Kumar Gautam, 283 Rakesh R. Savant, 109 Ravi Sankar, 69 Revathi, A., 341 Rezoana Akter, 53 Richa, 319 Ritesh R. Dhote, 373 Roshni Poddar, 25
Index S Sachin S. Bhat, 331 Sajeesh, C. S., 305 Sakshee Sawant, 141 Sandhya Ramakrishnan, 217 Santhi, S. G., 341 Saravanakeerthana Perumal, 319 Saurabh Bilgaiyan, 155 Selin Sara Varghese, 1 Senthil Arumugam Muthukumarswamy, 205 Senthil, L., 251 Shini Renjith, 119 Shishir Kumar Shandilya, 387 Siddhi Rawal, 319 Soumojit Roy, 177 Subhash Kamble, 127 Sudeep Rai, 85 Sudhanva Rajesh, 39 Sudhir Nadda, 283 Suparna Parua Biswas, 177 Suraj Revankar, 331
T Tanav Aggarwal, 95 Tarun Kanti Ghosh, 69 Trupti Sonawane, 141 Tushar Patnaik, 359
U Umesh Gupta, 271
V Vandana Jhala, 17 Venkataravana Nayak, K., 127 Venkitesh S. Anand, 119 Venugopal, K. R., 127 Vikas Sihag, 387 Vineet Sharma, 305 Vinod K. Boppanna, 305